Interview: Whamcloud Wins FastForward Contract for Exascale R&D

28

Today Whamcloud announced that the company has been awarded the Storage and I/O Research & Development subcontract for the Department of Energy’s FastForward program. FastForward is set up to initiate partnerships with multiple companies to accelerate the R&D of critical technologies needed for extreme scale computing. To learn more, I caught up with Eric Barton, Whamcloud’s CTO.

insideHPC: Many DOE applications place extreme requirements on computations, data movement, and reliability. What aspects will Whamcloud focus on in this contract?

Eric Barton: All of the above. We’re researching a completely new I/O stack suitable for Exascale.

At the top the stack we’re building an object-oriented storage API based on HDF5 to support high-level data models, their properties and relationships. This will use a non-blocking initiation and completion notification APIs to ensure application developers can overlap compute and I/O naturally and efficiently. The API will also allow distributed updates to be grouped into atomic transactions to ensure that application data and metadata stored in the Exascale storage system remains self consistent in the face of all possible failures.

In the middle of the I/O stack, we’re prototyping a Burst Buffer using persistent solid-state storage accessed using OS bypass technology and a data layout optimizer based on PLFS. This part of the stack, running on dedicated I/O nodes of the Exascale machine, will handle the impedance mismatch between the smooth streaming I/O required for efficient disk utilization with the bursty, fragmented and misaligned I/O that Exascale applications will produce.

At the bottom of the stack we’re designing a new scalable I/O API to replace POSIX for distributed applications. Called DAOS, for Distributed Application Object Storage, This API will support asynchronous transactional I/O within scalable object collections. This will provide the functionality, performance, scalability and fault tolerance foundational to the whole Exascale I/O stack.


     

     
    Read more at insideHPC