Burst buffer

In the high-performance computing environment, the burst buffer is a fast and intermediate storage lnterm between the front-end computing processes and the back-end storage systems . It emerges as a fast storage solution to the ever-increasing performance of the gap between the processing and the input / output (I / O) bandwidth of the storage systems. [1] Burst buffer is built from high-performance storage devices, such as NVRAM and SSD . It is one of the largest I / O bandwidth providers in the world.

Use cases

The emergence of burst buffer fosters a wide variety of solutions to accelerate the scientific data movement on supercomputers . For example, scientific applications’ life cycles typically alternate between computation phases and I / O phases. [2]Namely, after every round of computation, all the computing processes competing with their intermediate data to the back-end storage systems. With the deployment of a burst buffer, processes can quickly write to data computation, and immediately proceed to the next round of computation; The data are then asynchronously flushed from storage to the same time with the next round of computation. In this way, I / O time is shortened by the overlapped computation and data flush operations. [3] [4] In addition, buffering data in burst buffer also gives you plenty of opportunities to reshape the data traffic to the back-end storage systems for efficient bandwidth utilization of storage systems. [5] [6]In another common use case, the scientific applications can be used in the context of data transfer and storage. Bypassing the storage systems allows you to relish most of the performance benefit from burst buffer. [7]

Representative burst buffer architectures

There are two representative burst buffer architectures in the high-performance computing environments: node-local buffer burst and remote shared burst buffer. In the node-local burst buffer architecture, burst buffer storage is located on the individual compute node, so the aggregate burst buffer bandwidth grows linearly with the compute node count. This scalability benefit has been well-documented in recent literature. [8] [9] [10] [11] It also provides a scalable metadata management strategy for maintaining global namespace for data distributed across all burst buffers. [12] [13]In the remote shared burst buffer architecture, burst buffer storage resides on a smaller number of I / O slots between the compute nodes and the back-end storage systems. Data movement between the compute nodes and the burst buffer. Placing the buffer on the I / O nodes facilitates the independent development, deployment, and maintenance of the burst buffer service. Hence, several well-known commercialized software products have been developed to manage this type of burst buffer, such as DataWarp and Infinite Memory Engine.

Supercomputers deployed with burst buffer

Due to its importance, burst buffer has been widely deployed on the leadership-scale supercomputers. For example, node-local burst buffer has been installed on DASH supercomputer at the San Diego Supercomputer Center , [14]Tsubame supercomputers at Tokyo Institute of Technology , Theta and Aurora supercomputers at the Argonne National Laboratory , Summit supercomputer at the Oak Ridge National Laboratory , and Sierra supercomputer at the Lawrence Livermore National Laboratory , etc. Remote shared burst buffer has been adopted by Tianhe-2 Supercomputer at the National Supercomputer Center in Guangzhou, Trinity supercomputer at the Los Alamos National Laboratory , and Cori supercomputer at the Lawrence Berkeley National Laboratory , etc.

References

  1. Jump up^ “On the Role of Burst Buffers in Leadership-Class Storage Systems”(PDF) . IEEE. April 2012.
  2. Jump up^ “A Case of System-Wide Power Management for Scientific Applications”(PDF) . IEEE. September 2013.
  3. Jump up^ “Jitter-Free Co-Processing on a Prototype Exascale Storage Stack”(PDF) . IEEE. April 2012.
  4. Jump up^ “BurstMem: A High-Performance Burst Buffer System for Scientific Applications” (PDF) . IEEE. October 2014.
  5. Jump up^ “TRIO: Burst Buffer Based I / O Orchestration” (PDF) . IEEE. September 2015.
  6. Jump up^ “Leveraging Burst Buffer Coordination to Prevent I / O Interference”(PDF) . IEEE. March 2017.
  7. Jump up^ “An Ephemeral Burst-Buffer File System for Scientific Applications” (PDF). IEEE. November 2016.
  8. Jump up^ “BurstFS: A Distributed Burst Buffer File System for Scientific Applications”(PDF) . November 2015.
  9. Jump up^ “Design, Modeling, and Evaluation of Scalable Multi-level Checkpointing System” (pdf) . ACM. November 2010.
  10. Jump up^ “Has 1 PB / s File System to Checkpoint Three Million MPI Tasks” (PDF) . ACM. June 2013.
  11. Jump up^ “FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems” (PDF) . IEEE. October 2014.
  12. Jump up^ “MetaKV: A Key-Value Store for Metadata Management of Distributed Burst Buffers” (PDF) . IEEE. May 2017.
  13. Jump up^ “ZHT: A Reliable Light-Weight Persistent Dynamic Scalable Zero-Hop Distributed Table Hash” (PDF) . IEEE. May 2013.
  14. Jump up^ “DASH: A Recipe for a Flash-based Data Intensive Supercomputer”(PDF) . ACM. November 2010.