The Oak Ridge Leadership Computing Facility (OLCF) has announced the first details about the Orion storage subsystem of its upcoming Frontier exascale supercomputer set to go online in late 2021. Being the industry’s first 1.5 ExaFLOPS supercomputer, Frontier will need a very fast storage subsystem. It looks like it is set to get one with up to 700 Petabytes of storage, 75 TB/s of throughput, and 15 billion IOPS (yes, billion) of performance on tap.
“To the best of our knowledge, Orion will be the largest and fastest single file POSIX namespace file system in the world,” said Sarp Oral, Input/Output Working Group lead for Frontier at OLCF.
The Frontier supercomputer will actually have two storage sub-systems: an in-system storage layer featuring massive sequential read performance of over 75TB/s and around 15 billion read IOPS, as well as a center-wide file system called Orion that offers a whopping 700PB of capacity.
The Orion Global File Storage System Layer: 700PB Capacity at 10TB/s
Since Frontier relies on HPE’s Cray Shasta architecture, its global file storage system will largely rely on the ClusterStor multi-tier architecture that uses both PCIe 4.0/NVMe solid-state drives as well as traditional hard disk drives.
The Cray ClusterStor machines use AMD EPYC processors and can automatically align data flows in the file system with the workload and shift I/O operations between different tiers of storage as needed. Such shifting makes applications believe that they are accessing high-performance all-flash arrays, thus maximizing performance.
As for the software side of matters, Orion will use an open-source Lustre parallel file system (used by loads of supercomputers worldwide, including OLCF’s Titan and Jaguar) as well as ZFS with a volume manager.
In general, Frontier’s center-wide Orion will have three tiers:
- A metadata tier comprising of 480 NVMe SSDs with 10PB of capacity.
- An NVMe storage tier that uses 5,400 SSDs providing 11.5PB of capacity, peak read-write speeds of 10TB/s, and over 2 million random-read input/output operations per second (IOPS).
- An HDD storage tier based on 47,700 PMR hard drives offering 679PB of capacity, a peak read speed of 5.5TB/s, a peak write speed of 4.6TB/s, and over 2 million random-read IOPS.
OLCF says that Orion will have 40 Lustre metadata server nodes and 450 Lustre object storage service (OSS) nodes, a total of 1,350 OSTs systemwide. Each OSS node will provide one object storage target (OST) device for performance and two OST devices for capacity. In addition, Orion will employ 160 nodes for routing that will offer peak read-write speeds of 3.2 TB/s available to other OLCF resources and platforms.
“Orion is pushing the envelope of what is possible technically due to its extreme scale and hard disk/NVMe hybrid nature,” said Dustin Leverman, leader of the OLCF’s High-Performance Computing Storage and Archive Group. “This is a complex system, but our experience and best practices will help us create a resource that allows our users to push science boundaries using Frontier.”
The In-Storage Layer: Up to 75TB/s at 15 Billion Read IOPS
Frontier’s in-storage layer comprises of SSDs installed directly into compute nodes and connected to AMD’s EPYC processors using a PCIe Gen 4 interface. These NVMe drives will offer an aggregate performance of over 75TB/s read speed, over 35TB/ write speed, and over 15 billion random-read IOPS.
The OLCF did not disclose the capacity of the in-storage layer, but this is just local storage, so do not expect tens of petabytes here.
Summary
Overall, the in-storage layer provides Frontier a whopping performance of 75TB/s, whereas the center-wide Orion offers a capacity of around 700PB. Combining this dual-layer and multi-tier storage subsystem provides just what a 1.5 EFLOPS machine with a 20MWatt power consumption needs: unbeatable storage performance to feed data to CPUs and GPUs and the ultimate capacity to store the large datasets that the supercomputer is made to process.