Wide adoption of high-performance computing (HPC) for scientific research in recent years has enabled scientific progress at a pace never seen before. Supercomputers can simulate various processes much faster than scientists can experiment in real life or replicate processes that are impossible to experiment with. But simulations are not the only efficient method for scientific research and it can be greatly enhanced with artificial intelligence and deep learning. That’s why scientists from High-Performance Computing Center in Stuttgart plan to upgrade their CPU-only Hawk supercomputer with Nvidia’s A100 GPUs in the coming months.
Deep Learning to Assist Simulations
For many years, scientists have used simulations to research areas like aerodynamics, climate modeling, computational fluid dynamics, and molecular dynamics. Simulation algorithms developed by researchers are based on fundamental scientific principles and are very accurate. That precision requires using thousands of CPU cores as well as double precision floating point format (FP64), which often means programming skills to run efficiently, pretty long computing times, and the generation of massive amounts of data. More recently, CPUs and GPUs have gained significant artificial intelligence (AI) and deep learning (DL) capabilities, which is why researchers started to use AI and DL in their work.
AI and DL algorithms do not offer precision of simulations and are not meant to replace them. But what they can do is to rapidly identify patterns in large datasets, and then create a computational model that approximates the actual behavior. In some cases, using AI and DL can eliminate certain scientifically accurate simulations that will lead nowhere, which greatly speeds up research. In fact, many scientists believe that combining AI/DL and simulations is the future of supercomputing.
HSLR’s Hawk to Get GPUs
Like other academic researchers, supercomputer center High-Performance Computing Center Stuttgart (HLRS) has relied on HPC simulations performed by CPU-based machines. This year it deployed Hawk, an AMD EPYC 7742-based supercomputer with 698,800 cores and 1,397,760 GB of memory that can achieve a 19,334 TFLOPS in Rmax Linpack performance. In addition, the scholars have also been studying GPU-based AI supercomputers since 2019 when they deployed a Cray CS-Storm system featuring 60 Nvidia GPUs.
Apparently, HLRS’s findings were quite promising as this week HLRS, Hewlett Packard Enterprise, and Nvidia announced that HLRS would upgrade its primary Hawk supercomputer with 24 HPE Apollo 6500 Gen10 Plus systems with 192 Nvidia A100 GPUs.
“At HLRS our mission has always been to provide systems that address the most important needs of our key user community, which is largely focused on computational engineering,” explained HLRS Director Dr.-Ing. Michael Resch. “For many years this has meant basing our flagship systems on CPUs to support codes used in computationally intensive simulation. Recently, however, we have seen growing interest in deep learning and artificial intelligence, which run much more efficiently on GPUs. Adding this second key type of processor to Hawk’s architecture will improve our ability to support scientists in academia and industry who are working at the forefront of computational research.”
The key part of HLRS announcement is that 24 HPE Apollo 6500 Gen10 Plus systems will be integrated with the Hawk supercomputer to build a hybrid machine with one filesystem that will be able to run both AI/DL algorithms on Nvidia GPUs and perform traditional highly-precise simulations on AMD’s 64-core CPUs.
“Once Nvidia GPUs are integrated into Hawk, hybrid workflows combining HPC and AI will become much more efficient,” said Dennis Hoppe, who leads artificial intelligence operations at HLRS. “Losses of time that occur because of data transfer and the need to run different parts of the workflows in separate stages will practically disappear. Users will be able to stay on the computing cores they are using, run an AI algorithm, and integrate the results immediately.”
In addition, users of HLRS’s Hawk will be able to access Nvidia’s NGC catalog of GPU-optimized programs for AI, DL, and HPC workloads, which to a large degree popularize usage of GPUs among HLRS’s clients.
Tomorrow’s Supercomputers Today
HPC-enabled simulations are not going anywhere because when one needs to develop a real product or model something very precisely, there is no substitute for FP64 computations right now. GPUs like Nvidia’s A100 or AMD’s Instinct MI100 were also designed to run HPC workloads with an FP64 precision, but since there are already many CPU-optimized algorithms, GPUs are not going to fully replace general-purpose processors any time soon, especially for scientific researchers.
But using AI and DL to assist or advance HPC simulations is about to become a new trend in the supercomputing world. Hawk will be one of Europe’s first hybrid supercomputers, but eventually such systems will be considerably more widespread.