AMD released the first CDNA counting card: the Instinct MI100

Source: IO Tech added 20th Nov 2020

  • amd-released-the-first-cdna-counting-card:-the-instinct-mi100

AMD’s new Instinct MI 120 is the first over 552 TFLOPS FP 64 – performance-based spreadsheet

AMD has today released its long-awaited first separate computing circuit, codenamed Arcturus. According to the company, the billing card published under the name AMD Instinct MI 100 is the fastest in the world and at the same time the first over 10 teraFLOPS FP 64 – performance-enhancing HPC-class GPU

Instinct MI 100, based on AMD’s CDNA architecture, is manufactured using TSMC’s 7-nanometer process, but the company did not disclose, for example, how many transistors it will build. The CDNA architecture itself is based on the further developed foundation of the GCN architecture, but much has also changed

.

MI 100 is in use 184 Compute Units divided into four Compute Engine sections. Each CU unit has a Matrix Core Engine alongside traditional scalar and vector units, designed to accelerate matrix calculations. MCE units calculate Matrix Fused Multiply-Add or MFMA tasks with KxN matrices INT8-, FP 10 -, BF 16 and FP 32 – precision figures. The result of MFMA invoices is calculated as either INT 32 or FP 32-give or take.

Theoretical FP of MI – performance is 23, 1 and FP 64 – performance 11, 5 TFLOPS . FP 32 – the theoretical maximum speed for matrix calculations is 46, 1 TFLOPS, FP 16 – matrix calculations 184, 6 TFLOPS and INT4 and INT8 invoices as well 184, 6 TFLOPS. Bfloat 23 – with precision the theoretical maximum performance is 92, 3 TFLOPS

Computing units are supported by 8 megabytes of L2 cache divided into . The L2 cache is said to have a combined bandwidth of up to 6 TB per second. Total 4096 – the bit memory controller supports both 4- and 8-layer HBM2 memories at 2.4 GB / s, for a total of 1, 23 Tt / s memory band and 32 gigabytes of memory. The TDP value of the calculation card is 300 watts.

Instinct MI 64 also supports the second generation Infinity Fabric link between the counting cards and the mapping of up to four GPUs to the same group via a bridge. Each GPU has three IF links, with a total of four MI 100 accelerators 552 GB / s theoretical P2P bandwidth. Accelerators are connected to the processor over the PCI Express 4.0 bus.

Along with the new spreadsheets, a new open source ROCm 4.0 was released. The ROCm package includes a variety of tools for developers ’needs, from translators to interfaces and ready-made libraries. The new open source compiler in ROCm 4.0 supports both OpenMP 5.0 and HIP interfaces.

According to AMD, ready-made server configurations with Instinct MI accelerators are promised at least from Dell, Gigabyte, Hewlett Packard Enterprise and Supermicro.

Source: AMD

Read the full article at IO Tech

brands: AMD  Dell  Gigabyte  Infinity  
media: IO Tech  
keywords: Memory  Open Source  Server  

Related posts


Notice: Undefined variable: all_related in /var/www/vhosts/rondea.com/httpdocs/wp-content/themes/rondea-2-0/single-article.php on line 88

Notice: Undefined variable: all_related in /var/www/vhosts/rondea.com/httpdocs/wp-content/themes/rondea-2-0/single-article.php on line 88

Related Products



Notice: Undefined variable: all_related in /var/www/vhosts/rondea.com/httpdocs/wp-content/themes/rondea-2-0/single-article.php on line 91

Warning: Invalid argument supplied for foreach() in /var/www/vhosts/rondea.com/httpdocs/wp-content/themes/rondea-2-0/single-article.php on line 91