Software - Page 29 of 189

Image 1 of 10

(Image credit: Cerebras)

Image 2 of 10

(Image credit: Cerebras)

Image 3 of 10

(Image credit: Cerebras)

Image 4 of 10

(Image credit: Cerebras)

Image 5 of 10

(Image credit: Cerebras)

Image 6 of 10

(Image credit: Cerebras)

Image 7 of 10

(Image credit: Cerebras)

Image 8 of 10

(Image credit: Cerebras)

Image 9 of 10

(Image credit: Cerebras)

Image 10 of 10

(Image credit: Cerebras)

Cerebras, the company behind the Wafer Scale Engine (WSE), the world’s largest single processor, shared more details about its latest WSE-2 today at the Linley Spring Processor Conference. The new WSE-2 is a 7nm update to the original Cerebras chip and is designed to tackle AI workloads with 850,000 cores at its disposal. Cerebras claims that this chip, which comes in an incredibly small 26-inch tall unit, replaces clusters of hundreds or even thousands of GPUs spread across dozens of server racks that use hundreds of kilowatts of power.

The new WSE-2 now wields 850,00 AI-optimized cores spread out over 46,225 mm2 of silicon (roughly 12×12 in.) packed with 2.6 trillion transistors. Cerebras also revealed today that the second-gen chip has 40 GB of on-chip SRAM memory, 20 petabytes of memory bandwidth, and 220 petabits of aggregate fabric bandwidth. The company also revealed that the chip consumes the same 15kW of power as its predecessor but provides twice the performance, which is the benefit of moving to the denser 7nm node from the 16nm used with the previous-gen chip.

Cerebras Wafer Scale Engine 2 WSE-2 Specifications

	Cerebras Wafer Scale Engine 2	Cerebras Wafer Scale Engine	Nvidia A100
Process Node	TSMC 7nm	TSMC 16nm	TSMC 7nm N7
AI Cores	850,000	400,000	6,912 + 432
Die Size	46,255 mm2	46,255 mm2	826 mm2
Transistors	2.6 Trillion	1.2 Trillion	54 Billion
On-Chip SRAM Memory	40 GB	18 GB	40 MB
Memory Bandwidth	20 PB/s	9 PB/s	1,555 GB/s
Fabric Bandwidth	220 Pb/s	100 Pb/s	600 GB/s
Power Consumption (System/Chip)	20kW / 15kW	20kW / 15kW	250W (PCIe) / 400W (SXM)

These almost unbelievable specifications stem from the fact that the company uses an entire TSMC 7nm wafer to construct one large chip, thus sidestepping the typical reticle limitations of modern chip manufacturing to create a wafer-sized processor. The company builds in redundant cores directly into the hardware, which then leaves room for disabling defective cores, to sidestep the impact of defects during the manufacturing process.

The company accomplishes this feat by stitching together the dies on the wafer with a communication fabric, thus allowing it to work as one large cohesive unit. This fabric provides 220 Petabits/S of throughput for the WSE2, which is slightly more than twice the 100 Petabits/S of the first-gen model. The wafer also includes 40GB of on-chip memory that provides up to 20 Petabytes/S of throughput, both of which are also more than twice that of the previous-gen WSE.

Image 1 of 10

(Image credit: Cerebras)

Image 2 of 10

(Image credit: Cerebras)

Image 3 of 10

(Image credit: Cerebras)

Image 4 of 10

(Image credit: Cerebras)

Image 5 of 10

(Image credit: Cerebras)

Image 6 of 10

(Image credit: Cerebras)

Image 7 of 10

(Image credit: Cerebras)

Image 8 of 10

(Image credit: Cerebras)

Image 9 of 10

(Image credit: Cerebras)

Image 10 of 10

(Image credit: Cerebras)

Cerebras hasn’t specified the WSE-2’s clock speeds, but has told us in the past that the first-gen WSE doesn’t run at a very “aggressive” clock (which the company defined as a range from 2.5GHz to 3GHz). We’re now told that the WSE-2 runs at the same clock speeds as the first-gen model, but provides twice the performance within the same power envelope due to its increased system resources. We certainly don’t see those types of generational performance improvements with CPUs, GPUs, or most accelerators. Cerebras says that it has made unspecified changes to the microarchitecture to extract more performance, too.

As you can see below, cores are distributed into tiles, with each tile having its own router, SRAM memory, FMAC datapath, and tensor control. All cores are connected via a 2D mesh low-latency fabric. The company claims these optimizations result in a 2x improvement in wall clock training time with a BERT-style network training that was completed using the same code and compiler used with the first-gen wafer-scale chip.

Image 1 of 4

(Image credit: Cerebras)

Image 2 of 4

(Image credit: Cerebras)

Image 3 of 4

(Image credit: Cerebras)

Image 4 of 4

(Image credit: Cerebras)

As before, the chip comes wrapped in a specialized 15U system that’s designed specifically to accommodate the unique characteristics of the wafer-scale device. We’re told that the changes to the first-gen CS-1 system, which you can read about in-depth here, are very minimal in the new CS-2 variant. Given that the most important metrics, like power consumption and the size of the WSE, have remained the same, it makes sense that most of the system is identical.

Cerebras hasn’t specified pricing, but we expect the WSE-2 unit will continue to attract attention from the military and intelligence communities for any multitude of purposes, including nuclear modeling, but Cerebras can’t divulge several of its customers (for obvious reasons). It’s safe to assume they are the types with nearly unlimited budgets, so pricing isn’t a concern. On the public-facing side, the Argonne National Laboratory is using the first systems for cancer research and basic science, like studying black holes.

Cerebras also notes that its compiler easily scaled to exploit twice the computational power, so the software ecosystem that is already in place is supported. As such, the WSE-2 unit can accept standard PyTorch and TensorFlow code that is easily modified with the company’s software tools and APIs. The company also allows customers instruction-level access to the silicon, which stands in contrast to GPU vendors.

Cerebras has working systems already in service now, and general availability of the WSE-2 is slated for the third quarter of 2021.

Our Verdict

There’s no doubt that the Predator Apollo RGB DDR4-4500 is a speedy memory kit. Unfortunately, the hefty price tag will probably scare off potential buyers.

For

+ Good XMP performance
+ Samsung B-die ICs
+ RGB lighting doesn’t require proprietary software

Against

– Excessively tall
– Far too expensive

It’s hard not to know Acer – it’s one of the more prominent mainstream brands in the computer industry. However, the company’s Predator sub-brand might not ring a bell for the typical computer user that’s not into gaming. Nonetheless, the Predator label is home to Acer’s premium gaming PCs, laptops, monitors, and chairs. To further expand its reach, Acer has created Predator Storage, a new family of high-performance storage and memory products that target enthusiasts and gamers alike.

Acer won’t actually manage Predator Storage, though. Following in HP’s footsteps, Acer has handed the reins over to Chinese OEM Biwin Storage to manufacture and commercialize Predator-branded memory and SSDs on its behalf in the United States and Canadian markets. Today marks Predator Storage’s first venture into the memory market. The sub-brand debuts with its Apollo RGB series of gaming memory that offers frequencies ranging from DDR4-3200 up to DDR4-5000.

Image 1 of 3

Predator Apollo RGB DDR4-4500 C19 (Image credit: Tom’s Hardware)

Image 2 of 3

Predator Apollo RGB DDR4-4500 C19 (Image credit: Tom’s Hardware)

Image 3 of 3

Predator Apollo RGB DDR4-4500 C19 (Image credit: Tom’s Hardware)

The Predator Apollo RGB memory modules sport an aluminum heat spreader for effective heat dissipation. According to the brand, the design takes after a cyberpunk theme. It features a two-tone paint job with a mixture of black and silver colors and is carved in such a way that it exposes the majority of the LED diffuser. However, one thing to consider is that the Predator Apollo RGB measures 51.4mm (2.02 inches) tall, so you’ll need to make sure you have the necessary clearance space for the memory modules, especially if you’re using a large CPU air cooler.

As with any modern-day gaming memory, the Predator Apollo RGB is equipped with RGB lighting that you can configure to your heart’s content. Software isn’t provided for such purposes, but the memory is compatible with all the major RGB ecosystems, including Asus Aura Sync, Gigabyte RGB Fusion 2.0, MSI Mystic Light Sync, and ASRock Polychrome Sync.

Predator Apollo RGB DDR4-4500 C19 (Image credit: Tom’s Hardware)

Our Predator Apollo RGB memory kit checks in at an unorthodox data rate of DDR4-4500. There are so few DDR4-4500 memory kits on the market that we can count them with the fingers of one hand. As you can tell by now, the Predator Apollo RGB is a dual-channel 16GB memory kit, so it consists of two DDR4 memory modules with a density of 8GB each. The memory modules are based on a single-rank design and are manufactured with a 10-layer PCB and 15μm gold-plated contacts.

Leveraging Samsung’s K4A8G085WB-BCPB (B-die) ICs, the Predator Apollo RGB is rated for DDR4-4500 at 19-19-19-39 timings with a 1.45V DRAM voltage requirement. When the XMP 2.0 profile for the advertised speed isn’t active, the memory modules default to DDR4-2133 with automatic timings at 15-15-15-36. For more on timings and frequency considerations, see our PC Memory 101 feature, as well as our How to Shop for RAM story.

Comparison Hardware

Memory Kit	Part Number	Capacity	Data Rate	Primary Timings	Voltage	Warranty
Thermaltake ToughRAM RGB	R009D408GX2-4600C19A	2 x 8GB	DDR4-4600 (XMP)	19-26-26-45 (2T)	1.50	Lifetime
Predator Apollo RGB	BL.9BWWR.255	2 x 8GB	DDR4-4500 (XMP)	19-19-19-39 (2T)	1.45	Lifetime
Patriot Viper 4 Blackout	PVB416G440C8K	2 x 8GB	DDR4-4400 (XMP)	18-26-26-46 (2T)	1.45	Lifetime
Klevv Cras XR RGB	KD48GU880-40B190Z	2 x 8GB	DDR4-4000 (XMP)	19-25-25-45 (2T)	1.40	Lifetime
TeamGroup T-Force Xtreem ARGB	TF10D416G3600HC14CDC01	2 x 8GB	DDR4-3600 (XMP)	14-15-15-35 (2T)	1.45	Lifetime

Our Intel test system consists of an Intel Core i9-10900K and Asus ROG Maximus XII Apex on the 0901 firmware. On the opposite side, the AMD testbed leverages an AMD Ryzen 5 3600 and ASRock B550 Taichi with the 1.30 firmware. The MSI GeForce RTX 2080 Ti Gaming Trio is the main graphics card in our RAM benchmarks.