The magic of the three Radeon RX 6000 is in the graphics chip , the navigation system 21 or, as it is also called, Big Navi. Its key data are quickly named: Manufactured by TSMC with 7 nanometer technology (what else?), 519 Square millimeters in size and with 26, 8 billion transistors provided. AMD brings in 77 Compute Units with 64 Shader cores under.
How does AMD manage to remove the chip that is in the RX 6800 XT has more than twice as much computing power as the navigation system 10 from the RX 5700 XT, to supply adequate data?
With the graphics memory itself, AMD leaves the church in the village and trusts in the tried and tested: 16 GByte GDDR6-RAM on a memory bus with 256 parallel Data lines. Only its data rate is 15 Gbit per second slightly higher than with the Radeon RX 5700 (XT), so that now nominally 512 GByte / s on the data sheet instead of 448. This helps not to drive up the price.
A first test of the Radeon RX 6800 and 6800 XT can also be found on heise online. The full test with further details, for example on Smart Access Memory, in the upcoming c’t edition 26 / 2020.
Infinity Cache An essential component of Big Navi is the Infinity Cache , IC for short. This is a last-level cache that is transparent for games and applications and that has an unheard-of size for graphics chips of 128 MByte and is supported by 4 MByte level 2 cache. AMD has experimented with several sizes in the simulator and from 128 MByte determined a certain saturation.
Cache organization for Big Navi
(Image: AMD)
In games, the IC should have a hit rate of 47 percent – almost six out of ten memory accesses can be intercepted here .
The IC delivers via 16 64 – bit channels with a maximum clock rate of 1, 94 GHz almost 2 TByte / s and is therefore four times as fast as the normal GDDR6 memory. However, this includes a boost function that is extra for 550 GByte / s is good – but only if the electrical power budget is not exhausted. We measured around 1.2 TByte / s in synthetic tests, which the mixture transfers to GDDR6 and Infinity Cache.
AMD draws on the expertise of the CPU department for the Infinity Cache which has a lot of experience with large, but compact, fast caches via the Zen processors. The IC should also help to improve the energy balance. Each bit from the cache only needs an energy of 1.6 picoJoule, while a memory access requires 7-8 pJ.
Raytracing and DirectX 12 Ultimate The Infinity Cache has another function: It saves parts of the ray tracing acceleration structures that are necessary for the ray accelerators. With these, AMD accelerates ray tracing for the first time using specialized hardware circuits. However, one goes a slightly different way than Nvidia.
The ray tracing units at AMD “only” take over the intersection calculations between the rays and the BVH boxes or at the lowest level of the Triangles. In contrast to the raster performance, the ray tracing performance does not keep pace with Nvidia’s latest RTX cards. In 3DMark Port Royale, for example, the RX 6800 XT is “only” as fast as an OC model of the GeForce RTX 2080 Ti and lies with 9031 Points approx. 10 percent before the RTX 3070. The situation is similar in the (soft) ray tracing benchmark Neon Noir.
The result of the new 3DMark feature test for ray tracing, which displays almost the entire scene via ray tracing and at the same time, was exciting offers a setting option for the number of samples per pixel (spp), i.e. the rays. Here it was shown that based on 2 spp the Radeon RX 6800 XT is behind the GeForce RTX 3070 from 31 Percent on 15 Percentage at the maximum setting of 20 spp
In the graphics evaluation of the 3DMark Firestrike Extreme, however, where the classic rasterization performance counts, it hangs the two GeForce cards by at least 40 percent and is still more than double so expensive RTX 3090.
Big Navi block diagram
(Image: AMD)
Thanks to ray tracing, Big Navi now also meets the Ultimate specification of DirectX 12. In addition, variable rate shading, mesh shaders and especially sampler feedback are now part of the package. The Radeon can also do DirectStorage, which is called RTX IO by Nvidia and which has already caused a stir on the Xbox Series S / X and Playstation 5 game consoles under a different name.
In some aspects such as the sampler feedback (Tier 1.0 instead of 0.9) or the fine grain of the variable rate shading (8 × 8 pixel fine tiles instead of 16 × 16), the Radeon even surpasses that GeForce RTX 3000.
Benchmark experiments I: Blender 2. 90. 1 We played around a bit with a few benchmarks beforehand. On the one hand, there is the rendering program Blender. In the classroom scene, the Radeon RX 6000 XT with the Cycles renderer itself is connected to a GeForce RTX 3070 with CUDA. However, that is only half the story. Because the GeForce is with the preset tile size of 32 × 32 Pixels already close to their performance limit, while the Radeon is catching up with larger tiles.
Briefly to explain: When GPU rendering in Blender, the entire graphics chip works on a single tile – in the example mentioned, in the case of the GeForce 5888 shaders -Units of only 1024 pixels. With CPUs, however, each CPU thread works on its own tile. The art is to ensure the utilization of the units and at the same time to get the data from the graphics memory or the caches fast enough.
Accordingly, the Radeon RX 6800 also increases significantly if the tile size under “Performance” – “Tile Size” is set to about 64 × 64 quadrupled. However, the Radeon VII, which is equipped with a generous memory transfer rate, has already achieved this, so it is not a direct effect of the Infinity Cache.
The calculation time of the classroom scene drops on our old test system, which we only have used for these benchmarks from around 136 seconds to just more 84 seconds – a time saving of nearly 40 Percent. With the GeForce RTX 3000 this was only around 2 percent, so then (without the faster Optix Renderer) falls behind the Radeon. When changing to 128 × 100 Pixel-sized tiles save another 8 percent on the Radeon and save the scene is after just under 77 Seconds in the can – with the GeForce RTX 3070 that doesn’t make any further profit, the time stays at around 90 seconds.
Benchmark experiments II: Synthetic benchmarks Since AMD also tinkered with the innards of the chip, we ran some synthetic benchmarks to feel the effects on the tooth. The raster output stages have now been doubled from four to eight pixels per unit. The throughput for early depth tests, which is also important for shadow mapping, was above the level of a Radeon RX 6800 XT GeForce RTX 3090. In most cases, the texture performance comes with Nvidia’s top model and is with four-channel FP 32 – Textures even clearly in front of it and compared to the RX 5700 XT more than doubled.
The throughput of the shader cores is both with the FP 31 – calculations as well as clock-adjusted for the special functions 100 Percent higher than with the RX 5700 XT and also above that of a GeForce RTX 2080 TI. Nvidia’s RTX 2080 series still has a big lead here, but often cannot achieve this in games Implement fps performance.
The triangular throughput has also increased in all disciplines of our measurement compared to the RX 5700 XT significantly increased – even in some cases where the memory speed plays a role, it is more than twice as fast.
New features in the Compute Unit at Big Navi
(Image: AMD)
(csp)