Nvidia introduces 4nm GPU H100 with 80 billion transistors, PCIe 5.0 and HBM3
Nvidia has announced its H100 accelerator for data centers and hpc. This PCIe 5.0 GPU is produced on TSMC’s 4N node and features HBM3 memory with a bandwidth of up to 3TB/s. The Nvidia H100 succeeds the current A100 GPU.
The Nvidia H100 GPU is based on Hopper, a GPU architecture that is aimed at data centers and follows HPC and Ampere in that area. The H100 consists of 80 billion transistors and is produced on TSMC’s 4N process. That is a modified version of TSMC’s N4 process, specifically for Nvidia. The Nvidia H100 is again a monolithic chip, just like the A100. Initially, it was rumored that Nvidia would introduce a data center GPU with a multi-chip design, consisting of multiple dies. AMD did that last year with its Instinct MI200 series.
The current A100 is produced on a modified version of TSMC’s 7nm process and consists of 54.2 billion transistors. Nvidia claims that the H100 offers up to three times more computing power than the A100 in fp16, tf32 and fp64 and six times more in fp8. The H100 GPU is 814mm². That is slightly smaller than the current GA100, which has a die surface of 826mm².
The Nvidia H100 SXM5 (left) and H100 PCIe
HBM3 for SXM5 model, HBM2e for PCIe variant
Nvidia introduces two variants of the H100. The focus seems to be on an SXM5 variant, which has 128 streaming multiprocessors for a total of 16,896 fp32 CUDA cores. That card gets 50MB L2 cache and 80GB HBM3 memory on a 5120bit memory bus, for a maximum memory bandwidth of about 3TB/s. This card gets 50MB L2 cache and a tdp of 700W. Users can combine multiple of these H100 SXM GPUs via Nvidia’s NVLink interconnect. According to Nvidia, the fourth generation offers bandwidths of up to 900GB/s.
There will also be a PCIe 5.0 x16 variant for more standard servers. That model gets 114 SMS and 14,592 CUDA cores. Furthermore, the PCIe variant gets 40MB L2 cache, just like the current A100. It is striking that the PCIe variant still has slower HBM2e memory, according to the Hopper whitepaper that Nvidia published on Tuesday. At 80GB, the amount is equal to the SXM model. The PCIe variant gets a tdp of 350W.
The Nvidia Hopper H100 GPU
New Hopper Features: Transformer Engine, DPX Instruction Set
Also adapted the Hopper architecture itself compared to Ampere. Hopper and the H100 feature a new transformer engine, combining a new kind of Tensor core with a software stack to process fp8 and fp16 formats for transformer network training. Those are a kind of deep learning models.
For cloud computing, the H100 can be partitioned into up to seven instances. Ampere was already able to do that, but with Hopper they are completely isolated from each other. In addition, Hopper gets a new DPX instruction set intended for dynamic programming. Nvidia claims that the H100 performs up to seven times better than an A100 without DPX in this use case.
DGX Systems and SuperPods
Nvidia also introduces a DGX H100 system with eight H100 GPUs. With its eight H100 GPUs, such a system has 640GB HBM3 memory with a total bandwidth of 24TB/s. Users can combine up to 32 of those DGX systems over NVLink connections. Nvidia calls it a DGX SuperPod. Such a system with 32 nodes must offer an exaflop of computing power, claims Nvidia. This refers to fp8 computing power. The company is building an EOS supercomputer, consisting of 18 DGX SuperPods with a total of 4608 H100 GPUs.
Nvidia has not yet announced what the H100 GPU will cost. It is also not yet clear what the H100 DGX systems or DGX H100 SuperPods will cost. Hopper is also not expected to be used in consumer GPUs. Later this year, Nvidia is rumored to introduce its Lovelace architecture for new GeForce RTX graphics cards.
H100, TSMC 4nm | GA100, TSMC 7nm | GV100, TSMC 12nm |
814 mm² | 826mm² | 815mm² |
80 billion | 54 billion | 21.1 billion |
SXM: 16,896 PCIe: 14.592 |
6912 | 5120 |
SXM: 528 PCIe: 456 |
432 | 640 |
SXM: 80GB HBM3 PCIe: 80GB HBM2e |
40GB / 80GB HBM2e | 16GB / 32GB HBM2 |
SXM: 60Tflops PCIe: 48Tflops |
19.5Tflops | 15.7Tflops |
SXM: 30Tflops PCIe: 24Tflops |
9.7Tflops | 7.8Tflops |
SXM: 1000Tflops PCIe: 800Tflops |
312Tflops | 125Tflops |
SXM: 500Tflops PCIe: 400Tflops |
156Tflops | N/A |
SXM: 60Tflops PCIe: 48Tflops |
19.5Tflops | N/A |
SXM: 2000Tops PCIe: 1600Tops |
624Tops | N/A |
Up to 700W | Up to 400W | Up to 300W |
SXM5/PCIe 5.0 | SXM4/PCIe 4.0 | SXM2 / PCIe 3.0 |