Nvidia introduces 4nm GPU H100 with 80 billion transistors, PCIe 5.0 and HBM3

Spread the love

Nvidia has announced its H100 accelerator for data centers and hpc. This PCIe 5.0 GPU is produced on TSMC’s 4N node and features HBM3 memory with a bandwidth of up to 3TB/s. The Nvidia H100 succeeds the current A100 GPU.

The Nvidia H100 GPU is based on Hopper, a GPU architecture that is aimed at data centers and follows HPC and Ampere in that area. The H100 consists of 80 billion transistors and is produced on TSMC’s 4N process. That is a modified version of TSMC’s N4 process, specifically for Nvidia. The Nvidia H100 is again a monolithic chip, just like the A100. Initially, it was rumored that Nvidia would introduce a data center GPU with a multi-chip design, consisting of multiple dies. AMD did that last year with its Instinct MI200 series.

The current A100 is produced on a modified version of TSMC’s 7nm process and consists of 54.2 billion transistors. Nvidia claims that the H100 offers up to three times more computing power than the A100 in fp16, tf32 and fp64 and six times more in fp8. The H100 GPU is 814mm². That is slightly smaller than the current GA100, which has a die surface of 826mm².

The Nvidia H100 SXM5 (left) and H100 PCIe

HBM3 for SXM5 model, HBM2e for PCIe variant

Nvidia introduces two variants of the H100. The focus seems to be on an SXM5 variant, which has 128 streaming multiprocessors for a total of 16,896 fp32 CUDA cores. That card gets 50MB L2 cache and 80GB HBM3 memory on a 5120bit memory bus, for a maximum memory bandwidth of about 3TB/s. This card gets 50MB L2 cache and a tdp of 700W. Users can combine multiple of these H100 SXM GPUs via Nvidia’s NVLink interconnect. According to Nvidia, the fourth generation offers bandwidths of up to 900GB/s.

There will also be a PCIe 5.0 x16 variant for more standard servers. That model gets 114 SMS and 14,592 CUDA cores. Furthermore, the PCIe variant gets 40MB L2 cache, just like the current A100. It is striking that the PCIe variant still has slower HBM2e memory, according to the Hopper whitepaper that Nvidia published on Tuesday. At 80GB, the amount is equal to the SXM model. The PCIe variant gets a tdp of 350W.

The Nvidia Hopper H100 GPU

New Hopper Features: Transformer Engine, DPX Instruction Set

Also adapted the Hopper architecture itself compared to Ampere. Hopper and the H100 feature a new transformer engine, combining a new kind of Tensor core with a software stack to process fp8 and fp16 formats for transformer network training. Those are a kind of deep learning models.

For cloud computing, the H100 can be partitioned into up to seven instances. Ampere was already able to do that, but with Hopper they are completely isolated from each other. In addition, Hopper gets a new DPX instruction set intended for dynamic programming. Nvidia claims that the H100 performs up to seven times better than an A100 without DPX in this use case.

DGX Systems and SuperPods

Nvidia also introduces a DGX H100 system with eight H100 GPUs. With its eight H100 GPUs, such a system has 640GB HBM3 memory with a total bandwidth of 24TB/s. Users can combine up to 32 of those DGX systems over NVLink connections. Nvidia calls it a DGX SuperPod. Such a system with 32 nodes must offer an exaflop of computing power, claims Nvidia. This refers to fp8 computing power. The company is building an EOS supercomputer, consisting of 18 DGX SuperPods with a total of 4608 H100 GPUs.

Nvidia has not yet announced what the H100 GPU will cost. It is also not yet clear what the H100 DGX systems or DGX H100 SuperPods will cost. Hopper is also not expected to be used in consumer GPUs. Later this year, Nvidia is rumored to introduce its Lovelace architecture for new GeForce RTX graphics cards.

Nvidia Hopper alongside previous Nvidia HPC GPUs

Architecture Hopper Ampere Volta

GPU

Die surface

Transistors

CUDA cores (fp32)

tensor cores

Memory

FP32 Vector

FP64 Vector

FP16 Tensor

TF32 Tensor

FP64 Tensor

INT8 Tensor

tdp

form factor

H100, TSMC 4nm GA100, TSMC 7nm GV100, TSMC 12nm
814 mm² 826mm² 815mm²
80 billion 54 billion 21.1 billion
SXM: 16,896
PCIe: 14.592
6912 5120
SXM: 528
PCIe: 456
432 640
SXM: 80GB HBM3
PCIe: 80GB HBM2e
40GB / 80GB HBM2e 16GB / 32GB HBM2
SXM: 60Tflops
PCIe: 48Tflops
19.5Tflops 15.7Tflops
SXM: 30Tflops
PCIe: 24Tflops
9.7Tflops 7.8Tflops
SXM: 1000Tflops
PCIe: 800Tflops
312Tflops 125Tflops
SXM: 500Tflops
PCIe: 400Tflops
156Tflops N/A
SXM: 60Tflops
PCIe: 48Tflops
19.5Tflops N/A
SXM: 2000Tops
PCIe: 1600Tops
624Tops N/A
Up to 700W Up to 400W Up to 300W
SXM5/PCIe 5.0 SXM4/PCIe 4.0 SXM2 / PCIe 3.0
You might also like
Exit mobile version