Nvidia introduces DGX GH200 supercomputer with 256 Grace CPUs and H100 GPUs

Spread the love

Nvidia reports at Computex that its Grace Hopper ‘superchips’ for data centers are now in full production. The company is also showing a DGX GH200 supercomputer, which consists of such chips and has 144TB of shared memory.

The Nvidia DGX GH200 has 256 Grace Hopper Super Chips. These are chips that combine an H100 GPU with an Nvidia Grace CPU on a single module of approximately 200 billion transistors. Each Grace Hopper superchip has 72 Neoverse V2 CPU cores and 16,896 CUDA cores, 96GB of HBM3 memory and 480GB of Lpddr5 memory. This gives the entire DGX GH200 system a total of 18,432 CPU cores, more than 4.3 million CUDA cores and 144TB of shared memory.

Nvidia’s Grace Hopper superchip

The Grace Hopper superchips are interconnected with Nvidia’s own NVLink interconnect, which allows all GPUs to access each other’s memory. The system thus functions as a single GPU. 96 L1 switches and 32 L2 switches are used for that NVLink interconnect, reports the manufacturer. Nvidia says the GPU-to-GPU bandwidth is 900GB/s. To illustrate, a PCIe 5.0 x16 GPU has a bandwidth of approximately 63GB/s.

Nvidia says the DGX H200 is aimed at large AI workloads. According to the manufacturer, the system has ‘1 exaflop’ of FP8 computing power. The manufacturer does not share performance figures for other computing formats. The company does share some benchmarks comparing the DGX GH200 to a current DGX A100 system, reports Tom’s Hardware. The new DGX GH200 system is 2.2 to 6.3x faster. However, current DGX A100 systems have 32 to 256 A100 GPUs, depending on the benchmark. Nvidia’s DGX GH200 system has 256 GPUs in all benchmarks shown.

According to the manufacturer, Google Cloud, Meta and Microsoft will be the first to have access to the DGX GH200 supercomputer. Eventually, Nvidia wants to make the blueprint for the system available to cloud providers and other hyperscalers. The DGX GH200 should be released at the end of this year. The manufacturer does not share price.

Nvidia will build its own Helios supercomputer that combines four DGX GH200 systems. These four systems, which together have 1024 Grace Hopper superchips, are connected to Nvidia’s Quantum-2 InfiniBand with a bandwidth of 400Gbit/s. That system should come online at the end of this year.

Source: Nvidia, Tom’s Hardware

You might also like