AMD releases Instinct MI200 accelerators with multi-chip design and 128GB HBM2e
AMD announces its Instinct MI200 series of accelerators for data centers and supercomputers. These chips are based on the CDNA 2 architecture and have 128GB of HBM2e memory. They are also the first GPUs with a multi-chip module design.
AMD will initially release two accelerators in its MI200 series, the company announced Monday evening during its data center livestream. The company comes with an AMD Instinct MI250x and MI250, which differ in the number of compute units. The chips use the CDNA 2 architecture, which is intended for use in data centers and supercomputers. AMD will later also release an MI210 GPU with PCIe form factor, although no specifications are known yet.
The Instinct MI250 accelerators are the first GPUs with a so-called mcm design, or multi-chip module. Multiple compute dies are combined on a single chip. The MI250x and MI250 each feature two of these CDNA2 dies, which are produced on TSMC’s 6nm node.
MI200: 128GB HBM2e and up to 220 compute units
The MI250x features two compute modules, each containing 110 compute units, for a total of 220 cu’s. The MI250 in turn gets two modules with 104 cu’s, which amounts to 208 compute units in total. This is complemented by 128GB HBM2e-ecc memory on both chips. The maximum clock speed of both variants is 1.7GHz.
The accelerators will also receive four HBM2e controllers each with a total memory bandwidth of 3.2TB/s. The MI200 GPUs will also receive eight third-generation Infinity Fabric links, which are used to interconnect chips and enable memory coherence between different CPUs and GPUs.
The MI200 GPUs also feature AMD’s second-generation Matrix cores, which are intended to perform fp64 and fp32 matrix calculations for hpc and AI applications. According to the manufacturer, these Matrix cores are up to four times faster than AMD’s previous Instinct MI100 accelerators. The MI250x gets a total of 880 Matrix cores and the MI250 has that number at 832.
AMD Instinct MI250x and Instinct MI250 | ||||||||
Fashion model | Compute units | Stream Processors | Memory | Bandwidth | FP64/FP32 vector | FP64/FP32 matrix | FP16/BF16 | form factor |
AMD Instinct MI250x | 220 | 14,080 | 128GB HBM2e (ecc) | 3.2TB/s | Up to 47.9Tflops (peak) | Up to 95.7Tflops (peak) | Up to 383Tflops (peak) | OAM |
AMD Instinct MI250 | 208 | 13.312 | 128GB HBM2e (ecc) | 3.2TB/s | Up to 45.3Tflops (peak) | Up to 90.5Tflops (peak) | Up to 362.1Tflops (peak) | OAM |
Peak performance: ‘up to 47.9Tflops at fp64’
According to AMD, the MI200 series is significantly faster than the first generation Instinct MI100 GPUs, and also faster than the competition. The MI250x achieves FP64 Vector performance up to 47.9Tflops. For example, Nvidia’s A100 would achieve a processing power of up to 9.7Tflops in such FP64 calculations. The company also talks about FP64 and FP32 Matrix peak performance of up to 95.7Tflop/s and FP16 and BF16 performance of up to 383Tflops.
The first MI200 accelerators are already being delivered to the US Department of Energy’s Oak Ridge National Laboratory. The chips will be used in the Frontier exascale system. The supercomputer is expected to achieve peak performance of ‘more than 1.5 exaflops’. Thomas Zacharia of Oak Ridge National Laboratory says a single MI200 GPU is more powerful than a full node from the Summit supercomputer. He also mentions that Frontier is currently being installed and will be online ‘early next year’.
AMD announced its Instinct MI200 series on Monday evening, along with its new EPYC Milan-X server processors with 3D V cache. The company also previewed its Zen 4 architecture on Monday, which will be based on TSMC’s 5nm node and available in EPYC CPUs with different core types.