Nvidia Announces Ampere Architecture for GeForce and Tesla
Nvidia has announced the A100, the first GPU that the company produces on its new Ampere architecture. The chip is the first to come to a DGX system with eight A100 GPUs. GeForce cards also get a GPU based on Ampere.
Ampere should eventually replace not only Volta but Turing as well, serving as a single platform for both the enterprise and consumer cards, Nvidia CEO Jensen Huang said ahead of the announcement, according to Marketwatch. Volta is the architecture of the Tesla V100 accelerator GPU; GeForce cards based on Volta never appeared. The GeForce 20 cards are based on the Turing architecture. About Ampere for GeForce, Huang said nothing further, only that there will be a lot of overlap with Ampere for Tesla but with other configurations.
The first GPU based on Ampere is the Tesla A100 and it is intended for high performance computing, artificial intelligence and other data center applications. This chip allows Nvidia to produce at 7nm and contains 54 billion transistors. The surface of the die is 826mm². This means that the number of transistors has increased considerably compared to the GV100 GPU of the Tesla V100, which has 21.1 billion transistors, while the chip surface is not much larger: the GV100 measures 815 mm².
The number of cudacores of the A100 has increased from 5120 to 6912 over the V100. The number of tensor cores has decreased from 640 to 432, but these are third-generation tensor cores that are improved over the previous generation, according to Nvidia . In fp64 calculations, these offer more than twice the performance. With fp32 calculations, that would even be a tenfold increase, but Nvidia here compares calculations based on its own tensor float 32 with floating point 32 calculations. According to Nvidia, “tf32 works just like fp32 without having to change any code.”
The memory bus of the A100 is 5120 bits wide and the maximum memory bandwidth is 1555GB/s. The accelerator has 40MB of on-chip level cache, seven times more than the previous generation, and can have 40GB of vram spread over six hbm2e stacks.
Also new is the presence of multi instance GPU for virtualization. Each A100 can thus be divided into up to seven instances, each of which can work isolated and with its own memory for different users. In addition, there is support for a new nv-link interconnect to connect GPUs in a server. It offers a GPU-to-GPU bandwidth of 600GB/s.
Nvidia immediately announced a first system with the A100: the DGX A100. This contains eight A100 accelerators with a total of 320GB memory and this system is also equipped with 200Gbit/s interconnects from Mellanox acquired by Nvidia. It is striking that Nvidia has made the switch from Intel to AMD: the previous DGX-2 had two Intel Xeon Platinum 8168 processors. The manufacturer plans to offer the DGX A100 bundled in a cluster of 140 systems in the form of the so-called DGX SuperPOD.
7nm GA100 | 12nm GV100 | 12nm GV100 | 16nm GP100 |
826mm² |
815mm² |
815mm² |
610mm² |
54 billion |
21.1 billion |
21.1 billion |
15.3 billion |
108 |
80 |
80 |
56 |
6912 |
5120 |
5120 |
3840 |
432 |
640 |
640 |
AFTER |
78 tflops |
32.8 tflops |
31.4 tflops |
21.2 tflops |
19.5 tflops |
16.4 tflops |
15.7 tflops |
10.6 tflops |
9.7 tflops |
8.2 tflops |
7.8 tflops |
5.3 tflops |
~1410MHz |
~1601MHz |
~1533MHz |
~1480MHz |
1555GB/s |
1134GB/s |
900GB/s |
721GB/s |
2430MHz |
2214MHz |
1760MHz |
1408MHz |
40GB HBM2e |
32GB HBM2 |
16GB / 32GB HBM2 |
16GB HBM2 |
5120-bit |
4096-bit |
4096-bit |
4096-bit |
400 |
250W |
300W |
300W |
SXM4/pci-e 4.0 | pci e 3.0 | SXM2/pci-e 3.0 | SXM |