Nvidia Announces Tesla P40 and P4 Accelerators
Nvidia has announced two new Tesla cards with Pascal GPUs. The Tesla P40 is based on a GP102 GPU with 3840 cuda cores. That’s more than the Tesla P100, Nvidia’s most powerful accelerator, but the P40 doesn’t include hbm2 memory.
The Tesla P40 and P4 are the successors of the M40 and M4, which still contain Maxwell GPUs. Nvidia hasn’t announced pricing yet, but the P40 and P4 cards will be more affordable than the high-end P100 once they become available in October and November respectively.
Nvidia focuses with the cards on inferencing applications, where deep neural networking is used to recognize speech, images and text. Nvidia therefore links the introduction of the cards to two software tools. TensorRT is a library for optimizing deep learning models, while the DeepStream SDK should accelerate decoding and analysis of video streams by artificial intelligence.
It is striking that Nvidia has activated 3840 stream processors with the P40, while this is 3584 with the ‘flagship’ P100. However, that card has a considerably wider memory interface in combination with fast HBM2 memory. As a result, the memory bandwidth of the P100 compared to the P40 is a lot higher: 720GB/s compared to 346GB/s. With the Tesla P100, Nvidia is focusing more on training deep learning networks, thanks to its significant fp16 computing power of 21.2 teraflops.
Nvidia Tesla | ||||||
Tesla P100 | Tesla P40 | Tesla P4 | Tesla M40 | Tesla M4 | ||
Cudacores | 3584 | 3840 | 2560 | 3072 | 1024 | |
clock speed | 1328MHZ | 1303MHz | 810MHz | 948MHz | 872MHz | |
boostkloksnl. | 1480MHz | 1531MHz | 1063MHz | 1114MHz | 1072MHz | |
Memory interface | 4096bit hbm2 | 384bit gddr5 | 256bit gddr5 | 384bit gddr5 | 128bit gddr5 | |
memory amount | 16GB | 24GB | 8GB | 12GB/24GB | 4GB | |
Memory Bandbr. | 720GB/s | 346GB/s | 192GB/s | 288GB/s | 88GB/s | |
Single Precision (FP32) | 9.3Tflops | 12Tflops | 5.5Tflops | 7Tflops | 2.2Tflops | |
Transistors | 15.3 billion | 12 billion | 7.2 billion | 8 billion | 2.94 billion | |
tdp | 300W | 250W | 50W-75W | 250W | 50W-75W | |
Manufacturing Process | TSMC 16nm | TSMC 16nm | TSMC 16nm | TSMC 28nm | TSMC 28nm | |
GPU | GP100 | GP102 | GP104 | GM200 | GM206 |