Google builds 11.5 petaflops machine learning supercomputers

Spread the love

Google is developing supercomputers optimized for machine learning that offer 11.5 petaflops of computing power. These TPU Pods are made up of second-generation tensor processing units.

A TPU Pod consists of 64 of the new tpus. A single tpu offers a processing power of 180 teraflops and is equipped with fast interconnects for cooperation with the other chips. The system acts as an accelerator for machine learning applications, specifically training a single comprehensive machine learning model.

According to Google, training a new model for translations on 32 of “the best commercially available GPUs” takes a full day. This same model can be processed on one eighth of a TPU Pod in an afternoon. Google gives few details about the tpus, but because they are used for model training, Top500 considers it likely that the chips for parallel matrix multiplication support 16-bit and 32-bit floating point operations.

The first generation of tpu’s, which Google introduced last year, were still 8-bit integer chips, which are mainly used for running already trained models. With the new chip, Google already seems to have an alternative to Nvidia’s Volta GPU, which has Tensor Cores on board for training models for machine learning. That GPU provides a computing power of 120 teraflops for mixed precision 16-bit and 32-bit floating point operations. Google may have already surpassed Nvidia with its optimized accelerator in terms of raw computing power, although the actual ratio also depends on other specifications.

Google not only uses its new TPUs in the TPU Pods for its own calculations, but also integrates them into the Google Compute Engine as Cloud TPUs. This provides customers with the capabilities and can connect them to virtual machines and other hardware such as Intel Skylake CPUs and Nvidia GPUs. Users can program the tpus with Google’s TensorFlow software.

You might also like