Specialised Chips are Here. A New Battle has Just Begun

Google’s TPU supercomputer shows the way forward for AI compute

Published on April 6, 2023

by Anirudh VK

Listen to this story

Recently, researchers at Google released a paper summarising their experience of building and using an AI supercomputer powered by the latest iteration of the company’s tensor processing units (TPUs). According to the researchers, this supercomputer performs better than some of its competitors while being more energy efficient than NVIDIA’s A100 GPUs.

This finding is interesting as it suggests a new trend in the field of AI compute. Many companies, including AWS, Google, Intel, and NVIDIA themselves have begun moving towards creating specialised chips for running AI workloads. With Google’s findings, it seems that the greener path towards AI compute lies in purpose-built chips.

Specialised supercomputers

The industry as a whole seems to be moving towards building supercomputers made for AI compute, with the latest example being Cerebras. In November last year, this budding AI compute startup revealed its new AI supercomputer termed Andromeda.

According to the company, this supercomputer is able to deliver over 1 exaflop of AI compute with over 13.5 million cores at its disposal. Interestingly, taking this approach has allowed Andromeda to be one of the only supercomputers to demonstrate “near-perfect linear scaling on LLM workloads”.

Usually, operating a supercomputer and running tasks on it results in some of the compute being wasted due to scaling costs. Uniquely, training time reduces perfectly proportional to the number of chips being used due to the specialised nature of the chips in the supercomputer.

This is just one of the advantages that can be seen by using specialised compute. Until now, companies have been using NVIDIA’s DGX platform to get a huge amount of dense compute for a relatively low price. For example, Meta’s AI research supercluster, or RSC was custom-built using NVIDIA’s DGX A100 systems. Apart from setting up a server farm to train AI compute, DGX has quickly become the go-to option for many companies willing to get into AI.

However, due to the unspecialised nature of these chips, there are no real benefits seen in DGX. On the other hand, Google’s TPUs have proven to be more efficient than NVIDIA’s A100 chips, with the drawback of being specialised for AI workloads.

TPUs vs GPUs

The latest generation of Google TPUs, currently in the 4th iteration, brings in over 2.1x performance and 2.7x more efficiency over TPU v3. Reportedly, this system has been in operation since 2020 and was one of the main training centres for Midjourney. The system is also four times larger than its previous iteration, deploying up to 4096 chips connected with a unique interconnect termed optical circuit switches (OSC).

When building the system and scaling it up to handle modern deep neural network workloads, Google engineers encountered reliability and scalability issues. To solve this, they built new features into TPU v4 to allow it to scale massively, with the new OSC technology being at the forefront of the innovation.

Moreover, to make the TPUs even more energy efficient, Google embedded dataflow processors known as Sparsecores, which serve to accelerate embedding-related workflows by 5x–7x while using only 5% of die area and power.

When training an algorithm on a large amount of distributed computing resources, individual nodes must be configured to interface with each other in a way that allows training to occur. The optical circuit switches allow Google’s engineers to reconfigure the structure of the system on the fly, allowing them to bypass the limitations that come with scaling systems.

When compared to similarly sized systems, the TPU supercomputer is 4.3x to 4.5x faster than the Graphcore IPU Bow (an integrated processing unit for AI workloads) and 1.3x to 1.9x more efficient than NVIDIA’s A100 chips. However, a point to note is that these tests were conducted on the last generation of NVIDIA chips due to the outdated nature of the supercomputer.

While it seems that Google’s TPUs have taken the lead when it comes to energy efficiency, NVIDIA’s new H100 chips are yet to be tested against these specialised hardware. What’s more, the green giant just released a whole new set of chips with added hardware components made for specific AI workloads.

At GTC 2023, NVIDIA released the L4, the L40, the H100 NVL and the Grace Hopper accelerator chip, each made for specific AI workloads. Only time will tell if TPU v4 is better than current NVIDIA chips, but for now, we do know that the battle for specialised AI hardware is just about beginning.

PS: The story was written using a keyboard.

Access all our open Survey & Awards Nomination forms in one place

Anirudh VK

I am an AI enthusiast and love keeping up with the latest events in the space. I love video games and pizza.

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

The Impact of Lok Sabha Election on India’s AI Progress

Vidyashree Srinivas

The BJP aims to safeguard citizen safety and privacy, leaning towards regulation, while the Congress views AI advancements as an opportunity to create jobs.