Specialised Chips are Here. A New Battle has Just Begun

Google’s TPU supercomputer shows the way forward for AI compute
Listen to this story

Recently, researchers at Google released a paper summarising their experience of building and using an AI supercomputer powered by the latest iteration of the company’s tensor processing units (TPUs). According to the researchers, this supercomputer performs better than some of its competitors while being more energy efficient than NVIDIA’s A100 GPUs.

This finding is interesting as it suggests a new trend in the field of AI compute. Many companies, including AWS, Google, Intel, and NVIDIA themselves have begun moving towards creating specialised chips for running AI workloads. With Google’s findings, it seems that the greener path towards AI compute lies in purpose-built chips.

Specialised supercomputers

The industry as a whole seems to be moving towards building supercomputers made for AI compute, with the latest example being Cerebras. In November last year, this budding AI compute startup revealed its new AI supercomputer termed Andromeda

According to the company, this supercomputer is able to deliver over 1 exaflop of AI compute with over 13.5 million cores at its disposal. Interestingly, taking this approach has allowed Andromeda to be one of the only supercomputers to demonstrate “near-perfect linear scaling on LLM workloads”.

Usually, operating a supercomputer and running tasks on it results in some of the compute being wasted due to scaling costs. Uniquely, training time reduces perfectly proportional to the number of chips being used due to the specialised nature of the chips in the supercomputer.

This is just one of the advantages that can be seen by using specialised compute. Until now, companies have been using NVIDIA’s DGX platform to get a huge amount of dense compute for a relatively low price. For example, Meta’s AI research supercluster, or RSC was custom-built using NVIDIA’s DGX A100 systems. Apart from setting up a server farm to train AI compute, DGX has quickly become the go-to option for many companies willing to get into AI. 

However, due to the unspecialised nature of these chips, there are no real benefits seen in DGX. On the other hand, Google’s TPUs have proven to be more efficient than NVIDIA’s A100 chips, with the drawback of being specialised for AI workloads. 

TPUs vs GPUs

The latest generation of Google TPUs, currently in the 4th iteration, brings in over 2.1x performance and 2.7x more efficiency over TPU v3. Reportedly, this system has been in operation since 2020 and was one of the main training centres for Midjourney.  The system is also four times larger than its previous iteration, deploying up to 4096 chips connected with a unique interconnect termed optical circuit switches (OSC). 

When building the system and scaling it up to handle modern deep neural network workloads, Google engineers encountered reliability and scalability issues. To solve this, they built new features into TPU v4 to allow it to scale massively, with the new OSC technology being at the forefront of the innovation. 

Moreover, to make the TPUs even more energy efficient, Google embedded dataflow processors known as Sparsecores, which serve to accelerate embedding-related workflows by 5x–7x while using only 5% of die area and power.

When training an algorithm on a large amount of distributed computing resources, individual nodes must be configured to interface with each other in a way that allows training to occur. The optical circuit switches allow Google’s engineers to reconfigure the structure of the system on the fly, allowing them to bypass the limitations that come with scaling systems.

When compared to similarly sized systems, the TPU supercomputer is 4.3x to 4.5x faster than the Graphcore IPU Bow (an integrated processing unit for AI workloads) and 1.3x to 1.9x more efficient than NVIDIA’s A100 chips. However, a point to note is that these tests were conducted on the last generation of NVIDIA chips due to the outdated nature of the supercomputer. 

While it seems that Google’s TPUs have taken the lead when it comes to energy efficiency, NVIDIA’s new H100 chips are yet to be tested against these specialised hardware. What’s more, the green giant just released a whole new set of chips with added hardware components made for specific AI workloads. 

At GTC 2023, NVIDIA released the L4, the L40, the H100 NVL and the Grace Hopper accelerator chip, each made for specific AI workloads. Only time will tell if TPU v4 is better than current NVIDIA chips, but for now, we do know that the battle for specialised AI hardware is just about beginning.

Download our Mobile App

Anirudh VK
I am an AI enthusiast and love keeping up with the latest events in the space. I love video games and pizza.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Bangalore

Future Ready | Lead the AI Era Summit

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

20th June | Bangalore

Women in Data Science (WiDS) by Intuit India

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox