MITB Banner

Forget ChatGPT vs Bard, The Real Battle is GPUs vs TPUs

While ChatGPT and Bard fight for their tech giant overlords, GPUs and TPUs work overtime to keep them running

Share

Listen to this story

As ChatGPT and Bard slug it out, two behemoths work in the shadows to keep them running – NVIDIA’s CUDA-powered GPUs (Graphic Processing Units) and Google’s custom-built TPUs (Tensor Processing Units). In other words, it’s no longer about ChatGPT vs Bard, but TPU vs GPU, and how effectively they are able to do matrix multiplication. 

Why models should be optimised

Training costs are one of the biggest barriers to creating a large model. AI compute is generally calculated in compute/GPU hours, which represents the time it takes for a model to be trained. Another method, termed petaflops/s-day is also used. 1 pf-day consists of compute nodes performing 10^15 (or a petaflop) of operations per second for a whole day. 

For context, the largest version of GPT-3 consisting of 175 billion parameters took 3640 pf-days to train. This means that the GPUs needed to conduct a petaflop of operations per day for almost 10 years! However, with the aid of parallelisation and a little help from Azure’s supercomputing cluster, predictions have placed the time to train this model at about 34 days. In contrast, training the same model on a single NVIDIA V100 GPU would take over 355 years

Even though cutting down training time to 34 days for such a big model seems like a behemoth task, the problem was solved using brute force. Estimates place the training cost of this model at about $5 million for compute alone. This estimate comes from cloud pricing of these GPUs, which generally cost $1.5 per hour per GPU. 

The single largest compute cost for training LLMs comes from matrix multiplication. In AI, researchers have solved the complex mathematical problem of matrix multiplication by creating a three-dimensional array known as a tensor. Calculating these tensors and using the output to feed into an algorithm constitutes a large part of training, and also consumes a lot of compute due to the nature of the task.

Matrix multiplication is one of the fundamental concepts of mathematics, but is also one of its hardest problems to solve. While researchers have found methods to effectively multiply small matrices of 4×4 size, larger matrices such as those seen in AI require complex mathematical proofs for efficient multiplication. 

Solving for efficient matrix multiplication can cut down on the amount of compute resources required for training and inferencing tasks. While other methods like quantisation and model shrinking have also proven to cut down on compute, they sacrifice on accuracy. For a tech giant creating a state-of-the-art model, they’d rather spend the $5 million, if there’s no way to cut costs. However, DeepMind found a way, and the ball is now in Google’s court.

GPUs vs TPUs

NVIDIA’s GPUs were well-suited to matrix multiplication tasks due to their hardware architecture, as they were able to effectively parallelise across multiple CUDA cores. Training models on GPUs became the status quo for deep learning in 2012, and the industry has never looked back. 

Building on this, Google also launched the first version of the tensor processing unit (TPU) in 2016, which contains custom ASICs (application-specific integrated circuits) optimised for tensor calculations. In addition to this optimisation, TPUs also work extremely well with Google’s TensorFlow framework; the tool of choice for machine learning engineers at the company. This gives them an edge in other AI compute tasks beyond matrix multiplication, and even allows them to speed up fine-tuning and inferencing tasks. 

In addition to this, researchers from Google’s DeepMind also found a method to discover better algorithms for matrix multiplication. Termed AlphaTensor, this AI system works to optimise matrix multiplication for other algorithms by providing efficient formulae for the process. 

While Google’s tech stack and emerging methods for optimising AI compute have yielded good results, competitors like Microsoft have been capitalising on NVIDIA’s industry entrenchment to eke out a competitive advantage. However, as today’s GPUs get more resource heavy (in terms of power draw and cooling), enterprises are looking at alternatives. Moreover, AI needs compute to get better, with this study predicting that AI compute needs will double every 3.4 months. 

To serve this need, NVIDIA has entered into the AI accelerator space. While competitors like AMD and Intel have already created competing products, NVIDIA’s industry knowhow and their iron-fisted hold over CUDA has netted them an advantage yet again. With the launch of the NVIDIA DGX, companies were able to deploy a packaged hardware and software solution for any AI task: something that competitors still cannot offer due to a lack of intellectual property. 

While other competitors like AWS have also launched AI accelerators (see Graviton and Inferentia), the battlefield seems to be dominated by GPUs and TPUs for the time being. AI accelerators might figure into enterprises’ next upgrade cycle, but existing solutions provide USPs that cannot be replaced so easily. 

NVIDIA GPUs’ general purpose nature allows it to accelerate a wide variety of workloads, and Google TPUs’ focused nature allows it to offer the best possible compute for those working in Google’s ecosystem of AI tools. A paradigm shift in this field might lead to one winning over the other, but the death of Moore’s Law says that we will have to wait awhile before the war is won. 

Share
Picture of Anirudh VK

Anirudh VK

I am an AI enthusiast and love keeping up with the latest events in the space. I love video games and pizza.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.