# Forget ChatGPT vs Bard, The Real Battle is GPUs vs TPUs

While ChatGPT and Bard fight for their tech giant overlords, GPUs and TPUs work overtime to keep them running
 Listen to this story

As ChatGPT and Bard slug it out, two behemoths work in the shadows to keep them running – NVIDIA’s CUDA-powered GPUs (Graphic Processing Units) and Google’s custom-built TPUs (Tensor Processing Units). In other words, it’s no longer about ChatGPT vs Bard, but TPU vs GPU, and how effectively they are able to do matrix multiplication.

### Why models should be optimised

Training costs are one of the biggest barriers to creating a large model. AI compute is generally calculated in compute/GPU hours, which represents the time it takes for a model to be trained. Another method, termed petaflops/s-day is also used. 1 pf-day consists of compute nodes performing 10^15 (or a petaflop) of operations per second for a whole day.

For context, the largest version of GPT-3 consisting of 175 billion parameters took 3640 pf-days to train. This means that the GPUs needed to conduct a petaflop of operations per day for almost 10 years! However, with the aid of parallelisation and a little help from Azure’s supercomputing cluster, predictions have placed the time to train this model at about 34 days. In contrast, training the same model on a single NVIDIA V100 GPU would take over 355 years

Even though cutting down training time to 34 days for such a big model seems like a behemoth task, the problem was solved using brute force. Estimates place the training cost of this model at about \$5 million for compute alone. This estimate comes from cloud pricing of these GPUs, which generally cost \$1.5 per hour per GPU.

The single largest compute cost for training LLMs comes from matrix multiplication. In AI, researchers have solved the complex mathematical problem of matrix multiplication by creating a three-dimensional array known as a tensor. Calculating these tensors and using the output to feed into an algorithm constitutes a large part of training, and also consumes a lot of compute due to the nature of the task.

Matrix multiplication is one of the fundamental concepts of mathematics, but is also one of its hardest problems to solve. While researchers have found methods to effectively multiply small matrices of 4×4 size, larger matrices such as those seen in AI require complex mathematical proofs for efficient multiplication.

Solving for efficient matrix multiplication can cut down on the amount of compute resources required for training and inferencing tasks. While other methods like quantisation and model shrinking have also proven to cut down on compute, they sacrifice on accuracy. For a tech giant creating a state-of-the-art model, they’d rather spend the \$5 million, if there’s no way to cut costs. However, DeepMind found a way, and the ball is now in Google’s court.

### GPUs vs TPUs

NVIDIA’s GPUs were well-suited to matrix multiplication tasks due to their hardware architecture, as they were able to effectively parallelise across multiple CUDA cores. Training models on GPUs became the status quo for deep learning in 2012, and the industry has never looked back.

Building on this, Google also launched the first version of the tensor processing unit (TPU) in 2016, which contains custom ASICs (application-specific integrated circuits) optimised for tensor calculations. In addition to this optimisation, TPUs also work extremely well with Google’s TensorFlow framework; the tool of choice for machine learning engineers at the company. This gives them an edge in other AI compute tasks beyond matrix multiplication, and even allows them to speed up fine-tuning and inferencing tasks.

In addition to this, researchers from Google’s DeepMind also found a method to discover better algorithms for matrix multiplication. Termed AlphaTensor, this AI system works to optimise matrix multiplication for other algorithms by providing efficient formulae for the process.

While Google’s tech stack and emerging methods for optimising AI compute have yielded good results, competitors like Microsoft have been capitalising on NVIDIA’s industry entrenchment to eke out a competitive advantage. However, as today’s GPUs get more resource heavy (in terms of power draw and cooling), enterprises are looking at alternatives. Moreover, AI needs compute to get better, with this study predicting that AI compute needs will double every 3.4 months.

To serve this need, NVIDIA has entered into the AI accelerator space. While competitors like AMD and Intel have already created competing products, NVIDIA’s industry knowhow and their iron-fisted hold over CUDA has netted them an advantage yet again. With the launch of the NVIDIA DGX, companies were able to deploy a packaged hardware and software solution for any AI task: something that competitors still cannot offer due to a lack of intellectual property.

While other competitors like AWS have also launched AI accelerators (see Graviton and Inferentia), the battlefield seems to be dominated by GPUs and TPUs for the time being. AI accelerators might figure into enterprises’ next upgrade cycle, but existing solutions provide USPs that cannot be replaced so easily.

NVIDIA GPUs’ general purpose nature allows it to accelerate a wide variety of workloads, and Google TPUs’ focused nature allows it to offer the best possible compute for those working in Google’s ecosystem of AI tools. A paradigm shift in this field might lead to one winning over the other, but the death of Moore’s Law says that we will have to wait awhile before the war is won.

## Our Upcoming Events

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Apple Should be Scared of Windows Copilot

Copilot will start its early rollout as part of the free Windows 11 update, beginning on September 26

### Top 5 Libraries in C/C++ for ML in 2023

There are tons of libraries in C/C++ for ML, such as TensorFlow, Caffe, and mlpack

### Tesla Optimus Finally Learns Yoga, Performs Vrikshasana

Jim Fan, senior AI scientist at NVIDIA, has come forward with insights on how exactly Optimus functions with such brilliance

### NVIDIA’s Dominance Set to Surge Further

NVIDIA’s Meteoric Rise in 2023: On Track to Surpass \$50 Billion Revenue, Achieves \$1 Trillion Market Cap, and Forges Global Partnerships for AI Dominance.

### 6 Brilliant JavaScript Frameworks for Every Developer

Although Python and R are more famous for machine learning, Java can serve this purpose effectively, especially if you’re already familiar with it

### Meet the Researcher Curing the Healthcare System with ML

Ziad Obermeyer is bringing the long-delayed impact of ML in healthcare

### Why Focus on Future AI Regulations When Deepfake Crimes Persist?

With discussions on AI regulations happening on one side, and deepfake crimes increasing on the other, shouldn’t the present be checked before moving to the future?

### Can Stability AI and Meta Meet OpenAI’s Multimodal Challenge?

Llama 3 is anticipated to introduce open-source multimodal capabilities

### How Oracle is Fuelling Musk’s Ambitions

“This guy is landing rockets on robot drones. Who else is landing rockets? Who are you?” said Larry about Elon.