As ChatGPT and Bard slug it out, two behemoths work in the shadows to keep them running – NVIDIA’s CUDA-powered GPUs (Graphic Processing Units) and Google’s custom-built TPUs (Tensor Processing Units). In other words, it’s no longer about ChatGPT vs Bard, but TPU vs GPU, and how effectively they are able to do matrix multiplication.
Why models should be optimised
Training costs are one of the biggest barriers to creating a large model. AI compute is generally calculated in compute/GPU hours, which represents the time it takes for a model to be trained. Another method, termed petaflops/s-day is also used. 1 pf-day consists of compute nodes performing 10^15 (or a petaflop) of operations per second for a whole day.
For context, the largest version of GPT-3 consisting of 175 billion parameters took 3640 pf-days to train. This means that the GPUs needed to conduct a petaflop of operations per day for almost 10 years! However, with the aid of parallelisation and a little help from Azure’s supercomputing cluster, predictions have placed the time to train this model at about 34 days. In contrast, training the same model on a single NVIDIA V100 GPU would take over 355 years.
AIM Daily XO
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Even though cutting down training time to 34 days for such a big model seems like a behemoth task, the problem was solved using brute force. Estimates place the training cost of this model at about $5 million for compute alone. This estimate comes from cloud pricing of these GPUs, which generally cost $1.5 per hour per GPU.
The single largest compute cost for training LLMs comes from matrix multiplication. In AI, researchers have solved the complex mathematical problem of matrix multiplication by creating a three-dimensional array known as a tensor. Calculating these tensors and using the output to feed into an algorithm constitutes a large part of training, and also consumes a lot of compute due to the nature of the task.
Download our Mobile App
Matrix multiplication is one of the fundamental concepts of mathematics, but is also one of its hardest problems to solve. While researchers have found methods to effectively multiply small matrices of 4×4 size, larger matrices such as those seen in AI require complex mathematical proofs for efficient multiplication.
Solving for efficient matrix multiplication can cut down on the amount of compute resources required for training and inferencing tasks. While other methods like quantisation and model shrinking have also proven to cut down on compute, they sacrifice on accuracy. For a tech giant creating a state-of-the-art model, they’d rather spend the $5 million, if there’s no way to cut costs. However, DeepMind found a way, and the ball is now in Google’s court.
GPUs vs TPUs
NVIDIA’s GPUs were well-suited to matrix multiplication tasks due to their hardware architecture, as they were able to effectively parallelise across multiple CUDA cores. Training models on GPUs became the status quo for deep learning in 2012, and the industry has never looked back.
Building on this, Google also launched the first version of the tensor processing unit (TPU) in 2016, which contains custom ASICs (application-specific integrated circuits) optimised for tensor calculations. In addition to this optimisation, TPUs also work extremely well with Google’s TensorFlow framework; the tool of choice for machine learning engineers at the company. This gives them an edge in other AI compute tasks beyond matrix multiplication, and even allows them to speed up fine-tuning and inferencing tasks.
In addition to this, researchers from Google’s DeepMind also found a method to discover better algorithms for matrix multiplication. Termed AlphaTensor, this AI system works to optimise matrix multiplication for other algorithms by providing efficient formulae for the process.
While Google’s tech stack and emerging methods for optimising AI compute have yielded good results, competitors like Microsoft have been capitalising on NVIDIA’s industry entrenchment to eke out a competitive advantage. However, as today’s GPUs get more resource heavy (in terms of power draw and cooling), enterprises are looking at alternatives. Moreover, AI needs compute to get better, with this study predicting that AI compute needs will double every 3.4 months.
To serve this need, NVIDIA has entered into the AI accelerator space. While competitors like AMD and Intel have already created competing products, NVIDIA’s industry knowhow and their iron-fisted hold over CUDA has netted them an advantage yet again. With the launch of the NVIDIA DGX, companies were able to deploy a packaged hardware and software solution for any AI task: something that competitors still cannot offer due to a lack of intellectual property.
While other competitors like AWS have also launched AI accelerators (see Graviton and Inferentia), the battlefield seems to be dominated by GPUs and TPUs for the time being. AI accelerators might figure into enterprises’ next upgrade cycle, but existing solutions provide USPs that cannot be replaced so easily.
NVIDIA GPUs’ general purpose nature allows it to accelerate a wide variety of workloads, and Google TPUs’ focused nature allows it to offer the best possible compute for those working in Google’s ecosystem of AI tools. A paradigm shift in this field might lead to one winning over the other, but the death of Moore’s Law says that we will have to wait awhile before the war is won.