Listen to this story
In 2022, there were a multitude of generative AI models released by think tanks such as Microsoft-backed OpenAI and Alphabet’s DeepMind, along with several others. While this battle between Microsoft and Google rages on, a clear winner has already emerged—NVIDIA. Nearly all the existing state-of-the-art models are trained on NVIDIA GPUs.
NVIDIA’s CUDA API and its GPUs’ suitability for AI workloads have made these chips a must-have for training AI workloads. Let’s delve deeper into how NVIDIA is playing all quarters and will inevitably profit the most from the upcoming AI wave.
NVIDIA’s command over AI
NVIDIA’s grip over the AI market is clearly evident. For instance, OpenAI used over 10,000 NVIDIA H100 and A100 GPUs for training ChatGPT, and Stable Diffusion took around 200,000 GPU hours to train on the NVIDIA A100 GPU. Cloud providers such as AWS and Azure have also partnered with NVIDIA to create humongous clusters of NVIDIA GPUs for enterprise training workloads. With their latest generation of GPUs—termed the H100 series—NVIDIA has solidified their lead in the AI training space.
The company is also doubling down on their market leader position, launching the DGX series of purpose-built AI supercomputers built on Volta GPU architecture. This series—combined with the Titan V and Quadra GV100—brings AI computation to enterprises and consumers alike. However, even as NVIDIA has the first mover advantage and has led the market for the past decade, other companies seem to be catching up fairly quickly.
NVIDIA’s long-time rival, AMD, has also made its first strides into the AI accelerator game. At CES 2023, AMD’s CEO Lisa Su announced the latest iteration of the AMD Instinct AI accelerator. This chip combines AMD’s CPU and GPU tech into one package and is aimed for deployment in enterprise use-cases. However, AMD’s consumer GPUs are far behind NVIDIA’s offerings in terms of usability for AI use-cases, mainly due to the lack of a CUDA-equivalent API.
While AMD has dedicated resources to develop the ROCm open software platform for machine learning applications, it is yet to reach the maturity that CUDA boasts after 14 years of constant development. NVIDIA GPUs are also better integrated into common AI tools such as TensorFlow and PyTorch, and usually run into lesser bugs and issues in a work environment.
In addition to AMD, tech giants like Google and Amazon have also poured resources into creating custom silicon for AI workloads. AWS has created Inferentia, a chip targeted at inference workloads on the cloud while Google has created the Tensor Processing Unit for AI tasks running on TensorFlow. These chips beat out NVIDIA’s GPUs for specific workloads, but NVIDIA’s chips are better suited for general purpose tasks, thereby making them usable for more applications.
GPUs and AI are inseparable
In order to understand why GPUs are a perfect fit for training large AI models, we must first look at their architecture. GPUs were first created as a way to render graphics and high-resolution images at a time when CPUs weren’t capable of doing so. These chips are made up of smaller cores, dubbed CUDA cores by NVIDIA (alternatively known as streaming multiprocessors). Today, the average GPU contains thousands of these cores that are built to conduct several tasks in parallel.
Training AI algorithms is suited for GPUs since most training workloads consist of tasks such as matrix multiplication, which are naturally suited for parallelism. This idea was first solidified in a research paper titled, ‘Large-scale Deep Unsupervised Learning using Graphics Processors’ by Rajat Raina, Anand Madhavan, and Andrew Y. Ng, published in 2009. In hindsight, using the parallel processing capabilities of GPUs at a time when CPU cores were lacking in number seems like an obvious choice.
This work sparked the trend of more AI researchers using GPUs for training tasks, leading NVIDIA to add more AI capabilities to their chips. It began with updates to CUDA—the API that allows developers to directly interface with and address GPU cores—followed by the addition of dedicated compute units for AI workloads known as ‘Tensor cores’. These cores are optimised for conducting floating point operations, which are an integral part of AI training and inference workloads.
Even as custom silicon makes its way to the enterprise market and eventually to the end consumer through cloud services, the obtainability and support of NVIDIA’s consumer GPUs has gained an unbreakable stronghold on the AI market. The pervasive nature of CUDA, coupled with Tensor cores on their latest GPUs, has established NVIDIA as the reigning leader of the AI accelerator space. While open-source APIs such as OpenCL have made inroads into NVIDIA’s CUDA dominance, the vertical integration present in their tech stack dissuades competition, at least until their competitors catch up to their mighty R&D efforts.