Listen to this story
NVIDIA has been at the forefront of the generative AI wave and has been one of the biggest takers of the pie—thanks to its GPUs. The chip giant showed a staggering forecast of $11 billion for the next quarter, which is 50% higher than Wall Street’s estimates.
But this has also been a double-edged sword for chip manufacturers because the demand has outpaced the supply by a huge margin. The Wall Street Journal quoted Sharon Zhou, co-founder and CEO of Lamini, a startup that helps companies build AI chatbots, as saying, “It’s like the toilet paper during the pandemic… because there is a shortage, it’s about who you know.”
The generative AI revolution and capacity augmentation from big data and server operators are causing a shortage. Companies are making a run to acquire GPUs for training large language and vision models due to their suitability for parallel processing tasks like matrix multiplication. The scarcity of advanced chips used in generative AI systems has led to competition to secure spare capacity and computing power.
The impact is such that NVIDIA’s GPU inventory tumbled 10.6% in March—its first decline since the fourth quarter of 2019. Elon Musk, who is said to have acquired a significant portion of Oracle’s available server space for his OpenAI rival called X.AI, leaving startup seekers in the dust, also famously said, “GPUs at this point are considerably harder to get than drugs.”
Sam Altman, meanwhile, believes that the era of scaling larger AI models is ending due to cost constraints arising from training and operating large language models on powerful GPUs and diminishing returns. But NVIDIA is looking to save the day and keep the stack flowing in. The market leader, having recently reached a trillion-dollar market cap, is focused on building an AI-powered future which will bring them their next trillion. CEO Jensen Huang outlined the company’s plans at the Computex conference. With its expertise in generative AI, NVIDIA aims to diversify its AI portfolio and capitalise on the market.
This is business
NVIDIA CEO Jensen Huang announced that the company plans to make their powerful supercomputers available for rent to businesses. Through the DGX Cloud rental service, developers can access tens of thousands of NVIDIA chips, including the flagship A100 or H100 chips, to accelerate AI advancements. The rental service, priced at $37,000 per month for eight chips, is aimed at meeting the growing demand in the AI industry. NVIDIA is collaborating with major companies such as Oracle, Microsoft, and Alphabet to offer supercomputers as a service, further expanding access to their powerful technology.
To make AI products more cost-effective, NVIDIA unveiled new chips and software solutions at the announcement. They also launched AI Foundations, a service designed to assist companies in training customised AI models. Additionally, NVIDIA introduced technology to expedite semiconductor design and manufacturing processes, significantly reducing calculation times from weeks to overnight. Collaborations with industry giants like AT&T, Taiwan Semiconductor Manufacturing Co (TSMC), and ASML Holding are in progress to bring these advancements to market.
Huang emphasised that NVIDIA’s entire data centre product lineup, including the H100, Grace CPU, Grace Hopper Superchip, NVLink, Quantum 400 InfiniBand, and BlueField-3 DPU, is now in production to meet the increasing demand. The company announced that the GeForce RTX 4060 Ti GPU for gamers, HGX H100 GPU server, and GH200 Grace Hopper superchip are also in full production. These developments highlight NVIDIA’s commitment to advancing AI computing and providing high-performance solutions for a range of applications.
In particular, the GH200 Grace Hopper Superchip, which combines the Arm-based NVIDIA Grace CPU and Hopper GPU architectures, is now in full production. The GH200 offers high bandwidth and compute capability, making it suitable for complex AI and high-performance computing workloads. Global hyperscalers, supercomputing centres, and system manufacturers like Cisco, Dell, and Lenovo will have access to GH200-powered systems. NVIDIA’s software stack, including NVIDIA AI, the Omniverse platform, and RTX technology, will be supported across these accelerated systems, with availability expected later this year.
NO competition, whatsoever!
NVIDIA currently dominates the global GPU market, holding 88% shares, while competitors AMD and Intel make up the remaining 12%. However, other companies are entering the AI accelerator market, including AMD with its AMD Instinct AI accelerator. Although AMD’s consumer GPUs are not as well-suited for AI applications as NVIDIA’s GPUs due to the lack of an equivalent API to NVIDIA’s CUDA, AMD has developed the ROCm open software platform for machine learning. However, CUDA is more mature and widely integrated, offering better compatibility with popular AI tools like TensorFlow and PyTorch.
In an effort to reduce dependence on NVIDIA, major tech giants like Google and Amazon have developed their own custom chips tailored for AI workloads. For instance, AWS has introduced Inferentia for inference tasks, while Google has developed the Tensor Processing Unit (TPU) specifically for TensorFlow. Despite the emergence of these custom chips, NVIDIA has been a key player in the AI space for a long time, even before the application of GPUs in training AI algorithms became prominent.