Last updated November 27, 2023
In AI Breakthroughs

NVIDIA Rides High on InfiniBands

“The vast majority of the dedicated large scale AI factories standardise on InfiniBand,” said Jensen Huang during NVIDIA’s Q3 earnings call

Share

Illustration by Nikhil Kumar

Published on November 27, 2023

by Vandana Nair

Listen to this story

NVIDIA has been shining all along with the latest Q3 earnings reflecting the unstoppable growth of the tech giant. The latest earnings reported a revenue of $18.12 billion which was a 206% increase YoY and 34% from the previous quarter. The company even attributed the phenomenal growth in revenue to its continued ramp of NVIDIA HGX platform along with end-to-end networking via InfiniBand.

NVIDIA has called out the contribution of networking that has now exceeded $10 billion annualised revenue run rate, nearly tripling from the previous year. This is attributed to the rising demand for InfiniBand which witnessed a fivefold increase YoY.

A Complete Architecture

InfiniBand, which is considered critical for gaining the scale and performance needed for training LLMs, when combined with NVIDIA HGX forms the foundational architecture for AI supercomputers and data centre infrastructures. InfiniBand is commonly used in supercomputing environments for interconnecting servers. The biggest advantage is its ability to provide low latency and high-bandwidth communication that is crucial for parallel processing tasks. With extreme-size datasets and ultra-fast processing of high-resolution simulations, NVIDIA’s Quantum InfiniBand Switches are said to match these needs with lower cost and complexity.

A few months ago, NVIDIA had reached breakthrough performance with their leading H100 chip. The tests were run on 3,584 H100 GPUs that were connected with InfiniBand as they allowed GPUs to deliver performance at standalone and scale levels. Thereby, proving its prowess when combined with high performing networking capabilities.

InfiniBands : The Preferred Choice

Speaking about the future of InfiniBands, Jensen Huang said that the vast majority of the dedicated large scale AI factories standardise on InfiniBand, and it’s not only because of data rate and latency but “the way traffic moves around the network” is important. He also called it a ‘computing fabric.”

Comparing it to Ethernet, Huang talks about the huge difference between the two. With NVIDIA investing $2 billion in infrastructure for AI factories, any form of variance, such as 20 or 30% in overall effectiveness will result in millions of dollars of change in value which accumulate as significant costs over the next 4-5 years.

Huang calls InfiniBand’s value proposition ‘undeniable for AI factories.’ However, Ethernet is not ruled out. While Infinibands are used for cases that require high bandwidths with low latency, ethernet finds applicability in other scenarios.

Ethernet, a widely used general-purpose networking technology for wired local area networks (LAN), is suitable for a broad range of applications, more geared towards connecting terminal devices. However, its capabilities cannot be matched with InfiniBands.

Interestingly, NVIDIA also offers gateway appliances connecting InfiniBand data centres to Ethernet-based infrastructures and storage. NVIDIA will also release Spectrum-X in Q1 next year, an Ethernet offering that is said to achieve 1.6x higher networking performance when compared to other available Ethernet technologies.

In terms of functionality, Intel’s Omni Path Architecture (OPA) was designed for high-speed data transfer and low latency communication in HPC environments. It was released in 2016, however, it was discontinued in 2019. Cisco on the other hand, has ethernet-based switches but nothing in the HPC space.

An Integrated Expansion

With GPU and networking offerings, enterprises are now given the choice of integrating their whole architectural framework from NVIDIA products. In addition to speaking about NVIDIA’s partnerships with Reliance, Infosys and Tata, the company mentioned their collaborations with organisations for optimising InfiniBands in their AI compute needs.

In the earnings call, NVIDIA spoke about its partnership with Scaleway, a French private cloud provider that will build their regional AI cloud based on NVIDIA H100 InfiniBand and AI Enterprise Software to power AI advancements across Europe.

Furthermore, Julich, a German supercomputing centre, also announced its plans to build their next-gen AI supercomputer using close to 24,000 Grace Hopper Superchips and Quantum-2 InfiniBand, elevating it to world’s most powerful AI supercomputer with over 90 exaflops of AI performance.

Interestingly, Microsoft Azure uses over 29,000 miles of InfiniBand cabling. Infiniband enabled HB and N-series’ virtual machines are utilised by Microsoft for achieving HPC with cost efficiency.

Bundling networking and GPU, NVIDIA is boosting its growth and stance in the supercomputer market. Going by the lack of alternatives to NVIDIA Infinibands, it looks like the company’s dominance is going to be further enhanced, ultimately making it indispensable for companies looking to utilise GPU and networking.

Access all our open Survey & Awards Nomination forms in one place

Vandana Nair

As a rare blend of engineering, MBA, and journalism degree, Vandana Nair brings a unique combination of technical know-how, business acumen, and storytelling skills to the table. Her insatiable curiosity for all things startups, businesses, and AI technologies ensures that there's always a fresh and insightful perspective to her reporting.