Why NVIDIA’s new H100 GPU is a breakthrough

NVIDIA has highlighted the Transformer engine for its ability to combine data formats with algorithms to fasten the hardware performance.

At their annual GTC conference, NVIDIA made a batch of new announcements, including a new silicon GPU architecture called Hopper, and the first data centre product based on it was the H100. The H100 comprises 80 billion transistors and employs a new Transformer engine. In his keynote address, CEO Jensen Huang noted that data centres were becoming “AI factories” as training AI models required huge computing power. This increased need for firepower has proven advantageous to NVIDIA since GPUs match well with AI’s deep learning methods.  

New transformer engine

NVIDIA has highlighted the Transformer engine for its ability to combine data formats with algorithms to fasten the hardware performance. Transformer, a machine learning system, gained popularity in 2017 and became an obvious choice for natural language processing models. OpenAI’s GPT-3 and DeepMind’s AlphaFold are powered by transformers. These large language models have grown exponentially in size in recent years. While GPT-3, one of the largest language models, has 175 billion parameters, last year, Google trained a 1.6 trillion-parameter model. 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

“The computing requirements to train large transformer models has been exploding. Training these giant models still takes months. Even on one of the world’s fastest AI supercomputers–Nvidia’s Selene–training the Megatron 530 model [Megatron-Turing NLG 530B, the largest natural language processing training model, at 530 billion parameters] would take one and a half months,” Paresh Kharya, NVIDIA’s senior director of product management and marketing at accelerated computing, said at a press briefing. 

Kharya said that the biggest challenge with cutting down on training time was that as the number of GPUs increased, the performance gains decreased. Hopper was built to tackle exactly that.

The transformer has a 16-bit floating-point precision along with an 8-bit floating-point data format that is a new addition. Floating-point numbers, which are in fractional values, support AI training. AI floating-point work is usually done with 16-bit half-precision (FP16), 32-bit half-precision (FP32) and 64-bit double-precision (FP64). The transformer engine uses NVIDIA’s fourth-generation tensor cores to apply mixed FP8 and FP16 formats while automatically choosing between the FP8 and FP16 formats. 

Enter Hopper

A successor to its two-year-old Ampere GPU architecture, Hopper architecture is named after computing genius and US Navy Rear Admiral Grace Hopper. NVIDIA also unveiled the Grace CPU Superchip named after the pioneer at the same event. 

The Hopper architecture has other important features such as: 

  • Confidential Computing: This enabled customer data and AI models to be protected during processing. Confidential computing is a valuable addition to GPUs as it was previously offered only on CPUs. It can also be used in healthcare and finance where sensitive data is being used.
  • 2nd Generation Secure Multi-Instance GPU technology: This divides a GPU into seven smaller, isolated parts so that they can handle different types of tasks. 
  • 4th-Generation NVIDIA NVLink: The NVLink can interconnect itself to 256 H100 GPUs with a nine times bigger bandwidth as compared to the previous generation using NVIDIA HDR Quantum Infiband. The company claims that it can accelerate the Transformer model network functions to make it six times faster than what it used to be.
  • DPX Programming Instructions: This is NVIDIA’s first GPU with DPX programming. Dynamic programming solves problems using recursion and memorization and accelerates our GPUs by seven times as compared to previous generations.

The H100 was built using the 4nm manufacturing process first used by TSMC and can support external connectivity of nearly 5 terabytes per second. NVIDIA has also claimed that it is the first GPU to support PCle Gen5 and HBM3, with 3TBps of memory bandwidth. The H100 is thrice faster than the previous generation A100. According to NVIDIA, twenty of these H100s have the capacity to hold up internet traffic for the whole world. 

The company announced that the H100 will be available starting from the third quarter and will be bound for on-premises, cloud, hybrid-cloud and edge data centres. The H100 will also be available from NVIDIA’s fourth-generation DGX system – the DGX H100. 

All of the announcements made were keeping in line with NVIDIA’s supercomputing and AI objectives. In the same event, it also stated that the new Eos supercomputer would be built using 18 H100 SuperPods, 576 DGX H100 systems and 360 NVLink Switches. 

More Great AIM Stories

Poulomi Chatterjee
Poulomi is a Technology Journalist with Analytics India Magazine. Her fascination with tech and eagerness to dive into new areas led her to the dynamic world of AI and data analytics.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.