At their annual GTC conference, NVIDIA made a batch of new announcements, including a new silicon GPU architecture called Hopper, and the first data centre product based on it was the H100. The H100 comprises 80 billion transistors and employs a new Transformer engine. In his keynote address, CEO Jensen Huang noted that data centres were becoming “AI factories” as training AI models required huge computing power. This increased need for firepower has proven advantageous to NVIDIA since GPUs match well with AI’s deep learning methods.
New transformer engine
NVIDIA has highlighted the Transformer engine for its ability to combine data formats with algorithms to fasten the hardware performance. Transformer, a machine learning system, gained popularity in 2017 and became an obvious choice for natural language processing models. OpenAI’s GPT-3 and DeepMind’s AlphaFold are powered by transformers. These large language models have grown exponentially in size in recent years. While GPT-3, one of the largest language models, has 175 billion parameters, last year, Google trained a 1.6 trillion-parameter model.
“The computing requirements to train large transformer models has been exploding. Training these giant models still takes months. Even on one of the world’s fastest AI supercomputers–Nvidia’s Selene–training the Megatron 530 model [Megatron-Turing NLG 530B, the largest natural language processing training model, at 530 billion parameters] would take one and a half months,” Paresh Kharya, NVIDIA’s senior director of product management and marketing at accelerated computing, said at a press briefing.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Kharya said that the biggest challenge with cutting down on training time was that as the number of GPUs increased, the performance gains decreased. Hopper was built to tackle exactly that.
The transformer has a 16-bit floating-point precision along with an 8-bit floating-point data format that is a new addition. Floating-point numbers, which are in fractional values, support AI training. AI floating-point work is usually done with 16-bit half-precision (FP16), 32-bit half-precision (FP32) and 64-bit double-precision (FP64). The transformer engine uses NVIDIA’s fourth-generation tensor cores to apply mixed FP8 and FP16 formats while automatically choosing between the FP8 and FP16 formats.
A successor to its two-year-old Ampere GPU architecture, Hopper architecture is named after computing genius and US Navy Rear Admiral Grace Hopper. NVIDIA also unveiled the Grace CPU Superchip named after the pioneer at the same event.
The Hopper architecture has other important features such as:
- Confidential Computing: This enabled customer data and AI models to be protected during processing. Confidential computing is a valuable addition to GPUs as it was previously offered only on CPUs. It can also be used in healthcare and finance where sensitive data is being used.
- 2nd Generation Secure Multi-Instance GPU technology: This divides a GPU into seven smaller, isolated parts so that they can handle different types of tasks.
- 4th-Generation NVIDIA NVLink: The NVLink can interconnect itself to 256 H100 GPUs with a nine times bigger bandwidth as compared to the previous generation using NVIDIA HDR Quantum Infiband. The company claims that it can accelerate the Transformer model network functions to make it six times faster than what it used to be.
- DPX Programming Instructions: This is NVIDIA’s first GPU with DPX programming. Dynamic programming solves problems using recursion and memorization and accelerates our GPUs by seven times as compared to previous generations.
The H100 was built using the 4nm manufacturing process first used by TSMC and can support external connectivity of nearly 5 terabytes per second. NVIDIA has also claimed that it is the first GPU to support PCle Gen5 and HBM3, with 3TBps of memory bandwidth. The H100 is thrice faster than the previous generation A100. According to NVIDIA, twenty of these H100s have the capacity to hold up internet traffic for the whole world.
The company announced that the H100 will be available starting from the third quarter and will be bound for on-premises, cloud, hybrid-cloud and edge data centres. The H100 will also be available from NVIDIA’s fourth-generation DGX system – the DGX H100.
All of the announcements made were keeping in line with NVIDIA’s supercomputing and AI objectives. In the same event, it also stated that the new Eos supercomputer would be built using 18 H100 SuperPods, 576 DGX H100 systems and 360 NVLink Switches.