Advertisement

NVIDIA H100 Vs A100: Which is the best GPU?

NVIDIA CEO Jensen Huang introduced the new NVIDIA H100 Tensor Core GPU at NVIDIA GTC, 2022. The GPU is based on NVIDIA’s new Hopper GPU architecture.

NVIDIA CEO Jensen Huang introduced the new NVIDIA H100 Tensor Core GPU at NVIDIA GTC, 2022. The GPU is based on NVIDIA’s new Hopper GPU architecture. Its predecessor, NVIDIA A100, is one of the best GPUs for deep learning. Is H100 a better successor? How do they compare? Analytics India Magazine analyses this.

The H100

The NVIDIA H100 is the company’s ninth-generation data centre GPU packed with 80 billion transistors. Based on the Hopper architecture, NVIDIA claims it to be “the world’s largest and most powerful accelerator”, perfect for large-scale AI and HPC models. This is given the GPU’s features like:

  • World’s Most Advanced Chip
  • New Transformer Engine that speeds up networks 6x the previous version
  • Confidential Computing
  • 2nd-Generation Secure Multi-Instance GPU with MIG capabilities extended by 7x the previous version
  • 4th-Generation NVIDIA NVLink connecting up to 256 H100 GPUs at 9x higher bandwidth
  • New DPX Instructions that can accelerate dynamic programming by up to 40x compared with CPUs and up to 7x compared with previous-generation GPUs.
The H100 GPU

NVIDIA promises the new GPU and Hopper technology will power the most upcoming research in AI. This is given the highly scalable NVLink® interconnect for advancing gigantic AI language models, deep recommender systems, genomics and complex digital twins. Additionally, it allows for enhanced AI inference supporting real-time and immersive applications using giant-scale AI models, chatbots with the capacity for real-time conversation based on Megatron 530B, the most powerful monolithic transformer language model and training of massive models with a speedier process.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

The A100

NVIDIA A100 Tensor Core GPU, announced in 2020, was then the world’s highest-performing elastic data centre for AI, data analytics, and HPC. The Ampere architecture provides up to 20X higher performance than its predecessor, with the ability to divide into seven GPUs and dynamically adjust to shifting demands. Supporting the huge digital transformation wave in 2020 and the pandemic, A100 GPU features a multi-instance GPU (MIG) virtualisation and GPU partitioning capability, efficient for cloud service providers (CSPs). Amidst MIG, the A100 allows CSPs improvement to deliver up to 7x more GPU Instances. In addition, the GPU includes the new third-generation Tensor Core that boosts throughput over V100 and supports DL and HPC data types. 

The A100 GPU
NVIDIA H100 GPU HPC and AI Preliminary Performance Chart using from 8 to 256 H100 GPUs
A100 Vs H100 comparison by NVIDIA


Download our Mobile App



The GPU architecture

NVIDIA Hopper

The NVIDIA Hopper, named after scientist Grace Hopper, was released two years after the previous NVIDIA Ampere architecture. The Hopper architecture extends MIG capabilities by up to 7x over the previous generation by offering secure multitenant configurations in cloud environments across each GPU instance. It introduces features to improve asynchronous execution and allow an overlap of memory copies with computation while minimising synchronisation points. Hopper also mitigates the issues of long training periods for giant models while maintaining the performance of GPUs. The architecture is created to accelerate the training of Transformer models on H100 GPUs by 6x. 

Asynchronous execution concurrency and enhancements in NVIDIA Hopper

NVIDIA Ampere

NVIDIA’s Ampere architecture has been described as the “heart of the world’s highest-performing, elastic data centres.” Ampere supports elastic computing with high acceleration at every scale. The architecture is crafted with 54 billion transistors, making it the largest 7 nanometers (nm) chip ever built. In addition, it provides L2 cache residency controls to manage data to keep or evict from the cache, further supporting the growth of data centres. Ampere consists of NVIDIAs’ third generation of NVLink®, allowing it to scale applications across multiple GPUs with doubled GPU-to-GPU direct bandwidth to 600 gigabytes per second (GB/s). 

Feature specifications

H100 

NVIDIA’s H100 is fabricated on TSMC’s 4N process with 80 billion transistors and 395 billion parameters, offering up to 9x faster speed than the A100. “NVIDIA H100 is the first truly asynchronous GPU”, the team stated. The GPU extends A100’s ‘global-to-shared asynchronous transfers’ across the address spaces. It also grows the CUDA thread group hierarchy with a new level called the thread block cluster.  

The H100 builds upon the A100 Tensor Core GPU SM architecture, enhancing the SM quadrupling the A100 peak per SM floating-point computational power and 0.0 compute capacity. This is given the FP8, doubling A100’s raw SM computational power on all previous Tensor Core, FP32, and FP64 data types, clock-for-clock. As listed by NVIDIA, these are the general specifications of H100. 

  • 8 GPCs, 72 TPCs (9 TPCs/GPC), 2 SMs/TPC, 144 SMs per full GPU
  • 128 FP32 CUDA Cores per SM, 18432 FP32 CUDA Cores per full GPU
  • 4 Fourth-Generation Tensor Cores per SM, 576 per full GPU
  • 6 HBM3 or HBM2e stacks, 12 512-bit Memory Controllers
  • 60MB L2 Cache
  • Fourth-Generation NVLink and PCIe Gen 5

A100

The A100 is built upon the A100 Tensor Core GPU SM architecture, and the third-generation NVIDIA high-speed NVLink interconnect. The chip consists of 54 billion transistors and can execute five petaflops of performance; a 20x leap from its predecessor, Volta. In addition, with a computing capacity of 8.0, the A100 includes fine-grained structured sparsity to double the compute throughput for deep neural networks. As listed by NVIDIA, these are the general specifications of A100. 

  • 8 GPCs, 8 TPCs/GPC, 2 SMs/TPC, 16 SMs/GPC, 128 SMs per full GPU
  • 64 FP32 CUDA Cores/SM, 8192 FP32 CUDA Cores per full GPU
  • 4 third-generation Tensor Cores/SM, 512 third-generation Tensor Cores per full GPU 
  • 6 HBM2 stacks, 12 512-bit memory controllers 

More Great AIM Stories

Avi Gopani
Avi Gopani is a technology journalist that seeks to analyse industry trends and developments from an interdisciplinary perspective at Analytics India Magazine. Her articles chronicle cultural, political and social stories that are curated with a focus on the evolving technologies of artificial intelligence and data analytics.

AIM Upcoming Events

Regular Passes expire on 3rd Mar

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 17th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, Virtual
Deep Learning DevCon 2023
27 May, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
AIM TOP STORIES

What went wrong with Meta?

Many users have opted out of Facebook and other applications tracking their activities now that they must explicitly ask for permission.