NVIDIA H100 Vs A100: Which is the best GPU?

NVIDIA CEO Jensen Huang introduced the new NVIDIA H100 Tensor Core GPU at NVIDIA GTC, 2022. The GPU is based on NVIDIA’s new Hopper GPU architecture.

NVIDIA CEO Jensen Huang introduced the new NVIDIA H100 Tensor Core GPU at NVIDIA GTC, 2022. The GPU is based on NVIDIA’s new Hopper GPU architecture. Its predecessor, NVIDIA A100, is one of the best GPUs for deep learning. Is H100 a better successor? How do they compare? Analytics India Magazine analyses this.

The H100

The NVIDIA H100 is the company’s ninth-generation data centre GPU packed with 80 billion transistors. Based on the Hopper architecture, NVIDIA claims it to be “the world’s largest and most powerful accelerator”, perfect for large-scale AI and HPC models. This is given the GPU’s features like:

  • World’s Most Advanced Chip
  • New Transformer Engine that speeds up networks 6x the previous version
  • Confidential Computing
  • 2nd-Generation Secure Multi-Instance GPU with MIG capabilities extended by 7x the previous version
  • 4th-Generation NVIDIA NVLink connecting up to 256 H100 GPUs at 9x higher bandwidth
  • New DPX Instructions that can accelerate dynamic programming by up to 40x compared with CPUs and up to 7x compared with previous-generation GPUs.
The H100 GPU

NVIDIA promises the new GPU and Hopper technology will power the most upcoming research in AI. This is given the highly scalable NVLink® interconnect for advancing gigantic AI language models, deep recommender systems, genomics and complex digital twins. Additionally, it allows for enhanced AI inference supporting real-time and immersive applications using giant-scale AI models, chatbots with the capacity for real-time conversation based on Megatron 530B, the most powerful monolithic transformer language model and training of massive models with a speedier process.

The A100

NVIDIA A100 Tensor Core GPU, announced in 2020, was then the world’s highest-performing elastic data centre for AI, data analytics, and HPC. The Ampere architecture provides up to 20X higher performance than its predecessor, with the ability to divide into seven GPUs and dynamically adjust to shifting demands. Supporting the huge digital transformation wave in 2020 and the pandemic, A100 GPU features a multi-instance GPU (MIG) virtualisation and GPU partitioning capability, efficient for cloud service providers (CSPs). Amidst MIG, the A100 allows CSPs improvement to deliver up to 7x more GPU Instances. In addition, the GPU includes the new third-generation Tensor Core that boosts throughput over V100 and supports DL and HPC data types. 

The A100 GPU
NVIDIA H100 GPU HPC and AI Preliminary Performance Chart using from 8 to 256 H100 GPUs
A100 Vs H100 comparison by NVIDIA

The GPU architecture


The NVIDIA Hopper, named after scientist Grace Hopper, was released two years after the previous NVIDIA Ampere architecture. The Hopper architecture extends MIG capabilities by up to 7x over the previous generation by offering secure multitenant configurations in cloud environments across each GPU instance. It introduces features to improve asynchronous execution and allow an overlap of memory copies with computation while minimising synchronisation points. Hopper also mitigates the issues of long training periods for giant models while maintaining the performance of GPUs. The architecture is created to accelerate the training of Transformer models on H100 GPUs by 6x. 

Asynchronous execution concurrency and enhancements in NVIDIA Hopper


NVIDIA’s Ampere architecture has been described as the “heart of the world’s highest-performing, elastic data centres.” Ampere supports elastic computing with high acceleration at every scale. The architecture is crafted with 54 billion transistors, making it the largest 7 nanometers (nm) chip ever built. In addition, it provides L2 cache residency controls to manage data to keep or evict from the cache, further supporting the growth of data centres. Ampere consists of NVIDIAs’ third generation of NVLink®, allowing it to scale applications across multiple GPUs with doubled GPU-to-GPU direct bandwidth to 600 gigabytes per second (GB/s). 

Feature specifications


NVIDIA’s H100 is fabricated on TSMC’s 4N process with 80 billion transistors and 395 billion parameters, offering up to 9x faster speed than the A100. “NVIDIA H100 is the first truly asynchronous GPU”, the team stated. The GPU extends A100’s ‘global-to-shared asynchronous transfers’ across the address spaces. It also grows the CUDA thread group hierarchy with a new level called the thread block cluster.  

The H100 builds upon the A100 Tensor Core GPU SM architecture, enhancing the SM quadrupling the A100 peak per SM floating-point computational power and 0.0 compute capacity. This is given the FP8, doubling A100’s raw SM computational power on all previous Tensor Core, FP32, and FP64 data types, clock-for-clock. As listed by NVIDIA, these are the general specifications of H100. 

  • 8 GPCs, 72 TPCs (9 TPCs/GPC), 2 SMs/TPC, 144 SMs per full GPU
  • 128 FP32 CUDA Cores per SM, 18432 FP32 CUDA Cores per full GPU
  • 4 Fourth-Generation Tensor Cores per SM, 576 per full GPU
  • 6 HBM3 or HBM2e stacks, 12 512-bit Memory Controllers
  • 60MB L2 Cache
  • Fourth-Generation NVLink and PCIe Gen 5


The A100 is built upon the A100 Tensor Core GPU SM architecture, and the third-generation NVIDIA high-speed NVLink interconnect. The chip consists of 54 billion transistors and can execute five petaflops of performance; a 20x leap from its predecessor, Volta. In addition, with a computing capacity of 8.0, the A100 includes fine-grained structured sparsity to double the compute throughput for deep neural networks. As listed by NVIDIA, these are the general specifications of A100. 

  • 8 GPCs, 8 TPCs/GPC, 2 SMs/TPC, 16 SMs/GPC, 128 SMs per full GPU
  • 64 FP32 CUDA Cores/SM, 8192 FP32 CUDA Cores per full GPU
  • 4 third-generation Tensor Cores/SM, 512 third-generation Tensor Cores per full GPU 
  • 6 HBM2 stacks, 12 512-bit memory controllers 

Download our Mobile App

Avi Gopani
Avi Gopani is a technology journalist that seeks to analyse industry trends and developments from an interdisciplinary perspective at Analytics India Magazine. Her articles chronicle cultural, political and social stories that are curated with a focus on the evolving technologies of artificial intelligence and data analytics.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Is Sam Altman a Hypocrite? 

While on the one hand, Altman is advocating for the international community to build strong AI regulations, he is also worried when someone finally decides to regulate it