MITB Banner

Stability AI Claims Intel Gaudi 2 is Faster than NVIDIA H100

But only without TensorRT Optimisation.

Share

Intel Soon to be on Par with NVIDIA
Listen to this story

In a recent blog post titled “Behind the Compute,” Stability AI, unveiled shocking findings regarding the performance of Intel Gaudi 2 accelerators compared to NVIDIA’s H100 in training and inference of its upcoming image generation model Stable Diffusion 3.

Stability AI’s text-to-image model demonstrated promising results in the performance analysis. Utilising the 2B parameter multimodal diffusion transformer (MMDiT) version of the model, Stability AI compared the training speed of Intel Gaudi 2 accelerators with NVIDIA’s A100 and H100.

On 2 nodes configuration, Intel Gaudi 2 system processed 927 training images per second, 1.5 times faster than NVIDIA H100-80GB. Further increasing the batch size to 32 per accelerator in Gaudi 2 resulted in a training rate of 1,254 images/sec.

On 32 Nodes Configuration, the Gaudi 2 cluster processed over 3x more images per second compared to NVIDIA A100-80GB GPUs, despite A100s having a highly optimised software stack.

On inference tests with the Stable Diffusion 3 8B parameter model, Gaudi 2 chips offered similar inference speed to NVIDIA A100 chips using base PyTorch.

However, Stability AI admitted that with TensorRT optimisation, A100 chips produced images 40% faster than Gaudi 2, but Stability AI anticipates Gaudi 2 to outperform A100s with further optimisation. This can be further contented with the upcoming GH200 processors that might be announced at GTC 2024 this month. 

Source: Stability AI Blog

Few months back, AMD also claimed that it has surpassed NVIDIA H100 on various performance metrics, but it was later debunked by NVIDIA as it said that AMD also did not include TensorRT optimisation for the test

Intel has also launched its Gaudi 3 AI accelerator which would make this competition even interesting in the future. 

Moreover, Stable Beluga 2.5 70B, Stability AI’s fine-tuned version of LLaMA 2 70B, showcased impressive performance on Intel Gaudi 2 accelerators. Running the PyTorch code out of the box on 256 Gaudi 2 accelerators, Stability AI measured an average throughput of 116,777 tokens/second.

Gaudi 2 demonstrated a 28% faster performance compared to NVIDIA A100 in inference tests with the 70B language model, generating 673 tokens/second per accelerator.

Share
Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.