Listen to this story
|
NVIDIA’s flagship H100 chip has blown the competition out of the water yet again, demonstrating the best performance yet on a set of MLPerf training benchmarks. The GPU set new records across the board in a new attempt conducted in conjunction with CoreWeave and Inflection AI.
The tests were run on a cluster of 3,584 H100 GPUs hosted on CoreWeave’s platform. These GPUs were linked together by the InfiniBand interconnect, with NVIDIA stating that this allows the GPUs to deliver performance at both standalone and scale levels.
The MLPerf benchmarks aim to quantify the power of certain hardware by calculating the amount of time it takes to complete certain workloads. The tests consist of various LLMs and computer vision models, along with a handful of CNNs and RNNs.
NVIDIA claims that the H100 has ‘delivered the highest performance on every benchmark’, with some standout examples being training GPT-3 in just 11 minutes, and completing the ResNet benchmark in just 0.18 minutes. Reportedly, the H100 was the only chip that was able to complete all benchmarks.
As Team Green dominates, NVIDIA’s competitors are left holding the slack. AMD’s AI accelerator offerings are nowhere close to the H100. The company recently launched the M1300X AI accelerator chips, which aims to compete against the H100 with its 192GB of VRAM.
While benchmark results have not been released for this new chip, the lukewarm response from the market is enough to show that the product won’t take over NVIDIA any time soon.
Other notable competitors include Cerebras, which is shown to be cost equivalent to NVIDIA when training large language models, and Google’s TPUs, which are more energy efficient and powerful, but restricted to certain workloads. For the foreseeable future, it seems that NVIDIA will continue to ride the AI wave, fuelled by the dual engines of capable compute and a well-integrated software ecosystem.