Listen to this story
In 2009, Intel claimed supercomputing dominance, as its chips powered over 80% of the world’s top 500 supercomputers. While those times have passed and Intel has since lost its lead to AMD, it seems redemption is on the horizon. The new Aurora supercomputer, built on Intel’s Sapphire Rapids line of chips and designed for Argonne National Laboratory’s simulation workloads, is poised to take over AMD’s position as the premier supercomputing chip.
While these eternal rivals continue to duke it out in the generalised supercomputer world, competition is gradually becoming intense as other tech giants seem to have bigger plans.
In a bid to reduce dependency on chipmakers and optimise their infrastructure in the process, Google, Meta, Amazon, and others have begun creating their own chips. In a market that’s more competitive than ever before, can Intel expand its foothold and take back the supercomputing crown?
Battle on an exascale
Until the announcement of Aurora, AMD’s Frontier supercomputer, built for the Oak Ridge National Laboratory’s scientific research requirements, was the undisputed king of the supercomputer world. Ever since its launch in 2021, Frontier has been the only supercomputer capable of exascale computing. A supercomputer capable of executing over 1 quintillion floating point operations per second (1 exaflop) is termed an exascale supercomputer.
According to the Top500 list, a list of top supercomputers in the world, Frontier is currently the only exascale supercomputer in operation, but Aurora looks ready to change that. Announced at the ISC High Performance Conference, Aurora was poised to be Intel’s answer to AMD’s untouched exascale dominance.
However, with the released spec sheet, it seems that Aurora will not only challenge Frontier, but dominate it.
While Aurora’s journey to the exascale level has been fraught with delays and reworks, it seems that the supercomputer is finally seeing the light of day. First announced in 2015, the project was delayed twice, once in 2017, and again in 2020. This comes as no surprise, as Intel’s Sapphire Rapids line, the chips powering Aurora, have also been delayed multiple times. Now that the chip is finally entering the market, Intel has found a rare opportunity to shine out and take a significant title from its core competitor.
However, there is still another problem that intel cannot move beyond: power consumption. Frontier consumes 21 megawatt (MW) of power, and was, for a while, the most efficient supercomputer in the world. Aurora, on the other hand, is predicted to consume over 60 MW of power.
Indeed, this is becoming a new problem for older data centres and compute farms. Reports have noted that newer CPUs from both Intel Xeon and AMD EPYC consume upwards of 350W each, with NVIDIA’s GPUs being even more power-hungry.
In the light of rising energy prices across the world and a higher focus on sustainability, companies are looking for more efficient, specialised chips. These chips can bring a host of improvements apart from just power consumption, sweetening the pot for companies willing to create their own chips.
Bigger isn’t always better
While supercomputers like Frontier and Aurora are targeted for research on topics like nuclear fusion, low-carbon technologies, cancer, and subatomic particles, big techs have trained their sights on one single goal — AI.
Google, Meta, Amazon, and even Microsoft are working towards freeing themselves from Intel, AMD, and NVIDIA for AI compute. Google’s latest TPUv4, specialised for TensorFlow operations, is 1.2x-1.7x faster and 1.3x-1.9x more efficient than NVIDIA’s A100 chips. AWS offers a whole suite of chips for both AI training and inferencing workloads, with the chips’ specialised nature being a natural fit for AI tasks.
Meta recently announced a new chip called the Meta Training and Inference Accelerator. As the name suggests, MTIA is a specialised chip for Meta’s internal AI workloads. The chip directly replaces CPUs in the data centre, and is made to work alongside GPUs for greater efficiency.
Microsoft’s AI chip efforts are still shrouded in mystery, but reports have emerged that the company is working on a chip all the same. Codenamed Athena, this specialised chip will be optimised for AI workloads, especially those of OpenAI.
It is worth noting that Intel is also looking to compete in the same market, with Aurora being somewhat of a proof of concept for the enterprise success of its GPUs and CPUs. Sapphire Rapids is as of yet unproven, and has received somewhat of a lukewarm response from the market. However, it seems that Intel’s Data Centre GPU Max series might just give NVIDIA a run for its money.
According to Intel, the newest line of GPUs outperforms the NVIDIA H100 by an average of 30%-50%. The Xeon Max Series CPU also beats out AMD’s Genoa chips by 65%. With its Gaudi2 series of deep learning accelerators, Intel is looking to take more slices of NVIDIA’s pie, cutting into the latter’s market share slowly, but surely.
While AMD is currently dominating the leaderboard, Intel’s new set of chips is hinging on Aurora’s impact to make a splash in the enterprise world. Even as big techs pour R&D costs into creating their own chips, it seems likely that Intel might claw its way back to victory from the jaws of defeat in the HPC (high-performance computing) market.