Listen to this story
Over the past decade, the enterprise hardware space has witnessed market leaders, such as AMD and Intel, push for more cores in their CPUs. Quoting improvements to scalability and increased advantages for cloud service providers, this industry trend has been left more or less unchecked over the last few years. However, HPC (high-performance computing) customers like IBM, Atos, Oracle, and HPE are now demanding more than just cores; they need better CPUs.
To understand why CPU manufacturers are pushing for more cores per socket, we must first understand the requirements of enterprise users. One of the biggest limiting factors for efficiency in data centres is space. Servers are laid out in a rack system, with each rack containing a fully-configured computer (or server infrastructure like network switches and cooling solution). Each rack can hold up to four sockets for CPUs, putting a hard cap on the amount of computing power that can be held in a single rack, and—by extension—the server farm.
Owing to this limitation, cloud service providers and hyperscalers often ran into problems when trying to scale their resources, which led to a booming demand for CPUs with more cores. AMD and Intel then sunk a lot of their R&D resources into developing new microarchitectures that let them put more cores on every chip. For example, AMD’s Threadripper and Epyc lineup have benefitted greatly from the new chiplet microarchitecture.
Sign up for your weekly dose of what's up in emerging technology.
Since then, the chip manufacturers have only been focusing on increasing core counts. One only needs to look at the difference between the current enterprise tech field and what it was ten years ago. In 2013, Intel released the Ivy Bridge versions of the Xeon, which offered a maximum of 12 cores per socket on their highest end chips. In comparison, their new Sapphire Rapids chips go up to 60 cores per socket. The same goes for AMD, who came out of the game with their Epyc Naples line of chips in 2017 with 32 cores per socket. Now, the latest Epyc chip caps out at a whopping 96 cores per socket.
Hyperscalers is a term used to refer to companies like Google, Meta, and Amazon that are focusing on leveraging cloud computing for business growth in multiple verticals.
Download our Mobile App
Beyond Core Counts
While adding more cores provides scalability options for cloud computing giants and hyperscalers like Google, Meta, and others, it simultaneously implies that the rest of the market is left behind. Scott Tease, Lenovo’s vice president of HPC and AI, said,
“We’re kind of swimming in cores. We’re seeing very, very few customers that are really looking for 96 cores or 128 cores. It’d be much nicer to have an 8-core or 16-core part at a 4-plus GHz kind of frequency.”
Due to the focus on packing more cores into CPUs, other metrics like clock speed, memory bandwidth, and power consumption have fallen to the wayside. Since the last ten years, enterprise CPUs have largely been relegated to frequencies around 3 GHz. To juxtapose, AMD’s latest consumer chips can boost their frequencies up to 5.7 GHz, with Intel’s chips going up to 6 GHz.
The same goes with memory bandwidth, with AMD enhancing theirs to up to 460 GB/s of bandwidth with the rise of DDR5 memory. Intel Xeon has decided to adopt the high-bandwidth memory (HBM) model, sacrificing upgradability and cost for 1 TB/s of memory bandwidth. These are only stopgap solutions, as CPU architecture must be fundamentally reworked to allow for more memory lanes and bandwidth.
Power consumption is also another concern compounded by rising core counts. The latest line of Epyc and Xeon processors consume 400 watts and 350 watts, respectively, for each CPU, which quickly adds up on the scale of a server farm. Operators must also account for the higher heat output that comes with such power-hungry chips, necessitating exotic cooling solutions that add to costs.
Solving these problems requires fundamental rework at the silicon level, but it seems like hyperscalers don’t want to wait for these companies to innovate. Amazon has already created AWS Graviton for x86 applications on the cloud, with Google creating the TPU for AI-focused workloads. Azure is reportedly also working on creating custom silicon for their cloud workloads.
While the demand from cloud service providers and hyperscalers is obviously the reason for chip manufacturers to cram more cores into their processors, the industry must now turn towards making better CPUs for the future. Better memory and faster storage can only take CPUs half way there, but AMD and Intel must adapt or lose out to custom silicon.
CPU in decline?
When considering the market on the scale of tech giants, the direction is clear. Custom silicon is the way forward, a trend that even AMD and Intel have recognised. AMD’s Instinct chips aim to bring together CPU and GPU capabilities and combine it with high-bandwidth memory and their 3D V-Cache technology. Intel is also combining its fledgling GPU technologies with their CPUs in their “Falcon Shores” offering.
While these offerings are sure to fill the gap when it comes to hyper scale requirements, the HPC industry has already moved towards FPGAs and GPUs for their requirements. For example, CERN is using FPGAs to accelerate their workloads for discovering dark matter.
Microsoft has also created Project Catapult, an initiative bringing together FPGAs and CPUs to create programmable silicon in the cloud. Physical simulations, a mainstay of HPC, have matured to the point where they run extremely well on GPUs, relegating CPUs to the task of managing system resources.
Within the next decade, CPU manufacturers must either adopt or see their enterprise market share shrink. With the rise of custom silicon and capabilities of FPGAs and GPUs, CPUs must provide a solid competitive advantage for HPC or risk dying out.