GPU has an Energy Problem

If energy costs for running GPUs are climbing, what is the endgame?
Listen to this story

While GPUs are fuelling a massive growth for NVIDIA thanks to the burgeoning demand from AI companies, the costs incurred by these companies don’t end with just GPUs. The energy cost for running these GPUs in data centres is enormous. Recently, a study showed that data centres consumed approximately 1,000 kWh per square metre, which is about 10x the power consumption of a typical American household. BLOOM, an LLM, utilised 914kWh over an 18-day period while running on 16 NVIDIA A100 40GB GPUs, managing an average of 558 requests/hour. 

Climbing Costs

As per an article by Sequoia, if $1 is spent on a GPU, approximately another dollar is spent on energy costs for running that GPU in a data centre, and taking into consideration how the companies buying them will need to make a margin, the costs incurred would almost be two-fold. 

Training AI models within data centres can require up to three times the energy compared to typical cloud workloads, thereby straining the infrastructure needs. For instance, AI servers with GPUs may require up to 2 kW of power, whereas a standard cloud server will require only 300-500W. 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Last year, in Northern Virginia’s Data Center Alley, there was almost a power outage owing to large consumption. It is also believed that the current generation of data centres are ill-equipped to handle surge demand owing to AI-related activities. Furthermore, power usage is expected to surpass 35GW per year by 2030. 




Source: McKinsey

As per research, AI data centre server infrastructure along with operating costs is said to cross $76 billion by 2028. This exceeds twice the estimated annual operating cost of AWS which holds about one-third the cloud infrastructure services market. Big tech companies are also shelling big on running them. Earlier this year, The Information had estimated that OpenAI spends close to $700,000 daily for running its models. Owing to massive amounts of computing power required, the infrastructural cost for running the AI model is extremely high. 

Considering the trend in which companies are speeding through the generative AI race, a Gartner study projects that in the next two years, the exorbitant costs will exceed the value generated, which would lead to about 50% of large enterprises pulling the plug on its large-scale AI model developments by 2028. 

A user on X who reviews CPU coolers, spoke about how he would choose an energy efficient GPU not to avoid high electricity bills but to avoid all that heat generation. 

Workaround for Energy Efficiency

Specialised data centres that are aimed at running generative AI workloads are springing. Sprouting in suburban locations that are away from big markets and running on existing electrical networks without shorting them are options companies are looking at. With quicker connectivity and reduced expenses, these are emerging as viable alternatives.

Innovative technology to build cooler data centres are also being pursued. As part of a government program COOLERCHIPS, the US Department of Energy, recently awarded $40 million to fund 15 projects. NVIDIA has been granted $5 million to build a data centre with revolutionary cooling systems to boost energy efficiency. The team is building an innovative liquid-cooling system that can efficiently cool a data centre in a mobile container, even when it operates at temperatures as high as 40 degree celsius drawing 200kW of power. The new system is said to run 20% more efficiently than current air-cooled approaches and will cost at least 5% less. 

In the near-future, a probable possibility of renewable energy sources that can power data centres is also likely. Going by how big tech leaders are increasingly investing in nuclear energy companies, and with Microsoft posting a job opening for a ‘principal program manager for nuclear technology,’ nothing can be ruled out. It might pave the way for energy and cost-effective alternatives that might address the current problem. 

With the current pattern of energy consumption slated to only go up, the rising demand of GPUs can also create another scenario. With increased adoption, the cost of GPUs can eventually come down, thereby achieving a trade-off with energy consumption. 

Vandana Nair
As a rare blend of engineering, MBA, and journalism degree, Vandana Nair brings a unique combination of technical know-how, business acumen, and storytelling skills to the table. Her insatiable curiosity for all things startups, businesses, and AI technologies ensures that there's always a fresh and insightful perspective to her reporting.

Download our Mobile App

MachineHack

AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Strengthen Critical AI Skills with Trusted Corporate AI Training

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR