MITB Banner

GPU has an Energy Problem

If energy costs for running GPUs are climbing, what is the endgame?

Share

Listen to this story

While GPUs are fuelling a massive growth for NVIDIA thanks to the burgeoning demand from AI companies, the costs incurred by these companies don’t end with just GPUs. The energy cost for running these GPUs in data centres is enormous. Recently, a study showed that data centres consumed approximately 1,000 kWh per square metre, which is about 10x the power consumption of a typical American household. BLOOM, an LLM, utilised 914kWh over an 18-day period while running on 16 NVIDIA A100 40GB GPUs, managing an average of 558 requests/hour. 

Climbing Costs

As per an article by Sequoia, if $1 is spent on a GPU, approximately another dollar is spent on energy costs for running that GPU in a data centre, and taking into consideration how the companies buying them will need to make a margin, the costs incurred would almost be two-fold. 

Training AI models within data centres can require up to three times the energy compared to typical cloud workloads, thereby straining the infrastructure needs. For instance, AI servers with GPUs may require up to 2 kW of power, whereas a standard cloud server will require only 300-500W. 

Last year, in Northern Virginia’s Data Center Alley, there was almost a power outage owing to large consumption. It is also believed that the current generation of data centres are ill-equipped to handle surge demand owing to AI-related activities. Furthermore, power usage is expected to surpass 35GW per year by 2030. 

Source: McKinsey

As per research, AI data centre server infrastructure along with operating costs is said to cross $76 billion by 2028. This exceeds twice the estimated annual operating cost of AWS which holds about one-third the cloud infrastructure services market. Big tech companies are also shelling big on running them. Earlier this year, The Information had estimated that OpenAI spends close to $700,000 daily for running its models. Owing to massive amounts of computing power required, the infrastructural cost for running the AI model is extremely high. 

Considering the trend in which companies are speeding through the generative AI race, a Gartner study projects that in the next two years, the exorbitant costs will exceed the value generated, which would lead to about 50% of large enterprises pulling the plug on its large-scale AI model developments by 2028. 

A user on X who reviews CPU coolers, spoke about how he would choose an energy efficient GPU not to avoid high electricity bills but to avoid all that heat generation. 

Workaround for Energy Efficiency

Specialised data centres that are aimed at running generative AI workloads are springing. Sprouting in suburban locations that are away from big markets and running on existing electrical networks without shorting them are options companies are looking at. With quicker connectivity and reduced expenses, these are emerging as viable alternatives.

Innovative technology to build cooler data centres are also being pursued. As part of a government program COOLERCHIPS, the US Department of Energy, recently awarded $40 million to fund 15 projects. NVIDIA has been granted $5 million to build a data centre with revolutionary cooling systems to boost energy efficiency. The team is building an innovative liquid-cooling system that can efficiently cool a data centre in a mobile container, even when it operates at temperatures as high as 40 degree celsius drawing 200kW of power. The new system is said to run 20% more efficiently than current air-cooled approaches and will cost at least 5% less. 

In the near-future, a probable possibility of renewable energy sources that can power data centres is also likely. Going by how big tech leaders are increasingly investing in nuclear energy companies, and with Microsoft posting a job opening for a ‘principal program manager for nuclear technology,’ nothing can be ruled out. It might pave the way for energy and cost-effective alternatives that might address the current problem. 

With the current pattern of energy consumption slated to only go up, the rising demand of GPUs can also create another scenario. With increased adoption, the cost of GPUs can eventually come down, thereby achieving a trade-off with energy consumption. 

Share
Picture of Vandana Nair

Vandana Nair

As a rare blend of engineering, MBA, and journalism degree, Vandana Nair brings a unique combination of technical know-how, business acumen, and storytelling skills to the table. Her insatiable curiosity for all things startups, businesses, and AI technologies ensures that there's always a fresh and insightful perspective to her reporting.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.