Why Deep Learning Is A Costly Affair

Deep learning models have brought great success to NLP applications thanks to the untiring efforts of the ML community to improve the accuracy of these models. These improvements, however, come at a cost. The computational resources required and the time consumed add up to the overall tweaking of the model. NLP models especially, have become quite popular with Microsoft, Google and NVIDIA releasing large models in the past couple of years. But how much does training these models cost is rarely talked about. In an effort to investigate this, Israeli company AI21 labs published a work detailing the costs of model training.

Deep Learning Is A Costly Affair

via AI21 labs

The researchers at AI21 labs have quantitatively estimated the costs of training differently sized BERT models on the Wikipedia and Book corpora (15 GB) where they have obtained the cost of one training run, and a typical fully-loaded cost. The following figures are based on the experiments carried out at AI21 Labs:

  • $2.5k – $50k (110 million parameters)
  • $10k – $200k (340 million parameters)
  • $80k – $1.6m (1.5 billion parameters)

However, the authors admit that the cost can be lower than the ones displayed above owing to using preemptible versions of the system, but not very far from these values. These figures also assume the usage of cloud solutions such as GCP or AWS, and on-premise implementations are sometimes cheaper. Still, the figures provide a general sense of the costs.  

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Based on information released by Google, the researchers estimate that, at list-price, training the 11B parameter variant of T5 costs well above $1.3 million for a single run. Assuming 2-3 runs of the large model and hundreds of the small ones, the (list-) price tag for the entire project may have been $10 million.

A similar but more rigorous approach was taken by Emma Strubell and her colleagues in a work published last year. The results of their work can be seen as follows:

Download our Mobile App

The above table contains the estimated cost of training a model in terms of CO2 emissions (lbs) and cloud compute cost in USD.

Since it has been established that the training of deep learning is indeed an expensive affair. Let’s take a look at the factors that contribute to such high costs (according to AI21 labs):

  • size of a dataset, 
  • model size and 
  • training volume 

The researchers say that an increase in above factors result in an increase in the number of FLOPs, and the costs usually boil down to the number of FLOPs. 

And since there isn’t a proper formula to quantify how many FLOPs are needed for a certain NLP model, things get more complicated.


Based on their observations, the authors have concluded their findings as follows:

  • Since the prices on AWS were reduced over 65 times since its launch in 2006, and by as much as 73% between 2014 and 2017, they expect the same trend for AI-oriented compute offerings
  • There needs to be an end to the state-of-the-art race as many top players put all their resources just to top a leaderboard, which is impractical to put into use for others
  • Useful as neural networks are, there is a school of thought that holds that statistical ML is necessary but insufficient, and will get you just that far

Smaller organisations do not have the resources to replicate the successes these leaderboard toppers flaunt, so the authors conclude by saying that the Googles of the world should pre-train and publish the large language models while the rest of the world stick to fine-tuning them as it would be an affordable approach.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Is Foxconn Conning India?

Most recently, Foxconn found itself embroiled in controversy when both Telangana and Karnataka governments simultaneously claimed Foxconn to have signed up for big investments in their respective states