Deep learning models have brought great success to NLP applications thanks to the untiring efforts of the ML community to improve the accuracy of these models. These improvements, however, come at a cost. The computational resources required and the time consumed add up to the overall tweaking of the model. NLP models especially, have become quite popular with Microsoft, Google and NVIDIA releasing large models in the past couple of years. But how much does training these models cost is rarely talked about. In an effort to investigate this, Israeli company AI21 labs published a work detailing the costs of model training.
Deep Learning Is A Costly Affair
The researchers at AI21 labs have quantitatively estimated the costs of training differently sized BERT models on the Wikipedia and Book corpora (15 GB) where they have obtained the cost of one training run, and a typical fully-loaded cost. The following figures are based on the experiments carried out at AI21 Labs:
Sign up for your weekly dose of what's up in emerging technology.
- $2.5k – $50k (110 million parameters)
- $10k – $200k (340 million parameters)
- $80k – $1.6m (1.5 billion parameters)
However, the authors admit that the cost can be lower than the ones displayed above owing to using preemptible versions of the system, but not very far from these values. These figures also assume the usage of cloud solutions such as GCP or AWS, and on-premise implementations are sometimes cheaper. Still, the figures provide a general sense of the costs.
Based on information released by Google, the researchers estimate that, at list-price, training the 11B parameter variant of T5 costs well above $1.3 million for a single run. Assuming 2-3 runs of the large model and hundreds of the small ones, the (list-) price tag for the entire project may have been $10 million.
A similar but more rigorous approach was taken by Emma Strubell and her colleagues in a work published last year. The results of their work can be seen as follows:
The above table contains the estimated cost of training a model in terms of CO2 emissions (lbs) and cloud compute cost in USD.
Since it has been established that the training of deep learning is indeed an expensive affair. Let’s take a look at the factors that contribute to such high costs (according to AI21 labs):
- size of a dataset,
- model size and
- training volume
The researchers say that an increase in above factors result in an increase in the number of FLOPs, and the costs usually boil down to the number of FLOPs.
And since there isn’t a proper formula to quantify how many FLOPs are needed for a certain NLP model, things get more complicated.
Based on their observations, the authors have concluded their findings as follows:
- Since the prices on AWS were reduced over 65 times since its launch in 2006, and by as much as 73% between 2014 and 2017, they expect the same trend for AI-oriented compute offerings
- There needs to be an end to the state-of-the-art race as many top players put all their resources just to top a leaderboard, which is impractical to put into use for others
- Useful as neural networks are, there is a school of thought that holds that statistical ML is necessary but insufficient, and will get you just that far
Smaller organisations do not have the resources to replicate the successes these leaderboard toppers flaunt, so the authors conclude by saying that the Googles of the world should pre-train and publish the large language models while the rest of the world stick to fine-tuning them as it would be an affordable approach.