Researchers from MIT recently collaborated with the University of Brasilia and Yonsei University to estimate the computational limits of deep learning (DL). They stated, “The computational needs of deep learning scale so rapidly that they will quickly become burdensome again.”
The researchers analysed 1,058 research papers from the arXiv pre-print repository and other benchmark references in order to understand how the performance of deep learning techniques depends on the computational power of several important application areas.
They stated, “To understand why DL is so computationally expensive, we analyse its statistical as well as computational scaling in theory. We show DL is not computationally expensive by accident, but by design.”
They added, “The same flexibility that makes it excellent at modelling the diverse phenomena as well as outperforming the expert models also makes it more computationally expensive in nature. Despite this, we realised that the actual computational burden of DL models is scaling more rapidly than lower bounds from theory, suggesting that substantial improvements might be possible.”
Finding The Limits
The researchers described the computational demands of deep learning applications in five prominent application areas and showed that progress in all five is strongly reliant on increases in computing power. The five prominent application domains are image classification (ImageNet benchmark), object detection (MS COCO), question answering (SQuAD 1.1), named entity recognition (COLLN 2003), and machine translation (WMT 2014 En-to-Fr).
The researchers showed that the computational requirements have escalated quickly in each of the above-mentioned application domains and also mentioned that these increases in computing power have been central cause to the improvement in performances.
They further performed two separate analysis of computational requirements, reflecting the two types of information that were available:
- Computation Per Network Pass: Computation Per Network Pass is the number of floating-point operations expected for a single pass in the network.
- Hardware Burden: Hardware Burden is the computational capability of the hardware used to train the model, which is calculated as processors × ComputationRate × time.
The Computational Requirements
According to the researchers, the relationship between performance, model complexity, and computational requirements in deep learning are still not well understood theoretically. Due to the role of over-parameterisation, deep learning is intrinsically more reliant on computing power than other techniques.
The researchers stated that the challenge of over-parameterisation is that the number of deep learning parameters must grow as the number of data points grows. Since the cost of training a deep learning model scales with the product of the number of parameters with the number of data points, this implies that the computational requirements grow as at least the square of the number of data points in the over-parameterised setting.
Methods like stochastic gradient-based optimisation can help in providing a regularising effect in case of over-parameterising. While in regression, one of the simplest forms of regularisation is the Lasso regression, which penalises the number of non-zero coefficients in the model, making it sparser in nature.
However, these improvements allow the regularised model to be much more flexible, but this comes with much higher computational costs associated with estimating a large number of parameters.
The researchers stated, “By analogy, we can see that deep learning performs well because it uses over-parameterisation to create a highly flexible model and uses (implicit) regularisation to make the sample complexity tractable.
They added, “At the same time, however, deep learning requires vastly more computation than more efficient models. Thus, the great flexibility of deep learning inherently implies a dependence on large amounts of data and computation.”
Wrapping Up
Deep learning has gained much traction over recent years. The technique has shown its superiority to humans by beating in popular games like Go, Poker, among others. According to the researchers, continued progress in these applications will require dramatically more computationally-efficient methods, which will either have to come from changes to deep learning or from moving to other machine learning methods.
You can read the paper here.