Active Hackathon

Do Convolutional Networks Perform Better With Depth?

“Double descent does not happen through depth.”

The double descent curve, tells that increasing model capacity past the interpolation threshold can lead to a decrease in test error. Increasing neural network capacity through width leads to double descent. But what about the depth of the neural network? How does increase or reduction in-depth play out towards the end? A group of researchers from MIT have attempted to explore this question in their work titled, “Do Deeper Convolutional Networks Perform Better?”.

The Depth And Death Of Complexity 

Double Descent illustration. (Source: OpenAI)

Not adhering to the common notion that standard statistical machine learning theory predicts that bigger models should be more prone to overfitting, Mikhail Belkin and his peers in their seminal paper have discovered that the standard bias-variance tradeoff actually derails once it hits the “interpolation threshold”.


Sign up for your weekly dose of what's up in emerging technology.

Statistics tell us that over parameterization leads to overfitting. That is, as models become complex, their ability to generalise drops. Deep neural networks, on the contrary, have performed well with increasing complexity. The phenomena called double descent, explains this conundrum. Before we go further into the depth of a network and double descent, let’s discuss the double descent itself briefly. Double descent as a concept was popularised late last year by the researchers at OpenAI

The OpenAI researchers explored the bias-variance tradeoff before the interpolation threshold holds and how increasing model complexity leads to overfitting, increasing test error. Double descent introduced the concept of interpolation threshold, and how the results vary depending upon which side of the threshold they are on and once this threshold is breached, the test error reduces. 

The work presents two situations for test error to increase with sample size:

(A): Training error increases with a sample size

(B): Generalization gap increases with sample size.

Now there can exist cases where (A) is true and (B) is false and vice versa or where both (A) and (B) are true.

The models for a critical number of samples, try very hard to fit the train set. This can destroy the global structure of the model. For fewer samples, the OpenAI researchers stated that the models are overparameterized enough to fit the train set while still behaving well on the distribution.

Until last year, the double descent behaviour hasn’t been explored owing to several barriers. The double descent curve, to be observed, requires a parametric family of spaces with functions of arbitrary complexity. 

So far, the complexity is defined in terms of increasing the width of the network. So, what role does depth have? 

{Note: A width is a number of nodes on each layer whereas depth is the number of layers itself.}

To understand the role of depth, the researchers at MIT considered linear neural networks. According to the authors, linear neural networks are useful for analysing deep learning phenomena since they represent linear operators but have non-convex optimisation landscapes. Furthermore, the solution learned by a single layer fully connected network is well understood. 

The experiments on the linear autoencoders and linear convolutional classifiers, concluded the authors, consistently demonstrate that the test accuracy decreases once it hits the interpolation threshold. 

Key Takeaways

  • Experiments in the classification setting on CIFAR10 and ImageNet32 using ResNets and fully-convolutional networks demonstrate that test performance worsens beyond a critical depth. 
  • The test accuracy of convolutional networks approaches that of fully connected networks as depth increases.
  • Increasing depth leads to poor generalisation.
  • Against conventional wisdom, our findings indicate that when models are near or past the interpolation threshold (e.g. achieving 100% training accuracy), practitioners should decrease the depth of these networks to improve their performance. 
  • The driving force behind the success of deep learning is not the depth of the models, but rather their width.

Download the original paper here.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

The curious case of Google Cloud revenue

Porat had earlier said that Google Cloud was putting in money to make more money, but even with the bucket-loads of money that it was making, profitability was still elusive.

Global Parliaments can do much more with Artificial Intelligence

The world is using AI to enhance the performance of its policymakers. India, too, has launched its own machine learning system NeVA, which at the moment is not fully implemented across the nation. How can we learn and adopt from the advancement in the Parliaments around the world? 

Why IISc wins?

IISc was selected as the world’s top research university, trumping some of the top Ivy League colleges in the QS World University Rankings 2022