The beginning of the end for Convolutional Neural Networks?

One cannot discount ConvNets of its several flaws. Some of these limitations are very fundamental, pushing users to prefer other models over ConvNets.
convolutional neural networks

Yann LeCun’s earliest breakthroughs came with the invention of Convolutional Neural Networks (ConvNets). He first introduced them in the 1980s when he was a postdoctoral research associate at the University of Toronto. Inspired by the earlier works of Japanese computer scientist Kunihiko Fukushima, ConvNets were modelled after the brain’s visual cortex, a part that handles sight.

Over the years, ConvNets’ popularity grew by leaps and bounds. This popularity can be attributed largely to its architecture, effectiveness, and accuracy. They have been widely adopted for a large number of industrial applications like recommender systems, natural language processing, etc.

That said, one cannot discount ConvNets of its several flaws. Some of these limitations are very fundamental, pushing users to prefer other models over ConvNets. An example of one such model is Transformer. Initially used extensively for language processing applications, its scope has expanded to computer vision, TinyML, among others.

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Is it the beginning of the end for ConvNets?

ConvNets and their limitations

ConvNets learn everything end-to-end. They combine evidence and generalise across positions. ConvNets use layers of feature detectors, and each of these feature detectors is local and repeated across space. One of the key challenges with computer vision is data variance in the real world. The human vision system can recognise objects from different angles, backgrounds, and even under different lighting conditions. In a case where objects are partially obstructed, the vision system uses cues to fill in the missing information.

Download our Mobile App

While ConvNets are designed well enough to cope with translations, meaning they can correctly identify the position of the object in the image, the same cannot be said for dealing with the effects of changing viewpoints like rotations and scaling. ConvNets cannot handle rotation at all. In a speech, Goeff Hinton said that ConvNets could not deal with handedness detection at all. This means that if a ConvNet is trained on both left and right shoes, it would not be able to tell the difference between the two.

According to Hinton, one of the ways to solve this is by using 4D or 6D maps for training AI to perform object detection. This, however, is very expensive. For the present time, researchers just gather a lot of images that display the object in various positions. This, again, is not a very efficient method. Hinton said, “We’d like neural nets that generalise to new viewpoints effortlessly. If they learn to recognise something, and you make it ten times as big, and you rotate it 60 degrees, it shouldn’t cause them any problem at all. We know computer graphics is like that, and we’d like to make neural nets more like that.”

Another major disadvantage with ConvNets is the pooling layers. Pooling in ConvNets is for generalising features and helping the network recognise the feature independent of its location in the image. Pooling is especially useful in an image classification task where the user has to detect the presence of a certain object in the image but are not very concerned about its location. Pooling leads to increased efficiency of the network and leads faster training. Location variance can improve the statistical efficiency of the network.


That said, pooling layers lead to a loss of valuable information, and it ignores the larger relationship between the part and the whole. For example, if we are considering a face detector, we have to combine features like mouth, eyes, and a nose present at the correct location for it to classify as a face. A ConvNet will classify it as a face if these features are present, whether or not they are placed at the correct location.

To this end, Hinton and his team filed a patent on Capsule Neural Network as a replacement for ConvNets. The researchers had claimed they could replace ConvNets for traditional computer vision applications. This model could not only figure out the feature but also identify its position in the image.

Not just limited to weak generalisations

ConvNets recognise objects in a very different way than humans. These differences are not limited to weak generalisations. Adding even a tiny bit of noise to an image would lead ConvNets to recognise it as completely different.

Given the limitations of ConvNets, other models continue to soar in popularity, more prominently Transformers. After the success of large language models like GPT-2 and GPT-3, Transformers have been successfully deployed for computer vision applications. Vision Transformer, developed by Google’s team, is an image classification model that deploys transformer architecture over patches of the image.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox