MITB Banner

The beginning of the end for Convolutional Neural Networks?

One cannot discount ConvNets of its several flaws. Some of these limitations are very fundamental, pushing users to prefer other models over ConvNets.
Share
convolutional neural networks

Yann LeCun’s earliest breakthroughs came with the invention of Convolutional Neural Networks (ConvNets). He first introduced them in the 1980s when he was a postdoctoral research associate at the University of Toronto. Inspired by the earlier works of Japanese computer scientist Kunihiko Fukushima, ConvNets were modelled after the brain’s visual cortex, a part that handles sight.

Over the years, ConvNets’ popularity grew by leaps and bounds. This popularity can be attributed largely to its architecture, effectiveness, and accuracy. They have been widely adopted for a large number of industrial applications like recommender systems, natural language processing, etc.

That said, one cannot discount ConvNets of its several flaws. Some of these limitations are very fundamental, pushing users to prefer other models over ConvNets. An example of one such model is Transformer. Initially used extensively for language processing applications, its scope has expanded to computer vision, TinyML, among others.

Is it the beginning of the end for ConvNets?

ConvNets and their limitations

ConvNets learn everything end-to-end. They combine evidence and generalise across positions. ConvNets use layers of feature detectors, and each of these feature detectors is local and repeated across space. One of the key challenges with computer vision is data variance in the real world. The human vision system can recognise objects from different angles, backgrounds, and even under different lighting conditions. In a case where objects are partially obstructed, the vision system uses cues to fill in the missing information.

While ConvNets are designed well enough to cope with translations, meaning they can correctly identify the position of the object in the image, the same cannot be said for dealing with the effects of changing viewpoints like rotations and scaling. ConvNets cannot handle rotation at all. In a speech, Goeff Hinton said that ConvNets could not deal with handedness detection at all. This means that if a ConvNet is trained on both left and right shoes, it would not be able to tell the difference between the two.

According to Hinton, one of the ways to solve this is by using 4D or 6D maps for training AI to perform object detection. This, however, is very expensive. For the present time, researchers just gather a lot of images that display the object in various positions. This, again, is not a very efficient method. Hinton said, “We’d like neural nets that generalise to new viewpoints effortlessly. If they learn to recognise something, and you make it ten times as big, and you rotate it 60 degrees, it shouldn’t cause them any problem at all. We know computer graphics is like that, and we’d like to make neural nets more like that.”

Another major disadvantage with ConvNets is the pooling layers. Pooling in ConvNets is for generalising features and helping the network recognise the feature independent of its location in the image. Pooling is especially useful in an image classification task where the user has to detect the presence of a certain object in the image but are not very concerned about its location. Pooling leads to increased efficiency of the network and leads faster training. Location variance can improve the statistical efficiency of the network.

Source

That said, pooling layers lead to a loss of valuable information, and it ignores the larger relationship between the part and the whole. For example, if we are considering a face detector, we have to combine features like mouth, eyes, and a nose present at the correct location for it to classify as a face. A ConvNet will classify it as a face if these features are present, whether or not they are placed at the correct location.

To this end, Hinton and his team filed a patent on Capsule Neural Network as a replacement for ConvNets. The researchers had claimed they could replace ConvNets for traditional computer vision applications. This model could not only figure out the feature but also identify its position in the image.

Not just limited to weak generalisations

ConvNets recognise objects in a very different way than humans. These differences are not limited to weak generalisations. Adding even a tiny bit of noise to an image would lead ConvNets to recognise it as completely different.

Given the limitations of ConvNets, other models continue to soar in popularity, more prominently Transformers. After the success of large language models like GPT-2 and GPT-3, Transformers have been successfully deployed for computer vision applications. Vision Transformer, developed by Google’s team, is an image classification model that deploys transformer architecture over patches of the image.

PS: The story was written using a keyboard.
Picture of Shraddha Goled

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.
Related Posts

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories

Featured

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

AIM Conference Calendar

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives. Revel in intimate events that encapsulate the heart and soul of the AI Industry.

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed