Is Depth In Neural Networks Always Preferable? This Research Says the Contrary

Deep neural networks are defined by their depth. However, more depth implies increased sequential processing and delay. This depth raises the question of whether it is possible to construct high-performance “non-deep” neural networks. Princeton University and Intel Labs researchers demonstrate that it is. 

Characteristics Of Depth 

The fields of machine learning, computer vision, and natural language processing have been transformed by deep neural networks (DNNs). As its name implies, one of the primary characteristics of DNNs is their depth. They have a large depth, which can be defined as the longest path between an input neuron and an output neuron. Often, a neural network can be characterised as a linear sequence of layers with no intra-group connections. In these circumstances, the depth of a network is defined by its layer count. 

It is widely believed that a significant depth is required for high-performance networks, as depth boosts a network’s representational capability and aids in learning increasingly abstract characteristics. Indeed, one of the key reasons for ResNets‘ success is that they enable extremely deep networks with up to 1000 layers. As a result, state-of-the-art performance is increasingly attained by training models with a high degree of depth, and the definition of “deep” has moved from “two or more layers” in the early days of deep learning to “tens or hundreds of layers” in today’s models.


Sign up for your weekly dose of what's up in emerging technology.

Is Deeper Necessary?

However, is a great depth always necessary? The depth is an important issue to ask because great depth does not come without downsides. For example, a deeper network results in increased sequential processing and delay; it is also more difficult to parallelise and is, therefore, less appropriate for applications that require rapid response times.

Contrary to popular belief, the researchers discovered that this is indeed possible. They describe a non-deep network design that outperforms its deep equivalents. The researchers referred to the design as ParNet (Parallel Networks). They demonstrate for the first time that a classification network with a depth of 12 can achieve higher than 80% accuracy on ImageNet, 96% on CIFAR10, and 81% on CIFAR100. Additionally, the researchers demonstrate that a detection network with a shallow (12) backbone may obtain a 48% AP on MS-COCO. ParNet assists in addressing a scientific question regarding the necessity of great depth and provides practical benefits. ParNet may be efficiently parallelised over several processors due to its similar substructures.

Download our Mobile App

Research Contributions

To summarise, there are three contributions:

• For the first time, the researchers demonstrate that a neural network with a depth of 12 may perform well on extremely competitive benchmarks (80.7% on ImageNet, 96% on CIFAR10, 81% on CIFAR100).

• The researchers demonstrate how ParNet’s parallel structures can be used for fast, low-latency inference.

• The researchers examine ParNet scaling requirements and demonstrate how they can be effectively scaled while maintaining a continuous low depth.

Code is available at Non-Deep Networks

The researchers do this by layering parallel subnetworks rather than one layer after another. The current research contributes to the effective reduction of depth while keeping a high level of performance.  The researchers analyse the design’s scaling rules and demonstrate how to improve performance without altering the network’s depth. Finally, the researchers demonstrate the feasibility of using non-deep networks to construct low-latency recognition systems.


The researchers established for the first time empirical evidence that non-deep networks can compete with deep networks in large-scale visual recognition benchmarks. They demonstrated that similar substructures could be leveraged to generate remarkably performant non-deep networks. Additionally, the researchers demonstrated methods for scaling up and optimising the performance of such networks without expanding their depth. The work demonstrates alternate designs for highly accurate neural networks that do not require deep networks. Such designs may be more suitable for future multi-chip processors. Moreover, the researchers anticipate that the work will aid in the construction of highly precise and rapid neural networks.

Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Dr. Nivash Jeevanandam
Nivash holds a doctorate in information technology and has been a research associate at a university and a development engineer in the IT industry. Data science and machine learning excite him.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges