Deep neural networks are defined by their depth. However, more depth implies increased sequential processing and delay. This depth raises the question of whether it is possible to construct high-performance “non-deep” neural networks. Princeton University and Intel Labs researchers demonstrate that it is.
Characteristics Of Depth
The fields of machine learning, computer vision, and natural language processing have been transformed by deep neural networks (DNNs). As its name implies, one of the primary characteristics of DNNs is their depth. They have a large depth, which can be defined as the longest path between an input neuron and an output neuron. Often, a neural network can be characterised as a linear sequence of layers with no intra-group connections. In these circumstances, the depth of a network is defined by its layer count.
It is widely believed that a significant depth is required for high-performance networks, as depth boosts a network’s representational capability and aids in learning increasingly abstract characteristics. Indeed, one of the key reasons for ResNets‘ success is that they enable extremely deep networks with up to 1000 layers. As a result, state-of-the-art performance is increasingly attained by training models with a high degree of depth, and the definition of “deep” has moved from “two or more layers” in the early days of deep learning to “tens or hundreds of layers” in today’s models.
Is Deeper Necessary?
However, is a great depth always necessary? The depth is an important issue to ask because great depth does not come without downsides. For example, a deeper network results in increased sequential processing and delay; it is also more difficult to parallelise and is, therefore, less appropriate for applications that require rapid response times.
Contrary to popular belief, the researchers discovered that this is indeed possible. They describe a non-deep network design that outperforms its deep equivalents. The researchers referred to the design as ParNet (Parallel Networks). They demonstrate for the first time that a classification network with a depth of 12 can achieve higher than 80% accuracy on ImageNet, 96% on CIFAR10, and 81% on CIFAR100. Additionally, the researchers demonstrate that a detection network with a shallow (12) backbone may obtain a 48% AP on MS-COCO. ParNet assists in addressing a scientific question regarding the necessity of great depth and provides practical benefits. ParNet may be efficiently parallelised over several processors due to its similar substructures.
To summarise, there are three contributions:
• For the first time, the researchers demonstrate that a neural network with a depth of 12 may perform well on extremely competitive benchmarks (80.7% on ImageNet, 96% on CIFAR10, 81% on CIFAR100).
• The researchers demonstrate how ParNet’s parallel structures can be used for fast, low-latency inference.
• The researchers examine ParNet scaling requirements and demonstrate how they can be effectively scaled while maintaining a continuous low depth.
Code is available at Non-Deep Networks
The researchers do this by layering parallel subnetworks rather than one layer after another. The current research contributes to the effective reduction of depth while keeping a high level of performance. The researchers analyse the design’s scaling rules and demonstrate how to improve performance without altering the network’s depth. Finally, the researchers demonstrate the feasibility of using non-deep networks to construct low-latency recognition systems.
The researchers established for the first time empirical evidence that non-deep networks can compete with deep networks in large-scale visual recognition benchmarks. They demonstrated that similar substructures could be leveraged to generate remarkably performant non-deep networks. Additionally, the researchers demonstrated methods for scaling up and optimising the performance of such networks without expanding their depth. The work demonstrates alternate designs for highly accurate neural networks that do not require deep networks. Such designs may be more suitable for future multi-chip processors. Moreover, the researchers anticipate that the work will aid in the construction of highly precise and rapid neural networks.
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
Nivash holds a doctorate in information technology and has been a research associate at a university and a development engineer in the IT industry. Data science and machine learning excite him.