Recently, researchers from Amazon and the University of California, Davis introduced ResNeSt, which is a new variant of ResNet. The researchers claimed that this network preserves the overall ResNet structure that is to be used in downstream tasks straightforwardly without any additional computational costs.
Improvements of models and networks are one of the trending ways to enhance technologies and make advancements. Over the years, deep convolutional networks have achieved significant breakthroughs for image classification in computer vision. Previously, in one of our articles, we discussed how ResNets constitute a substantial breakthrough in image processing.
ResNet was created by Kaiming He in the year 2015 in a paper titled Deep Residual Learning for Image Recognition. It is basically a residual learning framework that is built to ease the training of networks that are substantially deeper than those which have been used earlier. ResNet has become one of the most successful CNN architectures which have been adopted in various computer vision applications.
The researchers created a ResNet-like network by stacking several Split-Attention blocks, which they named as ResNeSt, where S stands for the split.
According to the researchers, the first contribution of this research is accomplished by exploring a simple architectural modification of the ResNet, incorporating feature-map split attention within the individual network blocks. The resulting unit is referred to as a Split-Attention block, which is a computational unit, consisting of a feature-map group and split attention operations.
By stacking several Split-Attention blocks, the researchers create a ResNet-like network known as ResNeSt. The researchers claimed that the ResNeSt architecture requires less computation than existing ResNet-variants, and is easy to be adopted as a backbone for other vision tasks.
How It Works
ResNeSt generalizes the channel-wise attention into feature-map group representation, which can be modularised and accelerated using unified CNN operators. The researchers studied the image classification performance of ResNeSt on the ImageNet 2012 dataset with 1.28M training images and 50K validation images from 1000 different classes.
ResNeSt is based on the ResNet-D model. The researchers performed different types of training that eventually improved the accuracy of ResNetD-50 from 78.31% to 79.15%. While employing the Split-Attention block to form a ResNeSt-50-fast model, accuracy was further boosted to 80.64%. The new technique has also delivered better accuracy and latency trade-off than models found via neural architecture search (NAS).
The final outcomes revealed that ResNeSt outperformed all ResNet variants with a similar number of network parameters and FLOPS, including ResNet, ResNeXt, SENet, ResNet-D, and SKNet. According to the researchers, ResNeSt-50 achieves 80.64 top-1 accuracy, which is the first 50-layer ResNet variant that surpasses 80% on ImageNet.
In this paper, the researchers used datasets like Cityscapes dataset, including COCO-2017 and ImageNet dataset.
Contributions By The Researchers
Below are some of the contributions mentioned by the researchers
- The researchers stacked several Split-Attention blocks to create a ResNet-like network called ResNeSt
- The second contributions of this research are large scale benchmarks on image classification and transfer learning applications.
According to the researchers, ResNeSt outperformed all existing ResNet variants and has the same computational efficiency and even achieves better speed-accuracy trade-offs than state-of-the-art CNN models produced via neural architecture search. The network also enables feature-map attention across different feature-map groups.
The researchers also claimed that models utilising a ResNeSt backbone are able to achieve as well as improve the learned feature representations to boost performance across image classification, object detection, instance segmentation and semantic segmentation. For future work, the researchers will be working on augmenting the search spaces for neural architecture search and potentially improve the overall performance.
Read the paper here.