MITB Banner

Amazon’s ResNeSt Surpassed Popular CVPR Award Winner ResNet

Share

Recently, researchers from Amazon and the University of California, Davis introduced ResNeSt, which is a new variant of ResNet. The researchers claimed that this network preserves the overall ResNet structure that is to be used in downstream tasks straightforwardly without any additional computational costs.

Improvements of models and networks are one of the trending ways to enhance technologies and make advancements. Over the years, deep convolutional networks have achieved significant breakthroughs for image classification in computer vision. Previously, in one of our articles, we discussed how ResNets constitute a substantial breakthrough in image processing.   

ResNet was created by Kaiming He in the year 2015 in a paper titled Deep Residual Learning for Image Recognition. It is basically a residual learning framework that is built to ease the training of networks that are substantially deeper than those which have been used earlier.  ResNet has become one of the most successful CNN architectures which have been adopted in various computer vision applications

The researchers created a ResNet-like network by stacking several Split-Attention blocks, which they named as ResNeSt, where S stands for the split.  

Behind ResNeSt

According to the researchers, the first contribution of this research is accomplished by exploring a simple architectural modification of the ResNet, incorporating feature-map split attention within the individual network blocks. The resulting unit is referred to as a Split-Attention block, which is a computational unit, consisting of a feature-map group and split attention operations.

By stacking several Split-Attention blocks, the researchers create a ResNet-like network known as ResNeSt. The researchers claimed that the ResNeSt architecture requires less computation than existing ResNet-variants, and is easy to be adopted as a backbone for other vision tasks.

How It Works

ResNeSt generalizes the channel-wise attention into feature-map group representation, which can be modularised and accelerated using unified CNN operators. The researchers studied the image classification performance of ResNeSt on the ImageNet 2012 dataset with 1.28M training images and 50K validation images from 1000 different classes. 

ResNeSt is based on the ResNet-D model. The researchers performed different types of training that eventually improved the accuracy of ResNetD-50 from 78.31% to 79.15%. While employing the Split-Attention block to form a ResNeSt-50-fast model, accuracy was further boosted to 80.64%. The new technique has also delivered better accuracy and latency trade-off than models found via neural architecture search (NAS).

The final outcomes revealed that ResNeSt outperformed all ResNet variants with a similar number of network parameters and FLOPS, including ResNet, ResNeXt, SENet, ResNet-D, and SKNet. According to the researchers, ResNeSt-50 achieves 80.64 top-1 accuracy, which is the first 50-layer ResNet variant that surpasses 80% on ImageNet.

Dataset Used

In this paper, the researchers used datasets like Cityscapes dataset, including COCO-2017 and ImageNet dataset.

Contributions By The Researchers

Below are some of the contributions mentioned by the researchers

  • The researchers stacked several Split-Attention blocks to create a ResNet-like network called ResNeSt 
  • The second contributions of this research are large scale benchmarks on image classification and transfer learning applications. 

Wrapping Up

According to the researchers, ResNeSt outperformed all existing ResNet variants and has the same computational efficiency and even achieves better speed-accuracy trade-offs than state-of-the-art CNN models produced via neural architecture search. The network also enables feature-map attention across different feature-map groups.

The researchers also claimed that models utilising a ResNeSt backbone are able to achieve as well as improve the learned feature representations to boost performance across image classification, object detection, instance segmentation and semantic segmentation. For future work, the researchers will be working on augmenting the search spaces for neural architecture search and potentially improve the overall performance. 
Read the paper here.

Share
Picture of Ambika Choudhury

Ambika Choudhury

A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.