Top 7 Baselines For Image Recognition

Image classification tasks occupy the majority of machine learning experiments. Their critical usage in medical diagnosis, digital photography, self-driving cars and many others have attracted researchers to innovate models that would give near perfect prediction of the target object.

Here, we have compiled a list of top-performing methods according to papers with code, on the widely popular datasets that are used for benchmarking the image classification models.

Noisy Student [EfficientNet L2]

Dataset: ImageNet

AIM Daily XO

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Accuracy: 88.4

ImageNet consists of more than 14 million images comprising classes such as animals, flowers, everyday objects, people and many more. Training a model on ImageNet gives it an ability to match the human-level vision, given the diversity of data. NoisyStudent is a semi-supervised learning method, which achieves 88.4% top-1 accuracy on ImageNet (SOTA) and surprising gains on robustness and adversarial benchmarks. It was introduced by Google brain team in collaboration with Carnegie Mellon University. 

Download our Mobile App

They first trained an EfficientNet model on labelled ImageNet images and used it as a teacher to generate pseudo labels on 300M unlabeled images. The researchers then trained a larger EfficientNet as a student model on the combination of labelled and pseudo labelled images.


Dataset: CIFAR-10

Accuracy: 99.3

The CIFAR-10 dataset consists of 60000 colour images of 32×32 n 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

With BiT, the authors revisit the paradigm of pre-training on large supervised datasets and fine-tuning the weights on the target task. Big Transfer (BiT) was created by scaling up pre-training and by combining a few carefully selected components. BiT performs well on a wide range of data regimes — from 10 to 1M labelled examples. This method achieved 99.3% on CIFAR-10.

CNN + Homogeneous Filter Capsules

Dataset: MNIST

Accuracy: 99.84

The MNIST is a database of handwritten digits with a training set of 60,000 examples and a test set of 10,000 examples. 

In the mixed model of convolutional neural networks and capsule networks, the researchers present a convolutional neural network design with additional branches after certain convolutions so that we can extract features. 

This method, claims the authors, establishes a new state-of-the-art for the MNIST dataset with an accuracy of 99.84%.


Dataset: SVHN 

% Error: 1.02

SVHN is obtained from house numbers in Google Street View images. SVHN was introduced to develop machine learning and object recognition algorithms with a minimal requirement on data preprocessing and formatting. 

In this paper, a simple procedure called AutoAugment is defined to automatically search for improved data augmentation policies. 

The authors use a search algorithm to find the best policy such that the neural network yields the highest validation accuracy on a target dataset. This method achieves state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, and ImageNet (without additional data).


Dataset: STL-10

Accuracy: 95.48

The STL-10 dataset is an image recognition dataset, where each class has fewer labelled training examples, but a very large set of unlabeled examples is provided to learn image models prior to supervised training. 

Ensemble of Auto-Encoding Transformations (EnAET) is trained to learn from both labelled and unlabeled data based on the embedded representations by decoding both spatial and non-spatial transformations. This makes EnAET different from traditional semi-supervised methods that focus on improving prediction consistency and confidence. EnAET explores the role of self-supervised representations in semi-supervised learning under a rich family of transformations. Experiment results on CIFAR-10, CIFAR-100, SVHN and STL10 demonstrate that the proposed EnAET outperforms the state-of-the-art semi-supervised methods by significant margins. 


Dataset: Clothing 1M

Accuracy: 74.76

DivideMix leverages semi-supervised learning techniques for learning with noisy labels. In particular, DivideMix dynamically divides the training data into a labelled set with clean samples and an unlabeled set with noisy samples and trains the model on both the labelled and unlabeled data in a semi-supervised manner. 

PreAct-ResNet18 + FMix

Dataset: Fashion MNIST

% Error: 1.02

Fashion-MNIST consists of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28×28 grayscale image, associated with a label from 10 classes. Fashion-MNIST serves as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. 

FMix, claim the authors, improves performance for a number of state-of-the-art models across a range of data sets and problem settings. They have analysed MixUp, CutMix, and FMix from an information-theoretic perspective, characterising learned models in terms of how they progressively compress the input with depth. 

via paperswithcode

Of all the models, the ones trained on ImageNet and CIFAR happen to be popular with the practitioners. The plot above illustrates how the models have improved on ImageNet over the years.

Sign up for The Deep Learning Podcast

by Vijayalakshmi Anandan

The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

24th Mar, 2023 | Webinar
Women-in-Tech: Are you ready for the Techade

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Council Post: Evolution of Data Science: Skillset, Toolset, and Mindset

In my opinion, there will be considerable disorder and disarray in the near future concerning the emerging fields of data and analytics. The proliferation of platforms such as ChatGPT or Bard has generated a lot of buzz. While some users are enthusiastic about the potential benefits of generative AI and its extensive use in business and daily life, others have raised concerns regarding the accuracy, ethics, and related issues.