Top 7 Baselines For Image Recognition

Image classification tasks occupy the majority of machine learning experiments. Their critical usage in medical diagnosis, digital photography, self-driving cars and many others have attracted researchers to innovate models that would give near perfect prediction of the target object.

Here, we have compiled a list of top-performing methods according to papers with code, on the widely popular datasets that are used for benchmarking the image classification models.

Noisy Student [EfficientNet L2]

Dataset: ImageNet


Sign up for your weekly dose of what's up in emerging technology.

Accuracy: 88.4

ImageNet consists of more than 14 million images comprising classes such as animals, flowers, everyday objects, people and many more. Training a model on ImageNet gives it an ability to match the human-level vision, given the diversity of data. NoisyStudent is a semi-supervised learning method, which achieves 88.4% top-1 accuracy on ImageNet (SOTA) and surprising gains on robustness and adversarial benchmarks. It was introduced by Google brain team in collaboration with Carnegie Mellon University. 

They first trained an EfficientNet model on labelled ImageNet images and used it as a teacher to generate pseudo labels on 300M unlabeled images. The researchers then trained a larger EfficientNet as a student model on the combination of labelled and pseudo labelled images.


Dataset: CIFAR-10

Accuracy: 99.3

The CIFAR-10 dataset consists of 60000 colour images of 32×32 n 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

With BiT, the authors revisit the paradigm of pre-training on large supervised datasets and fine-tuning the weights on the target task. Big Transfer (BiT) was created by scaling up pre-training and by combining a few carefully selected components. BiT performs well on a wide range of data regimes — from 10 to 1M labelled examples. This method achieved 99.3% on CIFAR-10.

CNN + Homogeneous Filter Capsules

Dataset: MNIST

Accuracy: 99.84

The MNIST is a database of handwritten digits with a training set of 60,000 examples and a test set of 10,000 examples. 

In the mixed model of convolutional neural networks and capsule networks, the researchers present a convolutional neural network design with additional branches after certain convolutions so that we can extract features. 

This method, claims the authors, establishes a new state-of-the-art for the MNIST dataset with an accuracy of 99.84%.


Dataset: SVHN 

% Error: 1.02

SVHN is obtained from house numbers in Google Street View images. SVHN was introduced to develop machine learning and object recognition algorithms with a minimal requirement on data preprocessing and formatting. 

In this paper, a simple procedure called AutoAugment is defined to automatically search for improved data augmentation policies. 

The authors use a search algorithm to find the best policy such that the neural network yields the highest validation accuracy on a target dataset. This method achieves state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, and ImageNet (without additional data).


Dataset: STL-10

Accuracy: 95.48

The STL-10 dataset is an image recognition dataset, where each class has fewer labelled training examples, but a very large set of unlabeled examples is provided to learn image models prior to supervised training. 

Ensemble of Auto-Encoding Transformations (EnAET) is trained to learn from both labelled and unlabeled data based on the embedded representations by decoding both spatial and non-spatial transformations. This makes EnAET different from traditional semi-supervised methods that focus on improving prediction consistency and confidence. EnAET explores the role of self-supervised representations in semi-supervised learning under a rich family of transformations. Experiment results on CIFAR-10, CIFAR-100, SVHN and STL10 demonstrate that the proposed EnAET outperforms the state-of-the-art semi-supervised methods by significant margins. 


Dataset: Clothing 1M

Accuracy: 74.76

DivideMix leverages semi-supervised learning techniques for learning with noisy labels. In particular, DivideMix dynamically divides the training data into a labelled set with clean samples and an unlabeled set with noisy samples and trains the model on both the labelled and unlabeled data in a semi-supervised manner. 

PreAct-ResNet18 + FMix

Dataset: Fashion MNIST

% Error: 1.02

Fashion-MNIST consists of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28×28 grayscale image, associated with a label from 10 classes. Fashion-MNIST serves as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. 

FMix, claim the authors, improves performance for a number of state-of-the-art models across a range of data sets and problem settings. They have analysed MixUp, CutMix, and FMix from an information-theoretic perspective, characterising learned models in terms of how they progressively compress the input with depth. 

via paperswithcode

Of all the models, the ones trained on ImageNet and CIFAR happen to be popular with the practitioners. The plot above illustrates how the models have improved on ImageNet over the years.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM