Now Reading
Top 7 Baselines For Image Recognition

Top 7 Baselines For Image Recognition

Ram Sagar

Image classification tasks occupy the majority of machine learning experiments. Their critical usage in medical diagnosis, digital photography, self-driving cars and many others have attracted researchers to innovate models that would give near perfect prediction of the target object.



Here, we have compiled a list of top-performing methods according to papers with code, on the widely popular datasets that are used for benchmarking the image classification models.

Noisy Student [EfficientNet L2]

Dataset: ImageNet



Accuracy: 88.4

ImageNet consists of more than 14 million images comprising classes such as animals, flowers, everyday objects, people and many more. Training a model on ImageNet gives it an ability to match the human-level vision, given the diversity of data. NoisyStudent is a semi-supervised learning method, which achieves 88.4% top-1 accuracy on ImageNet (SOTA) and surprising gains on robustness and adversarial benchmarks. It was introduced by Google brain team in collaboration with Carnegie Mellon University. 

They first trained an EfficientNet model on labelled ImageNet images and used it as a teacher to generate pseudo labels on 300M unlabeled images. The researchers then trained a larger EfficientNet as a student model on the combination of labelled and pseudo labelled images.

BitL[ResNet]

Dataset: CIFAR-10

Accuracy: 99.3

The CIFAR-10 dataset consists of 60000 colour images of 32×32 n 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

With BiT, the authors revisit the paradigm of pre-training on large supervised datasets and fine-tuning the weights on the target task. Big Transfer (BiT) was created by scaling up pre-training and by combining a few carefully selected components. BiT performs well on a wide range of data regimes — from 10 to 1M labelled examples. This method achieved 99.3% on CIFAR-10.

CNN + Homogeneous Filter Capsules

Dataset: MNIST

Accuracy: 99.84

The MNIST is a database of handwritten digits with a training set of 60,000 examples and a test set of 10,000 examples. 

In the mixed model of convolutional neural networks and capsule networks, the researchers present a convolutional neural network design with additional branches after certain convolutions so that we can extract features. 

This method, claims the authors, establishes a new state-of-the-art for the MNIST dataset with an accuracy of 99.84%.

AutoAugment

Dataset: SVHN 

% Error: 1.02

SVHN is obtained from house numbers in Google Street View images. SVHN was introduced to develop machine learning and object recognition algorithms with a minimal requirement on data preprocessing and formatting. 

In this paper, a simple procedure called AutoAugment is defined to automatically search for improved data augmentation policies. 

The authors use a search algorithm to find the best policy such that the neural network yields the highest validation accuracy on a target dataset. This method achieves state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, and ImageNet (without additional data).

EnAET

Dataset: STL-10

See Also
image data augmentation

Accuracy: 95.48

The STL-10 dataset is an image recognition dataset, where each class has fewer labelled training examples, but a very large set of unlabeled examples is provided to learn image models prior to supervised training. 

Ensemble of Auto-Encoding Transformations (EnAET) is trained to learn from both labelled and unlabeled data based on the embedded representations by decoding both spatial and non-spatial transformations. This makes EnAET different from traditional semi-supervised methods that focus on improving prediction consistency and confidence. EnAET explores the role of self-supervised representations in semi-supervised learning under a rich family of transformations. Experiment results on CIFAR-10, CIFAR-100, SVHN and STL10 demonstrate that the proposed EnAET outperforms the state-of-the-art semi-supervised methods by significant margins. 

DivideMix

Dataset: Clothing 1M

Accuracy: 74.76

DivideMix leverages semi-supervised learning techniques for learning with noisy labels. In particular, DivideMix dynamically divides the training data into a labelled set with clean samples and an unlabeled set with noisy samples and trains the model on both the labelled and unlabeled data in a semi-supervised manner. 

PreAct-ResNet18 + FMix

Dataset: Fashion MNIST

% Error: 1.02

Fashion-MNIST consists of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28×28 grayscale image, associated with a label from 10 classes. Fashion-MNIST serves as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. 

FMix, claim the authors, improves performance for a number of state-of-the-art models across a range of data sets and problem settings. They have analysed MixUp, CutMix, and FMix from an information-theoretic perspective, characterising learned models in terms of how they progressively compress the input with depth. 

via paperswithcode

Of all the models, the ones trained on ImageNet and CIFAR happen to be popular with the practitioners. The plot above illustrates how the models have improved on ImageNet over the years.

Provide your comments below

comments


If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top