Image classification tasks occupy the majority of machine learning experiments. Their critical usage in medical diagnosis, digital photography, self-driving cars and many others have attracted researchers to innovate models that would give near perfect prediction of the target object.
Here, we have compiled a list of top-performing methods according to papers with code, on the widely popular datasets that are used for benchmarking the image classification models.
Noisy Student [EfficientNet L2]
ImageNet consists of more than 14 million images comprising classes such as animals, flowers, everyday objects, people and many more. Training a model on ImageNet gives it an ability to match the human-level vision, given the diversity of data. NoisyStudent is a semi-supervised learning method, which achieves 88.4% top-1 accuracy on ImageNet (SOTA) and surprising gains on robustness and adversarial benchmarks. It was introduced by Google brain team in collaboration with Carnegie Mellon University.
They first trained an EfficientNet model on labelled ImageNet images and used it as a teacher to generate pseudo labels on 300M unlabeled images. The researchers then trained a larger EfficientNet as a student model on the combination of labelled and pseudo labelled images.
The CIFAR-10 dataset consists of 60000 colour images of 32×32 n 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
With BiT, the authors revisit the paradigm of pre-training on large supervised datasets and fine-tuning the weights on the target task. Big Transfer (BiT) was created by scaling up pre-training and by combining a few carefully selected components. BiT performs well on a wide range of data regimes — from 10 to 1M labelled examples. This method achieved 99.3% on CIFAR-10.
CNN + Homogeneous Filter Capsules
The MNIST is a database of handwritten digits with a training set of 60,000 examples and a test set of 10,000 examples.
In the mixed model of convolutional neural networks and capsule networks, the researchers present a convolutional neural network design with additional branches after certain convolutions so that we can extract features.
This method, claims the authors, establishes a new state-of-the-art for the MNIST dataset with an accuracy of 99.84%.
% Error: 1.02
SVHN is obtained from house numbers in Google Street View images. SVHN was introduced to develop machine learning and object recognition algorithms with a minimal requirement on data preprocessing and formatting.
In this paper, a simple procedure called AutoAugment is defined to automatically search for improved data augmentation policies.
The authors use a search algorithm to find the best policy such that the neural network yields the highest validation accuracy on a target dataset. This method achieves state-of-the-art accuracy on CIFAR-10, CIFAR-100, SVHN, and ImageNet (without additional data).
The STL-10 dataset is an image recognition dataset, where each class has fewer labelled training examples, but a very large set of unlabeled examples is provided to learn image models prior to supervised training.
Ensemble of Auto-Encoding Transformations (EnAET) is trained to learn from both labelled and unlabeled data based on the embedded representations by decoding both spatial and non-spatial transformations. This makes EnAET different from traditional semi-supervised methods that focus on improving prediction consistency and confidence. EnAET explores the role of self-supervised representations in semi-supervised learning under a rich family of transformations. Experiment results on CIFAR-10, CIFAR-100, SVHN and STL10 demonstrate that the proposed EnAET outperforms the state-of-the-art semi-supervised methods by significant margins.
Dataset: Clothing 1M
DivideMix leverages semi-supervised learning techniques for learning with noisy labels. In particular, DivideMix dynamically divides the training data into a labelled set with clean samples and an unlabeled set with noisy samples and trains the model on both the labelled and unlabeled data in a semi-supervised manner.
PreAct-ResNet18 + FMix
Dataset: Fashion MNIST
% Error: 1.02
Fashion-MNIST consists of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28×28 grayscale image, associated with a label from 10 classes. Fashion-MNIST serves as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms.
FMix, claim the authors, improves performance for a number of state-of-the-art models across a range of data sets and problem settings. They have analysed MixUp, CutMix, and FMix from an information-theoretic perspective, characterising learned models in terms of how they progressively compress the input with depth.
Of all the models, the ones trained on ImageNet and CIFAR happen to be popular with the practitioners. The plot above illustrates how the models have improved on ImageNet over the years.