Last updated February 2, 2021
In AI Mysteries

The Evolution of ImageNet for Deep Learning in Computer Vision

From 2010 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) which is a global annual contest held where software programs(mostly these are Convnets) compete for image classification and detection of objects and scenes. The best algorithm with the least top 5 error rate is selected as the winner.

Share

Published on November 13, 2020

by Jayita Bhattacharyya

Working with computer vision problems such as object recognition, action detection the first we think of is acquiring the suitable dataset to train our model over it. Earlier in the field of AI, more focus was given to machine learning and deep learning algorithms, but there was a lack of proper dataset to run these algorithms. As a result, it was limited to researchers only; the business world did not find much interest in AI back then.

In 2006, Fei Fei Li came up with the idea to run these algorithms in the real world. Thus ImageNet started originating under the hood of WordNet. ImageNet is the biggest image dataset containing more than 14 million images of more than 20000 different categories having 27 high-level subcategories containing at least 500 images each. All of these images are manually annotated by the ImageNet developers, and over 1million images contain the bounding boxes around the object in the picture. In 1.2 million pictures SIFT(Scale-Invariant Feature Transform) is provided, which gives a lot of information regarding features in an image.

Within six years, the error rate came down from 26% to 2.25%, which is a huge achievement.

YEAR	WINNER	TOP 5 ERROR RATE %
2012	ALEXNET	15.3
2013	ZFNET	11.2
2014	INCEPTION V1 (GoogLeNet) VGG NET (Runner up)	6.67 7.3
2015	ResNet	3.57
2016	ResNeXt	4.1
2017	SENet	2.251
2018	PNASNet-5	3.8

It was a revolution in the world of AI, and people started taking an interest in it. Researchers say humans have a top-5 error rate of 5.1% which is almost double of the best performing deep learning model trained on ImageNet.

In today’s article, we will be discussing the ImageNet database and its variants.

ImageNet2012

It was developed by many authors, mainly Fei-Fei Li, who started building it. As per the 2015 ILSVRC paper Olga Russakovsky, Jonathan Krause, Aditya Khosla, Michael Bernstein, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Jia Deng and Hao Su, Andrej Karpathy, and Alexander C. Berg are among the other authors.

WordNet is a language database. Based on English language semantics of wordnet Fei Fei Li started building Imagenet around each of the synsets(most of which are nouns). At least 1000 images were provided for each synset. The developers used Amazon Mechanical Turk to help them with the image classification. Images have been subsampled to 256×256 to fit in the deep learning models.

Dataset size: 155.84 GiB

Data : train set- 1281167 images, validation set – 50000 images, test set- 100000 images.

Code Snippet:

With TensorFlow (dataset requires to be downloaded manually from here)

import tensorflow_datasets as tfds
train,test = tfds.load('imagenet2012', split=['train', 'test'])

Using PyTorch (works with Scipy library)

from torchvision import transforms, datasets
train = datasets.ImageNet('', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))
test = datasets.ImageNet('', train=False, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))

Mini ImageNet

This dataset was created for few-shot learning trained through meta-transfer learning of one hundred classes with 600 samples per class. Images will be resized to 84×84. Download dataset from here

Performance measures of mini Imagenet:

The GitHub repository for generating a mini Imagenet from Imagenet.

ImageNet2012_real

Developed in 2020 by Xiaohua Zhai, Aaron van den Oord, Alexander Kolesnikov, Lucas Beyer and Olivier J. Henaff presented in the paper “Are We Done With Imagenet”. This dataset contains 50000 validation images of the original Imagenet, with real labels. It provides multiclass labels and better annotations than the original labels and annotations of Imagenet.

Dataset Size: 6.25 GiB

Code Snippet:

With TensorFlow (dataset requires to be downloaded manually)

import tensorflow_datasets as tfds
imreal = tfds.load('imagenet2012_real')

An implementation of this dataset is given in this Github repository.

ImageNet2012_subset

This dataset is also developed in 2020 by Kornblith, Simon, Norouzi, Chen, Ting, Mohammad and Geoffrey Hinton. As the name suggests, this is a subset of the ImageNet2012 containing 1% of total dataset and 10% of the total dataset. This is purposed to be used in semi-supervised learning algorithms.

1pct Configuration(By default):

Dataset size: 7.6 GiB

Data is split into 12811 training images and 50000 validation images.

10 pct configuration:

Dataset size: 19.91 GiB

Data is split into 128116 training images and 50000 validation images.

Code Snippet:

With TensorFlow (dataset requires to be downloaded manually)

import tensorflow_datasets as tfds
train,test = tfds.load('imagenet2012_subset', split=['train', 'test'])

ImageNet_A and ImageNet_O

Developed in 2019 by Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt and Dawn Song mentioned in their paper “Natural Adversarial Examples”. These datasets contain images labelled with original ImageNet labels of those 1000 classes. These are real-world, unmodified images that ResNet-50 failed to classify correctly. Imagenet-A contains images which are of the same classes as the original ImageNet while ImageNet-O contains images from classes which are not seen earlier.

Dataset Size: 650.87 MiB

Data: 7500 testing images

Results show the black text as the actual class and red text as predicted class with confidence score by ResNet-50.

Code Snippet:

With TensorFlow

import tensorflow_datasets as tfds
img_a = tfds.load('imagenet_a')

ImageNet_R

It was developed in 2020 by Dan Hendrycks, Steven Basart, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhuand Norman Mu, Saurav Kadavath, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt and Justin Gilmer. Here ‘R’ stands for Rendition as its a rendition provided to 200 Imagenet classes. This dataset contains art, paintings, patterns, Deviantart, graffiti, embroidery, sketches, tattoos, cartoons, graphics, origami, plastic objects, plush objects, sculptures, toys, and video game renditions from the original ImageNet.

Dataset Size: 2.03GiB

Data: 3000 images

Code Snippet:

With TensorFlow

import tensorflow_datasets as tfds
img_r = tfds.load('imagenet_r')

An implementation of the above dataset can be found in this GitHub repository.

ImageNet_Resized

Developed in 2017 by Chrabaszcz, Hutter, Patryk, Loshchilov, Ilya, and Frank. This dataset was built for downsampled images of original Imagenet, as an alternative to CIFAR datasets.

Data Split is same as original ImageNet

8×8 downsampled images( by default) :

Dataset Size: 237.11 MiB

16×16 downsampled images:

Dataset Size: 932.34 MiB

32×32 downsampled images:

Dataset Size: 3.46 GiB

64×64 downsampled images:

Dataset Size: 13.13 GiB

Code Snippet:

With TensorFlow

import tensorflow_datasets as tfds
img_resize = tfds.load('imagenet_resized')

Conclusion

Some other datasets inspired by Imagenet – Imagenet-V2, Imagenette, Imagewoof, Imagewang. ImageNet has collaboration with PASCAL VOC. Imagenet is under constant development to serve the computer vision community. As of 2019, a report generated bias in most images. Imagenet is working to overcome bias and other shortcomings. Tiny ImageNet Visual Recognition Challenge is a project by Stanford, which is similar to ILSVCR. The annotation process of Imagenet is based on 3rd party and crowdsourcing.

Access all our open Survey & Awards Nomination forms in one place