Working with computer vision problems such as object recognition, action detection the first we think of is acquiring the suitable dataset to train our model over it. Earlier in the field of AI, more focus was given to machine learning and deep learning algorithms, but there was a lack of proper dataset to run these algorithms. As a result, it was limited to researchers only; the business world did not find much interest in AI back then.
In 2006, Fei Fei Li came up with the idea to run these algorithms in the real world. Thus ImageNet started originating under the hood of WordNet. ImageNet is the biggest image dataset containing more than 14 million images of more than 20000 different categories having 27 high-level subcategories containing at least 500 images each. All of these images are manually annotated by the ImageNet developers, and over 1million images contain the bounding boxes around the object in the picture. In 1.2 million pictures SIFT(Scale-Invariant Feature Transform) is provided, which gives a lot of information regarding features in an image.
From 2010 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) which is a global annual contest held where software programs(mostly these are Convnets) compete for image classification and detection of objects and scenes. The best algorithm with the least top 5 error rate is selected as the winner.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
Within six years, the error rate came down from 26% to 2.25%, which is a huge achievement.
YEAR | WINNER | TOP 5 ERROR RATE % |
2012 | ALEXNET | 15.3 |
2013 | ZFNET | 11.2 |
2014 | INCEPTION V1 (GoogLeNet) VGG NET (Runner up) | 6.67 7.3 |
2015 | ResNet | 3.57 |
2016 | ResNeXt | 4.1 |
2017 | SENet | 2.251 |
2018 | PNASNet-5 | 3.8 |
It was a revolution in the world of AI, and people started taking an interest in it. Researchers say humans have a top-5 error rate of 5.1% which is almost double of the best performing deep learning model trained on ImageNet.
In today’s article, we will be discussing the ImageNet database and its variants.
ImageNet2012
It was developed by many authors, mainly Fei-Fei Li, who started building it. As per the 2015 ILSVRC paper Olga Russakovsky, Jonathan Krause, Aditya Khosla, Michael Bernstein, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Jia Deng and Hao Su, Andrej Karpathy, and Alexander C. Berg are among the other authors.
WordNet is a language database. Based on English language semantics of wordnet Fei Fei Li started building Imagenet around each of the synsets(most of which are nouns). At least 1000 images were provided for each synset. The developers used Amazon Mechanical Turk to help them with the image classification. Images have been subsampled to 256×256 to fit in the deep learning models.
Dataset size: 155.84 GiB
Data : train set- 1281167 images, validation set – 50000 images, test set- 100000 images.
Code Snippet:
With TensorFlow (dataset requires to be downloaded manually from here)
import tensorflow_datasets as tfds train,test = tfds.load('imagenet2012', split=['train', 'test'])
Using PyTorch (works with Scipy library)
from torchvision import transforms, datasets train = datasets.ImageNet('', train=True, download=True, transform=transforms.Compose([ transforms.ToTensor() ])) test = datasets.ImageNet('', train=False, download=True, transform=transforms.Compose([ transforms.ToTensor() ]))
Mini ImageNet
This dataset was created for few-shot learning trained through meta-transfer learning of one hundred classes with 600 samples per class. Images will be resized to 84×84. Download dataset from here
Performance measures of mini Imagenet:
The GitHub repository for generating a mini Imagenet from Imagenet.
ImageNet2012_real
Developed in 2020 by Xiaohua Zhai, Aaron van den Oord, Alexander Kolesnikov, Lucas Beyer and Olivier J. Henaff presented in the paper “Are We Done With Imagenet”. This dataset contains 50000 validation images of the original Imagenet, with real labels. It provides multiclass labels and better annotations than the original labels and annotations of Imagenet.
Dataset Size: 6.25 GiB
Code Snippet:
With TensorFlow (dataset requires to be downloaded manually)
import tensorflow_datasets as tfds imreal = tfds.load('imagenet2012_real')
An implementation of this dataset is given in this Github repository.
ImageNet2012_subset
This dataset is also developed in 2020 by Kornblith, Simon, Norouzi, Chen, Ting, Mohammad and Geoffrey Hinton. As the name suggests, this is a subset of the ImageNet2012 containing 1% of total dataset and 10% of the total dataset. This is purposed to be used in semi-supervised learning algorithms.
1pct Configuration(By default):
Dataset size: 7.6 GiB
Data is split into 12811 training images and 50000 validation images.
10 pct configuration:
Dataset size: 19.91 GiB
Data is split into 128116 training images and 50000 validation images.
Code Snippet:
With TensorFlow (dataset requires to be downloaded manually)
import tensorflow_datasets as tfds train,test = tfds.load('imagenet2012_subset', split=['train', 'test'])
ImageNet_A and ImageNet_O
Developed in 2019 by Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt and Dawn Song mentioned in their paper “Natural Adversarial Examples”. These datasets contain images labelled with original ImageNet labels of those 1000 classes. These are real-world, unmodified images that ResNet-50 failed to classify correctly. Imagenet-A contains images which are of the same classes as the original ImageNet while ImageNet-O contains images from classes which are not seen earlier.
Dataset Size: 650.87 MiB
Data: 7500 testing images
Results show the black text as the actual class and red text as predicted class with confidence score by ResNet-50.
Code Snippet:
With TensorFlow
import tensorflow_datasets as tfds img_a = tfds.load('imagenet_a')
ImageNet_R
It was developed in 2020 by Dan Hendrycks, Steven Basart, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhuand Norman Mu, Saurav Kadavath, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt and Justin Gilmer. Here ‘R’ stands for Rendition as its a rendition provided to 200 Imagenet classes. This dataset contains art, paintings, patterns, Deviantart, graffiti, embroidery, sketches, tattoos, cartoons, graphics, origami, plastic objects, plush objects, sculptures, toys, and video game renditions from the original ImageNet.
Dataset Size: 2.03GiB
Data: 3000 images
Code Snippet:
With TensorFlow
import tensorflow_datasets as tfds img_r = tfds.load('imagenet_r')
An implementation of the above dataset can be found in this GitHub repository.
ImageNet_Resized
Developed in 2017 by Chrabaszcz, Hutter, Patryk, Loshchilov, Ilya, and Frank. This dataset was built for downsampled images of original Imagenet, as an alternative to CIFAR datasets.
Data Split is same as original ImageNet
8×8 downsampled images( by default) :
Dataset Size: 237.11 MiB
16×16 downsampled images:
Dataset Size: 932.34 MiB
32×32 downsampled images:
Dataset Size: 3.46 GiB
64×64 downsampled images:
Dataset Size: 13.13 GiB
Code Snippet:
With TensorFlow
import tensorflow_datasets as tfds img_resize = tfds.load('imagenet_resized')
Conclusion
Some other datasets inspired by Imagenet – Imagenet-V2, Imagenette, Imagewoof, Imagewang. ImageNet has collaboration with PASCAL VOC. Imagenet is under constant development to serve the computer vision community. As of 2019, a report generated bias in most images. Imagenet is working to overcome bias and other shortcomings. Tiny ImageNet Visual Recognition Challenge is a project by Stanford, which is similar to ILSVCR. The annotation process of Imagenet is based on 3rd party and crowdsourcing.