In Computer Vision, specifically, Image processing has become more efficient with the use of deep learning algorithms. To show the performance of these neural networks some basic preprocessed datasets were built, namely the MNIST and its variants such as KMNIST, QKMNIST, EMNIST, binarized MNIST and 3D MNIST. Ever since these datasets were built, it has been popular amongst beginners and researchers.
In today’s article, we’ll be talking about the very basic and primarily the most curated datasets used for deep learning in computer vision.
MNIST(Modified National Institute of Standards and Technology) database contains handwritten digits. It is a subset of the larger dataset present in NIST(National Institute of Standards and Technology). Developed by Yann LeCunn, Corinna Cortes and Christopher J.C. Burges and released in 1999. This is a “hello world” dataset deep learning in computer vision beginners for classification, containing ten classes from 0 to 9. The original black and white images of NIST had been converted to grayscale in dimensions of 28*28 pixels in width and height, making a total of 784 pixels. Pixel values range from 0 to 255, where higher numbers indicate darkness and lower as lightness.
MNIST database consists of two NIST databases – Special Database 1 and Special Database 3. Special Database 1 contains digits written by high school students. Special Database 3 consists of digits written by employees of the United States Census Bureau.
Database Size: 21MiB
Data: Total 70000 images split into -Train set 60000 images, Test set 10000 images.
Performance: Highest error rate, as shown on the official website, is 12%. The original paper of MNIST showed the report of using SVM(Support Vector Machine) gave an error rate of 0.8%. Over the years, several methods have been applied to reduce the error rate. Some notable out of them are In 2004, a best-case error rate of 0.42% was achieved by using a classifier called LIRA, which is a neural classifier consisting of three neuron layers. Using affine distortions and the elastic distortions error rate of 0.39 was achieved by using a 6layer deep neural network. In 2011, 0.27 error rate was achieved using the similar architecture of a convolutional neural network(CNN). In 2013, an error rate of 0.21 using regularization and DropConnect. In 2018, an error rate of 0.18% by using simultaneous stacking of three kinds of neural networks. As of February 2020, an error rate of 0.17 has been achieved using data augmentations with CNNs.
This is best suited for beginners as it is a real-world dataset where data is already pre-processed, formatted and normalized. Researchers and learners also use it for trying on new algorithms. MNIST is taken as a reference to develop other such datasets.
import tensorflow as tf mnist = tf.keras.datasets.mnist (x_train, y_train),(x_test, y_test) = mnist.load_data()
import torch import torchvision from torchvision import transforms, datasets train = datasets.MNIST('', train=True, download=True, transform=transforms.Compose([ transforms.ToTensor() ])) test = datasets.MNIST('', train=False, download=True, transform=transforms.Compose([ transforms.ToTensor() ]))
They were developed by Salakhutdinov, Ruslan and Murray, Iain in 2008 as a binarized version of the original MNIST dataset. Binarizing is done by sampling from a binomial distribution defined by the pixel values, originally used in deep belief networks(DBN) and variational autoencoders(VAE). The images are in grayscale format 28 x 28 pixels.
Download Size – 104 MiB
Data: train set 50000 images, the test set 10000 images and validation set 10000 images
It is used to evaluate generative models for images, so unlike MNIST labels are not provided here.
import tensorflow_datasets as tfds train,test = tfds.load('binarized_mnist', split=['train', 'test'])
Extended MNIST derived from MNIST in 2017 and developed by Gregory Cohen, Saeed Afshar, Jonathan Tapson, and André van Schaik. EMNIST is made from the NIST Special Database 19. The original NIST data is converted to a 28×28 pixel image format and structure matches that of MNIST dataset.
Download Size: 535.75MB
The six different splits provided in this dataset:
- EMNIST ByClass: 814,255 characters with 62 unbalanced classes.
- EMNIST Balanced: 131,600 characters with 47 balanced classes.
- EMNIST Digits: 280,000 characters with 10 balanced classes.
- EMNIST MNIST: 70,000 characters with 10 balanced classes.
- EMNIST Letters: 145,600 characters with 26 balanced classes.
- EMNIST ByMerge: 814,255 characters with 47 unbalanced classes.
import tensorflow_datasets as tfds train,test = tfds.load('emnist', split=['train', 'test'])
import torch import torchvision from torchvision import datasets train = datasets.EMNIST('',train = True, split='byclass',download= True) test = datasets.EMNIST('',train = False, split='byclass',download= True)
Kuzushiji MNIST Dataset developed by Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb, Kazuaki Yamamoto and David Ha for Deep Learning on Classical Japanese Literature. KMNIST is a drop-in replacement for the MNIST dataset (28×28 pixels of grayscaled 70,000 images), consisting of original MNIST format and NumPy format.
Dataset Size- 31.76 MiB
Download Size – 300MB
Data: train set 60000 images, the test set 10000 images
import tensorflow_datasets as tfds train,test = tfds.load('kmnist', split=['train', 'test'])
import torch import torchvision from torchvision import datasets train = datasets.KMNIST('',train = True,download= True) test = datasets.KMNIST('',train = False,download= True)
It was developed by Facebook AI Research. The original MNIST consisted of only 10000 images for the test dataset, which was not enough; QMNIST was built to provide more data. 50000 more MNIST-like data were generated. This was made from NIST Special Database 19 keeping the pre-processing as close enough as possible to MNIST using Hungarian algorithm. After several iterations and improvements, 50000 additional digits were generated.
import torch import torchvision from torchvision import datasets train = datasets.QMNIST('',train= True, download= True) test = datasets.QMNIST('',train= False, download= True)
3D version of the original MNIST images. There are 5000 training, 1000 validation and 1000 testing point clouds included stored in an HDF5 file format. This was introduced to get started with 3D computer vision problems such as 3D shape recognition.
To generate 3D MNIST you can refer to this notebook.