Comprehensive Guide To 9 Most Important Image Datasets For Data Scientists

In this article, we will discuss the various image datasets that are readily available for training machine learning models.

Vision data is the most widely used form of data around us. Almost every industry from fashion to streaming platforms, medical, legal, finance all has its usage for various use-cases. Social media being one of the biggest examples. AI has taken over everything in the world now and has done wonders to image data. Machine learning and deep learning models as we know are well trained where there are diverse data, so these algorithms are data hunger. Thus there became a need to develop better datasets to address biases present in these algorithms.

Computer vision is a field where computers deal with digital images in the form of pixel values. In other words, computers are made to have an understanding of images/videos as humans do. It includes processing, analyzing, transforming, extracting features and various other operations done to an image. Earlier image processing techniques used have certain drawbacks as they fail to bring out high-level dimensionality accurately. Now deep learning algorithms have overcome these problems and have proven to be much reliable. Nowadays they are used in almost all kinds of tasks such as object detection, object tracking, image classification, image segmentation and localization, 3D pose estimation, video matting and many more we can keep naming.   

Taking image datasets forward now GANs (generative adversarial networks) have taken over. They can increase the size of datasets by including synthetic data. Besides, it can make synthetic data imitate exactly like real-world data, for example – deepfakes. In recent years it has gained much attention, and more research and development is revolving around it.


Sign up for your weekly dose of what's up in emerging technology.

In this article, we will discuss the various image datasets that are readily available for training machine learning models. 


MNIST is the handwritten digits dataset. The very first of its kind to have been developed in 1999 by Yan LeCunn and other researchers. It is a very basic dataset for beginners, starting deep learning with computer vision. Using simple Convnet architectures these are very easy as it is preprocessed in grayscale images (total 70,000 out of which 60,000 training set and 10,000 test set) each of 28*28 pixels associated with numbers 0 to 9 as labels. 

Over the years different variants of MNIST have been released namely – binarized MNIST, KMNIST, EMNIST, QMNIST, and 3D MNIST. Binarized MNIST contains the binarized version of original digits MNIST. EMNIST or extended MNIST is an extension by adding more data to the original MNIST. KMNIST is Kuzushiji MNIST which is a drop-in replacement of the original MNIST with NumPy format. QMNIST developed by Facebook AI research contains 50,000 additional images apart from the original MNIST. 3D MNIST, as the name suggests, contains 3-dimensional digit representations. It is a smaller dataset compared to MNIST. All of these datasets are open-sourced and readily available to use in ML model training. There are some pre-built libraries in Tensorflow and PyTorch for implementing these datasets.  

For implementation and other information -> 6 MNIST Image Datasets 


MNIST could not explore many aspects of deep learning algorithms based on computer vision, so Fashion MNIST was released. As the name suggests, it contains ten categories of apparels namely T-shirt/top, trouser, pullover, dress, coat, sandals, shirt, sneakers, bags, ankle boots with class labels 0 to 9 as MNIST. All of these images are in grayscale with 28*28 pixels each. With fashion MNIST new benchmarks were achieved in deep learning. This also has pre-built libraries to be readily used for model training. Recently fashion MNIST was used with GANs and have generated really good results showing new apparel designs.

For implementation and other information -> Fashion MNIST


Following the MNIST type structure, many other datasets were released to fulfil different purposes. With neural networks finding relevance in all fields, medical science has many things to be covered and addressed. Bioinformatics data science has now been much in research and achieved some of the results that weren’t addressed for years. Different medical MNIST datasets have evolved over the years, MedMNIST is one of the recently released (in 2020) benchmark datasets in them. It is a collection of 10 open sourced medical datasets namely – PathMNIST, ChestMNIST, DermaMNIST, OCTMNIST, PneumoniaMNIST, RetinaMNIST, OrganMNIST(axial, coronal, sagittal). These datasets have been implemented using machine learning and AutoML. 

Rest consist of medical MNIST, skin cancer MNIST and colorectal histology MNIST. Medical MNIST consists of 6 classes – ChestCT, BreastMRI, CXR, Hand, HeadCT, AbdomenCT. Colorectal cancer histology Multiclass classification for texture analysis belonging to 8 classes of tissues. Skin Cancer MNIST contains 7 classes – Melanocytic nevi,  Melanoma, Benign keratosis-like lesions, Basal cell carcinoma, Actinic keratoses, Vascular lesions, Dermatofibroma. Different libraries have been implemented around them and can be readily used for building medical research projects.   

For implementation and other information -> Medical MNIST


Sign language MNIST was released to bring help for hearing and speaking impaired people to convey messages through hand gestures. It is similar in structure to the original MNIST in pixel dimensions and some other parameters. There are 24 classes present from A to Z except for J and Z. It is present in CSV format with labels and pixel values for each. It is developed from American Sign Language letter database.

For implementation and other information -> Sign Language MNIST


Google has a huge open-source vision dataset which serves many purposes. Along with images it contains annotations, object relationship in images, object detection and bounding boxes, image segmentation and other recently released localized narratives. It has gone through 6 versions and currently the v6 version is in use. It is accessible through Google Cloud Vision API. Images have been crowdsourced and validated by professional annotators. Two of its most significant implementations have been seen in artistic style transfer and deep dream.   

For implementation and other information -> Open Images


Imagenet is one of the greatest achievements in computer vision. Until now Imagenet is the biggest image dataset with over 14 million images spread across 20,000 different classes. Imagenet every year holds a competition on the dataset where different deep learning algorithms/models compete to win it. With every year passing the error rates have been reduced and it’s remarkable how to have crossed the human average error rate.  Imagenet2012 (started by Fei Fei Li, later enhanced by many other researchers), thereafter many variants came over as drop-in replacement to original Imagenet namely – Imagenet2012_real, Imagenet2012_subset, Mini Imagenet, Imagenet_A & Imagenet_O, Imagenet_R, Imagenet_resized. These datasets were released along with research papers specifying their relevance. All of these have pre-built libraries to directly be used in model training.  

For implementation and other information -> Imagenet

CIFAR 10 & 100

Cifar contains 80million tiny images dataset. Cifar-10 contains 10 object classes namely – aeroplane, bird, car, cat, deer, dog, frog, horse, ship, and truck. These images are in the form of 32×32 pixels RGB format. Cifar 100 is an extension to Cifar 10. It contains 100 object classes divided into 20 main classes- aquatic mammals, fishes, large omnivores and herbivores, medium-sized mammals, flower, food container, household electrical devices, fruit and vegetable, household furniture, insects, large carnivores, large man-made outdoor things, large natural outdoor scenes, non-insect invertebrates, people, reptiles, trees, small mammals, vehicles 1, vehicles 2. Both these datasets have an implementation in deep learning libraries. 

For implementation and other information -> CIFAR10 & CIFAR100

STL 10

The STL10 dataset was built inspired by the Cifar10 dataset. It is used in unsupervised learning. Divided into 10 classes – aeroplane, birds, car, cat, deer, dog, horse, monkey, ship, truck. Images are in 96×96 pixels in RGB. Total of 13000 images divided into 5000 training and 8000 test sets. It has implementations in deep learning libraries Tensorflow and PyTorch.

For implementation and other information -> STL10


Caltech consists of 4 different datasets – Caltech 101 (containing 100 object classes of common daily use such as fans, cars, boats, lamps etc and 1 background clutter), Caltech 256 (extension to Caltech101, contains more classes and larger background clutter for testing), Caltech Birds 2010 (200 bird species) and Caltech Birds 2011(extension to Caltech Birds 2010). All these images have annotations present with bounding boxes and other information. These datasets have implementations in deep learning libraries.

For implementation and other information -> Caltech

More Great AIM Stories

Jayita Bhattacharyya
Machine learning and data science enthusiast. Eager to learn new technology advances. A self-taught techie who loves to do cool stuff using technology for fun and worthwhile.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM