Now Reading
Guide To MNIST Datasets For Fashion And Medical Applications

Guide To MNIST Datasets For Fashion And Medical Applications

Jayita Bhattacharyya

We all know MNIST is a famous dataset for handwritten digits to get started with computer vision in deep learning. MNIST is the best to know for benchmark datasets in several deep learning applications. Taking a step forward many institutions and researchers have collaborated together to create MNIST like datasets with other kinds of data such as fashion, medical images, sign languages, skin cancers, colorectal cancer histology and skin cancer MNIST.

MNIST was not enough to tackle all kinds of computer vision problems. MNIST was so well pre-processed that beginners could not learn much out of it. Using a simple ConvNet architecture could give more than 90% accuracy as MNIST images could be differentiated with only 1-pixel value. As a result, many other deep learning algorithms were not well utilised. So it was time to move ahead and generate more use cases. As a result, many drop-in replacements were made in MNIST to serve the data science practitioners better. 

Taking our dataset discussion ahead, today we’ll be talking about all those datasets which have proven to be very handy for data science practitioners.

FASHION MNIST



Developed in 2017 by Kashif Rasul, Han Xiao, and Roland Vollgraf collected from Zalando Research. The images are in a grayscale format of 28*28. The dataset contains 70000 images out of which 60000 training images and 10000 testing images. The dataset contains 10 classes labelled from 0 to 9 where 0 – Tshirt/top, 1 – Trouser, 2 –  Pullover, 3 – Dress, 4 – Coat, 5 – Sandal, 6 – Shirt, 7 – Sneaker, 8 – Bag, 9 – Ankle Boot. 

Dataset size: 36.42 MiB

Fashion MNIST was built as there are many modern Computer Vision problems MNIST cannot address.

Code Snippet

Using TensorFlow


Stay Connected

Get the latest updates and relevant offers by sharing your email.
import tensorflow_datasets as tfds
train,test = tfds.load('fashion_mnist', split=['train', 'test'])

Using PyTorch

import torch
import torchvision
from torchvision import transforms, datasets
train = datasets.Fashion_MNIST('', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))
test = datasets.Fashion_MNIST('', train=False, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))

You can visit this website to check various performance measures of Fashion MNIST.

MedMNIST

One of the very recent datasets developed in 2020 by Jiancheng Yang, Rui Shi, Bingbing Ni, Bilian Ke. MedMNIST has a collection of 10 medical open image datasets. The dataset contains 28 x 28 pixeled images which make it possible to use in any kind of machine learning algorithms as well as AutoML for medical image analysis and classification. The ten datasets used are – PathMNIST, ChestMNIST, DermaMNIST, OCTMNIST, PneumoniaMNIST, RetinaMNIST, OrganMNIST(axial, coronal, sagittal). The datasets have been trained on ResNet-18 and ResNet-50 baseline models. For AutoML it has been trained on AutoKeras, Auto-sklearn, and Google AutoML Vision. 

For entire code by MedMNIST creator, you can check this GitHub.

MEDICAL MNIST

Developed in 2017 by Arturo Polanco Lozano. This is also known as the MedNIST dataset for radiology and medical imaging. Images have been gathered from several datasets – at TCIA, the RSNA Bone Age Challenge, and the NIH Chest X-ray dataset.

The dataset contains 58954 medical images belonging to 6 classes – ChestCT(10000 images), BreastMRI(8954 images), CXR(10000 images), Hand(10000 images), HeadCT(10000 images), AbdomenCT(10000 images). Images are in the dimensions of 64×64 pixels. 

Dataset size: 75.98 MB

For entire code by NVIDIA Deep Learning Institute, you can check this notebook.

SIGN LANGUAGE MNIST

Developed in 2017, this dataset is taken from American Sign Language(ASL) which has almost the same as MNIST having 28*28 dimensions in grayscale. The dataset contains 27,455 training data and 7172 testing data to be classified into 24 classes. Dataset labels are from A to Y representing each hand gesture. Each data represents a label from 0 to 25 to be mapped for each alphabetic letter A-Z (except for 9=J or 25=Z). The dataset is present in Kaggle as CSV format storing each pixel value in rows(pixel1 to pixel784). 

Dataset Size: 100.9 MB

An implementation of this dataset using Keras library is present in this notebook.

Colorectal Histology MNIST

See Also
Time series

Developed in 2016, by multiple authors Kather, Francesco and Melchers, Jakob Nikolas and Weis, Susanne M and Schad, Alexander and Z{“o}llner, Lothar R and Gaiser, Cleo-Aron and Bianconi, Timo and Marx, Frank Gerrit. Multiclass classification for texture analysis in colorectal cancer histology belonging to 8 classes of tissues. There are two sets ColorectalHistology containing 5000 images of 150 x 150 x 3 in RGB another ColorectalHistologyLarge containing 10 large 5000 x 5000 pixels containing more than one type of tissue.

Dataset Size: 1.14 GB

Code Snippet

Using TensorFlow

For ColorectalHistology,

import tensorflow_datasets as tfds
train,test = tfds.load('ColorectalHistology', split=['train', 'test'])

For ColorectalHistologyLarge,

import tensorflow_datasets as tfds
train,test = tfds.load('ColorectalHistologyLarge', split=['train', 'test'])

Skin Cancer MNIST

Added from different sources this dataset contains dermatoscopic images of pigmented lesions was created in 2018. Developed by multiple authors Philipp Tschandl, Noel Codella, Veronica Rotemberg, M. Emre Celebi, Aadi Kalloo, Konstantinos Liopyris, Stephen Dusza, David Gutman, Brian Helba, Michael Marchetti, Harald Kittler, Allan Halpern.

This dataset is released by the HAM10000 (“Human Against Machine with 10000 training images”). It contains 10015 dermatoscopic images present in training set for academic machine learning research and available at the ISIC archive. 

It has 7 different classes of skin cancer which are – 1-Melanocytic nevi, 2 – Melanoma, 3 – Benign keratosis-like lesions, 4 -Basal cell carcinoma, 5 – Actinic keratoses, 6 – Vascular lesions, 7 – Dermatofibroma.

Dataset Size: 2.7 GB

An implementation of the above can be found in this notebook.

What Do You Think?

If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.
What's Your Reaction?
Excited
0
Happy
0
In Love
3
Not Sure
0
Silly
0

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top