Using MONAI Framework For Medical Imaging Research

MONAI is a PyTorch based framework, community-driven, and has been accepted in many healthcare imaging solutions. It is integrated with training and modelling workflows in a native PyTorch Standard. MONAI provides deep learning solution in medical image training and analysis at several places
MONAI Analytics India Magazine

Medical Imaging has been used in several applications in the healthcare industry. Deep Learning solutions have exceeded many healthcare tasks in detecting and diagnosing abnormalities in medical data. In January 2020, we noticed Google’s DeepMind AI outperformed radiologists in detecting breast cancer, according to Nature’s publication.

Data management is one of the most critical steps in deep learning solutions. The size of healthcare data is reaching 2314 Exabytes of new data by 2020. According to IBM, We are almost generating 2.5 quintillion bytes of data daily, including large healthcare and financial information.

Introduction

MONAI is a PyTorch based framework, community-driven, and has been accepted in many  healthcare imaging solutions. It is integrated with training and modelling workflows in a native PyTorch Standard. MONAI provides deep learning solution in medical image training and analysis at several places

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

A typical end-to-end workflow of MONAI in the medical deep learning area

Source: https://docs.monai.io/en/latest/_images/end_to_end.png

MONAI Dataset Managers

CacheDataset is a multi-thread dataset manager to accelerate the transformation during the training by storing intermediate outputs before the randomised transforms stage in the transformation loop. It provides up to 10x training speed. 

Code: https://docs.monai.io/en/latest/_modules/monai/data/dataset.html#CacheDataset

Documentation: https://docs.monai.io/en/latest/data.html#monai.data.CacheDataset

Similar to CacheDataset, PersistentDataset stores the intermediate cache values which are persisted in the disk for the rapid retrieval between the runs, especially in case of hyperparameter tuning or when the dataset size is much bigger than the available memory. 

Code: https://docs.monai.io/en/latest/_modules/monai/data/dataset.html#PersistentDataset

Documentation: https://docs.monai.io/en/latest/data.html#persistentdataset

Just like in PyTorch SmartCache, SmartCacheDataset in MONAI replaces the same number of items once the previous epoch gets completed during the training process. 

Let’s say we have 5 images:[image1, image2, image3, image4, image5] and cache_numr = 4, replace_rate = 0.25

cache_num: number of items needs to be cached. Default is sys.maxsize. It will take the minimum of (cache_num, data_length x cache_rate, data_length).

During training, image will be replaced for every epoch as below: 

 epoch 1: [image1, image2, image3, image4]
 epoch 2: [image2, image3, image4, image5]
 epoch 3: [image3, image4, image5, image1]
 epoch 3: [image4, image5, image1, image2]
 epoch N: [image[N % 5] ...] 

Code: https://docs.monai.io/en/latest/data.html#smartcachedataset

Documentation: https://docs.monai.io/en/latest/_modules/monai/data/dataset.html#SmartCacheDataset

ZipDataset combines multiple PyTorch datasets and converts them into a tuple. It allows performing complex training transformation in different data sources.

Documentation: https://docs.monai.io/en/latest/data.html#zipdataset

Code: https://docs.monai.io/en/latest/_modules/monai/data/dataset.html#ZipDataset

You can test the above different dataset managers here

We are going to learn how to load a public dataset in MONAI. MedNISTDataset, DecathlonDataset etc. can be loaded directly using MONAI APIs.

from monai.apps import DecathlonDataset, MedNISTDataset

Source: https://docs.monai.io/en/latest/_images/dataset_progress.png

We can refer to the MedNISTDataset or DecathlonDataset to easily create a new Dataset for other public data. MONAI provides

Mainly include the below steps:

  • MONAI provides a caching mechanism called CacheDataset to accelerate the training process.
  • Need to check if the datasets provide a public access key to share and use it 
  • Using monai.apps.download_and_extract to download and extract the dataset
  • Need to define and split the dataset into training, validation, and test sets
  • It is important to create the data list with dict items:

[
{'image': image1_path, 'label': label1_path},
{'image': image2_path, 'label': label2_path},
{'image': image3_path, 'label': label3_path},
... ...
]

Installing MONAI Packages in python

pip install monai[all]

Below example is taken from MONAI tutorial from GitHub.

Import Python and MONAI APIs

 import os
 import sys
 import tempfile
 import matplotlib.pyplot as plt
 from monai.apps import download_and_extract
 from monai.config import print_config
 from monai.data import CacheDataset
 from monai.transforms import (
     Compose,
     LoadImaged,
     Randomizable,
     ToTensord,
 )
 print_config() 

#Defining public dataset and the required transformation

For this example, we are going to use IXIdataset.

Download size 4.5 GB Approx.

 class IXIDataset(Randomizable, CacheDataset):
     resource = "http://biomedic.doc.ic.ac.uk/brain-development/downloads/IXI/IXI-T1.tar"
     md5 = "34901a0593b41dd19c1a1f746eac2d58"
     def __init__(
         self,
         root_dir,
         section,
         transform,
         download=False,
         seed=0,
         val_frac=0.2,
         test_frac=0.2,
         cache_num=sys.maxsize,
         cache_rate=1.0,
         num_workers=0,
     ):
         if not os.path.isdir(root_dir):
             raise ValueError("Root directory root_dir must be a directory.")
         self.section = section
         self.val_frac = val_frac
         self.test_frac = test_frac
         self.set_random_state(seed=seed)
         dataset_dir = os.path.join(root_dir, "ixi")
         tarfile_name = f"{dataset_dir}.tar"
         if download:
             download_and_extract(self.resource, tarfile_name, dataset_dir, self.md5) 

        # loading sample 10 images

         self.datalist = [
             {"image": os.path.join(dataset_dir, "IXI314-IOP-0889-T1.nii.gz"), "label": 0},
             {"image": os.path.join(dataset_dir, "IXI249-Guys-1072-T1.nii.gz"), "label": 0},
             {"image": os.path.join(dataset_dir, "IXI609-HH-2600-T1.nii.gz"), "label": 0},
             {"image": os.path.join(dataset_dir, "IXI173-HH-1590-T1.nii.gz"), "label": 1},
             {"image": os.path.join(dataset_dir, "IXI020-Guys-0700-T1.nii.gz"), "label": 0},
             {"image": os.path.join(dataset_dir, "IXI342-Guys-0909-T1.nii.gz"), "label": 0},
             {"image": os.path.join(dataset_dir, "IXI134-Guys-0780-T1.nii.gz"), "label": 0},
             {"image": os.path.join(dataset_dir, "IXI577-HH-2661-T1.nii.gz"), "label": 1},
             {"image": os.path.join(dataset_dir, "IXI066-Guys-0731-T1.nii.gz"), "label": 1},
             {"image": os.path.join(dataset_dir, "IXI130-HH-1528-T1.nii.gz"), "label": 0},
         ]
         data = self._generate_data_list()
         super().__init__(
             data, transform, cache_num=cache_num, cache_rate=cache_rate, num_workers=num_workers,
         )
     def randomize(self, data=None):
         self.rann = self.R.random()
     def _generate_data_list(self):
         data = list()
         for d in self.datalist:
             self.randomize()
             if self.section == "training":
                 if self.rann < self.val_frac + self.test_frac:
                     continue
             elif self.section == "validation":
                 if self.rann >= self.val_frac:
                     continue
             elif self.section == "test":
                 if self.rann < self.val_frac or self.rann >= self.val_frac + self.test_frac:
                     continue
             else:
                 raise ValueError(
                     f"Unsupported section: {self.section}, "
                     "available options are ['training', 'validation', 'test']."
                 )
             data.append(d)
         return data 

#Selecting a few images to visualise 

 train_ds = IXIDataset(
    root_dir=root_dir,
    section="training",
    transform=Compose([LoadImaged("image"), ToTensord("image")]),
    download=True,
 )
 plt.figure("check", (18, 6))
 for i in range(3):
    plt.subplot(1, 3, i + 1)
    plt.imshow(train_ds[i]["image"][:, :, 80].detach().cpu(), cmap="gray")
 plt.show() 

Please check the full code here

Conclusion

In this article, we have learned about different datasets managed from MONAI and also how they are unique from each other. We have also learned and implemented a public medical imaging dataset using MONAI. Check the full tutorial on MONAI here.

Krishna Rastogi
Krishna currently working as an Associate Director at ADaSci. He has 6+ experience research & development, cutting edge engineering to develop products from idea to deployment. He comes with expertise in building deep learning computer vision applications using both hardware and software solutions in several domains. His interests are the domain of distributed learning and Edge AI.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR