The New NIH Dataset And AI May Revolutionise Lesion Detection

Artificial intelligence has helped in the advancement of healthcare and both the fields have benefited from it. The medical field has enormous amounts of data which can be leveraged by machine learning and AI algorithms. AI can become a great asset in improving healthcare services. But as medical applications, in general, need lots of data for training intelligent machines, it was nearly impossible for researchers to build trustworthy detectors. Some data resources present were not vetted by medical experts and hence using them was risky.

But now that is about to change. The National Institutes of Health’s Clinical Center, US, has recently created a huge dataset of Computed Tomography (CT) images and made it available publicly. Anyone can download these images without even signing up. This will greatly help the community to improve the detection accuracy of lesions. This is a special development because NIH claims that, “While most publicly available medical image datasets have less than a thousand lesions, this dataset, named DeepLesion, has over 32,000 annotated lesions identified on CT images.”

The research and the collection process was carried out by Ronald M Summers, MD, PhD and his associates, working at the NIH Clinical Center Radiology and Imaging Sciences Department. The research paper titled Deeplesion: Automated Mining Of Large-scale Lesion Annotations And Universal Lesion Detection With Deep Learning was is published in the Journal of Medical Imaging earlier this year.

The wider economy can also be benefited. If AI is the engine of growth, then the healthcare AI market is definitely getting geared up for the ultimate boom. As AI technology paves the way for smarter healthcare systems. The most prominent revenue churners in AI applications are virtual assistants, optimisation systems for administrative workflows and robot-assisted surgery.

Many companies are exploring various applications of artificial intelligence and machine learning in healthcare segment – from predicting heart diseases to diagnosing illness to monitoring critical care. According to a report, the market for artificial intelligence in healthcare application is expected to grow at a CAGR of 42 percent till 2021.

‘DeepLesion’ By NIH

‘DeepLesion’ is a dataset comprising 32,000 CT images with carefully annotated lesions. This will be a breath of fresh air and give practitioners in medical AI good amounts of data to build important systems. The dataset comes from 4,400 unique individuals (patients) and has been annotated by experts.

The procedure according to NIH is as follows:

  1. Once a patient is out of a CT scanner, his/her images are sent to a radiologist to interpret.
  2. Radiologists then go on to measure and mark important observations with an electronic bookmark tool.
  3. These bookmarks are in the form of arrows, lines, diameters, and text that direct to the exact location and size of a lesion so experts can identify growth or new disease.

The bookmarks mentioned are full of medical records and observations are at the core of DeepLesion dataset. The researchers at the NIH say, “DeepLesion is unlike most lesion medical image datasets currently available, which can only detect one type of lesion. The database has great diversity – it contains all kinds of critical radiology findings from across the body, such as lung nodules, liver tumours, enlarged lymph nodes, and so on.”

The traditional ways of collecting image annotation cannot be translated into the image domain, according to the NIH researchers. The medical domain data has to be treated by experts and practitioners of medicine with great clinical experience. But the size of the dataset released by NIH is large enough to train a deep learning system. The main aim of releasing the huge dataset is to create a great universal lesion detector which will be independent of medical supervisors.

The Importance Of Open Source Datasets

The larger credit for the deep learning revolution has always been attributed to faster and larger computers, and also to the availability of larger datasets. But if you ask researchers and developers in the field of machine learning, they will point to the real hero of the revolution: open access. The fast and open discovery and dissemination of state-of-the-art knowledge is truly fabulous.

Researchers will be empowered by this release of the dataset. Development of a universal lesion detector which is a target for so many years for the medical community will finally come true. This research and model will also help radiologists to find all types of lesions. The great news is that the produced AI model can serve as an initial screening tool for patients. After the screening, the results can be sent to other specialist systems trained on specific types of lesions. NIH also states that the system produced with the data can be used mine and study the relationship between different types of lesions. Therefore, DeepLesion will help to correlate different type of lesions

NIH previously in 2017, had also released anonymized chest x-ray images and their corresponding data. The statement by NIH clearly marked their dedication to the project which said, “In the future, the NIH Clinical Center hopes to keep improving the DeepLesion dataset by collecting more data, thus improving its detection accuracy. The universal lesion detecting capability will become more reliable once researchers are able to leverage 3D and lesion type information. It may be possible to further extend DeepLesion to other image modalities such as MRI and combine data from multiple hospitals, as well.”

More Great AIM Stories

Abhijeet Katte
As a thorough data geek, most of Abhijeet's day is spent in building and writing about intelligent systems. He also has deep interests in philosophy, economics and literature.

More Stories


8th April | In-person Conference | Hotel Radisson Blue, Bangalore

Organized by Analytics India Magazine

View Event >>

30th Apr | Virtual conference

Organized by Analytics India Magazine

View Event >>

Yugesh Verma
All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges

Yugesh Verma
A beginner’s guide to Spatio-Temporal graph neural networks

Spatio-temporal graphs are made of static structures and time-varying features, and such information in a graph requires a neural network that can deal with time-varying features of the graph. Neural networks which are developed to deal with time-varying features of the graph can be considered as Spatio-temporal graph neural networks. 

Vijaysinh Lendave
How to Evaluate Recommender Systems with RGRecSys?

A recommender system, sometimes known as a recommendation engine, is a type of information filtering system that attempts to forecast a user’s “rating” or “preference” for an item. In this post, we will look at RGRecSys, a library that performs constraint evaluation of recommender systems.

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM