How To Overcome The Data Problem In Healthcare


Anup Kumar, senior staff engineer at Stryker, spoke about the modalities in orthopaedics, data collection in healthcare, and making most of the available data during his session at the Machine Learning Developers Summit (MLDS) 2021.

Michigan-headquartered Stryker is one of the world’s leading medical technology companies with offices across the world, including India. Stryker offers products and services in Orthopaedics, Medical, Surgical, Neurotechnology and Spine related areas.


Sign up for your weekly dose of what's up in emerging technology.

Trauma, Fractures & Their Classification

“A trauma situation is a fracture resulting from accidents. Trauma is an emergency situation and not a planned surgery. Hence CT Scans do not come into play. Such patients are directly taken to the operation theatre. Hence modality considerations in such situations have to be in situ,” said Kumar.

Once we have the fracture report through X-ray, the next step is to classify the fracture as per AO format. The AO Classification of fractures is a system for categorising bone fractures, first introduced in 1987. 

To study the AO class reliably, few things must be taken into account:

  • Location of the fracture
  • The viewpoint–anterior/posterior (AP) and medial/lateral (ML)
  • Whether any hardware inserted or not

Data Collection

The first and the foremost step in AO classification is data collection.. “We go web scraping across different images and collect samples. There are legal implications for commercial projects because it is very hard to trace the copyrights for these kinds of images. The safest bets for a commercial project is to go to the hospital or build a relationship with the hospital, get the dataset from them with a contract,” said Kumar.

Personal Health Information (PHI) is a legal consideration dealing with all the information associated with the data involving the patient’s identity or the nature of the care given. “As long as we can remove these pieces of information from the images, we are good to go. This is called anonymisation,” said Kumar.

Data Augmentation

Obtaining data is just the tip of the iceberg. The collected data may pose further challenges, such as imbalances in the AO sets and lack of viewpoints and information.

Kumar said, “Getting information for all the AO classes for the literature is difficult. The next challenge is how to get accuracy with a very small amount of data.”

The solution to this problem is data augmentation, a common strategy used to increase the diversity of the data without having to collect new data. However, Kumar cautioned that even with data augmentation techniques, there might be a few drawbacks such as lack of viewpoint variation (AP and ML), hardware exclusion and inclusion for different AO classes, etc.

Within data augmentation, a technique called Mixup is used to generate weighted combinations of random image pairs from the training data. “Mixup helps in creating data samples during training itself. It helps in making new data samples from the same AO class. However, the challenge is knowing whether you are getting a realistic image that is medically viable,” said Kumar.

Data Generation Using GANs

Traditional data generation using GANs has a few shortcomings, including repeated prediction of the same sample, conversion problems, and missing component number in generated images. GANs must be used carefully for medical images to control context.

The medical community uses a different approach. “We will take a simulated x-ray. Here we collect CT scans instead of x-rays. CT Scans are converted to simulate x-ray projection classes to create synthetic versions. These are called Digitally Reconstructed Radiographs (DRRs). In this, generally, we use average intensity projection and maximum intensity projection. The advantage with this method is that you can add density, giving you close to real image output,” said Kumar.

The DRRs are then fed to cycleGAN to obtain a real image.

Credit: Anup Kumar

More Great AIM Stories

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at

Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM