From interpreting chest x-rays to identifying eye diseases, the domain of transfer learning has found its significance in a variety of standard medical tasks. Therefore, it is extremely important to understand the commonly held assumptions, challenges and other solutions within the realms of transfer learning.
In transfer learning, the neural network is trained in two stages:
- Pre-training: The network is generally trained on a large-scale benchmark dataset representing a wide range of categories
- Fine-tuning: Pre-trained network is further trained on the specific target task of interest, which may have fewer labelled examples than the pre-training dataset.
What Are The Challenges?
In spite of being widely popular there are still few pressing questions bothering transfer learning in ML:
- How much of the original task has the model forgotten?
- Why don’t large models change as much as small models?
- Can we make more out of pre-trained weight statistics?
- Are the results similar for other tasks, such as segmentation?
A common practice in medical imaging tasks is to start with a large image of a bodily region of interest and identify diseases by identifying the variations in local textures in the images.
For example, in retinal fundus images, small red ‘dots’ means presence of microaneurysms and diabetic retinopathy, and in chest x-rays local white opaque patches are signs of consolidation and pneumonia.
This is in contrast to natural image datasets like ImageNet, where there is often a clear global subject of the image.
There is thus a myriad of open questions unattended such as how much ImageNet feature reuse is helpful for medical images amongst many others.
In a paper titled, “Transfusion: Understanding Transfer Learning for Medical Imaging”, researchers at Google AI, try to open up an investigation into the central challenges surrounding transfer learning. This paper was submitted at the prestigious NIPS 2019.
The standard ImageNet architectures were considered for experiments. ResNet50 and Inception-v3 were used, which are widely used medical transfer learning applications. The researchers also have designed a new family of simple, smaller convolutional architectures.
Each architecture has four to five repetitions of this basic layer. This model family is named CBR.
The above picture is a schematic representation of large models moving less through training than smaller networks.
Performance evaluation was done as follows:
- Models that were trained from random initialisation were compared to those pre-trained on ImageNet that use transfer learning for the same tasks.
- Then two large scale medical imaging tasks — diagnosing diabetic retinopathy from fundus photographs and identifying five different diseases from chest x-rays were looked at.
- Neural network architectures like ResNet50, Inception-v3 as well as a family of simple, lightweight convolutional neural networks like CBRs were assessed.
On the evaluation of transfer learning in very small data regimes, it was found that there was a larger gap in performance between transfer and training from scratch for large models such as ResNet. However, this was not true for smaller models like the ones designed CBRs. This finding indicates that the large models designed for ImageNet might be too over parameterised for the very small data regime.
The findings from the results from evaluating all of these models can be summarised as follows:
- Models trained from scratch performing nearly as well as standard ImageNet transferred models. Transfer learning does not significantly affect performance on medical imaging tasks.
- Smaller models perform at a level comparable to the standard ImageNet architectures on many medical imaging tasks.
- However, smaller models perform much worse on ImageNet classification, highlighting that ImageNet performance is not indicative of performance on medical tasks.
- Transfer offers feature independent benefits to convergence simply through better weight scaling.
- Using pre-trained weights from the last 2 layers of the network is found to have the biggest effect on convergence.
- It also has been observed that using pre-trained weights results in faster convergence and this is due to significant feature reuse.
AI for medical applications, unlike any other domain, needs more of these kinds of evaluations. The outcomes are not some trivial recommendations on an e-commerce site but results which have near-fatal outcomes. So, from deciding the size of the network to training techniques, this work by the researchers at Google investigates every aspect of transfer learning, solving few while presenting open questions for further research.