Google Reveals “What is being Transferred” in Transfer Learning

“Transfer Learning will be the next driver of Machine Learning Success”

Andrew NG

Recently, researchers from Google proposed the solution of a very fundamental question in the machine learning community — What is being transferred in Transfer Learning? They explained various tools and analyses to address the fundamental question. 

The ability to transfer the domain knowledge of one machine in which it is trained on to another where the data is usually scarce is one of the desired capabilities for machines. Researchers around the globe have been using transfer learning in various deep learning applications, including object detection, image classification, medical imaging tasks, among others.

Despite these utilisations, there are cases found by several researchers where there is a nontrivial difference in visual forms between the source and the target domain. It has become difficult for the researchers to understand what enables a successful transfer and which parts of the network are responsible for that. 

The Methodology

In order to investigate transfer learning, the researchers analyse networks in four different cases — the pre-trained network, the network at random initialisation, the network that is fine-tuned on target domain after pre-training on the source domain and the model that is trained on target domain from random initialisation.

They also used a series of analysis to understand what is being transferred between the models:

  • Firstly, they investigated the feature reuse by shuffling the data. The shuffling of blocks in the data disrupts the visual features in the images. This analysis showed the importance of feature re-use and proved that the low-level statistics of the data that is not disturbed by shuffling the pixels also play a role in the successful transfer.
  • Next, they compared the detailed behaviours of trained models. To perform this, the researchers investigated the agreements and disagreements between models that are trained from pre-training versus scratch. This experiment proved that two instances of models trained from pre-trained weights are more similar in feature space compared to ones trained from random initialisation.
  • The researchers then investigated the loss landscape of models trained from pre-training and random initialisation weights. They observed that there is no performance barrier between the two instances of models trained from pre-trained weights, which suggests that the pre-trained weights guide the optimisation to a flat basin of the loss landscape.

Dataset Used

The researchers used CheXpert data, which is a medical imaging dataset of chest x-rays considering different diseases. Besides this, they also used the DomainNet dataset that is specifically designed to probe transfer learning in diverse domains. The domains range from real images to sketches, clipart and painting samples.

Contributions Of This Research

The researchers made several contributions to this project. They are mentioned below:

  • For a successful transfer, both feature-reuse and low-level statistics of the data are important.
  • Models trained from pre-trained weights make similar mistakes on the target domain. They also have similar features and are surprisingly close in the distance in the parameter space. They are usually in the same basins of the loss landscape.
  • The models trained from random initialisation do not live in the same basin. They usually make different mistakes and have different features, and are farther away in the distance in the parameter space.
  • Modules in the lower layers are in charge of general features, and modules in higher layers are more sensitive to perturbation of their parameters.
  • One can start from earlier checkpoints of the pre-trained model without losing the accuracy of the fine-tuned model. The starting point of such phenomena depends on when the pre-train model enters its final basin.

Wrapping Up

In this project, the researchers presented that when a model is trained from pre-trained weights, the model stays in the same basin as well as in the loss landscape. Also, the different instances of such models are similar in feature space and close in parameter space.

They concluded that feature reuse plays a vital role in transfer learning, especially when the downstream task shares similar visual features with the pre-training domain. However, there are certain other factors such as low-level statistics that can lead to significant benefits of transfer learning, especially on optimisation speed. 

Further, on a concluding note, the researchers said, “Our observation of low-level data statistics improving training speed could lead to better network initialisation methods. Using these findings to improve transfer learning is of interest for future work.”

Read the paper here.

Download our Mobile App

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.