Active Hackathon

Google Reveals “What is being Transferred” in Transfer Learning

“Transfer Learning will be the next driver of Machine Learning Success”

Andrew NG

Recently, researchers from Google proposed the solution of a very fundamental question in the machine learning community — What is being transferred in Transfer Learning? They explained various tools and analyses to address the fundamental question. 

The ability to transfer the domain knowledge of one machine in which it is trained on to another where the data is usually scarce is one of the desired capabilities for machines. Researchers around the globe have been using transfer learning in various deep learning applications, including object detection, image classification, medical imaging tasks, among others.


Sign up for your weekly dose of what's up in emerging technology.

Despite these utilisations, there are cases found by several researchers where there is a nontrivial difference in visual forms between the source and the target domain. It has become difficult for the researchers to understand what enables a successful transfer and which parts of the network are responsible for that. 

The Methodology

In order to investigate transfer learning, the researchers analyse networks in four different cases — the pre-trained network, the network at random initialisation, the network that is fine-tuned on target domain after pre-training on the source domain and the model that is trained on target domain from random initialisation.

They also used a series of analysis to understand what is being transferred between the models:

  • Firstly, they investigated the feature reuse by shuffling the data. The shuffling of blocks in the data disrupts the visual features in the images. This analysis showed the importance of feature re-use and proved that the low-level statistics of the data that is not disturbed by shuffling the pixels also play a role in the successful transfer.
  • Next, they compared the detailed behaviours of trained models. To perform this, the researchers investigated the agreements and disagreements between models that are trained from pre-training versus scratch. This experiment proved that two instances of models trained from pre-trained weights are more similar in feature space compared to ones trained from random initialisation.
  • The researchers then investigated the loss landscape of models trained from pre-training and random initialisation weights. They observed that there is no performance barrier between the two instances of models trained from pre-trained weights, which suggests that the pre-trained weights guide the optimisation to a flat basin of the loss landscape.

Dataset Used

The researchers used CheXpert data, which is a medical imaging dataset of chest x-rays considering different diseases. Besides this, they also used the DomainNet dataset that is specifically designed to probe transfer learning in diverse domains. The domains range from real images to sketches, clipart and painting samples.

Contributions Of This Research

The researchers made several contributions to this project. They are mentioned below:

  • For a successful transfer, both feature-reuse and low-level statistics of the data are important.
  • Models trained from pre-trained weights make similar mistakes on the target domain. They also have similar features and are surprisingly close in the distance in the parameter space. They are usually in the same basins of the loss landscape.
  • The models trained from random initialisation do not live in the same basin. They usually make different mistakes and have different features, and are farther away in the distance in the parameter space.
  • Modules in the lower layers are in charge of general features, and modules in higher layers are more sensitive to perturbation of their parameters.
  • One can start from earlier checkpoints of the pre-trained model without losing the accuracy of the fine-tuned model. The starting point of such phenomena depends on when the pre-train model enters its final basin.

Wrapping Up

In this project, the researchers presented that when a model is trained from pre-trained weights, the model stays in the same basin as well as in the loss landscape. Also, the different instances of such models are similar in feature space and close in parameter space.

They concluded that feature reuse plays a vital role in transfer learning, especially when the downstream task shares similar visual features with the pre-training domain. However, there are certain other factors such as low-level statistics that can lead to significant benefits of transfer learning, especially on optimisation speed. 

Further, on a concluding note, the researchers said, “Our observation of low-level data statistics improving training speed could lead to better network initialisation methods. Using these findings to improve transfer learning is of interest for future work.”

Read the paper here.

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Council Post: How to Evolve with Changing Workforce

The demand for digital roles is growing rapidly, and scouting for talent is becoming more and more difficult. If organisations do not change their ways to adapt and alter their strategy, it could have a significant business impact.

All Tech Giants: On your Mark, Get Set – Slow!

In September 2021, the FTC published a report on M&As of five top companies in the US that have escaped the antitrust laws. These were Alphabet/Google, Amazon, Apple, Facebook, and Microsoft.

The Digital Transformation Journey of Vedanta

In the current digital ecosystem, the evolving technologies can be seen both as an opportunity to gain new insights as well as a disruption by others, says Vineet Jaiswal, chief digital and technology officer at Vedanta Resources Limited

BlenderBot — Public, Yet Not Too Public

As a footnote, Meta cites access will be granted to academic researchers and people affiliated to government organisations, civil society groups, academia and global industry research labs.