MITB Banner

How To Do Machine Learning When Data Is Unlabelled

Share

Semi-weakly supervised learning is a product of combining the merits of semi-supervised and weakly supervised learning. The goal here is to create efficient classification models.

To test this, Facebook AI has used a teacher-student model training paradigm and billion-scale weakly supervised data sets.

An example of a weakly supervised data set can be hashtags associated with publicly available photos. Since Instagram is rich with such data, it was chosen for performing semi-weakly supervised learning.

For the experiments, the team at Facebook AI used “Semi-weakly” supervised (SWSL) ImageNet models that are pre-trained on 940 million public images with 150,000 hashtags.

In this case, the associated hashtags are only used for building a better teacher model. During training the student model, those hashtags are ignored and the student model is pre-trained with a subset of images selected by the teacher model from the same 940 million public image dataset.

The results show that this approach has set new benchmarks for image and video classification models.

Training With Unlabeled Data

via FAIR

The above figure illustrates how semi-supervised training framework is used to generate lightweight image and video classification models.

The training procedure is carried out as follows:

  • A larger-capacity and highly accurate “teacher” model with all available labelled data sets are trained first.
  • Teacher model predicts the labels and corresponding soft-max scores for all the unlabelled data 
  • For pretraining the lightweight, computationally highly efficient “student” classification model, the top-scoring examples are considered
  • Student model with all the available labelled data is fine-tuned.

However, using semi-supervised data alone won’t be sufficient to achieve a state-of-the-art result at billion scales. To improve on this model, researchers at Facebook introduced semi-weak supervision approach. 

Researchers used the weakly supervised teacher model to select pretraining examples from the same data set of one billion hashtagged images.

To create highly accurate models, the teacher model is made to predict labels for the same weakly supervised data set of 65 million publicly available Instagram videos with which it was pre-trained.

For example, consider a tail class like “African Dwarf Kingfisher” bird. One might have a hard time finding a dataset containing labelled images of this bird. There may not be a sufficient number of weakly-supervised/tagged examples. However, chances are that a lot of untagged images of this bird is likely to exist in the unlabelled dataset. 

As discussed above, the teacher model trained with labels is able will identify enough images from the unlabeled data and classify the right kind of bird.

The teacher model obtained by pre-training on weakly-supervised data followed by fine-tuning on task-specific data has shown promising results. The student model obtained by training on the data selected by the teacher model is significantly better than the one obtained by training directly on the weakly-supervised data. This particular approach is what has led to achieving state-of-the-art results.

The results show that the weakly supervised teacher model, with 24x greater capacity than the student model, provided 82.8% top-1 accuracy on the validation set. 

Training details:

  • Models are trained using synchronous stochastic gradient descent (SGD) on 64GPUs across 8 machines. 
  • Each GPU processes 24 images at a time and apply batch normalisation to all convolutional layers on each GPU. 
  • The weight decay parameter is set to 0.0001 in all the experiments. 
  • For fine-tuning on ImageNet, the learning rate is set to 0.00025 over 30 epochs.

Key Takeaways

  • Semi-weakly supervised training framework has resulted in a new state-of-the-art academic benchmark for lightweight image and video classification models. 
  • It helped reduce the accuracy gap between the high-capacity state-of-the-art models and computationally efficient production-grade models. 
  • Can be used to create efficient, low-capacity production-ready models that deliver substantially higher accuracy than was previously possible.

By using a very large dataset of unlabelled images via semi-supervised learning, the researchers were able to improve the quality of CNN models. 

PS: The story was written using a keyboard.
Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India