Semi-weakly supervised learning is a product of combining the merits of semi-supervised and weakly supervised learning. The goal here is to create efficient classification models.
To test this, Facebook AI has used a teacher-student model training paradigm and billion-scale weakly supervised data sets.
Sign up for your weekly dose of what's up in emerging technology.
An example of a weakly supervised data set can be hashtags associated with publicly available photos. Since Instagram is rich with such data, it was chosen for performing semi-weakly supervised learning.
For the experiments, the team at Facebook AI used “Semi-weakly” supervised (SWSL) ImageNet models that are pre-trained on 940 million public images with 150,000 hashtags.
In this case, the associated hashtags are only used for building a better teacher model. During training the student model, those hashtags are ignored and the student model is pre-trained with a subset of images selected by the teacher model from the same 940 million public image dataset.
The results show that this approach has set new benchmarks for image and video classification models.
Training With Unlabeled Data
The above figure illustrates how semi-supervised training framework is used to generate lightweight image and video classification models.
The training procedure is carried out as follows:
- A larger-capacity and highly accurate “teacher” model with all available labelled data sets are trained first.
- Teacher model predicts the labels and corresponding soft-max scores for all the unlabelled data
- For pretraining the lightweight, computationally highly efficient “student” classification model, the top-scoring examples are considered
- Student model with all the available labelled data is fine-tuned.
However, using semi-supervised data alone won’t be sufficient to achieve a state-of-the-art result at billion scales. To improve on this model, researchers at Facebook introduced semi-weak supervision approach.
Researchers used the weakly supervised teacher model to select pretraining examples from the same data set of one billion hashtagged images.
To create highly accurate models, the teacher model is made to predict labels for the same weakly supervised data set of 65 million publicly available Instagram videos with which it was pre-trained.
For example, consider a tail class like “African Dwarf Kingfisher” bird. One might have a hard time finding a dataset containing labelled images of this bird. There may not be a sufficient number of weakly-supervised/tagged examples. However, chances are that a lot of untagged images of this bird is likely to exist in the unlabelled dataset.
As discussed above, the teacher model trained with labels is able will identify enough images from the unlabeled data and classify the right kind of bird.
The teacher model obtained by pre-training on weakly-supervised data followed by fine-tuning on task-specific data has shown promising results. The student model obtained by training on the data selected by the teacher model is significantly better than the one obtained by training directly on the weakly-supervised data. This particular approach is what has led to achieving state-of-the-art results.
The results show that the weakly supervised teacher model, with 24x greater capacity than the student model, provided 82.8% top-1 accuracy on the validation set.
- Models are trained using synchronous stochastic gradient descent (SGD) on 64GPUs across 8 machines.
- Each GPU processes 24 images at a time and apply batch normalisation to all convolutional layers on each GPU.
- The weight decay parameter is set to 0.0001 in all the experiments.
- For fine-tuning on ImageNet, the learning rate is set to 0.00025 over 30 epochs.
- Semi-weakly supervised training framework has resulted in a new state-of-the-art academic benchmark for lightweight image and video classification models.
- It helped reduce the accuracy gap between the high-capacity state-of-the-art models and computationally efficient production-grade models.
- Can be used to create efficient, low-capacity production-ready models that deliver substantially higher accuracy than was previously possible.
By using a very large dataset of unlabelled images via semi-supervised learning, the researchers were able to improve the quality of CNN models.