Developers are constantly enhancing artificial intelligence capability by leveraging various machine learning methodologies like supervised, unsupervised, and reinforcement learning. More recently, Facebook used supervised learning for state-of-the-art image and video classification. They call their new approach as semi-weak supervision, where the firm combined semi-supervised learning and weakly supervised learning to achieve a new benchmark in image classification.
Facebook boasts that their method will be able to deliver effective results even when there is a dearth of high-quality labelled training data.
Semi-Weak Supervision
Citing several drawbacks like inherent noise, missing and irrelevant tags etc, in training models with weak supervision, Facebook used the teacher-student model training paradigm. They leveraged billions of unlabelled images along with a relatively smaller set of task-specific labelled data. Initially, they trained on labelled data to get a teacher model. Then the teacher model was used to predict and rank the unlabelled images to determine top images and create a new dataset with the filtered data. Following that, the new dataset was used to train a student model. Eventually, they fine-tuned the student data to avoid potential labelling error with labelled data.
Such methodology assists the developers in case of missing supervised datasets, the model can select new data from unlabelled dataset to deliver accurate semi-supervised model. Collectively, this resulted in an accuracy of 81.2% on ImageNet using the RestNet-50 model. And in the Kinetics video action classification, they achieved 74.2%, which is 2.7% improvement over the previous top model.
This has enabled Facebook to create efficient low-capacity production-ready models that deliver higher accuracy even with fewer data and without the requirement of high computing capacity.
What Does It Mean For Firms?
Today, firms often try to feed a colossal amount of data for improving the accuracy of their machine learning models but the trade-off is that they require high computational resources. Besides, they even struggle to gather enough relevant data to train, thereby, slacking products time-to-market. However, with Facebook’s innovation, they can achieve better results with less labelled data and the same computational resources.
Getting rid of the resource-intensive machine learning models will not only help organisations in reducing operational savings but also allow them to enhance the superiority of their products.
Marching Away From Labelled Dataset
There are many billions of things and objects in the world, and it is impossible to manually label them all. Thus, sooner rather than later, businesses will have to move away from conventional machine learning techniques and adopt new methods such as semi-supervised learning.
For instance, Facebook’s weekly supervised approach used Instagram’s images to train the model based on image tags. Although it delivered state-of-the-art accuracy, there are many challenges pertaining to train the weekly supervised model such as labelling noise and requirement of huge computational power. Consequently, Facebook’s semi-supervised learning is an effective workaround for such problems in machine learning.
Outlook
Although the semi-weekly supervised model outpowered fully-supervised and weakly supervised models, one cannot conclude that semi-supervised learning is the best way ahead in image and video classification. Several blue-chip companies have made advancement in neural networks that have mitigated problems pertaining to high computation and automating randomisation. Case in point, the OpenAI’s automatic domain randomisation and Google’s supremacy in quantum computing can be extended into image and video classification in the near future. These new innovations have reduced the training time drastically, has enabled models to make own decisions in new situations even if they are not trained with a specific situation.
Palpably, Facebook has made its mark in image processing by delivering the best results thus far. And can be considered one of the best ways for image and video classification with minimum datasets and computational power.