Listen to this story
While researchers keep debating about self-supervised learning and reinforcement learning, it is clear that both fields are making remarkable progress, since 2022 saw tremendous innovations in both these fields.
Yann LeCun, the guru of self-supervised learning said, “Reinforcement learning is like a cherry on a cake, supervised learning is the icing on the cake, and self-supervised learning is the cake.”
Check out this list of the top-10 self-supervised models in 2022 .
AIM Daily XO
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Meta AI released the data2vec algorithm in January for speech, vision, and text-related computer vision models. Initially released as a competition for NLP tasks, data2vec does not use contrastive learning or rely on the reconstruction of the input example. The team said that data2vec is trained by giving a partial view of the input data and predicting the model representations.
Also known as the ConvNet model for the 2020s, ConvNext was proposed by the Meta AI team in March. It is constructed entirely by the ConvNet modules and is therefore accurate, simple in design, and scalable.
Download our Mobile App
Variance-Invariance-Covariance Regularization (VICReg) combines the variance term and a decorrelation mechanism based on redundancy reduction along with covariance regularisation to avoid the collapse problem of the encoder outputting constant vectors.
MIT’s Computer Science and AI Lab with Microsoft and Cornell University developed the Self-Supervised Transformer with Energy-based Graph Optimisation (STEGO) that discovers and localises semantically meaningful categories without any annotation in the image corpora. It uses a semantic segmentation method through which it labels every pixel in an image.
For self-supervised speech representation learning, researchers from the Chinese University of Hong Kong proposed Code BERT. Unlike other self-distillation approaches, their model predicts representations from a different modality. The model converts speech into a sequence of discrete codes for representation learning.
An unsupervised federated learning framework proposed by Microsoft, with cross-knowledge distillation, FedX learns unbiased representation from heterogeneous and decentralised local data by employing two-sided knowledge distillation and contrastive learning. Also, it is an adaptable architecture that can be used as an add-on module for various existing self-supervised algorithms in federated settings.
The Hokkaido University of Japan proposed TriBYOL for self-supervised representation learning for small batch sizes. With this method, researchers do not need heavy computational resources that require large batch sizes to learn good representation. This is a triplet network combined with a triple-view loss, hence improving efficiency and outperforming several self-supervised algorithms on several datasets.
Researchers from Nokia Bell Labs collaborated with Georgia Tech and the University of Cambridge to develop ColloSSL, a collaborative self-supervised framework for human activity recognition. Unlabelled sensor datasets captured simultaneously from multiple devices can be viewed as natural transformations for each other, and then generate a signal for representation learning. The paper presents three approaches – device selection, contrastive sampling, and multi-view contrastive loss.
Sungkyunkwan University proposed a simple auxiliary task of self-supervision that predicts localisable rotation (LoRot) with three properties to assist supervised objectives. First, to guide the model in learning rich features. Second, no significant alterations in training distribution while transforming in self-supervision. And third, light and generic tasks for high applicability on previous arts.
Microsoft and Peking University presented a universal framework to learn representations of time series in an arbitrary semantic level, TS2Vec. The model performs contrastive learning in hierarchical technique over augmented context views, enabling robust contextual representation for individual timestamps. The result showed significant improvement over state-of-the-art unsupervised learning time series representation.