Listen to this story
|
While researchers keep debating about self-supervised learning and reinforcement learning, it is clear that both fields are making remarkable progress, since 2022 saw tremendous innovations in both these fields.
Yann LeCun, the guru of self-supervised learning said, “Reinforcement learning is like a cherry on a cake, supervised learning is the icing on the cake, and self-supervised learning is the cake.”
Check out this list of the top-10 self-supervised models in 2022 .
AIM Daily XO
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
Data2vec
Meta AI released the data2vec algorithm in January for speech, vision, and text-related computer vision models. Initially released as a competition for NLP tasks, data2vec does not use contrastive learning or rely on the reconstruction of the input example. The team said that data2vec is trained by giving a partial view of the input data and predicting the model representations.
We created data2vec, the first general high-performance self-supervised algorithm for speech, vision, and text. When applied to different modalities, it matches or outperforms the best self-supervised algorithms. Read more and get the code:https://t.co/3x8VCwGI2x pic.twitter.com/Q9TNDg1paj
— Meta AI (@MetaAI) January 20, 2022
ConvNext
Also known as the ConvNet model for the 2020s, ConvNext was proposed by the Meta AI team in March. It is constructed entirely by the ConvNet modules and is therefore accurate, simple in design, and scalable.
Download our Mobile App
It's open source, of course: https://t.co/nWx2KFtl7X
— Yann LeCun (@ylecun) January 12, 2022
VICreg
Variance-Invariance-Covariance Regularization (VICReg) combines the variance term and a decorrelation mechanism based on redundancy reduction along with covariance regularisation to avoid the collapse problem of the encoder outputting constant vectors.
The code for VICReg is open sourced.
— Yann LeCun (@ylecun) March 3, 2022
"VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning" by Adrien Bardes, Jean Ponce, Yann LeCun
ICLR 2022 paper: https://t.co/H7crDPHCHV
Code: https://t.co/oadSBT61P3
STEGO
MIT’s Computer Science and AI Lab with Microsoft and Cornell University developed the Self-Supervised Transformer with Energy-based Graph Optimisation (STEGO) that discovers and localises semantically meaningful categories without any annotation in the image corpora. It uses a semantic segmentation method through which it labels every pixel in an image.
Today in collaboration with @MIT and @Cornell we announce a new SOTA for unsupervised Semantic Segmentation. “STEGO” discovers the world’s objects and classifies every pixel of input images: https://t.co/kkDQCMQs6g
— Microsoft Research (@MSFTResearch) April 21, 2022
CoBERT
For self-supervised speech representation learning, researchers from the Chinese University of Hong Kong proposed Code BERT. Unlike other self-distillation approaches, their model predicts representations from a different modality. The model converts speech into a sequence of discrete codes for representation learning.
FedX
An unsupervised federated learning framework proposed by Microsoft, with cross-knowledge distillation, FedX learns unbiased representation from heterogeneous and decentralised local data by employing two-sided knowledge distillation and contrastive learning. Also, it is an adaptable architecture that can be used as an add-on module for various existing self-supervised algorithms in federated settings.
TriBYOL
The Hokkaido University of Japan proposed TriBYOL for self-supervised representation learning for small batch sizes. With this method, researchers do not need heavy computational resources that require large batch sizes to learn good representation. This is a triplet network combined with a triple-view loss, hence improving efficiency and outperforming several self-supervised algorithms on several datasets.
ColloSSL
Researchers from Nokia Bell Labs collaborated with Georgia Tech and the University of Cambridge to develop ColloSSL, a collaborative self-supervised framework for human activity recognition. Unlabelled sensor datasets captured simultaneously from multiple devices can be viewed as natural transformations for each other, and then generate a signal for representation learning. The paper presents three approaches – device selection, contrastive sampling, and multi-view contrastive loss.
LoRot
Sungkyunkwan University proposed a simple auxiliary task of self-supervision that predicts localisable rotation (LoRot) with three properties to assist supervised objectives. First, to guide the model in learning rich features. Second, no significant alterations in training distribution while transforming in self-supervision. And third, light and generic tasks for high applicability on previous arts.
TS2Vec
Microsoft and Peking University presented a universal framework to learn representations of time series in an arbitrary semantic level, TS2Vec. The model performs contrastive learning in hierarchical technique over augmented context views, enabling robust contextual representation for individual timestamps. The result showed significant improvement over state-of-the-art unsupervised learning time series representation.