MITB Banner

Contrastive vs non-contrastive self-supervised learning techniques 

On the contrary, to collapse in contrastive methods, FAIR identified non-contrastive methods to suffer from a lesser collapse problem of a different nature.
Share

Today, the quantity of generated data and the complexity of annotating it is increasing tremendously. To resolve the issue of annotation, self-supervised learning methods come into the picture. Self-supervised models can learn better from the raw data, making it one of the most important areas of AI research today. There are a few methods to train machines without annotated data. Chief Scientist at Meta, Yann LeCun, recently tweeted, sharing his preference for non-contrastive learning. Analytics India Magazine has analysed this long-standing debate: contrastive learning or non-contrastive learning?

Contrastive learning

Contrastive learning is a machine learning approach to finding similar and dissimilar information from a dataset for an algorithm. It is also a classification algorithm where the data is classified based on similarity and dissimilarity. Contrastive methods learn representations by contrasting positive and negative examples. Past research has proved a great empirical success in computer vision tasks using contrastive pre-training. For instance, Hénaff et al., 2019, evaluated contrastive methods trained on unlabelled ImageNet data on a linear classifier and found it to surpass the accuracy of supervised AlexNet. Similarly, He et al., 2019, found contrastive pre-training on ImageNet to effectively transfer to other downstream tasks and outperform the supervised pre-training counterparts.

The contrastive method learns representations by minimising the distance between two views of the same data point and maximising views from different data points. Essentially, it minimises the distance between positive data to a minimum and maximises the distance between negative data to a maximum.

For example, suppose the model has to differentiate between a cat and a dog. In that case, it will do so by recognising the similarities and differences between the animals by identifying data points as similar and different. The programmers can perform augmentation combinations in the training data to pose similar images presenting different versions of the same image. Later, these are fed into vector representations for each image, training the model to similar output representations for similar images so it can differentiate a cat from a dog. As illustrated in this post, it should recognise cats have pointy ears while dogs have droopy ears. 

Contrastive learning in self-supervised vs supervised models/ GoogleAI

What is the dimensional collapse in contrastive learning

Google AI explained the positive and negative in contrastive learning, “These contrastive learning approaches typically teach a model to pull together the representations of a target image (a.k.a., the “anchor”) and a matching (“positive”) image in embedding space, while also pushing apart the anchor from many non-matching (“negative”) images.” Since labels are unavailable, the positive can be an augmentation of the anchor, and the negatives are chosen to be the other samples from the training minibatch. Given the random sampling, false negatives can cause a degradation in the representation quality. Facebook AI Research further noted the positive-negatives as the loss function of contrastive learning. “(It) is intuitively simple: minimise the distance in representation space between positive sample pairs while maximising the distance between negative sample pairs,” the team said.

In the paper worked on by Yann LeCun, contrastive learning can lead to dimensional collapse, “whereby the embedding vectors end up spanning a lower-dimensional subspace instead of the entire available embedding space”, the study explained. While, in theory, the positive and negative pairs in the contrastive approach should allow the negative to repulse and prevent the effect of dimensional collapse, the research proved otherwise. In contrastive learning, all the embedding vectors fall into a lower-dimensional subspace instead of the entire available embedding space because of two main mechanisms: 

* strong augmentation along feature dimensions 

* implicit regularisation driving models toward low-rank solutions 

Lack of collapse in non-contrastive self-supervised techniques

On the contrary, to collapse in contrastive methods, FAIR identified non-contrastive methods to suffer from a lesser collapse problem of a different nature. The study cited alternative approaches used by researchers in papers like Grill et al. (2020) and Chen & He (2020), who used stop-gradient and extra predictor to prevent collapse without negative pairs and Caron et al. (2018; 2020), who used an additional clustering step in their process. Unlike contrastive methods and their high reliance on a large quantity of negative samples, non-contrastive methods do not directly rely on explicit negative samples. Instead, the dynamics of the alignment of eigenspaces between the predictor and its input correlation matrix play a key role in preventing complete collapse. 

What are non-contrastive self-supervised techniques?

The non-contrastive approach only relies on positive sample pairs. For instance, FAIR demonstrated this as the training data containing two versions of a cat picture, the original in colour and another in black and white. There is no inclusion of negative examples, like an unrelated photo of a mountain. While, in theory, this might be counterintuitive and the model trained on only positive samples is bound to collapse, FAIR found their ability to learn good representations regardless of the lack of negative examples. “We’ve found the training of a non-contrastive self-supervised learning framework converges to a useful local minimum but not the global trivial one. Our work attempts to show why this is,” the team stated. 

The non-contrastive approach uses an extra predictor and a stop-gradient operation. Two popular non-contrastive methods, BYOL and SimSiam, have proved the need for the predictor and stop-gradient in preventing a representational collapse in the model.  

Unlike contrastive, the non-contrastive approach is simpler, based on optimising a CNN to extract similar feature vectors for similar images. They learn representations by minimising the distance between two views of the same image. In the cat example, the algorithm would detect characteristics like eyeballs, fur, paws and whiskers to relate to the cat.

PS: The story was written using a keyboard.
Picture of Avi Gopani

Avi Gopani

Avi Gopani is a technology journalist that seeks to analyse industry trends and developments from an interdisciplinary perspective at Analytics India Magazine. Her articles chronicle cultural, political and social stories that are curated with a focus on the evolving technologies of artificial intelligence and data analytics.
Related Posts

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories

Featured

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

AIM Conference Calendar

Immerse yourself in AI and business conferences tailored to your role, designed to elevate your performance and empower you to accomplish your organization’s vital objectives. Revel in intimate events that encapsulate the heart and soul of the AI Industry.

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed