Contrastive vs non-contrastive self-supervised learning techniques

On the contrary, to collapse in contrastive methods, FAIR identified non-contrastive methods to suffer from a lesser collapse problem of a different nature.

Published on March 30, 2022

by Avi Gopani

Today, the quantity of generated data and the complexity of annotating it is increasing tremendously. To resolve the issue of annotation, self-supervised learning methods come into the picture. Self-supervised models can learn better from the raw data, making it one of the most important areas of AI research today. There are a few methods to train machines without annotated data. Chief Scientist at Meta, Yann LeCun, recently tweeted, sharing his preference for non-contrastive learning. Analytics India Magazine has analysed this long-standing debate: contrastive learning or non-contrastive learning?

Contrastive Self-Supervised Learning can fall victim to dimensional collapse.
One reason I prefer non-contrastive methods these days. https://t.co/RoGZVqRtqz
— Yann LeCun (@ylecun) March 28, 2022

Contrastive learning

Contrastive learning is a machine learning approach to finding similar and dissimilar information from a dataset for an algorithm. It is also a classification algorithm where the data is classified based on similarity and dissimilarity. Contrastive methods learn representations by contrasting positive and negative examples. Past research has proved a great empirical success in computer vision tasks using contrastive pre-training. For instance, Hénaff et al., 2019, evaluated contrastive methods trained on unlabelled ImageNet data on a linear classifier and found it to surpass the accuracy of supervised AlexNet. Similarly, He et al., 2019, found contrastive pre-training on ImageNet to effectively transfer to other downstream tasks and outperform the supervised pre-training counterparts.

The contrastive method learns representations by minimising the distance between two views of the same data point and maximising views from different data points. Essentially, it minimises the distance between positive data to a minimum and maximises the distance between negative data to a maximum.

For example, suppose the model has to differentiate between a cat and a dog. In that case, it will do so by recognising the similarities and differences between the animals by identifying data points as similar and different. The programmers can perform augmentation combinations in the training data to pose similar images presenting different versions of the same image. Later, these are fed into vector representations for each image, training the model to similar output representations for similar images so it can differentiate a cat from a dog. As illustrated in this post, it should recognise cats have pointy ears while dogs have droopy ears.

Contrastive learning in self-supervised vs supervised models/ GoogleAI

What is the dimensional collapse in contrastive learning

Google AI explained the positive and negative in contrastive learning, “These contrastive learning approaches typically teach a model to pull together the representations of a target image (a.k.a., the “anchor”) and a matching (“positive”) image in embedding space, while also pushing apart the anchor from many non-matching (“negative”) images.” Since labels are unavailable, the positive can be an augmentation of the anchor, and the negatives are chosen to be the other samples from the training minibatch. Given the random sampling, false negatives can cause a degradation in the representation quality. Facebook AI Research further noted the positive-negatives as the loss function of contrastive learning. “(It) is intuitively simple: minimise the distance in representation space between positive sample pairs while maximising the distance between negative sample pairs,” the team said.

In the paper worked on by Yann LeCun, contrastive learning can lead to dimensional collapse, “whereby the embedding vectors end up spanning a lower-dimensional subspace instead of the entire available embedding space”, the study explained. While, in theory, the positive and negative pairs in the contrastive approach should allow the negative to repulse and prevent the effect of dimensional collapse, the research proved otherwise. In contrastive learning, all the embedding vectors fall into a lower-dimensional subspace instead of the entire available embedding space because of two main mechanisms:

* strong augmentation along feature dimensions

* implicit regularisation driving models toward low-rank solutions

Lack of collapse in non-contrastive self-supervised techniques

On the contrary, to collapse in contrastive methods, FAIR identified non-contrastive methods to suffer from a lesser collapse problem of a different nature. The study cited alternative approaches used by researchers in papers like Grill et al. (2020) and Chen & He (2020), who used stop-gradient and extra predictor to prevent collapse without negative pairs and Caron et al. (2018; 2020), who used an additional clustering step in their process. Unlike contrastive methods and their high reliance on a large quantity of negative samples, non-contrastive methods do not directly rely on explicit negative samples. Instead, the dynamics of the alignment of eigenspaces between the predictor and its input correlation matrix play a key role in preventing complete collapse.

What are non-contrastive self-supervised techniques?

The non-contrastive approach only relies on positive sample pairs. For instance, FAIR demonstrated this as the training data containing two versions of a cat picture, the original in colour and another in black and white. There is no inclusion of negative examples, like an unrelated photo of a mountain. While, in theory, this might be counterintuitive and the model trained on only positive samples is bound to collapse, FAIR found their ability to learn good representations regardless of the lack of negative examples. “We’ve found the training of a non-contrastive self-supervised learning framework converges to a useful local minimum but not the global trivial one. Our work attempts to show why this is,” the team stated.

The non-contrastive approach uses an extra predictor and a stop-gradient operation. Two popular non-contrastive methods, BYOL and SimSiam, have proved the need for the predictor and stop-gradient in preventing a representational collapse in the model.

Unlike contrastive, the non-contrastive approach is simpler, based on optimising a CNN to extract similar feature vectors for similar images. They learn representations by minimising the distance between two views of the same image. In the cat example, the algorithm would detect characteristics like eyeballs, fur, paws and whiskers to relate to the cat.

PS: The story was written using a keyboard.

Access all our open Survey & Awards Nomination forms in one place

Avi Gopani

Avi Gopani is a technology journalist that seeks to analyse industry trends and developments from an interdisciplinary perspective at Analytics India Magazine. Her articles chronicle cultural, political and social stories that are curated with a focus on the evolving technologies of artificial intelligence and data analytics.

All you need to know about Graph Contrastive Learning

What is Contrastive Self-Supervised Learning?

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

India is Making its Own AI Servers

Pritam Bordoloi

PLI scheme marks the beginning of India ‘s manufacturing venture

GPT-5 Likely to be Released After the US Elections

Donna Eva

Generative AI Jobs in India can Fetch You up to Rs 1 Crore

Siddharth Jindal

Top Editorial Picks

Elon Musk Set to Meet Indian Spacetech Startups During Upcoming Visit

Shyam Nandan Upadhyay

Happiest Minds Technologies Acquires Macmillan Learning India, Expands Edutech Reach

Shritama Saha

Meta Releases Llama 3, Beats Claude 3 Sonnet and Gemini Pro 1.5

Mohit Pandey

Nothing Becomes the First Smartphone Company to Integrate OpenAI’s ChatGPT

Siddharth Jindal

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Featured

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Through the implementation of advanced data management methodologies, resilient data observability solutions, and cutting-edge AI frameworks, Course5 is spearheading the