Facebook’s New Billion-Parameter Model Might Just Change Computer Vision Forever

Published on March 8, 2021

by Shraddha Goled

Just like the human brain, deep learning uses a neural network for object detection, speech recognition, translation, decision-making, and more. However, for deep learning — a subset of machine learning — to work optimally, a massive amount of data is required. Reducing the data-dependency of deep learning is one of the top priorities of AI researchers.

Facebook vice president Yann LeCun, considered one of the godfathers of deep learning, presented the blueprint for self-supervised learning at the AAAI conference in 2020. In a recent blog, LeCun wrote: “Practically speaking, it’s impossible to label everything in the world. There are also some tasks for which there’s simply not enough labeled data, such as training translation systems for low-resource languages. If AI systems can glean a deeper, more nuanced understanding of reality beyond what’s specified in the training data set, they’ll be more useful and ultimately bring AI closer to human-level intelligence”.

In self-supervised learning, systems don’t rely on labelled data sets to train and perform tasks. Instead, they learn directly from the information directly fed to them–text, images etc. This approach has already been used in NLP, where self-supervised pretraining of huge models has led to breakthroughs in machine translation, natural language inference, and question-answering.

Now, with SEER (SElf-supERvised), Facebook has co-opted this approach for computer vision. SEER is a billion-parameter self-supervision computer vision model that can learn from any group of images on the internet. These images needn’t be curated and labelled, which are otherwise a prerequisite for most computer vision training.

What Is SEER?

Self-supervised learning in NLP models uses trillions of parameters and heavy datasets for training. A large amount of data ensures a superior model.

In NLP, semantic concepts can be broken down into discrete words, but computer vision is a lot trickier. Matching the pixel to its corresponding concept is quite a task as many images need to be assessed to understand the variation around a single concept.

To efficiently scale models to work with complex and high-dimensional image data, two components are needed:

An algorithm that learns from a large number of random images with metadata or annotations
A convolutional network that can capture and learn every visual concept from given data.

To overcome these challenges, the team at Facebook adopted SwAV, an algorithm that groups images associated with similar concepts. With SwAV, the researchers were able to surpass the state-of-the-art algorithm’s performance at six times less training time.

Further, to train the model at such a large scale, researchers used RegNet, a Convolutional Networks-based deep learning algorithm capable of scaling up to trillions of parameters.

Credit: Facebook

All-Purpose Library For SEER

Facebook also open-sourced an all-purpose library for self-supervised learning called VISSL (VIsion library for state-of-the-art Self-Supervised Learning). It is a PyTorch-based library that allows self-supervised learning at both small and large scale. VISSL contains a benchmark suite and a model zoo with over 60 pre-trained models for comparing modern self-supervised learning methods.

VISSL has the following features:

Mixed precision from the NVIDIA Apex library that reduces memory requirements.
PyTorch’s gradient checkpointing helps in training models on large batch sizes.
The shared optimiser from the FairScale library that reduces memory usage
Optimisations for online self-supervised learning.

Wrapping Up

Self-supervised learning eliminates the need for human annotations and metadata. Other advantages include:

It enables the computer vision community to work with larger and more diverse data sets
Learn from unlabelled random images
Mitigate biases that may creep in with data curation
In cases such as medical imaging where there are limited datasets available, SEER can help in specialising models.
It enables faster and more accurate responses to rapid innovations in the field of computer vision.

PS: The story was written using a keyboard.

Access all our open Survey & Awards Nomination forms in one place

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

The Impact of Lok Sabha Election on India’s AI Progress

Vidyashree Srinivas

The BJP aims to safeguard citizen safety and privacy, leaning towards regulation, while the Congress views AI advancements as an opportunity to create jobs.