MITB Banner

Facebook’s New Billion-Parameter Model Might Just Change Computer Vision Forever

Share
Facebook SEER - Computer Vision

Just like the human brain, deep learning uses a neural network for object detection, speech recognition, translation, decision-making, and more. However, for deep learning — a subset of machine learning — to work optimally, a massive amount of data is required. Reducing the data-dependency of deep learning is one of the top priorities of AI researchers.

Facebook vice president Yann LeCun, considered one of the godfathers of deep learning, presented the blueprint for self-supervised learning at the AAAI conference in 2020. In a recent blog, LeCun wrote: “Practically speaking, it’s impossible to label everything in the world. There are also some tasks for which there’s simply not enough labeled data, such as training translation systems for low-resource languages. If AI systems can glean a deeper, more nuanced understanding of reality beyond what’s specified in the training data set, they’ll be more useful and ultimately bring AI closer to human-level intelligence”.

In self-supervised learning, systems don’t rely on labelled data sets to train and perform tasks. Instead, they learn directly from the information directly fed to them–text, images etc. This approach has already been used in NLP, where self-supervised pretraining of huge models has led to breakthroughs in machine translation, natural language inference, and question-answering.

Now, with SEER (SElf-supERvised), Facebook has co-opted this approach for computer vision. SEER is a billion-parameter self-supervision computer vision model that can learn from any group of images on the internet. These images needn’t be curated and labelled, which are otherwise a prerequisite for most computer vision training.

What Is SEER?

Self-supervised learning in NLP models uses trillions of parameters and heavy datasets for training. A large amount of data ensures a superior model.

In NLP, semantic concepts can be broken down into discrete words, but computer vision is a lot trickier. Matching the pixel to its corresponding concept is quite a task as many images need to be assessed to understand the variation around a single concept. 

To efficiently scale models to work with complex and high-dimensional image data, two components are needed:

  • An algorithm that learns from a large number of random images with metadata or annotations
  • A convolutional network that can capture and learn every visual concept from given data.

To overcome these challenges, the team at Facebook adopted SwAV, an algorithm that groups images associated with similar concepts. With SwAV, the researchers were able to surpass the state-of-the-art algorithm’s performance at six times less training time.

Further, to train the model at such a large scale, researchers used RegNet, a Convolutional Networks-based deep learning algorithm capable of scaling up to trillions of parameters.

Credit: Facebook

All-Purpose Library For SEER

Facebook also open-sourced an all-purpose library for self-supervised learning called VISSL (VIsion library for state-of-the-art Self-Supervised Learning). It is a PyTorch-based library that allows self-supervised learning at both small and large scale. VISSL contains a benchmark suite and a model zoo with over 60 pre-trained models for comparing modern self-supervised learning methods.

VISSL has the following features:

  • Mixed precision from the NVIDIA Apex library that reduces memory requirements.
  • PyTorch’s gradient checkpointing helps in training models on large batch sizes.
  • The shared optimiser from the FairScale library that reduces memory usage
  • Optimisations for online self-supervised learning.

Wrapping Up

Self-supervised learning eliminates the need for human annotations and metadata. Other advantages include:

  • It enables the computer vision community to work with larger and more diverse data sets
  • Learn from unlabelled random images
  • Mitigate biases that may creep in with data curation
  • In cases such as medical imaging where there are limited datasets available, SEER can help in specialising models.
  • It enables faster and more accurate responses to rapid innovations in the field of computer vision.
PS: The story was written using a keyboard.
Share
Picture of Shraddha Goled

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India