Facebook’s New Billion-Parameter Model Might Just Change Computer Vision Forever

Facebook SEER - Computer Vision

Just like the human brain, deep learning uses a neural network for object detection, speech recognition, translation, decision-making, and more. However, for deep learning — a subset of machine learning — to work optimally, a massive amount of data is required. Reducing the data-dependency of deep learning is one of the top priorities of AI researchers.

Facebook vice president Yann LeCun, considered one of the godfathers of deep learning, presented the blueprint for self-supervised learning at the AAAI conference in 2020. In a recent blog, LeCun wrote: “Practically speaking, it’s impossible to label everything in the world. There are also some tasks for which there’s simply not enough labeled data, such as training translation systems for low-resource languages. If AI systems can glean a deeper, more nuanced understanding of reality beyond what’s specified in the training data set, they’ll be more useful and ultimately bring AI closer to human-level intelligence”.

In self-supervised learning, systems don’t rely on labelled data sets to train and perform tasks. Instead, they learn directly from the information directly fed to them–text, images etc. This approach has already been used in NLP, where self-supervised pretraining of huge models has led to breakthroughs in machine translation, natural language inference, and question-answering.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Now, with SEER (SElf-supERvised), Facebook has co-opted this approach for computer vision. SEER is a billion-parameter self-supervision computer vision model that can learn from any group of images on the internet. These images needn’t be curated and labelled, which are otherwise a prerequisite for most computer vision training.

What Is SEER?

Self-supervised learning in NLP models uses trillions of parameters and heavy datasets for training. A large amount of data ensures a superior model.

In NLP, semantic concepts can be broken down into discrete words, but computer vision is a lot trickier. Matching the pixel to its corresponding concept is quite a task as many images need to be assessed to understand the variation around a single concept. 

To efficiently scale models to work with complex and high-dimensional image data, two components are needed:

  • An algorithm that learns from a large number of random images with metadata or annotations
  • A convolutional network that can capture and learn every visual concept from given data.

To overcome these challenges, the team at Facebook adopted SwAV, an algorithm that groups images associated with similar concepts. With SwAV, the researchers were able to surpass the state-of-the-art algorithm’s performance at six times less training time.

Further, to train the model at such a large scale, researchers used RegNet, a Convolutional Networks-based deep learning algorithm capable of scaling up to trillions of parameters.

Credit: Facebook

All-Purpose Library For SEER

Facebook also open-sourced an all-purpose library for self-supervised learning called VISSL (VIsion library for state-of-the-art Self-Supervised Learning). It is a PyTorch-based library that allows self-supervised learning at both small and large scale. VISSL contains a benchmark suite and a model zoo with over 60 pre-trained models for comparing modern self-supervised learning methods.

VISSL has the following features:

  • Mixed precision from the NVIDIA Apex library that reduces memory requirements.
  • PyTorch’s gradient checkpointing helps in training models on large batch sizes.
  • The shared optimiser from the FairScale library that reduces memory usage
  • Optimisations for online self-supervised learning.

Wrapping Up

Self-supervised learning eliminates the need for human annotations and metadata. Other advantages include:

  • It enables the computer vision community to work with larger and more diverse data sets
  • Learn from unlabelled random images
  • Mitigate biases that may creep in with data curation
  • In cases such as medical imaging where there are limited datasets available, SEER can help in specialising models.
  • It enables faster and more accurate responses to rapid innovations in the field of computer vision.
Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox