MITB Banner

Tech Behind DINO, Facebook’s Open-Source ML Model For Computer Vision

Share

Facebook AI launched a computer vision system called DINO to segment unlabeled and random images and videos without supervision. The open-source PyTorch framework implementation and pre-trained models for DINO is currently available on GitHub.

DINO stands for self DIstillation with NO labels. Facebook has developed the new model in collaboration with researchers at INRIA, Sorbonne University, MILA and McGill University. DINO has set a new state of the art among self-supervised methods.

Self-supervised vision transformers (ViT), a type of machine learning model, carry explicit information about the semantic segmentation of an image and perform better than supervised ViTs and convolutional neural network (CNNs).

Unlike traditional supervised learning, DINO doesn’t require large volumes of annotated/labelled data. In DINO, high accurate segmentation can be solved with self-supervised learning and a suitable architecture. 

Interestingly, DINO focuses on the near object even in highly ambiguous situations. 

Source: Facebook (Showcasing the visual representation of the original image, followed by supervised model and unsupervised model (DINO)) 

“Our model can discover and segment objects in an image or a video with absolutely no supervision,” said researchers at Facebook AI, pointing at the visuals of the original video trained versus DINO (self-supervised vision transformers). 

Source: Facebook (Self-attention maps of neural network on videos of a puppy, a horse, a BMX rider, and a fishing boat, using DINO)

Facebook’s CTO Mike Schroepfer, while explaining the nuances of its new computer vision system, said the DINO self-supervised model is inspired by how young children learn the language, physics, and more without formal instruction.

More with less

Facebook AI claimed that it lets users train models using limited computing resources. Besides launching DINO, the company has also introduced PAWS, a new model-training approach that delivers accurate results with much less compute. The open-source PyTorch implementation of PAWS (predicting view assignments with support samples) is also available on GitHub.

“When pretraining a standard ResNet-50 model with PAWS using just 1 percent of the labels in ImageNet, we get state-of-the-art accuracy while doing 10x fewer pertaining steps,” claimed Facebook researchers. 

For instance, while training a student network, its self-supervised computer vision system matches the output of a teacher network over different views on the same image. 

Source: Facebook 

Similarly, in another example where the model was asked to recognise duplicate images, Facebook’s DINO outperformed existing models, even though it was not trained to solve that particular problem. DINO had the highest accuracy compared to ViT trained on ImageNet and MultiGrain, which were so far considered to have the highest accuracy for duplicate detection. 

Source: Facebook (The image showcases how DINO can recognise near-duplicate images taken from the Flickr dataset, where red and green outlined images indicate false and true positives)  

Additionally, Facebook’s new model discovers object parts and shared characteristics across the model and learns to categorise and structure images into groups based on physical properties like animal species or biological taxonomy.

Source: Facebook (Feature representation of unlabelled data)

Lately, Facebook has been bullish on open-source frameworks, tools, libraries and models for research and developers to deploy in large-scale production. 

Last month, Facebook launched an open-source machine learning library called Flashlight that lets researchers execute AI applications seamlessly using C++. Its latest machine learning library, written entirely in C++, is currently available on GitHub

A year before that, Facebook had launched an open-source graph transformer networks (GTN) framework for effectively training graph-based learning models.

PS: The story was written using a keyboard.
Share
Picture of Amit Raja Naik

Amit Raja Naik

Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India