Tech Behind DINO, Facebook’s Open-Source ML Model For Computer Vision

Facebook AI launched a computer vision system called DINO to segment unlabeled and random images and videos without supervision. The open-source PyTorch framework implementation and pre-trained models for DINO is currently available on GitHub.

DINO stands for self DIstillation with NO labels. Facebook has developed the new model in collaboration with researchers at INRIA, Sorbonne University, MILA and McGill University. DINO has set a new state of the art among self-supervised methods.

Self-supervised vision transformers (ViT), a type of machine learning model, carry explicit information about the semantic segmentation of an image and perform better than supervised ViTs and convolutional neural network (CNNs).

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Unlike traditional supervised learning, DINO doesn’t require large volumes of annotated/labelled data. In DINO, high accurate segmentation can be solved with self-supervised learning and a suitable architecture. 

Interestingly, DINO focuses on the near object even in highly ambiguous situations. 

Source: Facebook (Showcasing the visual representation of the original image, followed by supervised model and unsupervised model (DINO)) 

“Our model can discover and segment objects in an image or a video with absolutely no supervision,” said researchers at Facebook AI, pointing at the visuals of the original video trained versus DINO (self-supervised vision transformers). 

Source: Facebook (Self-attention maps of neural network on videos of a puppy, a horse, a BMX rider, and a fishing boat, using DINO)

Facebook’s CTO Mike Schroepfer, while explaining the nuances of its new computer vision system, said the DINO self-supervised model is inspired by how young children learn the language, physics, and more without formal instruction.

More with less

Facebook AI claimed that it lets users train models using limited computing resources. Besides launching DINO, the company has also introduced PAWS, a new model-training approach that delivers accurate results with much less compute. The open-source PyTorch implementation of PAWS (predicting view assignments with support samples) is also available on GitHub.

“When pretraining a standard ResNet-50 model with PAWS using just 1 percent of the labels in ImageNet, we get state-of-the-art accuracy while doing 10x fewer pertaining steps,” claimed Facebook researchers. 

For instance, while training a student network, its self-supervised computer vision system matches the output of a teacher network over different views on the same image. 

Source: Facebook 

Similarly, in another example where the model was asked to recognise duplicate images, Facebook’s DINO outperformed existing models, even though it was not trained to solve that particular problem. DINO had the highest accuracy compared to ViT trained on ImageNet and MultiGrain, which were so far considered to have the highest accuracy for duplicate detection. 

Source: Facebook (The image showcases how DINO can recognise near-duplicate images taken from the Flickr dataset, where red and green outlined images indicate false and true positives)  

Additionally, Facebook’s new model discovers object parts and shared characteristics across the model and learns to categorise and structure images into groups based on physical properties like animal species or biological taxonomy.

Source: Facebook (Feature representation of unlabelled data)

Lately, Facebook has been bullish on open-source frameworks, tools, libraries and models for research and developers to deploy in large-scale production. 

Last month, Facebook launched an open-source machine learning library called Flashlight that lets researchers execute AI applications seamlessly using C++. Its latest machine learning library, written entirely in C++, is currently available on GitHub

A year before that, Facebook had launched an open-source graph transformer networks (GTN) framework for effectively training graph-based learning models.

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR