Tech Behind DINO, Facebook’s Open-Source ML Model For Computer Vision

Facebook AI launched a computer vision system called DINO to segment unlabeled and random images and videos without supervision. The open-source PyTorch framework implementation and pre-trained models for DINO is currently available on GitHub.

DINO stands for self DIstillation with NO labels. Facebook has developed the new model in collaboration with researchers at INRIA, Sorbonne University, MILA and McGill University. DINO has set a new state of the art among self-supervised methods.

Self-supervised vision transformers (ViT), a type of machine learning model, carry explicit information about the semantic segmentation of an image and perform better than supervised ViTs and convolutional neural network (CNNs).


Sign up for your weekly dose of what's up in emerging technology.

Unlike traditional supervised learning, DINO doesn’t require large volumes of annotated/labelled data. In DINO, high accurate segmentation can be solved with self-supervised learning and a suitable architecture. 

Interestingly, DINO focuses on the near object even in highly ambiguous situations. 

Source: Facebook (Showcasing the visual representation of the original image, followed by supervised model and unsupervised model (DINO)) 

“Our model can discover and segment objects in an image or a video with absolutely no supervision,” said researchers at Facebook AI, pointing at the visuals of the original video trained versus DINO (self-supervised vision transformers). 

Source: Facebook (Self-attention maps of neural network on videos of a puppy, a horse, a BMX rider, and a fishing boat, using DINO)

Facebook’s CTO Mike Schroepfer, while explaining the nuances of its new computer vision system, said the DINO self-supervised model is inspired by how young children learn the language, physics, and more without formal instruction.

More with less

Facebook AI claimed that it lets users train models using limited computing resources. Besides launching DINO, the company has also introduced PAWS, a new model-training approach that delivers accurate results with much less compute. The open-source PyTorch implementation of PAWS (predicting view assignments with support samples) is also available on GitHub.

“When pretraining a standard ResNet-50 model with PAWS using just 1 percent of the labels in ImageNet, we get state-of-the-art accuracy while doing 10x fewer pertaining steps,” claimed Facebook researchers. 

For instance, while training a student network, its self-supervised computer vision system matches the output of a teacher network over different views on the same image. 

Source: Facebook 

Similarly, in another example where the model was asked to recognise duplicate images, Facebook’s DINO outperformed existing models, even though it was not trained to solve that particular problem. DINO had the highest accuracy compared to ViT trained on ImageNet and MultiGrain, which were so far considered to have the highest accuracy for duplicate detection. 

Source: Facebook (The image showcases how DINO can recognise near-duplicate images taken from the Flickr dataset, where red and green outlined images indicate false and true positives)  

Additionally, Facebook’s new model discovers object parts and shared characteristics across the model and learns to categorise and structure images into groups based on physical properties like animal species or biological taxonomy.

Source: Facebook (Feature representation of unlabelled data)

Lately, Facebook has been bullish on open-source frameworks, tools, libraries and models for research and developers to deploy in large-scale production. 

Last month, Facebook launched an open-source machine learning library called Flashlight that lets researchers execute AI applications seamlessly using C++. Its latest machine learning library, written entirely in C++, is currently available on GitHub

A year before that, Facebook had launched an open-source graph transformer networks (GTN) framework for effectively training graph-based learning models.

More Great AIM Stories

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

Conference, in-person (Bangalore)
Cypher 2023
20-22nd Sep, 2023

3 Ways to Join our Community

Whatsapp group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our newsletter

Get the latest updates from AIM