TensorFlow Launches A New Library To Train Similarity Models

TensorFlow Launches A New Library To Train Similarity Models

TensorFlow recently released the first version of TensorFlow Similarity, a library for similarity learning, also known as metric learning and contrastive learning. It offers SOTA algorithms for metric learning and all the necessary components to research, train, evaluate, and serve similarity-based models. 

Check out the source code for TensorFlow Similarity here

The ability to search for related objects has many real-world applications, from finding similar-looking clothes to identifying the genre of songs currently playing, helping rescue missing pets, etc. In addition, searching for related items quickly is a vital part of many core information systems, including recommendation systems, multimedia searches and clustering pipelines. 


Sign up for your weekly dose of what's up in emerging technology.

Thanks to TensorFlow Similarity, you can now train and serve models that find similar items (such as images) in a large corpus of samples. For example, as shown below, you can train a similarity model to find and cluster similar looking images of dogs and cats from the Oxford IIT Pet Dataset by only training on a few classes. 

Check out this notebook to train your own similarity model. 

Download our Mobile App

TensorFlow Launches A New Library To Train Similarity Models
Examples of nearest neighbour searches performed on the embeddings generated by a similarity model trained on the Oxford IIT Pet Dataset. (Source: TensorFlow)

How is TensorFlow Similarity different? 

If we consider metric learning, it is different from traditional classification, as its objective is different. The model often learns to minimise the distance between similar examples and then maximises the distance between dissimilar examples in a supervised or self-supervised manner. On the other hand, TensorFlow Similarity provides necessary losses, metrics, samplers, visualises, and indexing sub-systems to make this quick and easy. 

Similarity models learn to produce embeddings that project items in a metric space, where similar objects are close together and far from dissimilar objects. 

TensorFlow Launches A New Library To Train Similarity Models
(Source: TensorFlow

In the backdrop, many of these systems are powered by deep learning models trained using contrastive learning. It teaches the model to learn an embedding space in which similar objects are close while dissimilar ones are far apart. For example, the images belonging to the same class/genre are pulled together, while distinct classes are pushed apart from each other. (as shown in the image below) 

TensorFlow Launches A New Library To Train Similarity Models
Examples of the images belonging to the same animal breed are pulled together while different breeds are pushed apart. (Source: TensorFlow

Here’s how it works

When applying TensorFlow Similarity to an entire dataset, contrastive losses allow the model to learn how to project items into the embedding space. The distances between embeddings represent how similar the input examples are. In the end, you will have a well-clustered space where the distance between dissimilar items is large and the distance between similar items is small.

Once the model is trained, the next step involves building an index that contains the embeddings of the various items you want to make searchable. At query time, TensorFlow Similarity uses fast approximate nearest neighbour search (ANN) to retrieve the closest matching objects from the index in sub-linear time. 

TensorFlow Similarity learns a ‘metric embedding space’ where the distance between ’embedded points’ is a function of a valid distance metric. Moreover, these distance metrics satisfy the triangle inequality, making the space amenable to ANN search and leading to high retrieval accuracy. 

Also, other approaches like using model feature extraction require an exact nearest neighbour search to find related objects and may not be as accurate as a trained similarity model. That prevents scaling, as performing an exact search requires a quadratic time in the size of the search index. In comparison, TensorFlow Similarity’s built-in ANN indexing system, which relies on the non-metric space library (NMSLIB), makes it possible to search millions of indexed items, retrieving the top-K similar matches faster. 

TensorFlow Launches A New Library To Train Similarity Models
(Source: TensorFlow

Wrapping up 

Besides accuracy and retrieval speed, the other major benefits of similarity models are that they allow you to add an infinite new number of classes to the index without having to retrain. Alternatively, you only need to compute the embeddings for representative items of the new classes and add them to the index. 

It is particularly useful when tackling problems where different items are unknown, constantly changing, or extremely large – like enabling users to discover newly released music similar to songs they have liked in the past. 

Currently, TensorFlow Similarity is still in the beta stage. It supports supervised training. However, in the coming months, it plans to support semi-supervised and self-supervised learning techniques like BYOL, SWAV and SimCLR

Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Amit Raja Naik
Amit Raja Naik is a seasoned technology journalist who covers everything from data science to machine learning and artificial intelligence for Analytics India Magazine, where he examines the trends, challenges, ideas, and transformations across the industry.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges