TensorFlow adds a new library for on-device text-to-image search

It works by using a model to embed the search query into a high-dimensional vector representing the semantic meaning of the query.
Listen to this story

TensorFlow has announced a new on-device embedding-based search library feature that allows one to quickly find similar images, text or audio from millions of data samples in a few milliseconds. 

It works by using a model to embed the search query into a high-dimensional vector representing the semantic meaning of the query. Then it uses ScaNN (Scalable Nearest Neighbors) to search for similar items from a predefined database. In order to apply it to your dataset, you need to use Model Maker Searcher API (tutorial) to build a custom TFLite Searcher model, and then deploy it onto devices using Task Library Searcher API (vision/text)


Sign up for your weekly dose of what's up in emerging technology.

How to build this feature?

Given below is a walkthrough of an end-to-end example of building a text-to-image search feature (retrieve the images given textual queries) using the new TensorFlow Lite Searcher Library. Here are the major steps:

  1. Train a dual encoder model for image and text query encoding using the COCO dataset.
  2. Create a text-to-image Searcher model using the Model Maker Searcher API.
  3. Retrieve images with text queries using the Task Library Searcher API.
  1. Training a Dual Encoder Model

The dual encoder model consists of an image encoder and a text encoder. The two encoders map the images and text, respectively, to embeddings in a high-dimensional space. The model computes the dot product between the image and text embeddings, and the loss encourages relevant image and text to have larger dot product (closer), and unrelated ones to have smaller dot product (farther apart).

Source: tensorflow.org

The training procedure is inspired by the CLIP paper and this Keras example. The image encoder is based on a pre-trained EfficientNet model and the text encoder is based on a pre-trained Universal Sentence Encoder model. The outputs from both encoders are then projected to a 128 dimensional space and are L2 normalized. For the dataset, we chose to use COCO, as its train and validation splits have human generated captions for each image. Please take a look at the companion Colab notebook for the details of the training process. 

The dual encoder model makes it possible to retrieve images from a database without captions because once trained, the image embedder can directly extract the semantic meaning from the image without any need for human-generated captions.

  1. Creating the text-to-image Searcher model using Model Maker

Once the dual encoder model is trained, we can use it to create the TFLite Searcher model that searches for the most relevant images from an image dataset based on the text queries. This can be done by the following three steps:

Source: tensorflow.org

  • Generate the embeddings of the image dataset using the TensorFlow image encoder. ScaNN is capable of searching through a very large dataset, hence we combined the train and validation splits of COCO 2014 totalling 123K+ images to demonstrate its capabilities. See the code here.
  • Convert the TensorFlow text encoder model into TFLite format. See the code here.
  • Use ModelMaker to create the TFLite Searcher model from the TFLite text encoder and the image embeddings.
  1. Running inference using Task Library

Try the code from the Colab. Also, see more information on how to integrate the model using the Task Library Java and C++ API, especially on Android. Each query in general takes only 6 milliseconds on Pixel 6.
For more details, click here

More Great AIM Stories

Kartik Wali
A writer by passion, Kartik strives to get a deep understanding of AI, Data analytics and its implementation on all walks of life. As a Senior Technology Journalist, Kartik looks forward to writing about the latest technological trends that transform the way of life!

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.

Now Reliance wants to conquer the AI space

Many believe that Reliance is aggressively scouting for AI and NLP companies in the digital space in a bid to create an Indian equivalent of FAANG – Facebook, Apple, Amazon, Netflix, and Google.