Build your first text-to-image searcher with TensorFlow Lite Model Maker

On-device machine learning uses a simplified version of cloud-based machine learning.

An on-device embedding based search package is been introduced by Tensorflow which could be run on android, ios and web applications. It runs with help of the Edge ML technique. This on-device package could help the user to search images, text or audio in just a snap of time. In this article, we would learn the implementation of on-device text-to-image search with TensorflowLite. Following are the topics to be covered.

Table of contents

  1. What is Edge ML?
  2. What is an on-device search?
  3. What is the TensorFlow Lite model maker?
  4. Building on-device text-to-image search model

Let’s start with understanding Edge ML.


Sign up for your weekly dose of what's up in emerging technology.

What is Edge ML?

The growth of IoT brought an explosion of Smart Devices linked to the Cloud, but the network was not yet prepared to handle this spike in demand. Cloud networks were overburdened, and businesses ignored critical Cloud computing problems such as security. Edge ML is the solution.

Edge ML is a technology that allows Smart Devices to analyse data locally (through local servers or at the device level) utilising machine and deep learning algorithms, decreasing dependency on Cloud networks. The word edge refers to deep learning and machine learning algorithms doing processing at the device or local level which are closest to the components gathering the data.

Edge devices continue to transmit data to the Cloud as necessary, but the ability to process certain data locally enables for screening of data sent to the Cloud while also allowing for real-time data processing and reaction.

Are you looking for a complete repository of Python libraries used in data science, check out here.

Usually, machine learning is performed on cloud services because it requires an intense amount of GPU and TPUs. So Tensorflow launched an on-device package which could use the device capabilities to perform the machine learning. These devices could be a mobile, laptop or embedded systems like raspberry pie, digital watches, etc. It could be said that these on-device machine learning models are simplified versions then compared to the complex cloud-based models. 

The machine learning model will use the local GPUs and CPUs to process the query of the user and search it locally. This means the model could be operated without being connected to the internet. In context to these benefits, it could be also stated that the user data would not be uploaded to any server hence data privacy.

For example, You must have seen or used such applications which use Augmented Reality (AR) filters which help users to visualize the product and understand the product in a better way. 

Let’s take another example of language translator applications like Google translator an on-device machine learning application it uses the local device to translate it could work even when the device is not connected to the internet.

What is the TensorFlow Lite model maker?

TensorFlow Lite makes use of TensorFlow models that have been compressed into a smaller, more efficient machine learning (ML) model format. TensorFlow Lite allows you to utilise pre-trained models, change existing models, or create your own TensorFlow models and then convert them to TensorFlow Lite format. When deploying a TensorFlow neural-network model for on-device ML applications, the process of adapting and converting the model to particular input data is streamlined.

TensorFlow Lite models can handle practically every task that a conventional TensorFlow model can do with a variety of input data types such as photos, video, audio, and text. 

Building on-device text-to-image search model

The whole process of building an on-device text-to-image search model could be divided into three parts which are listed below.

  1. Train an encoder model for image and text query encoding. The data which would be used is the COCO dataset.
  2. Create a Searcher model which can search images according to the text description. For this, the Model Maker Searcher API would be used.
  3. The images retrieved from the search query needed to be displayed, which will be done by the Task Library Searcher API.

Training the encoder

The encoder which will be used in this implementation is a dual encoder which could be trained on both images and text simultaneously. The image and text encoders may not produce embeddings with the same number of dimensions. They must be projected into the same embedding space. 

For embedding the text and image in the same dimensional projection, need to create functions using the TensorFlow’s ReLU (Rectified Linear Unit) which would return 0 if the input is negative, but if it is positive, it returns that value. After getting the value those values would be L2 normalized because it would be easier to retrieve them for training the dual encoder.

Create a Searcher model

This model will search the image according to the text description in the COCO dataset. The ScaNNOption model is been used for this task by the TensorFlow make model. This uses an embedding based search algorithm.

Embedding-based search is an excellent strategy for answering questions that rely on semantic understanding rather than simply indexable attributes. Machine learning models are trained in this approach to map queries and database objects to a shared vector embedding space so that the distance between embeddings contains semantic significance, i.e., comparable things are closer together.

Retrieving the images

The query is been computed and the resultant is stored in the variable. The resultant is the top five searches with the nearest neighbour distance from the search query. The image would be viewed on Flickr URL.

The training of the dual encoder and the search based model and would take time. So due to time constraints, this article would be using the pre-trained model on the COCO dataset. 

Let install the TFlite support package to unpack the model and TensorFlow text and audio packages also.

!pip install -q -U tflite-support
!pip install -q -U tensorflow-text==2.10.0b2
!sudo apt-get -qq install libportaudio2 

If using the google Colab notebook then install this dependency.

! pip install tf-estimator-nightly==2.8.0.dev2021122109

Importing the necessary libraries.

import tensorflow as tf
import tensorflow_hub as hub
import pandas as pd
import matplotlib.pyplot as plt
from tflite_support.task import text
from tflite_support.task import core

We are all set to unpack the ScaNN model.

options = text.TextSearcherOptions(
options.search_options.max_results = 5
tflite_searcher = text.TextSearcher.create_from_options(options)

The pre-trained model could be downloaded from here and stored the model in the current directory with the same name as used above. The resultant options are been limited to 5 it could be as per the user’s wish.

Let’s see the raw output of the model.'A dog sitting on chair')
Analytics India Magazine

As could be observed from the above image that the nearest distance is been calculated by the model in the embedded space. The closest ones are been given as the output using the embedding based search.

Now the metadata needs to be extracted from the resultants and a Flickr URL would be generated.

Let’s create a function to extract and display the image from the metadata.

def text_to_image_searcher(query_str, show_images=False):
  neighbors =
  for i, neighbor in enumerate(neighbors.nearest_neighbors):
    metadata = neighbor.metadata.decode('utf-8').split('_')
    flickr_id = metadata[0]
    print('Flickr url for %d:' %
          (i + 1, flickr_id))
  if show_images:
    plt.figure(figsize=(20, 13))
    for i, neighbor in enumerate(neighbors.nearest_neighbors):
      ax = plt.subplot(2, 3, i + 1)

      ax.set_title('%d: Similarity: %.05f' % (i + 1, -neighbor.distance))
      metadata = neighbor.metadata.decode('utf-8').split('_')
      image_path = '_'.join(metadata[1:])
      image = tf.image.decode_jpeg(
, channels=3) / 255

Let’s use the function and generate the URLs.

text_to_image_searcher('A dog sitting on chair')
Analytics India Magazine
Similarity = 0.6987
Similarity = 0.6910
Similarity = 0.6848
Similarity = 0.6737
Similarity = 0.6693

So the model performed pretty well in searching these images from the COCO dataset. and the execution time was less than 0.2 secs. 


On-device machine learning uses a simplified version of cloud-based machine learning. It uses the local device to perform the necessary action which reduces the latency, increase the user’s data privacy and the inferences could be run in a matter of milliseconds. With this hands-on article, we could understand the about on-device ML and implement it to build a text-to-image search model with the TensorFlowLite.


More Great AIM Stories

Sourabh Mehta
Sourabh has worked as a full-time data scientist for an ISP organisation, experienced in analysing patterns and their implementation in product development. He has a keen interest in developing solutions for real-time problems with the help of data both in this universe and metaverse.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM