Active Hackathon

Guide To Universal Sentence Encoder With TensorFlow

universal sentence encoder

Universal sentence encoder models encode textual data into high-dimensional vectors which can be used for various NLP tasks. It was introduced by Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope and Ray Kurzweil (researchers at Google Research)  in April 2018. (research paper)

The encoders used in such models require modelling the meaning of word sequences instead of individual words. Apart from single words, the models are trained and optimized for text having more-than-word lengths such as sentences, phrases or paragraphs. 


Sign up for your weekly dose of what's up in emerging technology.

Major variants of universal sentence encoder 

There are two main variations of the model encoders coded in TensorFlow – one of them uses transformer architecture while the other is a deep averaging network (DAN).

When fed with variable-length English text, these models output a fixed dimensional embedding representation of the input strings. They take lowercase PTB tokenized string as input and output sentence embedding as a 512-dimensional vector. 

  1. Transformer-based model

This variant builds sentence embeddings using the coding sub-graph of the transformer architecture. The sub-graph computes a context-aware representation of words in the input sentence. It considers identity and sequence of all other words too. The element-wise sum of that representation is computed at each word position and is converted into a fixed-length sentence encoding vector. 

  1. Deep Averaging Network (DAN)

In the variant employing DAN, input embeddings for words and bi-grams are averaged and fed to a feedforward DNN (Deep Neural Network) resulting in sentence embeddings. It is found that such DANs perform quite well on text classification tasks.

Comparison of the variants

Utilizing the output (sentence embeddings) of any of the variants for transfer learning gives better performance results than several baselines not using transfer learning or using word level transfer learning. 

However, there is a trade-off between the accuracy of the results obtained and the resources required for computation, when we compare both the variants. The transformer-based model aims to achieve high model accuracy, but it requires a high amount of computation resources and increases model complexity. The memory usage and computation time for this variant rise erratically with the length of the sentence. On the contrary, the computation time linearly increases with sentence length for the DAN-based model. In the research paper, the transformer model’s time complexity has been noted as O(n2)while that of DNA model as O(n), where ‘n’ denotes the sentence length. The DNA variant aims at efficient inference despite a little reduction in achieved accuracy. 

Universal sentence encoder family

Several versions of universal sentence encoder models can be found here. They differ from each other in terms of whether they are multilingual, which NLP task they are good at, which metric they prioritise (size, performance, etc.)

Practical implementation

Here’s a demonstration of using a DAN-based universal sentence encoder model for the sentence similarity task. The implementation has been coded in Google colab using Python version 3.7.10. Step-wise explanation of the code is as follows:

  1. Import required libraries
 from absl import logging
 import tensorflow as tf
 import tensorflow_hub as hub
 import matplotlib.pyplot as plt
 import numpy as np
 import os
 import pandas as pd
 import re    #module for regular expression operations
 import seaborn as sns 
  1. Load the TF Hub module of the universal sentence encoder
url = "" #@param ["", ""]

A drop-down list as shown below will allow you to switch between the URLs

model = hub.load(url) #Load the module from selected URL

  1. Define a function for computing sentence embedding of input string
 def embed(input):
   return model(input) 
  1. Illustrate how sentence embedding is computed for a word, sentence and paragraph
 word = "Anaconda"
 sen = "Tiger is India's national animal."  #sentence
 para = (             
     "Universal Sentence Encoder embeddings also support short paragraphs. "
     "There is no hard limit on how long the paragraph is. "
 msgs = [word, sen, para] 
  1.  Reduce logging output


set_verbosity() method sets the threshold for what messages will be logged.

  1. Embed the defined word, sentence and paragraph using the embed() method defined in step (3).

 message_emb = embed(msgs)

  1. Compute and print sentence embeddings
 for i, embedding in enumerate(np.array(message_emb).tolist()):
   print("Msg: {}".format(msgs[i]))     #print the message
    #print size of the embedding
   print("Embedding size: {}".format(len(embedding)) 
   #print the embedding representation
   msg_emb_snippet = ", ".join(       
       (str(x) for x in message_emb[:3]))
   print("Embedding: [{}, ...]\n".format(msg_emb_snippet)) 


  1. Define a function to find semantic text similarity between sentences
 def plot_similarity(labels, features, rotation):
#compute inner product of the encodings
   corr = np.inner(features, features) 
   g = sns.heatmap(  #plot heatmap 
       corr,  #computed inner product
       xticklabels=labels, #label the axes with input sentences
 #vmin and vmax are values to anchor the colormap
       cmap="YlOrRd") #matplotlib colormap name (here Yellow or Red)
   g.set_xticklabels(labels, rotation=rotation) 
   g.set_title("Semantic Textual Similarity") 
  1. Define a function to feed the message embeddings for plotting the heatmap
 def run_and_plot(msgs):
   message_embeddings_ = embed(msgs)
   plot_similarity(msgs, message_embeddings_, 90)
 #labels rotated by 90 degrees 
  1. Define the input sentences
 messages = [
     # Smartphones
     "I like my phone",
     "My phone is not good.",
     "Your cellphone looks great.",
     # Weather
     "Will it snow tomorrow?",
     "Recently a lot of hurricanes have hit the US",
     "Global warming is real",
     # Food and health
     "An apple a day, keeps the doctors away",
     "Eating strawberries is healthy",
     "Is paleo better than keto?",
     # Asking about age
     "How old are you?",
     "what is your age?",
  1. Pass the input messages to run_and_plot() defined in step (9) 




To get an in-depth understanding of universal sentence encoder, refer to the following sources:

More Great AIM Stories

Nikita Shiledarbaxi
A zealous learner aspiring to advance in the domain of AI/ML. Eager to grasp emerging techniques to get insights from data and hence explore realistic Data Science applications as well.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM