How to accelerate TensorFlow models with the XLA compiler?

XLA is a compiler used to accelerate training time of Tensorflow models and reduce memory consumption.
Listen to this story

XLA abbreviates for Accelerated Linear Algebra and is a compiler designed to accelerate the convergence of Tensorflow models quickly. So XLA is the compiler designed in a way to process and converge the TensorFlow models into a sequence of tasks and reduce memory consumption. In this article, let us focus on XLA and try to understand how it can be used as a compiler to accelerate Tensorflow models.

Table of Contents

  1. Introduction to XLA
  2. Why was XLA built?
  3. Working of XLA
  4. Case study of XLA
  5. Summary

Introduction to XLA

Accelerate Linear Algebra (XLA) is the compiler designed to accelerate Tensorflow models to speed up the training process and reduce the overall memory consumption. Tensorflow operations are split into each unit, and each of the units will have precompiled GPU units for faster convergence. But the GPU units may not get activated on certain platforms with respect to accelerator constraints. 

Are you looking for a complete repository of Python libraries used in data science, check out here.

Consider that we have designed a model to carry out some mathematical operations. The traditional Tensorflow principle will activate separate kernels for each of the operations, which causes a delay in the retrieval of the results. So this is where the XLA fuses the mathematical operations on a single kernel or a GPU kernel and speeds up the retrieval of results. The results retrieval and the memory consumption are reduced because operations are fused onto a single kernel which reduces the overall memory consumption. Reduced memory consumption speeds up model convergence, on the other hand.

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

XLA also provides external compilation frameworks that can be used accordingly for various compilation tasks. Through this external compilation, the required parameters of the models can be compiled on priority or as per requirements. 

Why was XLA built?

There are 4 main reasons that led to the development of XLA as a compiler that can be used on top of Tensorflow models. Let us look into the 4 main reasons that led to the development of the XLA compiler.

i) Improved execution speed is one of the top reasons that led to the development of the XLA compiler. The XLA compiler improves the execution speed by fusing tasks on a single GPU kernel which increases the execution speed. Fused operations increase results retrieval as the operations will be performed on single kernels.

ii) Reduced memory consumption is one of the major advantages of XLA as the computations get fused into single clusters and accelerated GPU kernels do not enforce heavy memory consumption and lead to the memory buffer.

iii) Reduced dependency on custom operations by replacing custom operations with simpler, lower levels of operations which facilitate faster execution and reduce dependencies.

iv) Easy portability as Tensorflow models compiled and executed using XLA is portable across various platforms and reduce decoding on other platforms.

Working of XLA

As XLA is one of the compilers designed to accelerate Tensorflow model compilation and execution, let us try to understand the XLA compiler in an easy way. The input to XLA are graphs of fused tasks and is termed as HLO according to XLA compiler terms. The HLO compiles the graphs into machine instructions for various architectures. XLA compiler is a package with various optimizations and analysis processes with certain specificity for the target. 

On the whole, the compiler can be taught as an integrated cluster of the front end and the back end. The front-end component of the compiler will be responsible for the target independent optimizations and analysis and in the back-end component, the target-dependent optimizations and analysis are taken up.

Let us understand the XLA compiler better through a case study.

Case study of XLA

Let us understand the major advantages of using XLA through a case study. At first, we will build a simple deep learning model and evaluate the time taken by the model to fit for 50 epochs and later move into using XLA and evaluating the time taken by the same model architecture to converge and fit for the mentioned epochs.

import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras.layers import Dense,MaxPooling2D,Conv2D,Flatten
from tensorflow.keras.models import Sequential
from tensorflow.keras import regularizers
from tensorflow.keras.preprocessing.image import ImageDataGenerator,load_img
train_path='/content/drive/MyDrive/Colab notebooks/Kernel Regularizers with NN/train'
img=load_img(train_path + "/African/af_tr109.jpg")
plt.title("African Elephant Image")
img=load_img(train_path + "/Asian/as_tr114.jpg")
plt.title("Asian Elephant  Image")

Here we are using an elephant classification dataset where we will have to build a model to classify the elephants as African or Asian elephants. So now let us build a model for this data and fit the model for 50 epochs and evaluate the time taken by the model to fit the data.

model1_res=%time model1.fit_generator(train_set,steps_per_epoch=840//64,epochs=50,validation_data=test_set,validation_steps=188//64)

So here we can see that for the model with the mentioned layers the model has taken 16 minutes and 57 seconds to fit the data. Now let us fit the same model architecture for the same number of epochs using the XLA compiler in the same working environment. Let us evaluate the wall time for the model after fitting the model using the XLA compiler.

Before using the XLA Compiler in the working environment it is a good practice to clear any other active sessions in the background. Let us instantiate the XLA compiler in the working environment, as shown below.

tf.keras.backend.clear_session() ## to clear other sessions in the environment
tf.config.optimizer.set_jit(True) ## enabling XLA

Now let us fit the same model architecture using XLA and observe the wall time taken by the compiler to fit the model with the same set of configurations used before.

model2_res=%time model2.fit_generator(train_set,steps_per_epoch=840//64,epochs=50,validation_data=test_set,validation_steps=188//64)

So here we can see that after fitting the same model with the same set of configurations the wall time of the model has been reduced. The wall time has seen a significant reduction of 50% after using XLA compiler in the working environment.  As we have seen that XLA reduces the clock time significantly for TensorFlow models, let us understand the meaning of wall time.

What is wall time?

Consider that you have been given a wristwatch or asked to monitor the time on the wall clock immediately after fitting the model. So wall time can be interpreted as the time taken by the model to fit and converge for the mentioned number of iterations. So wall time is used as a metric to estimate the time taken by the model to fit on the data. 

The magnitude of wall time varies with respect to hardware specifications and platform specifications. The higher the wall time higher will be the time taken for computation and convergence and the lower the wall time lesser is the time taken for computation and convergence.

So XLA is one such compiler designed for Tensorflow models that aim to reduce the wall time and speed up the training process.


Heavy Tensorflow models generally take a longer time for training and computation as it splits up tasks into different kernels. This is where the XLA compiler finds its major advantage as it fuses multiple tasks into single accelerated kernels and speeds up the training process. XLA is a compiler that significantly reduces the wall time, and a reduction in wall time reduces training time greatly. XLA facilitates using its compiler across various platforms and this helps deep learning engineers and researchers accelerate the training time of huge Tensorflow models.


Darshan M
Darshan is a Master's degree holder in Data Science and Machine Learning and an everyday learner of the latest trends in Data Science and Machine Learning. He is always interested to learn new things with keen interest and implementing the same and curating rich content for Data Science, Machine Learning,NLP and AI

Download our Mobile App


AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry


Strengthen Critical AI Skills with Trusted Corporate AI Training

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox