Listen to this story
XLA abbreviates for Accelerated Linear Algebra and is a compiler designed to accelerate the convergence of Tensorflow models quickly. So XLA is the compiler designed in a way to process and converge the TensorFlow models into a sequence of tasks and reduce memory consumption. In this article, let us focus on XLA and try to understand how it can be used as a compiler to accelerate Tensorflow models.
Table of Contents
- Introduction to XLA
- Why was XLA built?
- Working of XLA
- Case study of XLA
Introduction to XLA
Accelerate Linear Algebra (XLA) is the compiler designed to accelerate Tensorflow models to speed up the training process and reduce the overall memory consumption. Tensorflow operations are split into each unit, and each of the units will have precompiled GPU units for faster convergence. But the GPU units may not get activated on certain platforms with respect to accelerator constraints.
Are you looking for a complete repository of Python libraries used in data science, check out here.
Consider that we have designed a model to carry out some mathematical operations. The traditional Tensorflow principle will activate separate kernels for each of the operations, which causes a delay in the retrieval of the results. So this is where the XLA fuses the mathematical operations on a single kernel or a GPU kernel and speeds up the retrieval of results. The results retrieval and the memory consumption are reduced because operations are fused onto a single kernel which reduces the overall memory consumption. Reduced memory consumption speeds up model convergence, on the other hand.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
XLA also provides external compilation frameworks that can be used accordingly for various compilation tasks. Through this external compilation, the required parameters of the models can be compiled on priority or as per requirements.
Why was XLA built?
There are 4 main reasons that led to the development of XLA as a compiler that can be used on top of Tensorflow models. Let us look into the 4 main reasons that led to the development of the XLA compiler.
i) Improved execution speed is one of the top reasons that led to the development of the XLA compiler. The XLA compiler improves the execution speed by fusing tasks on a single GPU kernel which increases the execution speed. Fused operations increase results retrieval as the operations will be performed on single kernels.
ii) Reduced memory consumption is one of the major advantages of XLA as the computations get fused into single clusters and accelerated GPU kernels do not enforce heavy memory consumption and lead to the memory buffer.
iii) Reduced dependency on custom operations by replacing custom operations with simpler, lower levels of operations which facilitate faster execution and reduce dependencies.
iv) Easy portability as Tensorflow models compiled and executed using XLA is portable across various platforms and reduce decoding on other platforms.
Working of XLA
As XLA is one of the compilers designed to accelerate Tensorflow model compilation and execution, let us try to understand the XLA compiler in an easy way. The input to XLA are graphs of fused tasks and is termed as HLO according to XLA compiler terms. The HLO compiles the graphs into machine instructions for various architectures. XLA compiler is a package with various optimizations and analysis processes with certain specificity for the target.
On the whole, the compiler can be taught as an integrated cluster of the front end and the back end. The front-end component of the compiler will be responsible for the target independent optimizations and analysis and in the back-end component, the target-dependent optimizations and analysis are taken up.
Let us understand the XLA compiler better through a case study.
Case study of XLA
Let us understand the major advantages of using XLA through a case study. At first, we will build a simple deep learning model and evaluate the time taken by the model to fit for 50 epochs and later move into using XLA and evaluating the time taken by the same model architecture to converge and fit for the mentioned epochs.
import tensorflow as tf import matplotlib.pyplot as plt from tensorflow.keras.layers import Dense,MaxPooling2D,Conv2D,Flatten from tensorflow.keras.models import Sequential from tensorflow.keras import regularizers from tensorflow.keras.preprocessing.image import ImageDataGenerator,load_img train_path='/content/drive/MyDrive/Colab notebooks/Kernel Regularizers with NN/train' plt.figure(figsize=(15,5)) img=load_img(train_path + "/African/af_tr109.jpg") plt.imshow(img) plt.axis("off") plt.title("African Elephant Image") plt.show() plt.figure() img=load_img(train_path + "/Asian/as_tr114.jpg") plt.imshow(img) plt.axis("off") plt.title("Asian Elephant Image") plt.show()
Here we are using an elephant classification dataset where we will have to build a model to classify the elephants as African or Asian elephants. So now let us build a model for this data and fit the model for 50 epochs and evaluate the time taken by the model to fit the data.
img_row=150 img_col=150 model1=Sequential() model1.add(Conv2D(64,(5,5),activation='relu',input_shape=(img_row,img_col,3))) model1.add(MaxPooling2D(pool_size=(2,2))) model1.add(Conv2D(32,(5,5),activation='relu')) model1.add(MaxPooling2D(pool_size=(2,2))) model1.add(Conv2D(16,(5,5),activation='relu')) model1.add(MaxPooling2D(pool_size=(2,2))) model1.add(Flatten()) model1.add(Dense(126,activation='relu')) model1.add(Dense(52,activation='relu')) model1.add(Dense(1,activation='sigmoid')) model1.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy']) train_datagen=ImageDataGenerator(rescale=1./255,shear_range=0.2,zoom_range=0.2,horizontal_flip=True) test_datagen=ImageDataGenerator(rescale=1./255) train_set=train_datagen.flow_from_directory(train_path,target_size=(img_row,img_col), batch_size=64,class_mode='binary') test_set=test_datagen.flow_from_directory(test_path,target_size=(img_row,img_col), batch_size=64,class_mode='binary') model1_res=%time model1.fit_generator(train_set,steps_per_epoch=840//64,epochs=50,validation_data=test_set,validation_steps=188//64)
So here we can see that for the model with the mentioned layers the model has taken 16 minutes and 57 seconds to fit the data. Now let us fit the same model architecture for the same number of epochs using the XLA compiler in the same working environment. Let us evaluate the wall time for the model after fitting the model using the XLA compiler.
Before using the XLA Compiler in the working environment it is a good practice to clear any other active sessions in the background. Let us instantiate the XLA compiler in the working environment, as shown below.
tf.keras.backend.clear_session() ## to clear other sessions in the environment tf.config.optimizer.set_jit(True) ## enabling XLA
Now let us fit the same model architecture using XLA and observe the wall time taken by the compiler to fit the model with the same set of configurations used before.
So here we can see that after fitting the same model with the same set of configurations the wall time of the model has been reduced. The wall time has seen a significant reduction of 50% after using XLA compiler in the working environment. As we have seen that XLA reduces the clock time significantly for TensorFlow models, let us understand the meaning of wall time.
What is wall time?
Consider that you have been given a wristwatch or asked to monitor the time on the wall clock immediately after fitting the model. So wall time can be interpreted as the time taken by the model to fit and converge for the mentioned number of iterations. So wall time is used as a metric to estimate the time taken by the model to fit on the data.
The magnitude of wall time varies with respect to hardware specifications and platform specifications. The higher the wall time higher will be the time taken for computation and convergence and the lower the wall time lesser is the time taken for computation and convergence.
So XLA is one such compiler designed for Tensorflow models that aim to reduce the wall time and speed up the training process.
Heavy Tensorflow models generally take a longer time for training and computation as it splits up tasks into different kernels. This is where the XLA compiler finds its major advantage as it fuses multiple tasks into single accelerated kernels and speeds up the training process. XLA is a compiler that significantly reduces the wall time, and a reduction in wall time reduces training time greatly. XLA facilitates using its compiler across various platforms and this helps deep learning engineers and researchers accelerate the training time of huge Tensorflow models.