How TensorFlow Probability is used in Neural Networks?

There are many cases where we get the requirements of probabilistic models and techniques in neural networks. These requirements can be filled up by adding probability layers to the network that are provided by TensorFlow.

TensorFlow is one of the leading contributors of many different models, layers, and modelling techniques that make the building process of neural networks very easy and efficient. There are many cases where we get the requirements of probabilistic models and techniques in neural networks. These requirements can be filled up by adding probability layers to the network that are provided by TensorFlow. In this article, we are going to discuss TensorFlow probability layers along with how we can use them in any neural network. The major points to be discussed in this article are listed below.

Table of Contents

  1. What are TensorFlow Probability (TFP) Layers?
    1. Installation of TensorFlow Probability
  2. What are Variational Autoencoders?
  3. Implementing the TensorFlow Probability

Let’s begin with understanding the TensorFlow probability layers.

What are TensorFlow Probability(TFP) Layers?

As discussed in the introduction, TensorFlow provides various layers for building neural networks. Similarly, the TensorFlow probability is a library provided by the TensorFlow that helps in probabilistic reasoning and statistical analysis in the neural networks or out of the neural networks. This means that this library makes us capable of performing probabilistic reasoning and statistical analysis either for deep learning models or for other machine learning models.

Instead of this, we can also use this library for using low-level building blocks like distributions and bijectors and higher-level constructs like Markov Chain Monte Carlo, Probabilistic Layers, Structural Time Series, Generalized Linear Models, etc.  When we talk about the distribution this library has various distributions, which are listed below.

And when we talk about the bijectors, we can get the following bijectors in the library:

Here we can see that we have a huge amount of distributions and bijectors in the library which can be used for the integration of probabilistic methods with neural networks. Instead of these, we have the following distributions for model building.

Also, we have various probabilistic layers and inferences such as Markov Chain Monte Carlo, Variational inferences, and optimizers in the library.

Installation of TensorFlow Probability  

We can install the library using the following lines of codes.

!python -m pip install –upgrade –user pip

!python -m pip install –upgrade –user tensorflow tensorflow_probability

After installation, we are ready to use the library for approaching probabilistic methods with neural networks. We can start with the variational autoencoders, which can be used in different tasks like collaborative filtering, image compression and also for reinforcement learning. 

Variational Autoencoders

As we have discussed the domains where we can use the VAE models, we can also use them to generate the data. Here we will try to generate digits, as in MNIST data. This generation can be done by following two steps:

  • Sampling some latent representation from many distributions.
  • Based on the sample, we can draw the actual representation. 

In digit creation, we can imagine variations similar to the class identity of the digits in the MNIST dataset. Here in the dataset, we can find the variation in the digits due to noise in the signal. Using the VAE model, we will try to separate these noises from the signal.

To make this objective applicable, we can maximize the evidence lower bound(ELBO):

By the above formula, we can say that the ELBO is lower bound on log p(x) that is a log probability of a data point that is already observed. The integral in the first place is a reconstruction term and the second integral term is Kullback–Leibler divergence term. It represents a measure of closeness of encoder and prior. This measure can be considered as a process of making the encoder network honest. Let’s start with the implementation of the process, which will make a clear picture of the process in our mind.

Let’s start with importing the libraries.

import numpy as np
import tensorflow.compat.v2 as tf
import tensorflow_datasets as tfds
import tensorflow_probability as tfp
tfk = tf.keras
tfkl = tf.keras.layers
tfpl = tfp.layers
tfd = tfp.distributions

To make the process faster, we can use GPU. Since I am pursuing these codes in the Google Colab, we can start GPU from the runtime panel. We are required to follow the below process.

"Runtime" -> "Change runtime type" -> "Hardware accelerator" -> "GPU".

Now we can import the dataset as: 

datasets, datasets_info = tfds.load(name='mnist',
def _preprocess(sample):
  image = tf.cast(sample['image'], tf.float32) / 255. 
  image = image < tf.random.uniform(tf.shape(image))  
  return image, image


Building the Model

Let’s specify the model.

input_shape = datasets_info.features['image'].shape
encoded_size = 16
base_depth = 32

We can use the isotropic Gaussian prior for the VAE model.

Defining the prior as,

prior = tfd.Independent(tfd.Normal(loc=tf.zeros(encoded_size), scale=1),reinterpreted_batch_ndims=1)

Now we can make a model.


First, we are making an encoder network as,

encoder = tfk.Sequential([
    tfkl.Lambda(lambda x: tf.cast(x, tf.float32) - 0.5),
    tfkl.Conv2D(base_depth, 5, strides=1,
                padding='same', activation=tf.nn.leaky_relu),
    tfkl.Conv2D(base_depth, 5, strides=2,
                padding='same', activation=tf.nn.leaky_relu),
    tfkl.Conv2D(2 * base_depth, 5, strides=2,
                padding='same', activation=tf.nn.leaky_relu),
    tfkl.Conv2D(2 * base_depth, 5, strides=2,
                padding='same', activation=tf.nn.leaky_relu),
    tfkl.Conv2D(4 * encoded_size, 7, strides=1,
                padding='valid', activation=tf.nn.leaky_relu),

This is a simple sequential model where we have introduced a MultivariateNormalTril() layer to the output from the convolutional and dense layers that is a TFP layer. A helper MultivariateNormalTriL() layer will also be used which will output the correct number of activations. The activity_regularizer will make sure that distribution will contribute a regularization term to the final loss where we have used KL divergence to measure the closeness between the encoder and prior.


decoder = tfk.Sequential([
    tfkl.Reshape([1, 1, encoded_size]),
    tfkl.Conv2DTranspose(2 * base_depth, 7, strides=1,
                         padding='valid', activation=tf.nn.leaky_relu),
    tfkl.Conv2DTranspose(2 * base_depth, 5, strides=1,
                         padding='same', activation=tf.nn.leaky_relu),
    tfkl.Conv2DTranspose(2 * base_depth, 5, strides=2,
                         padding='same', activation=tf.nn.leaky_relu),
    tfkl.Conv2DTranspose(base_depth, 5, strides=1,
                         padding='same', activation=tf.nn.leaky_relu),
    tfkl.Conv2DTranspose(base_depth, 5, strides=2,
                         padding='same', activation=tf.nn.leaky_relu),
    tfkl.Conv2D(filters=1, kernel_size=5, strides=1,
                padding='same', activation=None),
    tfpl.IndependentBernoulli(input_shape, tfd.Bernoulli.logits),

Here the decoder network is introduced for decoding the images, where it is also a sequential model in which have a transposed convolutional layer to take the latent representation from the encoder model.

Now we can apply these decoders and encoders to the model as,

vae = tfk.Model(inputs=encoder.inputs,

Now we can fit the model on the data and train it.

negloglik = lambda x, rv_x: -rv_x.log_prob(x)
_ =,


Plotting the Results 

Now we can examine the random sample as,

x = next(iter(eval_dataset))[0][:10]
xhat = vae(x)
assert isinstance(xhat, tfd.Distribution)

Plotting samples from the data:


print('Decoded Random Samples:')

print('Decoded Modes:')

print('Decoded Means:')


Let’s generate random sample using the model as,

z = prior.sample(10)
xtilde = decoder(z)
assert isinstance(xtilde, tfd.Distribution)

Plotting the generated samples:

print('Randomly Generated Samples:')

print('Randomly Generated Modes:')

print('Randomly Generated Means:')


Here we can see the random samples of the generated images using the MNIST dataset and VAE model, where we have used functions and layers from the TensorFlow probability library.

Final Words 

Here in the article, we have seen how we can combine the neural networks with the TensorFlow Probability library. It helps in generating the images using the old data which we have in the datasets provided by the TensorFlow module. 


Download our Mobile App

Yugesh Verma
Yugesh is a graduate in automobile engineering and worked as a data analyst intern. He completed several Data Science projects. He has a strong interest in Deep Learning and writing blogs on data science and machine learning.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox