What Are Normalising Flows And Why Should We Care

Machine learning developers and researchers are constantly in pursuit of finding a well-defined probabilistic model that would correctly describe the processes that produce data. A central need in all of the machine learning is to develop the tools and theories to develop better-specified models that lead to even better insights of data.

One such attempt has been made by Danilo Rezende in the form of normalising flows. Today building probability distributions as the normalising flow is an active area of ML research.

Normalizing flows operate by pushing an initial density through a series of transformations to produce a richer, more multimodal distribution — like a fluid flowing through a set of tubes. Flows can be used for joint generative and predictive modelling by using them as the core component of a hybrid model.

Significance Of Normalised Flows

via Andrej Karpathy 

Normalizing flows provide a general way of constructing flexible probability distributions over continuous random variables. 

Let x be a D-dimensional real vector, and suppose we would like to define a joint distribution over x. The main idea of flow-based modelling is to express x as a transformation T of a real vector u sampled from a distribution of the flow-based model.

According to the Google Brain team, the key idea behind normalising of flows can be summarised as follows:

  • Take some distribution X whose log p(x) we can compute easily.
  • Learn some function f(x) where sampling: y = f(x)
  • Learn its inverse f-1(y) to transform points in Y back to the domain of X.
  • Density evaluation log p(y) = log p(x) + |log det J(f-1)(y)|, which can be optimised via stochastic gradient descent methods.

The flow can be thought of as an architecture, where the last layer is a (generalised) linear model operating on the features and these features distribution can be viewed as a regulariser on the feature space. In turn, flows are effective in any application requiring a probabilistic model with either of those capabilities.

Normalizing flows, due to their ability to be expressive while still allowing for exact likelihood calculations, are often used for probabilistic modelling of data. They have two primitive operations: density calculation and sampling. 

For example, invertible ResNets have been explored for classification with residual flows and have witnessed a first big improvement. The improvement can be something as significant as the reduction of the model’s memory footprint by obviating the need to store activations for backpropagation. 

This achievement may help one understand to what degree discarding information is crucial to deep learning’s success.

Normalizing flows allow us to control the complexity of the posterior at run-time by simply increasing the flow length of the sequence.

Rippel and Adams (2013), were the first to recognise that parameterizing flows with deep neural networks could result in quite general and expressive distribution classes.

Like with deep neural networks, normalizing the intermediate representations is crucial for maintaining stable gradients throughout the flow.

Normalizing flows can also be integrated into traditional Markov chain Monte Carlo (MCMC) sampling by using the flow to reparameterize the target distribution. Since the efficiency of Monte Carlo methods drastically depends on the target distribution, normalizing flows would make it easier to explore.

Normalizing flows can be thought of as implementing a ‘generalised reparameterization trick’, as they leverage a transformation of a fixed distribution to draw samples from a distribution of interest.

For instance, the Generative Model has been a popular application of flows in machine learning. Here are some other examples:

  • Image generation has been given serious effort since the earliest work on flows. Dinh et al. (2017) increased the capacity of their model by including scale transformations (instead of just translations), being the first to demonstrate that flows could produce sharp, visually compelling full-colour images.
  • In the case of text, the most direct way to apply normalizing flows to text data is to define a discrete flow over characters or a vocabulary. 

Future Direction

In a paper titled, Normalizing Flows for Probabilistic Modeling and Inference, researchers from DeepMind investigated the state of flow models in detail. 

They have listed the kind of flow models that have been in use, their evolution and their significance in domains like reinforcement learning, imitation learning, image, audio, text classification and many more.

The authors also speculate that many flow designs and specific implementations will inevitably become out-of-date as work on normalizing flows continues, we have attempted to isolate foundational ideas that will continue to guide the field well into the future.

The large scale adoption of normalising flows in place of conventional probabilistic models is advantageous because unlike other probabilistic models that require approximate inference as they scale, flows usually admit analytical calculations and exact sampling even in high dimensions.

However, the obstacles that are currently preventing wider application of normalizing flows are similar to those faced by any probabilistic models. With the way research is accelerating, the team at DeepMind are optimistic about the future of flow models.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

More Stories


8th April | In-person Conference | Hotel Radisson Blue, Bangalore

Organized by Analytics India Magazine

View Event >>

30th Apr | Virtual conference

Organized by Analytics India Magazine

View Event >>

A beginner’s guide to Spatio-Temporal graph neural networks

Spatio-temporal graphs are made of static structures and time-varying features, and such information in a graph requires a neural network that can deal with time-varying features of the graph. Neural networks which are developed to deal with time-varying features of the graph can be considered as Spatio-temporal graph neural networks. 

Yugesh Verma
A guide to explainable named entity recognition

Named entity recognition (NER) is difficult to understand how the process of NER worked in the background or how the process is behaving with the data, it needs more explainability. we can make it more explainable.

Yugesh Verma
10 real-life applications of Genetic Optimization

Genetic algorithms have a variety of applications, and one of the basic applications of genetic algorithms can be the optimization of problems and solutions. We use optimization for finding the best solution to any problem. Optimization using genetic algorithms can be considered genetic optimization

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM