A beginner’s guide to Bayesian CNN

Applying bayesian on neural networks is a method of controlling overfitting. We can also apply bayesian on CNN to reduce the overfitting and we can call CNN with applied Bayesian as a BayesianCNN.

Convolutional neural networks(CNN) are the best way to deal with computer vision problems like image classification, object localization and detection, and image segmentation. The main reason behind this capability is CNN can easily deal with a set of non-linear data points. Most of the time we find these datasets small in amount for training CNN and standard CNN requires large size data to overcome the problem of overfitting. Bayesian CNN is a variant of CNN that can reduce the chances of overfitting while training on small-size data.  In this article, we are going to discuss Bayesian CNN. The major points to be discussed in the article are listed below.

Table of contents

  1. What are Bayesian neural networks?
  2. Problem with CNN
  3. What is Bayesian CNN?
  4. The architecture of Bayesian CNN
  5. How Does Bayesian CNN Work?
  6. Applications of Bayesian CNN

Let’s first understand how Bayesian is used in a neural network. 

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

What are Bayesian neural networks?

We can think of the Bayesian neural network as an extension of a standard network with posterior inference so that the network can deal with overfitting. Talking about the standard networks they are bound to perform a given task on data without having any prior knowledge about the task. To do so, the network finds optimal points estimation for the weights in every node. Applying the Bayesian approach means using statistical methods to cover a probability distribution that is attached to the network with parameters of the network such as weights and biases. 

Talking about the standard networks we get different values from it for the same random variable. Applying Bayesian on the network makes the historical data represent the prior knowledge of the overall behaviour with the statistical properties of each variable that can also vary with time. We can assume that any random variable with a normal distribution and whenever the standard network works on X gives different results. The results from the network depend on the probability distribution of the X. we can get a similar result by deducing the nature and shape of the parameters of the neural network. The motive behind applying Bayesian on neural networks are as follows:

  • Standard networks have the problem of overfitting with small datasets.
  • Bayesian can be applied to any network.
  • Applying bayesian makes the network capable of giving better results with a huge number of tasks. 
  • Bayesian helps in predicting or estimating uncertainty in prediction.

In the above, we have discussed a motive that is telling us that applying bayesian reduces the problem of overfitting. While talking about the convolutional neural network, they are famous because of dealing with the image data, and to make them work properly we are required to provide high dimensional data for training but in such situations where the dimensions of the data are low, we may find applying Bayesian on CNN useful. Let’s discuss the problem with CNN before moving on to Bayesian CNN.

Are you looking for for a complete repository of Python libraries used in data science, check out here.

Problem with CNN

Convolutional neural networks are one of the most important variants of Deep neural networks that are being used mainly in dealing with image data. These data types can be considered as a set of non-linear data points that require a huge amount of modelling and are available as very small in amount. Training CNN requires a huge amount of data to reduce the chances of overfitting. 

So in general we can say that training CNN on small data leads to overfitting.  Although these models are capable of getting trained on small training data but not capable of predicting accurately. In the above, we have discussed that we can reduce the chances of overfitting by applying Bayesian statistics to the network. Similarly bayesian can also be applied with the CNN.  applying bayesian on CNN helps us in approximating the uncertainty and also regularizes the predictions. 

What is Bayesian CNN?

In the above, we have seen that applying bayesian on neural networks is a method of controlling overfitting. We can also apply bayesian on CNN to reduce the overfitting and we can call CNN with applied Bayesian as a BayesianCNN. One way to do so can be by modelling the distribution over the kernel of CNN, here we are required to infer the model posterior. In a variety of cases, we can find the approximation of the model posterior with the variational inference, whereas in some cases we find the modelling of the posterior using the variational distributions like gaussian distribution. The major objective of applying Bayesian on the CNN is to fit the parameter of distribution closer to the true posterior. This objective can be fully filled by minimizing the divergence from the posterior.  

We can form a Bayesian CNN by approximating the true posterior probability distribution with the variational probability distribution that can compose properties of distributions such as gaussian distribution. We call the final distribution a variational posterior probability distribution that expresses an uncertainty estimation of models parameters. Some of the studies of Bayesian CNN have shown in their results that these Bayesian CNNs are helpful in predicting more richly from cheap model averaging. 

We can use the Bayesian CNN for tasks like Image Super-Resolution and Generative Adversarial Networks. In this section, we have a look at how we can model a Bayesian CNN. Let’s just look at the works where we find the implementations of Bayesian CNN.

The architecture of Bayesian CNN

This section will let us know about the basic architecture of Bayesian CNN, let’s say in the architecture we are required to add variational inference with CNN. so the architecture can have the following three main component components:

(Each filter weight has distribution in Bayesian CNN)

  • Layers: In the layer section of the network we can use a module wrapper and flatten the layer, linear layers, and convolutional layer.
  • Bayesian models: that should contain some of the standard bayesian models like BBBLeNet, BBBAlexNet, BBB3Conv3FC, etc.
  • Convolutional model: this section of the architecture can hold CNN like LeNet and AlexNet.     

How Does Bayesian CNN Work?

Using the Bayesian CNN we mainly focus on estimating the uncertainty that can be of two types one that measures for variation of data and the second that measures for the model. So for estimating the uncertainty we can put the last layer in the above-given architecture and this layer should propose the estimator for predicting uncertainty using the below mathematical condition:

Image source

The first term in the equation is for measures for variation of data and the second measures variation of the models. The above-given condition computes the variability of the predictive probability. This condition also involves extra sampling steps for providing results while reducing the number of weights which also helps in the deduction of overfitting. After calculating these two terms we can also convert them in probability to output the results. 

Applications of Bayesian CNN

Some of the notable applications of bayesian convolutional networks are as follows:

Final words

In this article, we have discussed the Bayesian CNN and Bayesian neural networks. With CNN we find the problem of overfitting majorly and applying Bayesian statistics to them can make them more capable and accurate while reducing the chances of overfitting. Along with this, we have seen examples of some works that can help to make our projects on Bayesian CNN.

More Great AIM Stories

Yugesh Verma
Yugesh is a graduate in automobile engineering and worked as a data analyst intern. He completed several Data Science projects. He has a strong interest in Deep Learning and writing blogs on data science and machine learning.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

What can SEBI learn from casinos?

It is said that casino AI technology comes with superior risk management systems compared to traditional data analytics that regulators are currently using.

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.