# Hands-on Guide to Bayesian Neural Network in Classification

Consider an example where you are trying to classify a car and a bike. If an image of a truck is shown to the network, it ideally should not predict anything. But, because of the softmax function, it assigns a high probability to one of the classes and the network wrongly, though confidently predicts it to be a car. In order to avoid this, we use the Bayesian Neural Network (BNN).

Deep neural networks learn to form relationships with the given data without having prior exposure to the dataset. They perform very well on non-linear data and hence require large amounts of data for training. Although more information is better for the network, it leads to problems like overfitting

The neural network will work really well with training data but underperforms when it is fed unseen data. This makes the network blind to the uncertainties in the training data and tends to be overly confident in its wrong predictions. Consider an example where you are trying to classify a car and a bike. If an image of a truck is shown to the network, it ideally should not predict anything. But, because of the softmax function, it assigns a high probability to one of the classes and the network wrongly, though confidently predicts it to be a car. In order to avoid this, we use the Bayesian Neural Network (BNN).

• Why do we need to use BNN?
• How a Bayesian network works
• Implementing a BNN

### Why do we need to use the Bayesian Neural Network (BNN)?

As discussed above, we need to make sure our model does not overfit. You may be wondering that there are other methods to do this like Batch Normalisation, Dropout etc. But all these methods do not solve the problem of identifying uncertainty in the model. Hence the need for BNNs. Bayesian Networks work well on small datasets and are robust for avoiding overfitting. They also come with additional features like uncertainty estimation, probability distributions etc.

### How Does a Bayesian Neural Network work?

The motto behind a BNN is pretty simple — every entity is associated with a probability distribution, including weights and biases. There are values called ‘random variables’ in the bayesian world which provides a different value every time it is accessed. Let X be a random variable which represents the normal distribution, every time you access X, it’ll have a different value. This process of obtaining a different value on each access is called Sampling. What value comes out of each sample depends on the probability distribution. The wider the probability distribution, the more the uncertainty.

In a traditional neural network, each layer has fixed weights and biases that determine the output. But, a Bayesian neural network will have a probability distribution attached to each layer as shown below.

For a classification problem, you perform multiple forward passes each time with new samples of weights and biases. There is one output provided for each forward pass. The uncertainty will be high if the input image is something the network has never seen for all output classes.

### Implementation

We will now implement the BNN on a small dataset from sklearn called iris dataset. This dataset has 4 attributes and around 150 data points.

```import numpy as np
from sklearn import datasets
import torch
import torch.nn as nn
import torch.optim as optim
import torchbnn as bnn
import matplotlib.pyplot as plt

### Splitting the dataset into data and target and converting them to tensors

```data = dataset.data
target = dataset.target
data_tensor=torch.from_numpy(data).float()
target_tensor=torch.from_numpy(target).long()```

### Defining a simple Bayesian model

```model = nn.Sequential(
bnn.BayesLinear(prior_mu=0, prior_sigma=0.1, in_features=4, out_features=100),
nn.ReLU(),
bnn.BayesLinear(prior_mu=0, prior_sigma=0.1, in_features=100, out_features=3),
)

prior_mu (Float) is the mean of prior normal distribution.
prior_sigma (Float) is the sigma of prior normal distribution.```

### Defining loss function

The two-loss functions used here are cross-entropy loss and the BKL loss which is used to compute the KL (Kullback–Leibler) divergence of the network.

```cross_entropy_loss = nn.CrossEntropyLoss()
klloss = bnn.BKLLoss(reduction='mean', last_layer_only=False)
klweight = 0.01

### Training the model

The model is trained for 3000 steps(this would have lead to overfitting for a traditional network)

```for step in range(3000):
models = model(data_tensor)
cross_entropy = cross_entropy_loss(models, target)
kl = klloss(model)
total_cost = cross_entropy + klweight*kl

total_cost.backward()
optimizer.step()

_, predicted = torch.max(models.data, 1)
final = target_tensor.size(0)
correct = (predicted == target_tensor).sum()
print('- Accuracy: %f %%' % (100 * float(correct) / final))
print('- CE : %2.2f, KL : %2.2f' % (cross_entropy.item(), kl.item()))

Output : - Accuracy: 98.000000 %
- CE : 0.05, KL : 2.81```

### Visualisation

Let us now visualise the model and see how it has performed. To understand how sampling works, run the model multiple times and plot the graphs. You will notice minor changes with each iteration.

```def draw_graph(predicted) :
fig = plt.figure(figsize = (16, 8))

z1_plot = fig_1.scatter(data[:, 0], data[:, 1], c = target,marker=’v’)
z2_plot = fig_2.scatter(data[:, 0], data[:, 1], c = predicted)

plt.colorbar(z1_plot,ax=fig_1)
plt.colorbar(z2_plot,ax=fig_2)

fig_1.set_title("REAL")
fig_2.set_title("PREDICT")

plt.show()

Run 1 :
models = model(data)
_, predicted = torch.max(models.data, 1)
draw_graph(predicted)

Run 2:
models = model(data)
_, predicted = torch.max(models.data, 1)
draw_graph(predicted)

```

The above graph indicates the difference between the scattering of data points in the actual dataset versus the scattering of points in the predicted bayesian network. Since there are 3 output classes, each of them are indicated in a different color. Each time the data is sampled, the network assigns a probability distribution to the entire input data. On close observation near coordinates (2.3,6.2) of the predicted graph, there is a different prediction made because of the change in probability.

### Conclusion

The above implementation makes it clear that not only the problem of overfitting is much more robustly solved with the help of Bayesian Networks but also determines the uncertainty involved in the input.

## Our Upcoming Events

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### No, the Infamous 6-Month AI Pause Letter Wasn’t a Failure

Though the AI pause wasn’t adopted, the letter’s impact is evident, says signatory

### AI in Programming is to Collaborate, Not Eliminate

While the potential of AI is unquestionable, a deeper look into its current capabilities suggests that a complete or even a partial AI takeover in programming is unlikely

### Apple Should be Scared of Windows Copilot

Copilot will start its early rollout as part of the free Windows 11 update, beginning on September 26

### Top 5 Libraries in C/C++ for ML in 2023

There are tons of libraries in C/C++ for ML, such as TensorFlow, Caffe, and mlpack

### Tesla Optimus Finally Learns Yoga, Performs Vrikshasana

Jim Fan, senior AI scientist at NVIDIA, has come forward with insights on how exactly Optimus functions with such brilliance

### NVIDIA’s Dominance Set to Surge Further

NVIDIA’s Meteoric Rise in 2023: On Track to Surpass \$50 Billion Revenue, Achieves \$1 Trillion Market Cap, and Forges Global Partnerships for AI Dominance.

### 6 Brilliant JavaScript Frameworks for Every Developer

Although Python and R are more famous for machine learning, Java can serve this purpose effectively, especially if you’re already familiar with it

### Meet the Researcher Curing the Healthcare System with ML

Ziad Obermeyer is bringing the long-delayed impact of ML in healthcare

### Why Focus on Future AI Regulations When Deepfake Crimes Persist?

With discussions on AI regulations happening on one side, and deepfake crimes increasing on the other, shouldn’t the present be checked before moving to the future?