Search

Understanding Naive Bayes Classifier From Scratch

Naive Bayes classifiers is a highly scalable probabilistic classifiers that is built upon the Bayes theorem. This article goes through the Bayes theorem, ‘make some assumptions’ and then implement a naive Bayes classifier from scratch.

Naive Bayes classifier belongs to a family of probabilistic classifiers that are built upon the Bayes theorem. In naive Bayes classifiers, the number of model parameters increases linearly with the number of features. Moreover, it’s trained by evaluating a closed-form expression, i.e., a mathematical expression that can be evaluated using finite steps and has one definite solution. This means that naive Bayes classifiers train in linear time compared to the quadratic or cubic time of other iterative approximation based approaches. These two factors make naive Bayes classifiers highly scalable. In this article, we’ll go through the Bayes theorem, ‘make some assumptions’ and then implement a naive Bayes classifier from scratch.

Bayes Theorem

Bayes theorem is one of the most important formulas in all probability. It’s an essential tool for scientific discovery and for creating AI systems; it has also been used to find century-old treasures. It is formulated as

But what does that mean? Let’s understand it with the help of a puzzle taken from Kahneman and Tversky’s Experiments:

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy

“Steve is very shy and withdrawn, invariably helpful but with very little interest in people or in the world of reality. A meek and tidy soul, he has a need for order and structure and a passion for detail.”

Given the above description, do you think Steve is more likely to be a librarian or a farmer? The majority of people immediately conclude that Steve must be a librarian since he fits their idea of a librarian. However, as we see the whole picture, we see that there are twenty times as many farmers as librarians (in the United States). Most people aren’t aware of this statistic and hence can’t make an accurate prediction, and that’s okay. Also, that’s beside the point of this article. However, if you want to learn why we act irrationally and make assumptions like this, I wholeheartedly recommend reading Kahneman’s Thinking Fast and Slow.

Back to Bayes theorem, to model this puzzle more accurately, let’s start by creating create a representative sample of 420 people, 20 librarians and 400 farmers. And let’s say your intuition is that roughly 50% of librarians would fit that description, and 10% of farmers would. So the probability of a random person fitting this description being a librarian becomes 0.2 (10/50). So even if you think a librarian is five times as likely as a farmer to fit this description, that’s not enough to overcome the fact that there are way more farmers.

This new evidence doesn’t necessarily overrule your past belief but rather updates it. And this is precisely what the Bayes theorem models.  The first relevant number is the probability that your beliefs hold true before considering the new evidence. Using the ratio of farmers to librarians in the general population, this came out to be 1/5 in our example. This is known as the prior P(H). In addition to this, we need to consider the proportion of librarians that fit this description; the probability we would see the evidence given that the hypothesis is true, P(E|H).  In the context of the Bayes theorem, this value is called the ‘likelihood’. This represents a limited view of your initial hypothesis.

Similarly, we need to consider how much of the farmer’s side of the sample space make up the evidence; the probability of seeing the evidence given that your beliefs don’t hold true P(E|H’). Using these notations, the accurate probability of your beliefs being right given the evidence, P(H|E), also called the posterior probability, can be formulated as:

Where N is the sample size. In our example, this expression evaluates to 1/5. The sample size gets cancelled out; this gives us:

This gets further simplified to:

This is the original Bayes theorem that we started with. I hope this illustrated the core point of the Bayes theorem representing a changing belief system, not just a bunch of independent probabilities.

Assumptions Made by Naive Bayes Classifiers

The naive Bayes classifier is called ‘naive’ because it makes the assumption that all features are independent of each other. Another assumption that it makes is that the values of the features are normally (Gaussian) distributed. Using these assumptions, the original Bayes theorem is modified and transformed into a simpler form that is relevant for solving learning problems. We start with

where,

And because the features are considered independent, we can breakdown P(X|y)

For classification tasks, we want to select the class with the highest probability, so we choose y, which is

And P(X) is common to all classes and will not change their final ranks; hence it can be excluded from the equation.

A lot of these probabilities will be small numbers; multiplying them can cause overflow problems, so we apply the log function.

And that’s the equation naive Bayes classifiers use to make predictions, now let’s implement one.

Implementing a Naive Bayes Classifier from Scratch

Create a function that calculates the prior probability, P(H), mean and variance of each class. The mean and variance are later used to calculate the likelihood, P(E|H), using the Gaussian distribution.

``` import numpy as np
class GaussianNaiveBayes:
def fit(self, X, y):
n_samples, n_features = X.shape
self._classes = np.unique(y)
n_classes = len(self._classes)
self._mean = np.zeros((n_classes, n_features), dtype=np.float64)
self._var = np.zeros((n_classes, n_features), dtype=np.float64)
self._priors =  np.zeros(n_classes, dtype=np.float64)

# calculating the mean, variance and prior P(H) for each class
for i, c in enumerate(self._classes):
X_for_class_c = X[y==c]
self._mean[i, :] = X_for_class_c.mean(axis=0)
self._var[i, :] = X_for_class_c.var(axis=0)
self._priors[i] = X_for_class_c.shape[0] / float(n_samples) ```

Write a function for calculating the likelihood, P(E|H), of data X given the mean and variance

```     def _calculate_likelihood(self, class_idx, x):
mean = self._mean[class_idx]
var = self._var[class_idx]
num = np.exp(- (x-mean)**2 / (2 * var))
denom = np.sqrt(2 * np.pi * var)
return num / denom ```

And finally, write the methods for making classifications by calculating the posterior probability, P(H|E), of the classes

```     def predict(self, X):
y_pred = [self._classify_sample(x) for x in X]
return np.array(y_pred)

def _classify_sample(self, x):
posteriors = []
# calculating posterior probability for each class
for i, c in enumerate(self._classes):
prior = np.log(self._priors[i])
posterior = np.sum(np.log(self._calculate_likelihood(i, x)))
posterior = prior + posterior
posteriors.append(posterior)
# return the class with highest posterior probability
return self._classes[np.argmax(posteriors)] ```

Now let’s test our naive Bayes classifier and compare it with scikit-learn’s implementation.

``` from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import datasets
import time

X, y = datasets.make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

start = time.perf_counter()
nb = GaussianNaiveBayes()
nb.fit(X_train, y_train)
predictions = nb.predict(X_test)
end = time.perf_counter()
print(f"NumPy Naive Bayes accuracy: {accuracy_score(y_test, predictions)}")
print(f'Finished in {round(end-start, 3)} second(s)')  ```
``` NumPy Naive Bayes accuracy: 0.796
Finished in 0.013 second(s) ```
``` from sklearn.naive_bayes import GaussianNB
start = time.perf_counter()
sk_nb = GaussianNB()
sk_nb.fit(X_train, y_train).predict(X_test)
sk_predictions = sk_nb.predict(X_test)
end = time.perf_counter()
print(f"scikit-learn Naive Bayes accuracy: {accuracy_score(y_test, sk_predictions)}")
print(f'Finished in {round(end-start, 3)} second(s)')  ```
``` scikit-learn Naive Bayes accuracy: 0.96
Finished in 0.009 second(s) ```

You can find the above implementation in a Colab notebook here.

What to see naive Bayes classifiers in action check out these articles:

A machine learning enthusiast with a knack for finding patterns. In my free time, I like to delve into the world of non-fiction books and video essays.

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

NVIDIA Planning Big Expansions in Japan

Prime Minister Fumio Kishida has extended billions of dollars in financial support to bolster TSMC

Runway Partners with Getty to Build Video Generation Model for Enterprises

Runway enterprise users can refine RGM with their proprietary datasets, benefiting various industries like Hollywood,

Modular Announces Partnership with AWS, NVIDIA

ModCon 2023 was Modular’s first developer conference.

Air India Closes Historic Data Centres, Migrates to the Cloud

The closure of the data centres will further result in nearly a million dollar net

India’s Fab Ambitions: Tall Promises or Work in Progress?

Despite numerous claims, India still does not have a fab.

Extropic AI is All About e/acc of Generative AI

The startup wants to open source its technology, which the founder calls is “AI Manhattan

AWS’ Multi-Layered Approach to Tech and Trust

For 17 years Amazon Web Services has followed a set of principles to reach the

AssemblyAI Raises \$50 Mn for Superhuman Speech Recognition

The new model the company is building is trained on more than 10 million hours

Top 6 Generative AI Jobs in India

As the year concludes, we’ve compiled a list of promising generative AI jobs for those

AIM Top Ranked PG Data Science Programs (Full Time On-Campus) – 2023

A good course has high and well-balanced scores across all the parameters, making it at