Last updated February 18, 2021
In AI Mysteries

What Is A Naive Bayes Classifier And What Significance Does It Have In ML

Share

Published on April 1, 2019

by Ram Sagar

Classifier systems are most popular with spam filtering for emails, collaborative filtering for recommendation engines and sentiment analysis. AI is good with demarcating groups based on patterns over large sets of data.

Naive Bayes classifier is based on Bayes’ theorem and is one of the oldest approaches for classification problems.

Bayes’ theorem can be put in simple terms as:

The objective here is to determine the likelihood of an event A happening given B happens.

The naive Bayes classifier combines Bayes’ model with decision rules like the hypothesis which is the most probable outcomes.

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable.

It was initially introduced for text categorisation tasks and still is used as a benchmark.

There have been many innovations like Support Vector Machines or KNN over the years in solving the classification problem with more flexibility and smartly. But Naive Bayes classifier can still be competent with enough pre-processed data and has shown great results in medical applications where classification is crucial in diagnosis.

How Good Is NB Classifier For ML

The first assumption of a Naive Bayes classifier is that the value of a particular feature is independent of the value of any other feature. Which means that the interdependencies within data are comfortably neglected. Hence the name ‘naive.’

A naive Bayes classifier considers every feature to contribute independently to the probability irrespective of the correlations.

For unsupervised or in more practical scenarios, maximum likelihood is the method used by naive Bayes model in order to avoid any Bayesian methods, which are good in supervised setting.

Gaussian Naive Bayes classifier where the feature values are assumed to be distributed in accordance with Gaussian distribution. The likelihood of the feature being classified is assumed to be Gaussian.

Calling Gaussian NB classifier in Python using sci-kit learn:

from sklearn.naive_bayes import GaussianNB

Multinomial Naive Bayes classifier considers feature vectors which are representation of the frequencies with which few events have been generated by a multinomial distribution.

Whereas, in Bernoulli Naive Bayes approach, features are independent booleans and can be used for binary responses.

For example, in document classification tasks, Multinomial NB can be used for a number of times a word appears in the document(frequency). And, Bernoulli NB for classifying whether a word appears or not (a binary YES or NO).

NB classifiers are usually pitted against support vector machines(SVM). In many cases, SVMs are better than Naive Bayes. SVMs skims across features for dependencies when non-linear kernel like Gaussian or radial basis function(RBF) are used.

Even though naive Bayes is criticized for its inaccuracies surrounding the assumption of independence across features, it does fairly well when the class conditional feature is decoupled. This decoupling allows it to treat the feature distributions as one-dimensional distribution. And, avoid the challenges of dimensionality like, the need for data sets that grow exponentially with features.

NB classifiers can be tweaked in for better results especially for document classification or word identification using the following techniques:

By removing stop words in a sentence as they are not significant to the classification task. We won the game is as good as we won the game by a very close margin. Here ‘very’ and ‘close’ are the stop words and removing them wouldn’t change the result.
By lemmatizing words, synonymous parts of a paragraph will be grouped together to avoid the ticking of word frequency counter. ‘Game’ or ‘games’ will be grouped.
Checking significance of a word with term frequency-inverse document frequency(TF-IDF). This technique is used to check for the weightage of a word in text mining tasks. It can be used for stop words filtering. Using tf-idf value as a threshold, few words can be penalised in case of their high frequency.

Read about NB classifier in detail here

Access all our open Survey & Awards Nomination forms in one place