# Naive Bayes – Why Is It Favoured For Text Related Tasks?

We receive a large number of emails everyday. They contain a mix of spam and non-spam emails. Automated email filtering for the classification of spam and non-spam emails is a very important application. Finding out the sentiment around a certain amount of text helps determine the perception of a product or an event or similar to the general public. This is termed as sentiment analysis and is another common text application. Twitter sentiment analysis is one of the most commonly seen applications to gauge the consumer sentiment due to a recent event or the launch of a certain product etc.

The kind of applications mentioned above are binary or multiclass classification problems i.e. they would have a possibility of two or more outcomes of an event. This kind of a problem tells us to go for some classification algorithms like Logistic Regression, Tree Based Algorithms, Support Vector Machines, Naive Bayes etc. When you actually get to work with the above algorithms, Naive Bayes gives you the best kind of results which are desired. In applications like spam filtering and sentiment analysis, the data majorly consists of the textual data in the form of reviews or the contents of an email.

Naive Bayesian algorithm is a simple classification algorithm which uses probability of the events for its purpose. It is based on the Bayes Theorem which assumes that there is no interdependence amongst the variables. For example, if a fruit is banana and it has to be yellow/green in colour, in the shape of a banana and 1-2cm in radius. All of the properties stated above contribute individually towards that fruit being a banana and hence these features are referred to as “Naive”. As it considered the feature set to be Naive, the Naive Bayesian algorithm can be trained using less training data and also mislabeled data.

The Bayes Theorem is based on the following formula :

P(A/B) =P(A) x P(B/A)P(B)

Here we are calculating posterior probability of the class A when predictor B is given to us ie. P(A/B). P(A) is the prior probability of the class. P(B/A) is the likelihood of predictor B given class A probability. P(B) is the prior probability of the predictor B. Calculating these probabilities will help us calculate probabilities of the words in the text.

The Bayesian statistics is different from the general statistics in various ways that a general probability calculation is always done around random events with a repeated number of trials while the Bayesian statistics is involved in calculating the prior and posterior probabilities. Bayesian statistics gives the leverage of the changing probabilities which can happen prior and post a certain event.

The Naive Bayesian classifier consists of performing the below steps –

• Create a frequency table based on the words
• Calculate the likelihood for each of the classes based on the frequency table
• Calculate the posterior probability for each class
• The highest posterior probability is the outcome of the prediction experiment

All these probabilities are calculated by using the Bayes Theorem. As the Naive Bayes algorithm has the assumption of the “Naive” features it performs much better than other algorithms like Logistic Regression, Tree based algorithms etc. The Naive Bayes classifier is much faster with its probability calculations.

Different kinds of Naive Bayesian implementations exist –

• Gaussian Naive Bayes

This is the kind of algorithm used when all features follow a normal distribution. All features are continuous valued. The assumption is that there is no covariance between the independent features.

• Multinomial Naive Bayes

It is generally used where there are discrete features(for example – word counts in a text classification problem). It generally works with the integer counts which are generated as frequency for each word. All features follow multinomial distribution. In such cases TF-IDF(Term Frequency, Inverse Document Frequency) also works.

• Bernoulli Naive Bayes

This classifier also works with discrete data. The major difference between Multinomial Naive Bayes and Bernoulli is that Multinomial Naive Bayes works with occurrence counts while Bernoulli works with binary/boolean features. For example, the feature values are of the form true/false, yes/no, 1/0 etc. This is best visualized with the help of a histogram.

Different variations of the Naive Bayes classifier all work with the same analogy of independence of features. The way the different types of Naive Bayesian classifiers have been designed they work very well on all kinds of text related problems. Document classification is one such example of a text classification problem which can be solved by using both Multinomial and Bernoulli Naive Bayes. The calculation of probabilities is the major reason for this algorithm to be a text classification friendly algorithm and a top favorite among the masses. This classifier is highly used for predictions in real-time and also used in recommendation systems along with collaborative filtering.

An engineer at the core, data science is my passion. I have a Masters in Data Science from NMIMS. I have worked on machine learning problems, image classification and reinforcement learning problems. Solving complex problems and thinking of easy solutions is what I practice. Avid reader and writer describe me the best.

## Our Upcoming Events

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Apple Launches iCringe with a Sustainability Twist

With Mother Nature in mind, Apple is making impactful strides towards carbon-neutral products. However, there is a slight hiccup

### Data Science Hiring Process at Zoho

Zoho has over 10 open positions for both freshers and experienced professionals.

### Will AGI Be Built in China?

AGIEval Seems to Think So

### NVIDIA Expands Cloud Business with Investments, Partnerships

With NVIDIA partnership, Hugging Face users get access to SOTA GPUs and infrastructure needed to rapidly train and finetune foundation models at scale and drive a new wave of enterprise LLM development.

### Intel Soon to be on Par with NVIDIA

A green CPU with a blue GPU might soon be possible.

### Shell Hackathon to Protect Against Cyber Threats

The aim of the Cyber Threat Detection Hackathon is to build a model capable of identifying code in a body of text.

### ChatGPT is Down, I Can’t Code Anymore

Don’t they know I have a product to ship?

### Decoding SAP Labs’ Generative AI Motto

The German ERP software provider is investing heavily in upskilling its employees.

### Why AI Tech Honchos are Meeting Behind Closed Doors

What transpired when the who’s who of tech leaders convened in Capitol Hill last week to discuss AI behind closed doors?

### AI Clock is Ticking: Wake Up Call for Education Institutions

It’s not too late