Search

# Beginner’s Guide To Decision Trees: Why Are They Crucial For Data Science Applications

Decision Tree is the simple but powerful classification algorithm of machine learning where a tree or graph-like structure is constructed to display algorithms and reach possible consequences of a problem statement. This is a predictive modelling tool that is constructed by an algorithmic approach in a method such that the data set is split based on various conditions. Besides other classification methods of supervised learning, this method is widely used for practical approaches and can be used in both classification and regression problems. In general, this algorithm is referred to as Classification and Regression Trees (CART).

### How It Works

The supervised learning algorithm is always drawn upside down i.e. root is always at the top. The samples are split into two or more homogenous sets depending upon the differentiator in input variables.

##### Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy

For instance, let’s say we have a dataset of a population that includes two variables i.e. diabetic patients and non-diabetics. Now, in order to create a model to predict who is diabetic or non-diabetic, the tree will be traversed from root to leaf and it will be done till the criteria are fulfilled.

Some important cases are to be assumed to create a decision tree model

• The root node will be represented in the whole training set
• The nodes that do not split are called a leaf node
• The node which is divided into sub-nodes is called parent node and the sub-nodes are called as child-nodes.
• Records will be distributed recursively depending upon the attribute values

The model used here has three attributes and two classes. The attributes referred here are minimum systolic blood pressure, age and blood glucose level and the classes are diabetic and non-diabetic.

While modelling a decision tree we can use two methods to avoid over-fitting in the model. They are:

Pruning: This process involves the removing of branches that contains features of low importance for the model. It can be done either at the root or the leaves, but the simplest process will be done at the leaves.

Reverse Binary Splitting: All the features are taken into consideration in this process, then different split points are tried and tested. A cost function is used to split in the model that searches for the most homogenous branches having similar responses.

### Learning The Decision Tree Algorithm

Step 1: Create The Root Node

Let us assume Sample S. The initial step begins with the calculation of Entropy H(S) of the current state of Sample S. The Entropy is said to be the measurement of the quantity of uncertainty in data. The formula of Entropy is shown below

The value of Entropy will be zero if all the members resemble the same group and the value of Entropy will be one when fifty percent of the members belong to one particular class and the other fifty percent belongs to another class.

The following step is to select the attribute which gives us the Information Gain with the highest possibility and will be selected as the root node. Information gain is denoted by IG (S, A) for a sample S is the productive transform in entropy after deciding on a notable attribute A. Information Gain evaluates the relative transform in entropy in relation to the variables which are independent. The formula is given as:

Here, x denotes the feasible values for an attribute, H(S) is the Entropy of all the samples and P(x) is denoted as the possibility of the event x.

Step 2: If all the instances are found to be positive; Return leaf node as “positive‟.

Step 3: Else if all the instances are negative; Return leaf node as “negative‟.

Step 4: Remove the attribute which yields the highest Information Gain from the group of attributes.

Step 5: Repeat the process until the last attribute or the decision tree achieves all the leaf nodes.

### Decision Tree In Real-Life Applications

• Business Management: The model of this algorithm is broadly applied for customer alliance administration and also the recognition of deception.
• Customer Relationship Management: Arrange Customer’s alliance by observing the use of online amenity.
• Fraudulent Statement Detection: Recognition of Fraudulent Financial Statements (FFS) because of the issues creating for tax income of the government.
• Engineering: For vitality utilisation and detect observation, Decision Tree is used.
• Energy Consumption: Supports the companies with the issues of consuming of the energy.
• Fault Diagnosis: Broadly utilised application in the field of engineering is the recognition of faults.
• Healthcare Management: With its advancement in everyday life, it is used in the field of heath-caring systems.

1. Domain knowledge is not required for the decision tree construction
2. Inexactness of complex decision is minimised which results to assign exact values to the outcome of various actions
3. Easy interpretations and cope with numerical data and categorical data
4. It performs categorisation without much computation
5. Decision trees cope up with both continuous variables and categorical variables

1. One output attribute is restricted for Decision Tree
2. Decision Tree is an unstable classifier
3. This algorithm generates categorical outputs
4. Decision trees are not so suitable for task evaluating for prediction of the practicality.
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### AI Assists Production in Indian Film Industry

Implementing AI in pre-production can bring down storyboarding process time by 50-80% and reduce the

### Is GPT-4 Really Better than Radiologists?

“Radiology report summaries created by GPT-4 are comparable, and in some cases, even preferred over

### TSMC: The Wizard Behind AI’s Curtain

TSMC anticipates a substantial CAGR of nearly 50% in the AI sector from 2022 to 2027.

Not really.

### Google Gemini To Arrive Sooner Than Expected

This is after announcing the AI at the Google I/O 2023, the company had postponed

### ByteDance to Launch Platform to Build Custom Chatbots

This comes just a few days after OpenAI had delayed its plan to launch a

### This New AI tool Could Mark the Beginning of the End for TikTok and Instagram Influencers

Alibaba Group announces a model framework that can transform still images into dynamic character videos

### Embracing Identity: The Journey of Sujoy Das

“Why is it that corporate diversity efforts are often limited to specific times of the

### The Biggest Data Breaches of 2023

The most significant breaches that impacted the global landscape in 2023.

### NVIDIA Planning Big Expansions in Japan

Prime Minister Fumio Kishida has extended billions of dollars in financial support to bolster TSMC