Gini impurity is a function that determines how well a decision tree was split. Basically, it helps us to determine which splitter is best so that we can build a pure decision tree. Gini impurity ranges values from 0 to 0.5. It is one of the methods of selecting the best splitter; another famous method is Entropy which ranges from 0 to 1. In this article, we will have a look at the mathematical concept of the Gini impurity method for decision tree split. We will take random data and understand this concept from the very basics. Below is a list of points that we will cover in this article.

**Table of contents**

- What is a decision tree?
- The math behind the Gini impurity
- Constructing the decision tree using Gini impurity

Let’s start with having a quick introduction to the decision trees.

**What is a decision tree?**

A Decision tree is the supervised machine learning algorithm. As the name suggests, if we visualize it, it will appear like a tree. It is a powerful algorithm that can perform well on complex data, it is also versatile in nature because it can perform both classification and regression.

#### THE BELAMY

##### Sign up for your weekly dose of what's up in emerging technology.

The decision trees have a root node, internal nodes, and leaf nodes where,

- Root node: It’s a node from where the tree is originated
- Internal nodes: This node contains the only evaluation of variables
- Leaf nodes: These nodes are responsible for making decisions

Now, we will go through the mathematics behind the Gini impurity by implementing a decision tree on a given dataset.

**The math behind the Gini impurity**

Let’s have a look at the formula of Gini impurity. The formula of Gini impurity is given as:

Where,

The *j* represents the number of classes in the label, and

The *P* represents the ratio of class at the ith node.

Gini impurity has a maximum value of 0.5, which is the worst we can get, and a minimum value of 0 means the best we can get.

**Constructing the decision tree using Gini impurity**

We will use the banknote dataset to implement a decision tree. The dataset comprises the details of whether a banknote is genuine or not. There are two features given and the label has two values 0 or 1. (0 for authentic, 1 for unauthentic).

We have taken only 10 rows as a sample for easier plotting of a tree and a better understanding of Gini impurity.

Let’s construct the decision tree for the given data. For each value of features, we calculate the Gini impurity since our feature is continuous. It is not feasible to calculate each value one by one, so we will take one value for each feature.

Now, we will calculate the Gini impurity scores for the different combinations of values for each of the variables.

**Variance of Wavelet**

For values <= 0.9, we have 8 records there out of 10, and 2 records with values>0.90.

- For
**Variance of Wavelet**<=0.9 and class == 0 values are 6 so ratio will be 6/8 - For
**Variance of Wavelet**<=0.9 and class == 1 values are 2 so the ratio will be 2/8**Gini = 1 – ((6/8)^2 + (2/8)^2) =****0.375 (by using the above formula)**

- For
**Variance of Wavelet**>0.9 and class == 0 values are so ratio will be 2/2 - For
**Variance of Wavelet**>0.9 and class == 1 values are so ratio will be 0/2**Gini = 1 – ((2/2)^2 + (0/2)^2) =****0**

**Gini = 0 means it is the purest node and it is a leaf.**

**Skewness of wavelet**

For values <= 7.161, we have 3 records there out of 10, and 7 records with value >7.161.

- For
**Skewness of wavelet**<=7.161 and class == 0 values are 3 so the ratio will be 3/3 - For
**Skewness of wavelet**<=7.161 and class == 1 values are 0 so ratio will be 0/3**Gini = 1 – ((3/3)^2 + (0)^2) = 0**

**Gini = 0 means it is the purest node and it is a leaf.**

- For
**Skewness of wavelet**>7.161 and class == 0 values are 1 so the ratio will be 1/7 - For
**Skewness of wavelet**>7.161 and class == 1 values are 6 so the ratio will be 6/7**Gini = 1 – ((1/7)^2 + (6/7)^2) = 0.2579**

We will take the **Skewness of the wavelet** as the root node because it has lower impurity than the **Variance of the wavelet. **After calculating the above values, we get this decision tree as:

We can see the root node as skewness of the wavelet. It has a total of 10 samples and out of which ‘0’ are 4 and ‘1’ are 6, and it is labelled as class 1 because the majority of the class values are 1 that’s why, since it is not a leaf node so we cannot make a decision. For condition <=7.161 FALSE( > 7.161) as a result, we get the leaf node which has GINI = 0; which means it’s the purest node that can’t be further divided.

For condition <=7.161 TRUE we have calculated GINI = **0.2579 **but for the value of **Variance of wavelet <= 0.811 performs better than Skewness of wavelet. **That is why we calculated the Skewness of the wavelet instead of the Variance of the wavelet. Features can be repeated and cannot be repeated.

Now we will calculate the Gini impunity for the left node of the root node.

**Variance of Wavelet**

For values <= 0.811, we have 6 records there out of 7, and 1 record with values>0.811.

- For
**Variance of Wavelet**<=0.811 and class == 0 values are 0 so ratio will be 0/6 - For
**Variance of Wavelet**<=0.811 and class == 1 values are 6 so the ratio will be 6/6**Gini = 1 – ((0/7)^2 + (6/6)^2) =****0**

**Gini = 0 means it is the purest node and it is a leaf.**

- For
**Variance of Wavelet**>0.811 and class == 0 values are so ratio will be 1/1 - For
**Variance of Wavelet**>0.811 and class == 1 values are so ratio will be 0/1**Gini = 1 – ((1/1)^2 + (0/2)^2) =****0**

**Gini = 0 means it is the purest node and it is a leaf.**

Above, we have just calculated the leaf nodes of the left subtree. Considering this process, the final tree would look as,

**Final words**

In this article, we have learned about the decision tree and we implemented the decision tree for binary classification. To construct a decision tree, we used the Gini impurity method for splitting the tree. We could understand the mathematical concept of the approach in detail.