MITB Banner

Guide To Ensemble Methods: Bagging vs Boosting

Share

Bagging and Boosting methods of Ensemble Learning methods are widely used to increase the model performance, while there are a lot of articles focusing on the coding & implementation these models, the purpose of this article is to:

  • Appreciate these models by highlighting their significance, and 
  • Understand the circumstances when a particular method would be the ideal choice for implementation.

Ensemble Learning:

Building a highly accurate prediction model is certainly a difficult task.

Our main challenge is the Prediction Error, which can be broken down into:

Bias – The average of distance between the prediction and the target. 

Variance – Variability in the predictions.

Noise – Irreducible error i.e. the part of target value which the model is not able to predict / explain.

As you know it is impossible to reduce the noise, hence the term irreducible error, we shift our focus on reducing Bias and Variance.

So, Ensemble learning methods bring up the technique to reduce the Bias and Variance of the model by using multiple models together (hence the term Ensemble), in order to achieve better predictive performance, instead of a single model for prediction.

There are a number of Ensemble methods, in this article I will be discussing about the two widely used Ensemble methods that are Bagging and Boosting.

BOOSTING

When we use different / single learning algorithm, multiple times for prediction.

BAGGING

When we generate multiple versions of predictors and aggregate them to use in a model.

BAGGING (Bootstrap Aggregating): 

The procedure of applying Bootstrap sampling on the training dataset and then Aggregating when estimating a numerical outcome and doing a plurality vote when predicting a class is called BAGGING.

MATH of Bagging algorithm:

As there is limitation of having a large training set, so we take repeated Bootstrap samples (SB) with replacement from the same training set (S {( yn ; xn), n = 1,…., N}, where y is either a class or numerical target response.

The main objective is to use (SB) to get a better predictor than the single training set predictor ø (x, SB).

When y is numerical, we take average of ø (x, SB),

øB (x) = avg B ø (x, SB).

When y is a class label, we take plurality vote of ø (x, SB) to form øB (x).

When we want to reduce the Variance in the model, we can use Bagging.

According to Mr. Leo Breiman, Bagging algorithm can significantly move a good but unstable classification algorithms, like Classification and Regression trees, C4.5 and Neural Networks, towards optimality as it can contribute in reduction in the prediction error by reducing the variance without affecting the bias.

On the other hand, if we apply bagging to stable algorithms like k-Nearest neighbour, Discriminant analysis and Naïve bayes it would degrade their performance since this algorithm uses bootstrap samples consist of approx. 63% of the original data, so each of these samples would be missing around 37% of the original data. 

The Bagging Algorithm:

Step 1: Generate Bootstrap Samples of same size (with replacement) are repeatedly taken from the training data set, so that each record has an equal probability of being selected.

Step 2: A classification or estimation model is trained on each bootstrap sample drawn in Step 1, and a prediction is recorded for each sample.

Step 3: The bagging ensemble prediction is then defined to be the class with the most votes in Step 2 (for classification models) or the average of the predictions made in Step 2 (for estimation models)

Boosting (AdaBoost):

The idea of Boosting method is that instead of using a simple algorithm, which is not strong enough to make the accurate predictions alone as there are high variance and error rate, we combine multiple simple learning algorithms together, rather than finding a single highly accurate prediction rule.

Hence, we can say that Boosting algorithm is adaptive.

These simple algorithms are known as a weak learner, and Boosting algorithm calls this weak learner multiple times and feeding it a different subset of the training samples, such that each time the base learning algorithm generates a new weak prediction rule.

The Boosting algorithm then combines these multiple weak rules together to reduce variance and bias in individual model rules into a single prediction rule, such that it will be much more accurate than any one of the weak rules as.

Two fundamental approaches for effective implementation of Boosting algorithm:

  • Choosing the different subsets from training dataset for different iterations:
    • To increase the efficiency of the base learner predictions, high weightage is placed on the examples that were misclassified by earlier weak learner.
  • How to combine weak leaners together:
    • Taking a weighted majority vote of the predictions.

By Sirakorn – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=85888769

MATH of Boosting algorithm:

Given: (x1, y1),…., (xm, ym) where xi X, y1 Y = {0,1}

Initialize D1 (i) = 1/ m

For t = 1, …. , T

  • Train base leaner using distribution Dt.
  • Get base classifier ht : X –>
    • After the base learner finds the base classifier, it has to minimize the error

ε(i) =  Pr(i)˜Dt [ ht (x1) ≠ yi ]

  • Choose αt .
  • Update:  Dt+1 (i) = [ Dt(i) exp( – αt yi ht (x1) )] / Zt

where Zis a normalization factor

Output of the final classifier: 

H(x) = sign ( (t = 1 to T) Σ αt ht (x1) )

Boosting can help us reduce both Bias and Variance. It must be noted that when Bias in a model reduces the Variance increases and vice versa. Hence, we must find an optimal point of balance, this is called Bias – Variance Trade-off.

The Boosting Algorithm:

Step 1: All observations have equal weight in the original training data set D1. An

initial “base” classifier h1 is determined.

Step 2: The observations that were incorrectly classified by the previous base classifier have their weights increased, while the observations that were correctly

classified have their weights decreased. 

This gives us data distribution Dm, m=2, … , M. 

A new base classifier hm, m = 2, … , M is determined, based on the new weights. This step is repeated until the desired number of iterations M is achieved.

Step 3: The final boosted classifier is the weighted sum of the M base classifiers

Generalization & Test Error:

Generalization error refers to the performance on examples that are not seen by the weak learners during the training. It is the probability of misclassifying a new example by the base classifier.

Test error is  the faction of mistakes on a newly sampled test set.

Bagging vs. Boosting:

Based on the available data, problem objective and other existing settings, choose between bagging or boosting. Accuracy of the model is increased by significantly reducing the variability of the estimate during the adaptive procedure. Hence, the performance results obtained show higher stability with aggregation than during individual results.

  • The bagging method will not present better result in bias when there is a challenge of low performance. However, since the boosting method concentrates on the optimization of the strengths of individual model and reducing the weakness of a single model, booting model gives output with low errors.
  • When facing the challenge of overfitting in a single model, the bagging method out performs the boosting method as the later model itself comes with over-fitting.

References:

Bagging Predictors By Leo Breiman

The Boosting Approach to Machine Learning by Robert E. Schapire

Data Mining and Predictive Analytics by Daniel T. Larose, Chantal D. Larose

https://en.wikipedia.org/wiki/Ensemble_learning

https://blog.paperspace.com/bagging-ensemble-methods/

https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205

Share
Picture of Mohammed Rizwan Nagauri

Mohammed Rizwan Nagauri

Aspiring Data Scientist - PG Program in Data Science at Praxis Business School, Bangalore - India. Always excited to learn and explore
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.