Hands-On Guide To BootStrap Sampling For ML Performance Evaluation

In this article, we will talk more about BootStrap Sampling and understand its working. We will first understand how bootstrapping works and then we will implement it on a data set.

In machine learning, after we build the predictive models we often evaluate them using different error metrics like accuracy score, confusion matrix, etc. We also make use of cross-validation techniques to thoroughly understand the ability of the model to generalize well on unseen data. In cross-validation, we divide the data set randomly into different numbers of folds and validate the performance. The scores generated for each fold are averaged and divided by the number of folds through which we analyze the performance of this model on production data. Bootstrapping is also a similar technique that helps analyze the performance of a model. In bootstrapping random data sets are generated then on each data set model is fitted on training and evaluated on the testing data. 

In this article, we will talk more about Bootstrap Sampling and understand its working. We will first understand how bootstrapping works and then we will implement it on a data set. For this experiment, we will make use of an Iris data set that can be downloaded from Kaggle

What we will learn from this article? 

  • How can we evaluate the model performance? 
  • What is BootStap Sampling? How to use it? 
  • How BootStrap Sampling can be used to evaluate model performance?
  1. What is BootStrap Sampling? How does it work to check model performance?

While building a predictive model over some data set we always divide the data set into training and testing. The model is trained on the training data and predictions are made using the model on the testing data. These predictions are evaluated using different error metrics like accuracy score, confusion matrix etc. These error metrics help us to validate how good the model has predicted on testing data. But what if we want to get an estimate of how this built model will perform on unseen data or production data. To check this we have different techniques in machine learning called Cross-Validation. Similar to cross-validation we have another technique called Bootstrap Sampling. It is a technique that uses random samples from the data to generate new training and testing data. Similar to cross-validation where we define the number of folds here also we define the number of iterations that decides the total number of data sets we want to generate. 

Also, apart from the iterations, we define the size that decides the number of data points we want in the new data sets. If we do not define this it will generate the same number of points that are present in the original data. Let us now practically see how bootstrapping works. We will first define the required libraries and data. Then we will generate 10 data sets Refer to the below code for the same. 

  1. Practical Implementation of BootStrap Sampling
from sklearn.utils import resample
import NumPy as np
data = [1,2,3,4,5,6,7,8,9,10]
n_iterations = 10       
n_size = int(len(data) * 1)
stats = list()
for i in range(n_iterations):
    train = resample(data, n_samples=n_size)   
    test = np.array([x for x in data if x not in train]) 
    print("Train_data ->", train, " " , "Test_data ->", test)

As we can see in the above image we have now generated 10 different data sets where training data has data points that get repeated whereas in testing data we have all those data points that are not present in that training data. These generated data sets are different from each other but not completely since this method is known as sampling with replacement. 

  1. Implementation of Bootstrap Sampling On Iris Data Set

Now we will try doing bootstrap sampling on a specific data set and will compute the range to check the model performance on unseen data or the production data. We will first define the libraries and then will load the data. Use the below code for the same. 

from sklearn.utils import resample
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from matplotlib import pyplot
import numpy as np
data = pd.read_csv('Iris.csv')

Now we will define the data points values, n_iterations we want, and the number of sizes of each data set. Then we will create a new data set using bootstrap sampling. We are using the RandomForest classifier for this model. All the predictions made by the model are evaluated using accuracy scores that are kept in scores variable. Refer to the below code for the same. 

values = data.values
n_iterations = 10
n_size = int(len(data) * 1)
scores = list()
for i in range(n_iterations):
    train = resample(values, n_samples=n_size)  
    test = np.array([x for x in values if x.tolist() not in train.tolist()])
    rfcl = RandomForestClassifier()[:,:-1], train[:,-1])
    predictions = model.predict(test[:,:-1])
    score = accuracy_score(test[:,-1], predictions)   

Since we got all scores now we will check the range of the accuracy using histogram visualization. We will be using 95% confidence and will be checking the accuracy. Use the below code to the same. 

alpha = 0.95                         
p = ((1.0-alpha)/2.0) * 100              
lower = max(0.0, np.percentile(scores, p))  
p = (alpha+((1.0-alpha)/2.0)) * 100
upper = min(1.0, np.percentile(scores, p))
print('%.1f confidence interval %.1f%% and %.1f%%' % (alpha*100, lower*100, upper*100))


Through this article, we explored model performance technique i.e Bootstrap Sampling. We first discussed what it is and how it works. We then implemented the same on the iris data set and generated different data sets, built a random forest model over it, and computed different accuracy scores. Their scores were then used to check the range of accuracy by confidence level of 95%. The only difference that makes cross-validation and bootstrap sampling different is sampling with replacement and rest both works in a similar way. 

Download our Mobile App

Rohit Dwivedi
I am currently enrolled in a Post Graduate Program In Artificial Intelligence and Machine learning. Data Science Enthusiast who likes to draw insights from the data. Always amazed with the intelligence of AI. It's really fascinating teaching a machine to see and understand images. Also, the interest gets doubled when the machine can tell you what it just saw. This is where I say I am highly interested in Computer Vision and Natural Language Processing. I love exploring different use cases that can be build with the power of AI. I am the person who first develops something and then explains it to the whole community with my writings.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox