In machine learning, after we build the predictive models we often evaluate them using different error metrics like accuracy score, confusion matrix, etc. We also make use of cross-validation techniques to thoroughly understand the ability of the model to generalize well on unseen data. In cross-validation, we divide the data set randomly into different numbers of folds and validate the performance. The scores generated for each fold are averaged and divided by the number of folds through which we analyze the performance of this model on production data. Bootstrapping is also a similar technique that helps analyze the performance of a model. In bootstrapping random data sets are generated then on each data set model is fitted on training and evaluated on the testing data.

In this article, we will talk more about Bootstrap Sampling and understand its working. We will first understand how bootstrapping works and then we will implement it on a data set. For this experiment, we will make use of an Iris data set that can be downloaded from Kaggle.

**What we will learn from this article? **

- How can we evaluate the model performance?
- What is BootStap Sampling? How to use it?
- How BootStrap Sampling can be used to evaluate model performance?

**What is BootStrap Sampling? How does it work to check model performance?**

While building a predictive model over some data set we always divide the data set into training and testing. The model is trained on the training data and predictions are made using the model on the testing data. These predictions are evaluated using different error metrics like accuracy score, confusion matrix etc. These error metrics help us to validate how good the model has predicted on testing data. But what if we want to get an estimate of how this built model will perform on unseen data or production data. To check this we have different techniques in machine learning called Cross-Validation. Similar to cross-validation we have another technique called Bootstrap Sampling. It is a technique that uses random samples from the data to generate new training and testing data. Similar to cross-validation where we define the number of folds here also we define the number of iterations that decides the total number of data sets we want to generate.

Also, apart from the iterations, we define the size that decides the number of data points we want in the new data sets. If we do not define this it will generate the same number of points that are present in the original data. Let us now practically see how bootstrapping works. We will first define the required libraries and data. Then we will generate 10 data sets Refer to the below code for the same.

**Practical Implementation of BootStrap Sampling**

from sklearn.utils import resample import NumPy as np data = [1,2,3,4,5,6,7,8,9,10] n_iterations = 10 n_size = int(len(data) * 1) stats = list() for i in range(n_iterations): train = resample(data, n_samples=n_size) test = np.array([x for x in data if x not in train]) print("Train_data ->", train, " " , "Test_data ->", test)

As we can see in the above image we have now generated 10 different data sets where training data has data points that get repeated whereas in testing data we have all those data points that are not present in that training data. These generated data sets are different from each other but not completely since this method is known as sampling with replacement.

**Implementation of Bootstrap Sampling On Iris Data Set**

Now we will try doing bootstrap sampling on a specific data set and will compute the range to check the model performance on unseen data or the production data. We will first define the libraries and then will load the data. Use the below code for the same.

from sklearn.utils import resample from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracy_score from matplotlib import pyplot import numpy as np data = pd.read_csv('Iris.csv') print(data)

Now we will define the data points values, n_iterations we want, and the number of sizes of each data set. Then we will create a new data set using bootstrap sampling. We are using the RandomForest classifier for this model. All the predictions made by the model are evaluated using accuracy scores that are kept in scores variable. Refer to the below code for the same.

values = data.values n_iterations = 10 n_size = int(len(data) * 1) scores = list() for i in range(n_iterations): train = resample(values, n_samples=n_size) test = np.array([x for x in values if x.tolist() not in train.tolist()]) rfcl = RandomForestClassifier() rfcl.fit(train[:,:-1], train[:,-1]) predictions = model.predict(test[:,:-1]) score = accuracy_score(test[:,-1], predictions) print(score) scores.append(score)

Since we got all scores now we will check the range of the accuracy using histogram visualization. We will be using 95% confidence and will be checking the accuracy. Use the below code to the same.

pyplot.hist(scores) pyplot.show() alpha = 0.95 p = ((1.0-alpha)/2.0) * 100 lower = max(0.0, np.percentile(scores, p)) p = (alpha+((1.0-alpha)/2.0)) * 100 upper = min(1.0, np.percentile(scores, p)) print('%.1f confidence interval %.1f%% and %.1f%%' % (alpha*100, lower*100, upper*100))

**Conclusion**

Through this article, we explored model performance technique i.e Bootstrap Sampling. We first discussed what it is and how it works. We then implemented the same on the iris data set and generated different data sets, built a random forest model over it, and computed different accuracy scores. Their scores were then used to check the range of accuracy by confidence level of 95%. The only difference that makes cross-validation and bootstrap sampling different is sampling with replacement and rest both works in a similar way.