Guide To Hyperparameters Tuning Using GridSearchCV And RandomizedSearchCV

Hyperparameter Tuning

While building a Machine learning model we always define two things that are model parameters and model hyperparameters of a predictive algorithm. Model parameters are the ones that are an internal part of the model and their value is computed automatically by the model referring to the data like support vectors in a support vector machine. But hyperparameters are the ones that can be manipulated by the programmer to improve the performance of the model like the learning rate of a deep learning model. They are the one that commands over the algorithm and are initialized in the form of a tuple. 

In this article, we will explore hyperparameter tuning. We will see what are the different parts of a hyperparameter, how it is done using two different approaches – GridSearchCV and RandomizedSearchCV. For this experiment, we will use the Boston Housing Dataset that can be downloaded from Kaggle. We will first build the model using default parameters, then we will build the same model using a hyperparameter tuning approach and then will compare the performance of the model.

What We Will Learn From This Article?

  1. What is Hyper Parameter Tuning?
  2. What steps to follow to do Hyper Parameter Tuning?
  3. Implementation of Regression Model
  4. Implementation of Model using GridSearchCV 
  5. Implementation of Model using RandomizedSearchCV 
  6. Comparison of Different Models

1. What Is Hyperparameter Tuning?

Hyperparameter tuning is the process of tuning the parameters present as the tuples while we build machine learning models. These parameters are defined by us which can be manipulated according to programmer wish. Machine learning algorithms never learn these parameters. These are tuned so that we could get good performance by the model. Hyperparameter tuning aims to find such parameters where the performance of the model is highest or where the model performance is best and the error rate is least. We define the hyperparameter as shown below for the random forest classifier model. These parameters are tuned randomly and results are checked. 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse', max_depth=None, max_features='auto', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None, oob_score=False,  random_state=None, verbose=0, warm_start=False)




  1. What Steps To Follow For Hyper Parameter Tuning?
  • Select the type of model we want to use like RandomForestClassifier, regressor or any other model
  • Check what are the parameters of the model
  • Select the methods for searching the hyperparameter
  • Select the cross-validation approach
  • Evaluate the model using the score
  1. Implementation of Regression Model 

First, we will import all the required libraries and the dataset and do the basic EDA to understand the data. Use the below code to do the same

import pandas as pd

import numpy as np

from sklearn.tree import DecisionTreeRegressor

from sklearn.ensemble import RandomForestRegressor

df  = pd.read(‘Boston.csv’)

print(df) 

Output:

print(df.shape)

Output:

print(df.isnull().sum())

Output:

print(df.info())

Output:

There are a total of 506 rows and 14 columns in the data set, all the columns have float64 and int64 data type values and there are no missing values in the data set. Now we will define the independent and dependent variables y and x respectively. We will then split the dataset into training and testing. After which the training data will be passed to the decision tree regression model & score on testing would be computed. Refer to the below code for the same. 

y = df['medv']

X = df.drop('medv', axis=1)

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= .30, random_state=1)

from sklearn.tree import DecisionTreeRegressor

dtr = DecisionTreeRegressor()

dtr.fir(X_train,y_train)

print(dtr.score(X_test,y_test))

Output:

  1. Implementation of Model using GridSearchCV 

First, we will define the library required for grid search followed by defining all the parameters or the combination that we want to test out on the model. We have taken only the four hyperparameters whereas you can define as much as you want. If you increase the number of combinations then time complexity will increase. Use the below code to do the same.

from sklearn.model_selection import GridSearchCV

param_grid = {  'bootstrap': [True], 'max_depth': [5, 10, None], 'max_features': ['auto', 'log2'], 'n_estimators': [5, 6, 7, 8, 9, 10, 11, 12, 13, 15]}

Now we will define the type of model we want to build a random forest regression model in this case and initialize the GridSearchCV over this model for the above-defined parameters. 

rfr = RandomForestRegressor(random_state = 1)

g_search = GridSearchCV(estimator = rfr, param_grid = param_grid, 

                          cv = 3, n_jobs = 1, verbose = 0, return_train_score=True)

We have defined the estimator to be the random forest regression model param_grid to all the parameters we wanted to check and cross-validation to 3. We will now train this model bypassing the training data and checking for the score on testing data. Use the below code to do the same.

g_search.fit(X_train, y_train);

print(g_search.best_params_)

Output:

We can check the best parameter by using the best_params_ function that is shown above. 

print(best_grid.score(X_test, y_test))

Output:

  1. Implementation of Model using RandomizedSearchCV 

First, we will define the library required for random search followed by defining all the parameters or the combination that we want to test out on the model. Similar to grid search we have taken only the four hyperparameters whereas you can define as much as you want. We have then defined the random grid. Use the below code to do the same.

import numpy as np

from sklearn.model_selection import RandomizedSearchCV

n_estimators = [int(x) for x in np.linspace(start = 5 , stop = 15, num = 10)] # returns 10 numbers 

max_features = ['auto', 'log2']

max_depth = [int(x) for x in np.linspace(5, 10, num = 2)] 

max_depth.append(None)

bootstrap = [True, False]

r_grid = {'n_estimators': n_estimators,

               'max_features': max_features,

               'max_depth': max_depth,

               'bootstrap': bootstrap}

print(random_grid)

Output: 

Hyperparameter Tuning

We will now define the random search passing the rf model with the randomly chosen hyperparameters and then train it. After this, we will check the score. Use the below code to do the same.

rfr_random = RandomizedSearchCV(estimator=rfr, param_distributions=r_grid, n_iter = 20, scoring='neg_mean_absolute_error', cv = 3, verbose=2, random_state=42, n_jobs=-1, return_train_score=True)

rfr_random.fit(X_train, y_train);

Hyperparameter Tuning

print(rf_random.best_params_)

Output:

Hyperparameter Tuning

print(best_random.score(X_test , y_test))

Output:

Hyperparameter Tuning
  1. Comparison of Different Models 
Models Scores
Regression Model (Without Hyperparameter Search)80.34
Regression Model using GridSearchCV88.98
Regression Model using RandomizedSearchCV90.17

Conclusion 

Model Hyperparameter tuning is very useful to enhance the performance of a machine learning model. We have discussed both the approaches to do the tuning that is GridSearchCV and RandomizedSeachCV. The only difference between both the approaches is in grid search we define the combinations and do training of the model whereas in RandomizedSearchCV the model selects the combinations randomly. Both are very effective ways of tuning the parameters that increase the model generalizability. 

Rohit Dwivedi
I am currently enrolled in a Post Graduate Program In Artificial Intelligence and Machine learning. Data Science Enthusiast who likes to draw insights from the data. Always amazed with the intelligence of AI. It's really fascinating teaching a machine to see and understand images. Also, the interest gets doubled when the machine can tell you what it just saw. This is where I say I am highly interested in Computer Vision and Natural Language Processing. I love exploring different use cases that can be build with the power of AI. I am the person who first develops something and then explains it to the whole community with my writings.

Download our Mobile App

MachineHack

AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.