Now Reading
Guide To Hyperparameters Tuning Using GridSearchCV And RandomizedSearchCV

Guide To Hyperparameters Tuning Using GridSearchCV And RandomizedSearchCV

Rohit Dwivedi
Hyperparameter Tuning
W3Schools

While building a Machine learning model we always define two things that are model parameters and model hyperparameters of a predictive algorithm. Model parameters are the ones that are an internal part of the model and their value is computed automatically by the model referring to the data like support vectors in a support vector machine. But hyperparameters are the ones that can be manipulated by the programmer to improve the performance of the model like the learning rate of a deep learning model. They are the one that commands over the algorithm and are initialized in the form of a tuple. 

In this article, we will explore hyperparameter tuning. We will see what are the different parts of a hyperparameter, how it is done using two different approaches – GridSearchCV and RandomizedSearchCV. For this experiment, we will use the Boston Housing Dataset that can be downloaded from Kaggle. We will first build the model using default parameters, then we will build the same model using a hyperparameter tuning approach and then will compare the performance of the model.

What We Will Learn From This Article?

  1. What is Hyper Parameter Tuning?
  2. What steps to follow to do Hyper Parameter Tuning?
  3. Implementation of Regression Model
  4. Implementation of Model using GridSearchCV 
  5. Implementation of Model using RandomizedSearchCV 
  6. Comparison of Different Models

1. What Is Hyperparameter Tuning?

Hyperparameter tuning is the process of tuning the parameters present as the tuples while we build machine learning models. These parameters are defined by us which can be manipulated according to programmer wish. Machine learning algorithms never learn these parameters. These are tuned so that we could get good performance by the model. Hyperparameter tuning aims to find such parameters where the performance of the model is highest or where the model performance is best and the error rate is least. We define the hyperparameter as shown below for the random forest classifier model. These parameters are tuned randomly and results are checked. 



RandomForestRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse', max_depth=None, max_features='auto', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None, oob_score=False,  random_state=None, verbose=0, warm_start=False)

  1. What Steps To Follow For Hyper Parameter Tuning?
  • Select the type of model we want to use like RandomForestClassifier, regressor or any other model
  • Check what are the parameters of the model
  • Select the methods for searching the hyperparameter
  • Select the cross-validation approach
  • Evaluate the model using the score
  1. Implementation of Regression Model 

First, we will import all the required libraries and the dataset and do the basic EDA to understand the data. Use the below code to do the same

import pandas as pd

import numpy as np

from sklearn.tree import DecisionTreeRegressor

from sklearn.ensemble import RandomForestRegressor

df  = pd.read(‘Boston.csv’)

print(df) 

Output:

print(df.shape)

Output:

print(df.isnull().sum())

Output:

print(df.info())

Output:

There are a total of 506 rows and 14 columns in the data set, all the columns have float64 and int64 data type values and there are no missing values in the data set. Now we will define the independent and dependent variables y and x respectively. We will then split the dataset into training and testing. After which the training data will be passed to the decision tree regression model & score on testing would be computed. Refer to the below code for the same. 

y = df['medv']

X = df.drop('medv', axis=1)

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= .30, random_state=1)

from sklearn.tree import DecisionTreeRegressor

dtr = DecisionTreeRegressor()

dtr.fir(X_train,y_train)

print(dtr.score(X_test,y_test))

Output:

  1. Implementation of Model using GridSearchCV 

First, we will define the library required for grid search followed by defining all the parameters or the combination that we want to test out on the model. We have taken only the four hyperparameters whereas you can define as much as you want. If you increase the number of combinations then time complexity will increase. Use the below code to do the same.

from sklearn.model_selection import GridSearchCV

param_grid = {  'bootstrap': [True], 'max_depth': [5, 10, None], 'max_features': ['auto', 'log2'], 'n_estimators': [5, 6, 7, 8, 9, 10, 11, 12, 13, 15]}

Now we will define the type of model we want to build a random forest regression model in this case and initialize the GridSearchCV over this model for the above-defined parameters. 

rfr = RandomForestRegressor(random_state = 1)

g_search = GridSearchCV(estimator = rfr, param_grid = param_grid, 

                          cv = 3, n_jobs = 1, verbose = 0, return_train_score=True)

We have defined the estimator to be the random forest regression model param_grid to all the parameters we wanted to check and cross-validation to 3. We will now train this model bypassing the training data and checking for the score on testing data. Use the below code to do the same.

g_search.fit(X_train, y_train);

print(g_search.best_params_)

Output:

We can check the best parameter by using the best_params_ function that is shown above. 

print(best_grid.score(X_test, y_test))

Output:

  1. Implementation of Model using RandomizedSearchCV 

First, we will define the library required for random search followed by defining all the parameters or the combination that we want to test out on the model. Similar to grid search we have taken only the four hyperparameters whereas you can define as much as you want. We have then defined the random grid. Use the below code to do the same.

See Also
classification accuracy data size

import numpy as np

from sklearn.model_selection import RandomizedSearchCV

n_estimators = [int(x) for x in np.linspace(start = 5 , stop = 15, num = 10)] # returns 10 numbers 

max_features = ['auto', 'log2']

max_depth = [int(x) for x in np.linspace(5, 10, num = 2)] 

max_depth.append(None)

bootstrap = [True, False]

r_grid = {'n_estimators': n_estimators,

               'max_features': max_features,

               'max_depth': max_depth,

               'bootstrap': bootstrap}

print(random_grid)

Output: 

Hyperparameter Tuning

We will now define the random search passing the rf model with the randomly chosen hyperparameters and then train it. After this, we will check the score. Use the below code to do the same.

rfr_random = RandomizedSearchCV(estimator=rfr, param_distributions=r_grid, n_iter = 20, scoring='neg_mean_absolute_error', cv = 3, verbose=2, random_state=42, n_jobs=-1, return_train_score=True)

rfr_random.fit(X_train, y_train);

Hyperparameter Tuning

print(rf_random.best_params_)

Output:

Hyperparameter Tuning

print(best_random.score(X_test , y_test))

Output:

Hyperparameter Tuning
  1. Comparison of Different Models 
Models Scores
Regression Model (Without Hyperparameter Search)80.34
Regression Model using GridSearchCV88.98
Regression Model using RandomizedSearchCV90.17

Conclusion 

Model Hyperparameter tuning is very useful to enhance the performance of a machine learning model. We have discussed both the approaches to do the tuning that is GridSearchCV and RandomizedSeachCV. The only difference between both the approaches is in grid search we define the combinations and do training of the model whereas in RandomizedSearchCV the model selects the combinations randomly. Both are very effective ways of tuning the parameters that increase the model generalizability. 

What Do You Think?

If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top