Last updated August 8, 2019

How To Solve The Never-Ending Pursuit Of Perfect Hyperparameters

Share

Published on August 8, 2019

by Ram Sagar

The goal of hyperparameter exploration is to search across various hyperparameter configurations and find a configuration that results in the best performance. Typically, the hyperparameter exploration process is painstakingly manual, given that the search space is vast and evaluation of each configuration can be expensive.

Hyperparameters help answer questions like:

The depth of the decision tree
How many trees are required in random forest
How many layers should a neural network have
The learning rate for the Gradient Descent method.

Hyperparameters are adjustable parameters one chooses to train a model that governs the training process itself. For example, to train a deep neural network, you decide the number of hidden layers in the network and the number of nodes in each layer prior to training the model. These values usually stay constant during the training process.

To bottle down on the values, there are few methods to skim through the parameter space to figure out the values that align with the objective of the model that is being trained.

While defining the architecture of a machine learning model, it is usually not obvious to come across an optimal one because there is no one-stop answer to finding out the method in which hyperparameters can be tuned to reduce the loss; more or less a trial and error experimentation.

Techniques At Disposal

For architectures in particular like Long Short Term Memory(LSTM) networks, the learning rate and the size of the network are its prime hyperparameters.

In reinforcement learning algorithms, to measure the sensitivity of choice of hyperparameters, a larger number of data points because the performance is adequately captured with a lesser number of points due to high variance.

There are mainly three methods to perform high dimensional non-convex optimisation. They are as follows:

Grid search a very common and often advocated approach where you lay down a grid over the space of possible hyperparameters, and evaluate at each point on the grid; the hyperparameters from the grid which had the best objective value is then used in production.
Random search is performed by evaluating n uniformly random points in the hyperparameter space and select the one producing the best performance. But this method has its own disadvantages like high variance. So, a better, more intelligent alternative would be Bayesian optimisation.
Bayesian optimisation builds a surrogate for the objective and quantifies the uncertainty in that surrogate using a Bayesian machine learning technique, Gaussian process regression, and then uses an acquisition function defined from this surrogate to decide where to sample.

Apart from the above conventional methods, one can also make use of the graph-based systems for hyperparameter tuning.

To optimise and automate the hyperparameters, Google introduced Watch Your Step, an approach that formulates a model for the performance of embedding methods. In short, making the graph to concentrate on direct significant neighbours. Here the “Auto” portion corresponds to learning the graph hyperparameters by backpropagation.

Tools At Disposal

In this age of information abundance, especially in the world of AI where a new tool gets added and a new paper get published every other day, it becomes highly impractical for a practising machine learning engineer to keep track of which libraries work, which hyperparameters are best.

It is always great to have a toolbox that can automatically save and learn from experiment results, leading to long-term, persistent optimization that remembers all tests. A toolbox by the name Hyperparameter Hunter was released recently, which does exactly the same. The creators call this tool as a personal machine learning toolbox/assistant.

Hyperparameter hunter allows the users to run all of the benchmark/one-off experiments through it and it doesn’t start optimization from scratch like other libraries. It considers all the previously run experiments and previous optimization rounds that have been already run through it. The creators insist that Hyperparameter Hunter gives better results with increased usage.

Key Features Include

Stop worrying about keeping track of hyperparameters, scores, or re-running the same Experiments
Automatically reads the Experiment files to find the ones that fit, and it learns from them
Eliminates boilerplate code for cross-validation loops, predicting, and scoring
Have predictions ready to go when it’s time for ensembling, meta-learning, and finalizing the models.

Dependencies: Dill, NumPy, Pandas, SciPy, Scikit-Learn, Scikit-Optimize, SimpleJSON

Here’s a quick guide to get started with hyperparameter_hunter:

Installation

pip install hyperparameter_hunter

Setting Up Environment

from hyperparameter_hunter import Environment, CVExperiment

import pandas as pd

from sklearn.datasets import load_breast_cancer

from sklearn.model_selection import StratifiedKFold

from xgboost import XGBClassifier

Performing Optimization

from hyperparameter_hunter import BayesianOptPro, Real, Integer, Categorical