Listen to this story
The performance of machine learning algorithms is heavily dependent on selecting a good collection of hyperparameters. The Keras Tuner is a package that assists you in selecting the best set of hyperparameters for your application. The process of finding the optimal collection of hyperparameters for your machine learning or deep learning application is known as hyperparameter tuning. Hyperband is a framework for tuning hyperparameters which helps in speeding up the hyperparameter tuning process. This article will be focused on understanding the hyperband framework. Following are the topics to be covered in this article.
Table of contents
- About HPO approaches
- What is a Hyperband?
- Bayesian optimization vs Hyperband
- Working of hyperband
Hyperparameters are not model parameters and cannot be learned directly from data. When we optimize a loss function with something like gradient descent, we learn model parameters during training. Let’s talk about Hyperband and try to understand the need for its creation.
About HPO approaches
The approach of tweaking hyperparameters of machine learning algorithms is known as hyperparameter optimization (HPO). Excellent machine learning algorithms feature various, diverse, and complicated hyperparameters that produce a massive search space. Deep learning is used as the basis of many start-up processes, and the search space for deep learning methods is considerably broader than for typical ML algorithms. Tuning on a large search space is a difficult task. Data-driven strategies must be used to tackle HPO difficulties. Manual approaches do not work.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Are you looking for a complete repository of Python libraries used in data science, check out here.
What is a Hyperband?
By defining hyperparameter optimization as a pure-exploration adaptive resource allocation issue addressing how to distribute resources among randomly chosen hyperparameter configurations, a novel configuration assessment technique was devised. This is known as a Hyperband setup. It allocates resources using a logical early-stopping technique, allowing it to test orders of magnitude more configurations than black-box processes such as Bayesian optimization methods. Unlike previous configuration assessment methodologies, Hyperband is a general-purpose tool that makes few assumptions.
The capacity of Hyperband to adapt to unknown convergence rates and the behaviour of validation losses as a function of the hyperparameters was proved by the developers in the theoretical study. Furthermore, for a range of deep-learning and kernel-based learning issues, Hyperband is 5 to 30 times quicker than typical Bayesian optimization techniques. In the non-stochastic environment, Hyperband is one solution with properties similar to the pure-exploration, infinite-armed bandit issue.
The need for Hyperband
Hyperparameters is input to a machine learning algorithm that governs the performance generalization of the algorithm to unseen data. Due to the growing number of tuning parameters associated with these models are difficult to set by standard optimization techniques.
In an effort to develop more efficient search methods, Bayesian optimization approaches that focus on optimizing hyperparameter configuration selection have lately dominated the subject of hyperparameter optimization. By picking configurations in an adaptive way, these approaches seek to discover good configurations faster than typical baselines such as random search. These approaches, however, address the fundamentally difficult problem of fitting and optimizing a high-dimensional, non-convex function with uncertain smoothness and perhaps noisy evaluations.
The goal of an orthogonal approach to hyperparameter optimization is to accelerate configuration evaluation. These methods are computationally adaptive, providing greater resources to promising hyperparameter combinations while swiftly removing bad ones. The size of the training set, the number of features, or the number of iterations for iterative algorithms are all examples of resources.
These techniques seek to analyze orders of magnitude more hyperparameter configurations than approaches that evenly train all configurations to completion, hence discovering appropriate hyperparameters rapidly. The hyperband is designed to accelerate the random search by providing a simple and theoretically sound starting point.
Bayesian optimization vs Hyperband
|A probability-based model||A bandit-based model|
|Learns an expensive objective function by past observation.||In each given situation, the goal is to reduce the simple regret, defined as the distance from the best choice, as rapidly as feasible.|
|Bayesian optimization is only applicable to continuous hyperparameters, not categorical ones.||Hyperband can work for both continuous and categorical hyperparameters|
Working of hyperband
Hyperband calls the SuccessiveHalving technique introduced for hyperparameter optimization a subroutine and enhances it. The original Successive Halving method is named from the theory behind it: uniformly distribute a budget to a collection of hyperparameter configurations, evaluate the performance of all configurations, discard the worst half, and repeat until only one configuration remains. More promising combinations receive exponentially more resources from the algorithm.
The Hyperband algorithm is made up of two parts.
- For fixed-configuration and resource levels, the inner loop is called Successive Halving.
- The outer loop iterates over various configurations and resource parameters.
Each loop that executes the SuccessiveHalving within Hyperband is referred to as a “bracket.” Each bracket is intended to consume a portion of the entire resource budget and corresponds to a distinct tradeoff between n and B/n. As a result, a single Hyperband execution has a limited budget. Two inputs are required for hyperband.
- The most resources that may be assigned to a single configuration
- An input that determines how many configurations are rejected in each round of Successive Halving
The two inputs determine how many distinct brackets are examined; particularly, various configuration settings. Hyperband starts with the most aggressive bracket, which configures configuration to maximize exploration while requiring that at least one configuration be allotted R resources. Each consecutive bracket decreases the number of configurations by a factor until the last bracket, which allocates resources to all configurations. As a result, Hyperband does a geometric search in the average budget per configuration, eliminating the requirement to choose the number of configurations for a set budget at a certain cost.
- hypermodel: Keras tuner class that allows you to create and develop models using a searchable space.
- objective: It is the loss function for the model described in the hypermodel, such as ‘mse’ or ‘val_loss’. It has the data type string. If the parameter is a string, the optimization direction (minimum or maximum) will be inferred. If we have a list of objectives, we will minimize the sum of all the objectives to minimize while maximizing the total of all the objectives to maximize.
- max_epochs: The number of epochs required to train a single model. Setting this to a value somewhat greater than the estimated epochs to convergence for your biggest Model and using early halting during training is advised. The default value is 100.
- factor: Integer, the reduction factor for the number of epochs and number of models for each bracket. Defaults to 3.
- hyperband_iterations: The number of times the Hyperband algorithm is iterated over. Across all trials, one iteration will run about max epochs * (math.log(max epochs, factor) ** 2) cumulative epochs. Set this to the highest figure that fits within your resource budget. The default value is 1.
- seed: An optional integer that serves as the random seed.
- hyperparameters: HyperParameters instance that is optional. Can be used to override (or pre-register) search space hyperparameters.
- tune new entries: Boolean indicating whether or not hyperparameter entries required by the hypermodel but not defined in hyperparameters should be included in the search space. If this is not the case, the default values for these parameters will be utilized. True is the default value.
- allow new entries: The hypermodel is permitted to request hyperparameter entries that are not mentioned in hyperparameters. True is the default value.
Since the arms are autonomous and sampled at random, the hyperband has the potential to be parallelized. The simplest basic parallelization approach is to distribute individual Successive Halving brackets to separate computers. With this article, we have understood bandit-based hyperparameter tuning algorithm and its variation from bayesian optimization.