BoTorch is a library built on top of PyTorch for Bayesian Optimization. It combines Monte-Carlo (MC) acquisition functions, a novel sample average approximation optimization approach, auto-differentiation, and variance reduction techniques.
Here are the salient features of Botorch according to the Readme of it’s repository
- Provides a modular and easily extensible interface for composing Bayesian optimization primitives, including probabilistic models, acquisition functions, and optimizers.
- Harnesses the power of PyTorch, including auto-differentiation, native support for highly parallelized modern hardware (e.g. GPUs) using device-agnostic code, and a dynamic computation graph.
- Supports Monte Carlo-based acquisition functions via the reparameterization trick, making it straightforward to implement new ideas without imposing restrictive assumptions about the underlying model.
- Enables seamless integration with deep and/or convolutional architectures in PyTorch.
- Has first-class support for state-of-the art probabilistic models in GPyTorch, including support for multi-task Gaussian Processes (GPs) deep kernel learning, deep GPs, and approximate inference.
In this article, let’s grasp a high-level overview of Bayesian optimization and try to use it to constrain optimization.
Optimization
Optimization is the task of minimizing or maximizing an objective function by choosing parameters from a set of valid parameters. Optimization problems arise in all quantitative disciplines from computer science and engineering to operations research and economics.
Optimization can become extremely challenging if we don’t have access to the function we are trying to optimize.Our instinct to solve these kinds of problems is to do an exhaustive grid search of all the parameters and find out the best parameters.
If x is continuous with many dimensions and the observations y are noisy approximations of f(x) and the function is computationally expensive, then grid searching for best parameters is almost impossible. We can try doing random search but it won’t guarantee the optimum solution.
Bayesian Optimization
Bayesian optimization is a principled technique to solve this blackbox optimization problem.
It helps us to find the best parameters in fewer iterations by adaptively selecting “test points”.
It achieves a good Exploration/Exploitation Tradeoff i.e it balances calculating the function in regions of good performance vs regions of high uncertainty.
Bayesian Optimization works by building a surrogate function.Surrogate function is a smooth probabilistic model of the original objective function.This surrogate functions must provide predictions for the value of function at parameters we haven’t yet tested. They should also quantify the amount of uncertainty at these parameters.
Now these predictions and uncertainties from the surrogate model are used to generate an acquisition function.This acquisition function is used to select the parameters for next observation. And once a new observation is made we update the surrogate model.
This process is repeated till convergence or the expected gains are very low.Following visualization by ax.dev summarizes this process beautifully
Example
!pip install botorch can be used to do a quick install of botorch.
Let’s see how to optimize the following function with added constraint of ∥x∥−3≤0
x∈[0,1]6
Following is the implementation of enforcing constraints on the above hartman function.
from botorch.test_functions import Hartmann neg_hartmann6 = Hartmann(negate=True) def outcome_constraint(X): """L1 constraint; feasible if less than or equal to zero.""" return X.sum(dim=-1) - 3 def weighted_obj(X): """Feasibility weighted objective; zero if not feasible.""" return neg_hartmann6(X) * (outcome_constraint(X) <= 0).type_as(X)
Now we need to initialize the surrogate model. MultiOutput Gaussian Process is used for this as we have objective function and a constraint.
from botorch.models import FixedNoiseGP, ModelListGP from gpytorch.mlls.sum_marginal_log_likelihood import SumMarginalLogLikelihood NOISE_SE = 0.5 train_yvar = torch.tensor(NOISE_SE**2, device=device, dtype=dtype) def generate_initial_data(n=10): # generate training data train_x = torch.rand(10, 6, device=device, dtype=dtype) exact_obj = neg_hartmann6(train_x).unsqueeze(-1) # add output dimension exact_con = outcome_constraint(train_x).unsqueeze(-1) # add output dimension train_obj = exact_obj + NOISE_SE * torch.randn_like(exact_obj) train_con = exact_con + NOISE_SE * torch.randn_like(exact_con) best_observed_value = weighted_obj(train_x).max().item() return train_x, train_obj, train_con, best_observed_value def initialize_model(train_x, train_obj, train_con, state_dict=None): # define models for objective and constraint model_obj = FixedNoiseGP(train_x, train_obj, train_yvar.expand_as(train_obj)).to(train_x) model_con = FixedNoiseGP(train_x, train_con, train_yvar.expand_as(train_con)).to(train_x) # combine into a multi-output GP model model = ModelListGP(model_obj, model_con) mll = SumMarginalLogLikelihood(model.likelihood, model) # load state dict if it is passed if state_dict is not None: model.load_state_dict(state_dict) return mll, model
Now we need a function to optimize the acquisition function and give us next observation.
def optimize_acqf_and_get_observation(acq_func): """Optimizes the acquisition function, and returns a new candidate and a noisy observation.""" # optimize candidates, _ = optimize_acqf( acq_function=acq_func, bounds=bounds, q=BATCH_SIZE, num_restarts=NUM_RESTARTS, raw_samples=RAW_SAMPLES, # used for intialization heuristic options={"batch_limit": 5, "maxiter": 200}, ) # observe new values new_x = candidates.detach() exact_obj = neg_hartmann6(new_x).unsqueeze(-1) # add output dimension exact_con = outcome_constraint(new_x).unsqueeze(-1) # add output dimension new_obj = exact_obj + NOISE_SE * torch.randn_like(exact_obj) new_con = exact_con + NOISE_SE * torch.randn_like(exact_con) return new_x, new_obj, new_con
Now that all initialization steps are done we can run the bayesian optimization.
from botorch import fit_gpytorch_model from botorch.acquisition.monte_carlo import qNoisyExpectedImprovement from botorch.sampling.samplers import SobolQMCNormalSampler N_TRIALS = 3 N_BATCH = 20 MC_SAMPLES = 256 best_observed_all_nei = [] # average over multiple trials for trial in range(1, N_TRIALS + 1): best_observed_nei= [] # call helper functions to generate initial training data and initialize model train_x_nei, train_obj_nei, train_con_nei, best_observed_value_nei = generate_initial_data(n=10) mll_nei, model_nei = initialize_model(train_x_nei, train_obj_nei, train_con_nei) best_observed_nei.append(best_observed_value_nei) # run N_BATCH rounds of BayesOpt after the initial random batch for iteration in range(1, N_BATCH + 1): fit_gpytorch_model(mll_nei) # define the qNEI acquisition modules using a QMC sampler qmc_sampler = SobolQMCNormalSampler(num_samples=MC_SAMPLES) # for best_f, we use the best observed noisy values as an approximation qNEI = qNoisyExpectedImprovement( model=model_nei, X_baseline=train_x_nei, sampler=qmc_sampler, objective=constrained_obj, ) # optimize and get new observation new_x_nei, new_obj_nei, new_con_nei = optimize_acqf_and_get_observation(qNEI) # update training points train_x_nei = torch.cat([train_x_nei, new_x_nei]) train_obj_nei = torch.cat([train_obj_nei, new_obj_nei]) train_con_nei = torch.cat([train_con_nei, new_con_nei]) # update progress best_value_nei = weighted_obj(train_x_nei).max().item() best_observed_nei.append(best_value_nei) # reinitialize the models so they are ready for fitting on next iteration # use the current state dict to speed up fitting mll_nei, model_nei = initialize_model( train_x_nei, train_obj_nei, train_con_nei, model_nei.state_dict(), ) best_observed_all_nei.append(best_observed_nei)
Although observed value is close to true optimum, Variance across the trails is significant. To alleviate this issue we need to increase the number of trails.