 ###### Guide to Bayesian Optimization Using BoTorch # Guide to Bayesian Optimization Using BoTorch BoTorch is a library built on top of PyTorch for Bayesian Optimization. It combines Monte-Carlo (MC) acquisition functions, a novel sample average approximation optimization approach, auto-differentiation, and variance reduction techniques.

Here are the salient features of Botorch according to the Readme of it’s repository

`Register for our Workshop on How To Start Your Career In Data Science?`
• Provides a modular and easily extensible interface for composing Bayesian optimization primitives, including probabilistic models, acquisition functions, and optimizers.
• Harnesses the power of PyTorch, including auto-differentiation, native support for highly parallelized modern hardware (e.g. GPUs) using device-agnostic code, and a dynamic computation graph.
• Supports Monte Carlo-based acquisition functions via the reparameterization trick, making it straightforward to implement new ideas without imposing restrictive assumptions about the underlying model.
• Enables seamless integration with deep and/or convolutional architectures in PyTorch.
• Has first-class support for state-of-the art probabilistic models in GPyTorch, including support for multi-task Gaussian Processes (GPs) deep kernel learning, deep GPs, and approximate inference.

In this article, let’s grasp a high-level overview of Bayesian optimization and try to use it to constrain optimization.

# Optimization

Optimization is the task of minimizing or maximizing an objective function by choosing parameters from a set of valid parameters. Optimization problems arise in all quantitative disciplines from computer science and engineering to operations research and economics.

Optimization can become extremely challenging if we don’t have access to the function we are trying to optimize.Our instinct to solve these kinds of problems is to do an exhaustive grid search of all the parameters and find out the best parameters.

If x is continuous with many dimensions and the observations y are noisy approximations of f(x) and the function is computationally expensive, then grid searching for best parameters is almost impossible. We can try doing random search but it won’t guarantee the optimum solution.

# Bayesian Optimization

Bayesian optimization is a principled technique to solve this blackbox optimization problem.

It helps us to find the best parameters in fewer iterations by adaptively selecting “test points”.

It achieves a good Exploration/Exploitation Tradeoff i.e it balances calculating the function in regions of good performance vs regions of high uncertainty.

Bayesian Optimization works by building a surrogate function.Surrogate function is a smooth probabilistic model of the original objective function.This surrogate functions must provide predictions for the value of function at parameters we haven’t yet tested. They should also quantify the amount of uncertainty at these parameters.

Now these predictions and uncertainties from the surrogate model are used to generate an acquisition function.This acquisition function is used to select the parameters for next observation. And once a new observation is made we update the surrogate model.

This process is repeated till convergence or the expected gains are very low.Following visualization by ax.dev summarizes this process beautifully Bayesian Optimization using Gaussian Process(GP) as the surrogate function and Expected Improvement(EI) as the acquisition function.

# Example

!pip install botorch can be used to do a quick install of botorch.

Let’s see how to optimize the following function with added constraint of  ∥x∥−3≤0

x∈[0,1]6

Following is the implementation of enforcing constraints on the above hartman function.

``` from botorch.test_functions import Hartmann
neg_hartmann6 = Hartmann(negate=True)
def outcome_constraint(X):
"""L1 constraint; feasible if less than or equal to zero."""
return X.sum(dim=-1) - 3
def weighted_obj(X):
"""Feasibility weighted objective; zero if not feasible."""
return neg_hartmann6(X) * (outcome_constraint(X) <= 0).type_as(X) ```

Now we need to initialize the surrogate model. MultiOutput Gaussian Process is used for this as we have objective function and a constraint.

###### What is ClassSR And How It Helps In Super-Resolution Networks?

``` from botorch.models import FixedNoiseGP, ModelListGP
from gpytorch.mlls.sum_marginal_log_likelihood import SumMarginalLogLikelihood
NOISE_SE = 0.5
train_yvar = torch.tensor(NOISE_SE**2, device=device, dtype=dtype)
def generate_initial_data(n=10):
# generate training data
train_x = torch.rand(10, 6, device=device, dtype=dtype)
exact_obj = neg_hartmann6(train_x).unsqueeze(-1)  # add output dimension
exact_con = outcome_constraint(train_x).unsqueeze(-1)  # add output dimension
train_obj = exact_obj + NOISE_SE * torch.randn_like(exact_obj)
train_con = exact_con + NOISE_SE * torch.randn_like(exact_con)
best_observed_value = weighted_obj(train_x).max().item()
return train_x, train_obj, train_con, best_observed_value
def initialize_model(train_x, train_obj, train_con, state_dict=None):
# define models for objective and constraint
model_obj = FixedNoiseGP(train_x, train_obj, train_yvar.expand_as(train_obj)).to(train_x)
model_con = FixedNoiseGP(train_x, train_con, train_yvar.expand_as(train_con)).to(train_x)
# combine into a multi-output GP model
model = ModelListGP(model_obj, model_con)
mll = SumMarginalLogLikelihood(model.likelihood, model)
# load state dict if it is passed
if state_dict is not None:
return mll, model ```

Now we need a function to optimize the acquisition function and give us next observation.

``` def optimize_acqf_and_get_observation(acq_func):
"""Optimizes the acquisition function, and returns a new candidate and a noisy observation."""
# optimize
candidates, _ = optimize_acqf(
acq_function=acq_func,
bounds=bounds,
q=BATCH_SIZE,
num_restarts=NUM_RESTARTS,
raw_samples=RAW_SAMPLES,  # used for intialization heuristic
options={"batch_limit": 5, "maxiter": 200},
)
# observe new values
new_x = candidates.detach()
exact_obj = neg_hartmann6(new_x).unsqueeze(-1)  # add output dimension
exact_con = outcome_constraint(new_x).unsqueeze(-1)  # add output dimension
new_obj = exact_obj + NOISE_SE * torch.randn_like(exact_obj)
new_con = exact_con + NOISE_SE * torch.randn_like(exact_con)
return new_x, new_obj, new_con ```

Now that all initialization steps are done we can run the bayesian optimization.

``` from botorch import fit_gpytorch_model
from botorch.acquisition.monte_carlo import qNoisyExpectedImprovement
from botorch.sampling.samplers import SobolQMCNormalSampler
N_TRIALS = 3
N_BATCH = 20
MC_SAMPLES = 256
best_observed_all_nei =  []
# average over multiple trials
for trial in range(1, N_TRIALS + 1):
best_observed_nei= []
# call helper functions to generate initial training data and initialize model
train_x_nei, train_obj_nei, train_con_nei, best_observed_value_nei = generate_initial_data(n=10)
mll_nei, model_nei = initialize_model(train_x_nei, train_obj_nei, train_con_nei)
best_observed_nei.append(best_observed_value_nei)
# run N_BATCH rounds of BayesOpt after the initial random batch
for iteration in range(1, N_BATCH + 1):
fit_gpytorch_model(mll_nei)
# define the qNEI acquisition modules using a QMC sampler
qmc_sampler = SobolQMCNormalSampler(num_samples=MC_SAMPLES)
# for best_f, we use the best observed noisy values as an approximation
qNEI = qNoisyExpectedImprovement(
model=model_nei,
X_baseline=train_x_nei,
sampler=qmc_sampler,
objective=constrained_obj,
)
# optimize and get new observation
new_x_nei, new_obj_nei, new_con_nei = optimize_acqf_and_get_observation(qNEI)
# update training points
train_x_nei = torch.cat([train_x_nei, new_x_nei])
train_obj_nei = torch.cat([train_obj_nei, new_obj_nei])
train_con_nei = torch.cat([train_con_nei, new_con_nei])
# update progress
best_value_nei = weighted_obj(train_x_nei).max().item()
best_observed_nei.append(best_value_nei)
# reinitialize the models so they are ready for fitting on next iteration
# use the current state dict to speed up fitting
mll_nei, model_nei = initialize_model(
train_x_nei,
train_obj_nei,
train_con_nei,
model_nei.state_dict(),
)
best_observed_all_nei.append(best_observed_nei) ```

Although observed value is close to true optimum, Variance across the trails is significant. To alleviate this issue we need to increase the number of trails.

# References

Paper

Github Repository

Tutorials

Colab Notebook

`Join our Telegram Group. Be part of an engaging community`