Guide to Bayesian Optimization Using BoTorch

BoTorch is a library built on top of PyTorch for Bayesian Optimization. It combines Monte-Carlo (MC) acquisition functions, a novel sample average approximation optimization approach, auto-differentiation, and variance reduction techniques.

BoTorch is a library built on top of PyTorch for Bayesian Optimization. It combines Monte-Carlo (MC) acquisition functions, a novel sample average approximation optimization approach, auto-differentiation, and variance reduction techniques.

Here are the salient features of Botorch according to the Readme of it’s repository


Sign up for your weekly dose of what's up in emerging technology.
  • Provides a modular and easily extensible interface for composing Bayesian optimization primitives, including probabilistic models, acquisition functions, and optimizers.
  • Harnesses the power of PyTorch, including auto-differentiation, native support for highly parallelized modern hardware (e.g. GPUs) using device-agnostic code, and a dynamic computation graph.
  • Supports Monte Carlo-based acquisition functions via the reparameterization trick, making it straightforward to implement new ideas without imposing restrictive assumptions about the underlying model.
  • Enables seamless integration with deep and/or convolutional architectures in PyTorch.
  • Has first-class support for state-of-the art probabilistic models in GPyTorch, including support for multi-task Gaussian Processes (GPs) deep kernel learning, deep GPs, and approximate inference.

In this article, let’s grasp a high-level overview of Bayesian optimization and try to use it to constrain optimization.


Optimization is the task of minimizing or maximizing an objective function by choosing parameters from a set of valid parameters. Optimization problems arise in all quantitative disciplines from computer science and engineering to operations research and economics. 

Blackbox function to optimize

Optimization can become extremely challenging if we don’t have access to the function we are trying to optimize.Our instinct to solve these kinds of problems is to do an exhaustive grid search of all the parameters and find out the best parameters.

If x is continuous with many dimensions and the observations y are noisy approximations of f(x) and the function is computationally expensive, then grid searching for best parameters is almost impossible. We can try doing random search but it won’t guarantee the optimum solution.

Bayesian Optimization

Bayesian optimization is a principled technique to solve this blackbox optimization problem.

It helps us to find the best parameters in fewer iterations by adaptively selecting “test points”.

It achieves a good Exploration/Exploitation Tradeoff i.e it balances calculating the function in regions of good performance vs regions of high uncertainty.

Bayesian Optimization works by building a surrogate function.Surrogate function is a smooth probabilistic model of the original objective function.This surrogate functions must provide predictions for the value of function at parameters we haven’t yet tested. They should also quantify the amount of uncertainty at these parameters.

Now these predictions and uncertainties from the surrogate model are used to generate an acquisition function.This acquisition function is used to select the parameters for next observation. And once a new observation is made we update the surrogate model.

This process is repeated till convergence or the expected gains are very low.Following visualization by summarizes this process beautifully

Bayesian Optimization using Gaussian Process(GP) as the surrogate function and Expected Improvement(EI) as the acquisition function.


!pip install botorch can be used to do a quick install of botorch.

Let’s see how to optimize the following function with added constraint of  ∥x∥−3≤0


Following is the implementation of enforcing constraints on the above hartman function.

 from botorch.test_functions import Hartmann
 neg_hartmann6 = Hartmann(negate=True)
 def outcome_constraint(X):
     """L1 constraint; feasible if less than or equal to zero."""
     return X.sum(dim=-1) - 3
 def weighted_obj(X):
     """Feasibility weighted objective; zero if not feasible."""
     return neg_hartmann6(X) * (outcome_constraint(X) <= 0).type_as(X) 

Now we need to initialize the surrogate model. MultiOutput Gaussian Process is used for this as we have objective function and a constraint.

 from botorch.models import FixedNoiseGP, ModelListGP
 from gpytorch.mlls.sum_marginal_log_likelihood import SumMarginalLogLikelihood
 NOISE_SE = 0.5
 train_yvar = torch.tensor(NOISE_SE**2, device=device, dtype=dtype)
 def generate_initial_data(n=10):
     # generate training data
     train_x = torch.rand(10, 6, device=device, dtype=dtype)
     exact_obj = neg_hartmann6(train_x).unsqueeze(-1)  # add output dimension
     exact_con = outcome_constraint(train_x).unsqueeze(-1)  # add output dimension
     train_obj = exact_obj + NOISE_SE * torch.randn_like(exact_obj)
     train_con = exact_con + NOISE_SE * torch.randn_like(exact_con)
     best_observed_value = weighted_obj(train_x).max().item()
     return train_x, train_obj, train_con, best_observed_value
 def initialize_model(train_x, train_obj, train_con, state_dict=None):
     # define models for objective and constraint
     model_obj = FixedNoiseGP(train_x, train_obj, train_yvar.expand_as(train_obj)).to(train_x)
     model_con = FixedNoiseGP(train_x, train_con, train_yvar.expand_as(train_con)).to(train_x)
     # combine into a multi-output GP model
     model = ModelListGP(model_obj, model_con)
     mll = SumMarginalLogLikelihood(model.likelihood, model)
     # load state dict if it is passed
     if state_dict is not None:
     return mll, model 

Now we need a function to optimize the acquisition function and give us next observation.

 def optimize_acqf_and_get_observation(acq_func):
     """Optimizes the acquisition function, and returns a new candidate and a noisy observation."""
     # optimize
     candidates, _ = optimize_acqf(
         raw_samples=RAW_SAMPLES,  # used for intialization heuristic
         options={"batch_limit": 5, "maxiter": 200},
     # observe new values 
     new_x = candidates.detach()
     exact_obj = neg_hartmann6(new_x).unsqueeze(-1)  # add output dimension
     exact_con = outcome_constraint(new_x).unsqueeze(-1)  # add output dimension
     new_obj = exact_obj + NOISE_SE * torch.randn_like(exact_obj)
     new_con = exact_con + NOISE_SE * torch.randn_like(exact_con)
     return new_x, new_obj, new_con 

Now that all initialization steps are done we can run the bayesian optimization.

 from botorch import fit_gpytorch_model
 from botorch.acquisition.monte_carlo import qNoisyExpectedImprovement
 from botorch.sampling.samplers import SobolQMCNormalSampler
 N_TRIALS = 3 
 N_BATCH = 20 
 MC_SAMPLES = 256 
 best_observed_all_nei =  []
 # average over multiple trials
 for trial in range(1, N_TRIALS + 1):
     best_observed_nei= []
     # call helper functions to generate initial training data and initialize model
     train_x_nei, train_obj_nei, train_con_nei, best_observed_value_nei = generate_initial_data(n=10)
     mll_nei, model_nei = initialize_model(train_x_nei, train_obj_nei, train_con_nei)
     # run N_BATCH rounds of BayesOpt after the initial random batch
     for iteration in range(1, N_BATCH + 1):    
         # define the qNEI acquisition modules using a QMC sampler
         qmc_sampler = SobolQMCNormalSampler(num_samples=MC_SAMPLES)
         # for best_f, we use the best observed noisy values as an approximation
         qNEI = qNoisyExpectedImprovement(
         # optimize and get new observation
         new_x_nei, new_obj_nei, new_con_nei = optimize_acqf_and_get_observation(qNEI)
         # update training points
         train_x_nei =[train_x_nei, new_x_nei])
         train_obj_nei =[train_obj_nei, new_obj_nei])
         train_con_nei =[train_con_nei, new_con_nei])
         # update progress
         best_value_nei = weighted_obj(train_x_nei).max().item()
         # reinitialize the models so they are ready for fitting on next iteration
         # use the current state dict to speed up fitting
         mll_nei, model_nei = initialize_model(


Although observed value is close to true optimum, Variance across the trails is significant. To alleviate this issue we need to increase the number of trails.



Github Repository


Colab Notebook

More Great AIM Stories

Pavan Kandru
AI enthusiast with a flair for NLP. I love playing with exotic data.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM