# Guide To GPyTorch: A Python Library For Gaussian Process Models

GPyTorch is a PyTorch-based library designed for implementing Gaussian processes. It was introduced by Jacob R. Gardner, Geoff Pleiss, David Bindel, Kilian Q. Weinberger and Andrew Gordon Wilson – researchers at Cornel University (research paper).

Before going into the details of GPyTorch, let us first understand what a Gaussian process means, in short.

## Gaussian Process

In probability theory and statistics, the Gaussian process refers to a stochastic process i.e. a collection of random variables indexed by time or space in such a way that each finite collection of the random variables has a multivariate normal distribution (every finite linear combination of the variables is normally distributed).

#### AIM Daily XO

##### Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy

You might have heard about statistical inference techniques such as Bayesian inference using which one can represent uncertainty over numeric values like the outcome of a dice roll or the height of a person. Gaussian process instead is a probability distribution over possible functions. Find a detailed description of the Gaussian process here.

## Overview of GPyTorch

GPyTorch enables easy creation of flexible, scalable and modular Gaussian process models. It is implemented using PyTorch. It performs GP inference via Blackbox Matrix-Matrix multiplication (BBMM).

Pros of GPyTorch

1. Scalability: It enables training of GPs with millions of data points
2. Modular design: It has the capability of easily integrating GPs with deep neural networks
3. Speed: It can utilize state-of-the-art inference algorithms (such as SKI/KISS-GP, stochastic Lanczos expansions, LOVE, SKIP, stochastic variational deep kernel learning)  and hardware acceleration using GPUs

## Practical implementation

Here’s a demonstration of training an RBF kernel Gaussian process on the following function:

y = sin(2x) + E             …(i)

E ~ (0, 0.04)

(where 0 is mean of the normal distribution and 0.04 is the variance)

The code has been implemented in Google colab with Python 3.7.10 and GPyTorch 1.4.0 versions. Step-wise explanation of the code is as follows:

1. Install the GPyTorch library

`!pip install gpytorch`

1. Import required libraries
``` import math
import torch
import gpytorch
from matplotlib import pyplot as plt
%matplotlib inline #for visualization plots to appear at the frontend ```
1. Prepare training data
``` # Choose regularly spaced 100 points from the interval [0,1]
x_train = torch.linspace(0, 1, 100)
# Compute label as sin(2*pi*x) with Gaussian noise as described by eq.(i) above
y_train = torch.sin(x_train * (2 * math.pi)) + torch.randn(x_train.size()) * math.sqrt(0.04) ```
1. Define the GP model

We have used exact inference – the simplest form of GP model

``` class ExactGPModel(gpytorch.models.ExactGP):
def __init__(self, x_train, y_train, likelihood):
super(ExactGPModel, self).__init__(x_train, y_train, likelihood)
self.mean_module = gpytorch.means.ConstantMean() #prior mean
#covariance
self.covar_module = gpytorch.kernels.ScaleKernel
(gpytorch.kernels.RBFKernel())
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpytorch.distributions.MultivariateNormal(mean_x, covar_x) ```

For most of the GP regression models, following objects should be constructed:

• A ‘GP Model’ which handles most of the inference
• A ‘Likelihood’
• A ‘Mean’ defining prior mean of GP
• A ‘Kernel’ defining covariance of GP
• A Multivariate Normal Distribution

The two methods defined above are components of the Exact (non-variational) GP model.

The _init_ method takes a likelihood and the training data. It then constructs objects like mean module and kernel module required for the ‘forward’ method of the model. The ‘forward’ method takes in some data x. It returns a multivariate normal distribution with prior mean and covariance computed at x.

1. Initialize likelihood

`lkh = gpytorch.likelihoods.GaussianLikelihood()`

Initialize the GP model

`model = ExactGPModel(x_train, y_train, lkh)`

1. Find optimal hyperparameters of the model
``` model.train()
lkh.train() ```

Output:

``` GaussianLikelihood(
(noise_covar): HomoskedasticNoise(
(raw_noise_constraint): GreaterThan(1.000E-04)
)
) ```

`opt = torch.optim.Adam(model.parameters(), lr=0.1)`

1. Define loss for GP

`l = gpytorch.mlls.ExactMarginalLogLikelihood(lkh, model)`

1. Compute loss, length scale (i.e. length of twists and turns in the function) and noise for each iteration of the GP
``` for i in range(20):
# Zero gradients from previous iteration
# Store output from the model
op = model(x_train)
# Compute loss and backprop gradients
loss = -l(op, y_train)
loss.backward()  #back propagation
#Print the loss, length scale and noise for 20 iterations
print('Iter %d/%d - Loss: %.3f   lengthscale: %.3f   noise:
%.3f' % (
i + 1, 20, loss.item(),  #iteration number
model.covar_module.base_kernel.lengthscale.item(), #length scale
model.likelihood.noise.item() #noise
)) ```
1. Make prediction with the model
``` #Evaluation (predictive posterior) mode
model.eval()
lkh.eval() ```
1.  Make predictions by feeding model through likelihood
``` with torch.no_grad(), gpytorch.settings.fast_pred_var():
x_test = torch.linspace(0, 1, 51)
#equally spaced 51 test points in [0,1]
observed_pred = likelihood(model(x_test)) ```
1. Plot the fitted model
``` with torch.no_grad():  #disable gradient computation
# Initialize plot
fig, axis = plt.subplots(1, 1, figsize=(4, 3))
# Upper and lower confidence bounds
lower, upper = observed_pred.confidence_region()
# Plot training data as black stars
axis.plot(x_train.numpy(), y_train.numpy(), 'k*')
# Plot predictive means as blue line
axis.plot(x_test.numpy(), observed_pred.mean.numpy(), 'b')
# Shade between the lower and upper confidence bounds
#fill the area showing confidence
axis.fill_between(x_test.numpy(), lower.numpy(), upper.numpy(),
alpha=0.5)
axis.set_ylim([-3, 3])  #Y-direction limits
axis.legend(['Observed Data', 'Mean', 'Confidence']) ```

Output plot:

## References

To get a detailed understanding of the GPyTorch library, refer to the following web links:

## The Deep Learning Curve is a technology-based podcast hosted by Vijayalakshmi Anandan - Video Presenter and Podcaster at Analytics India Magazine. This podcast is the narrator's journey of curiosity and discovery in the world of technology.

A zealous learner aspiring to advance in the domain of AI/ML. Eager to grasp emerging techniques to get insights from data and hence explore realistic Data Science applications as well.

## Our Upcoming Events

24th Mar, 2023 | Webinar

27-28th Apr, 2023 I Bangalore
Data Engineering Summit (DES) 2023

23 Jun, 2023 | Bangalore
MachineCon India 2023 [AI100 Awards]

21 Jul, 2023 | New York
MachineCon USA 2023 [AI100 Awards]

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### GPT-4 Predictions: Hits and Misses

With OpenAI’s official GPT-4 launch, predictions went haywire.

### Indian Startups’ American Dream Turns into Nightmare

Y Combinator, the prestigious startup accelerator, has come under fire as Indian startups backed by the accelerator are facing trouble after the collapse of Silicon Valley Bank.

### Fraud of the Rings: Can Amazon be Trusted with Your Data?

A ransomware gang claims to have breached Amazon-owned smart security camera company, Ring, and is threatening to release its data

### Council Post: Evolution of Data Science: Skillset, Toolset, and Mindset

In my opinion, there will be considerable disorder and disarray in the near future concerning the emerging fields of data and analytics. The proliferation of platforms such as ChatGPT or Bard has generated a lot of buzz. While some users are enthusiastic about the potential benefits of generative AI and its extensive use in business and daily life, others have raised concerns regarding the accuracy, ethics, and related issues.

### Why GPT4 Might Disappoint You

Even after the announcement yesterday, Sam Altman was eager to admit how much of a perfect model GPT4 wasn’t

### Doomsday Will Be Triggered By GPT-4

“The model isn’t accurate in admitting its limitations,” reads GPT-4 paper. A crucial point to note for every single user as well.

### Why Are Researchers Slamming OpenAI’s GPT-4 Paper?

The 98 page paper introducing GPT-4 proudly declares that they’re disclosing ‘nothing’ about the contents of their training set

### Ernie, the Chinese Bot is Here to Take On ChatGPT

While presenting Ernie Bot, Robin Li said that he knows that the bot is not perfect. “So why are we unveiling it today? Because the market demands it,”

### Big Techs Flip-Flop on Open Source

Over the years, the tech giants have realised the potential of open source and built on it but not all of them have contributed enough in return.

### GPT-4 Hype Can’t Hurt Google

Many have taken GPT-4 to be one more nail – or perhaps the final nail? – in the coffin of Google.