Active Hackathon

Building robust models with learning rate schedulers in PyTorch?

Pytorch learning rate scheduler is used to find the optimal learning rate for various models by conisdering the model architecture and parameters.
Listen to this story

Learning rate in any modeling is an important parameter that has to be declared with utmost care. Learning rate basically decides how well and how quickly a model can converge to the optimal solution. Many times finding an optimal learning rate becomes a tedious task based on the model architecture. So learning rate scheduler in PyTorch is one of the frameworks that help us to iterate over various learning rate values and help in determining the loss values across batch sizes. In this article, let us try to understand the learning rate scheduler in PyTorch with respect to this context.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Table of Contents

  1. Introduction to learning rate scheduler in PyTorch
  2. Different learning rate schedulers in PyTorch
  3. The necessity of learning rate schedulers
  4. How to use learning rate schedulers in PyTorch?
  5. Summary

Introduction to learning rate scheduler in PyTorch

The learning rate scheduler in PyTorch is available in the form of a standard package known as torch.optim. This package is developed and structured by implementing various optimization algorithms. Most commonly used optimization techniques and algorithms are generally supported and this package is stuffed to find the best optimal learning rate possible for all model architectures and use cases.

Now let us take a look at the learning rate scheduler in PyTorch in a little more detail. The learning rate scheduler has to be used by first creating an optimizer object in the working environment. The object created should have the ability to take in the current state of the models and be responsible for updating the parameters based on the computed gradients.

Let us now try to understand the meaning of gradient with respect to the learning rate. So with respect to learning rate gradient can be taught as the steps taken by the algorithm to reach the optimal solution. Now let us look into a sample code on how to create the optimizer instance in the working environment.

Creating optimizer instance

The optimizer instance is created in the working environment by using the required optimizers. Generally used optimizers are either Stochastic Gradient Descent(SGD) or Adam. So using the below code can be used to create an SGD optimizer instance in the working environment.

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

So here, we can see that we are passing the model parameters, the random value of learning rate, and momentum values for the SGD optimizer instance. The optimizer instances can also be iterated through multiple values at different instances by mentioning the required parameters using dictionaries. By using independent instances we can either iterate the model through the default learning rate or through separate learning rate values for each of the model parameters. Now let us look into the different learning rate schedulers available in PyTorch.

Different learning rate schedulers in PyTorch

There are various learning rate schedulers in PyTorch that are made available in the torch. optim.lr_scheduler package. In the package, lr_scheduler  means learning rate scheduler. The package can be used along with different learning rate schedulers. In this section, let us look into the different learning rate schedulers in PyTorch and understand the functionality of each of them.

i) lr_scheduler.LambdaLR is used to set the learning rate for each of the parameter groups. The function computes the multiplicative factor for each iteration according to the optimizer mentioned in the function and epoch ratios mentioned.

ii) lr_scheduler.MultiplicativeLR is used to multiply the learning rate of each of the model parameters and group them accordingly by the factor given in the scheduler function.

iii) lr_scheduler.StepLR is used to gradually decay the learning rate based on the model parameters. The decay of the learning rate is based on step sizes for each of the iterations of the model parameters.

iv) lr_scheduler.MultiStepLR is used to decay the learning rate of each of the model parameters based only when the learning rate reaches the mentioned threshold or the maximum limit of model parameters.

v) lr_scheduler.ConstantLR is used to decay the learning rate factor gradually based on the model parameters through a constant until the model parameters iterate until the total iterations are passed by the model parameters.

vi) lr_scheduler.LinearLR is used to decay the learning rate of each of the parameters based on linearly changing model parameters and the scheduler will iterate until the maximum iterations are reached by the model parameters.

vii) lr_scheduler.ExponentialLR is used to decay the learning rate exponentially and the scheduler will iterate until the maximum model parameters are reached.

The above-mentioned learning rate schedulers basically iterate until the maximum number of model iterations is reached. Also, there are many learning rate schedulers that operate based on cosine angles properties and some of the schedulers operate on a list of schedulers that would be chained into a single learning rate scheduling instance.

The necessity of learning rate schedulers

Learning rate schedulers in PyTorch are made available in the form of a ready-to-use package built-in with various functionalities. The optimal learning rate is very much necessary to obtain better optimal solutions and better-converged models. So by using learning rate schedulers while modeling the loss value can be computed for models until the total number of iterations is reached.

By using learning rate schedulers the learning rate for each iteration can be validated and the learning rate with the low loss value can be used to make the model to the optimal solution. Learning rate schedulers also use various decaying methods which helps us opt for different learning rate schedulers for each task. The learning rate scheduler emphasizes on reporting the testing loss and accuracy for each iteration and decaying learning rate values. So the testing loss and accuracy can be validated and the best learning rate value can be used to find the optimal solution to the model architecture.

Let us understand the usage and functionality of learning rate schedulers in PyTorch better through a case study.

How to use learning rate schedulers in PyTorch?

For the case study let us extract the Fashion MNIST data that is available in the datasets module of torchvision and perform certain transform operations in the working environment. In this section, a complete case study of using the ExponentialLR learning rate scheduler is shown.

Step-1: Importing the required libraries

As this case study is carried out in PyTorch, let us import some of the torch libraries.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import matplotlib.pyplot as plt
from torchvision import datasets,transforms
from torch.optim.lr_scheduler import StepLR

Step-2: Validating device configuration

The case study is carried out in Colab so the device configuration is validated to check for the availability of any accelerators

device=torch.device("cuda" if torch.cuda.is_available() else "gpu")

Step-3: Transforming the data

The data is transformed into tensors using the Compose function of PyTorch, as shown in the below code.

transform=transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.1307),(0.3081))])

Step-4: PyTorch data processing and data loading

Let us use FashioMNIST data and perform certain data processing and load the data in the working environment

train_df=datasets.FashionMNIST('./data',train=True,download=True,transform=transform)
test_df=datasets.FashionMNIST('./data',train=False,transform=transform)
train_df_load=torch.utils.data.DataLoader(train_df,batch_size=batch_size,shuffle=True)
test_df_load=torch.utils.data.DataLoader(test_df,batch_size=batch_size)

Step-5: Creating a PyTorch model architecture

A PyTorch model architecture is created in the working environment, as shown below. Here basically, layers and forward propagation parameters are being declared.

### Creating a Model architecture
 
class FashionMNIST_Net(nn.Module):
   def __init__(self):
       super(FashionMNIST_Net, self).__init__()
       self.conv1 = nn.Conv2d(1, 32, 3, 1)
       self.conv2 = nn.Conv2d(32, 64, 3, 1)
       self.dropout1 = nn.Dropout(0.25)
       self.dropout2 = nn.Dropout(0.5)
       self.fc1 = nn.Linear(9216, 128)
       self.fc2 = nn.Linear(128, 10)
 
   def forward(self, x):
       x = self.conv1(x)
       x = F.relu(x)
       x = self.conv2(x)
       x = F.relu(x)
       x = F.max_pool2d(x, 2)
       x = self.dropout1(x)
       x = torch.flatten(x, 1)
       x = self.fc1(x)
       x = F.relu(x)
       x = self.dropout2(x)
       x = self.fc2(x)
       output = F.log_softmax(x, dim=1)
       return output

Step-6: Creating the training function

A user-defined function is used to train the model with required optimizers and iterations. The structure of the user-defined function is shown below.

def train(model,optimizer,epoch,log_interval):
   model.train()
   for batch_idx, (data, target) in enumerate(train_df_load):
       data, target = data.to(device), target.to(device)
       optimizer.zero_grad()
      
       output = model(data)
       loss = F.nll_loss(output, target)
       loss.backward()
       optimizer.step()
      
       lr=optimizer.param_groups[0]["lr"]
      
       if batch_idx % log_interval == 0:
           print('Train Epoch: {} batch-{}\tLoss: {:.6f} Learning Rate: {}'.format(epoch, batch_idx ,loss.item(),lr))
   lrs.append(lr)

Step-7: Creating the testing function

A user-defined function is used to test the model where the model loss and predictions are being evaluated.

​​def test(model):
   model.eval()
   test_loss = 0
   correct = 0
   with torch.no_grad():
       for data, target in test_df_load:
           data, target = data.to(device), target.to(device)
           output = model(data)
           test_loss += F.nll_loss(output, target, reduction='sum').item()
           pred = output.argmax(dim=1, keepdim=True) 
           correct += pred.eq(target.view_as(pred)).sum().item()
 
   test_loss /= len(test_df_load.dataset)
 
   print('\nTest set: Average loss: {:.4f}, Accuracy: {:.4f}\n'.format(
       test_loss,correct / len(test_df_load.dataset))

Step-8: Using the learning rate scheduler

For using the learning rate scheduler the model is compiled and the model is iterated along the training and testing user-defined functions to validate the learning rate until the model reaches the final number of iterations mentioned. 

model=FashionMNIST_Net().to(device)
optimizer=optim.Adam(model.parameters(),lr=0.01)
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer,gamma=0.1)
epochs=10
lrs=[]
for epoch in range(1,epochs+1):
 train(model,optimizer,epoch,1000)
 test(model)
 scheduler.step()

Here we can see that for each epoch the model has iterated through the mentioned batch size and after each epoch, the testing loss and accuracy are retrieved. By validating the testing loss and accuracy the optimal learning rate can be used for the model.

So this is how the ExponentialLR learning rate scheduler is used in PyTorch.

Summary

Learning rate is a very important metric that is required for the model to converge to the optimal solution in a better way. So PyTorch offers a variety of learning rate schedulers with diverse properties that can be used along with different model architectures to enable the model to converge to its best optimal solution. Various decaying methods can be used and the model can be validated for each of the iterations and the batches to find the optimal learning rate for the model architecture in use.

References

More Great AIM Stories

Darshan M
Darshan is a Master's degree holder in Data Science and Machine Learning and an everyday learner of the latest trends in Data Science and Machine Learning. He is always interested to learn new things with keen interest and implementing the same and curating rich content for Data Science, Machine Learning,NLP and AI

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

The curious case of Google Cloud revenue

Porat had earlier said that Google Cloud was putting in money to make more money, but even with the bucket-loads of money that it was making, profitability was still elusive.

Global Parliaments can do much more with Artificial Intelligence

The world is using AI to enhance the performance of its policymakers. India, too, has launched its own machine learning system NeVA, which at the moment is not fully implemented across the nation. How can we learn and adopt from the advancement in the Parliaments around the world? 

Why IISc wins?

IISc was selected as the world’s top research university, trumping some of the top Ivy League colleges in the QS World University Rankings 2022

[class^="wpforms-"]
[class^="wpforms-"]