Listen to this story
|
Learning rate in any modeling is an important parameter that has to be declared with utmost care. Learning rate basically decides how well and how quickly a model can converge to the optimal solution. Many times finding an optimal learning rate becomes a tedious task based on the model architecture. So learning rate scheduler in PyTorch is one of the frameworks that help us to iterate over various learning rate values and help in determining the loss values across batch sizes. In this article, let us try to understand the learning rate scheduler in PyTorch with respect to this context.
Table of Contents
- Introduction to learning rate scheduler in PyTorch
- Different learning rate schedulers in PyTorch
- The necessity of learning rate schedulers
- How to use learning rate schedulers in PyTorch?
- Summary
Introduction to learning rate scheduler in PyTorch
The learning rate scheduler in PyTorch is available in the form of a standard package known as torch.optim. This package is developed and structured by implementing various optimization algorithms. Most commonly used optimization techniques and algorithms are generally supported and this package is stuffed to find the best optimal learning rate possible for all model architectures and use cases.
Now let us take a look at the learning rate scheduler in PyTorch in a little more detail. The learning rate scheduler has to be used by first creating an optimizer object in the working environment. The object created should have the ability to take in the current state of the models and be responsible for updating the parameters based on the computed gradients.
Let us now try to understand the meaning of gradient with respect to the learning rate. So with respect to learning rate gradient can be taught as the steps taken by the algorithm to reach the optimal solution. Now let us look into a sample code on how to create the optimizer instance in the working environment.
Creating optimizer instance
The optimizer instance is created in the working environment by using the required optimizers. Generally used optimizers are either Stochastic Gradient Descent(SGD) or Adam. So using the below code can be used to create an SGD optimizer instance in the working environment.
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
So here, we can see that we are passing the model parameters, the random value of learning rate, and momentum values for the SGD optimizer instance. The optimizer instances can also be iterated through multiple values at different instances by mentioning the required parameters using dictionaries. By using independent instances we can either iterate the model through the default learning rate or through separate learning rate values for each of the model parameters. Now let us look into the different learning rate schedulers available in PyTorch.
Different learning rate schedulers in PyTorch
There are various learning rate schedulers in PyTorch that are made available in the torch. optim.lr_scheduler package. In the package, lr_scheduler means learning rate scheduler. The package can be used along with different learning rate schedulers. In this section, let us look into the different learning rate schedulers in PyTorch and understand the functionality of each of them.
i) lr_scheduler.LambdaLR is used to set the learning rate for each of the parameter groups. The function computes the multiplicative factor for each iteration according to the optimizer mentioned in the function and epoch ratios mentioned.
ii) lr_scheduler.MultiplicativeLR is used to multiply the learning rate of each of the model parameters and group them accordingly by the factor given in the scheduler function.
iii) lr_scheduler.StepLR is used to gradually decay the learning rate based on the model parameters. The decay of the learning rate is based on step sizes for each of the iterations of the model parameters.
iv) lr_scheduler.MultiStepLR is used to decay the learning rate of each of the model parameters based only when the learning rate reaches the mentioned threshold or the maximum limit of model parameters.
v) lr_scheduler.ConstantLR is used to decay the learning rate factor gradually based on the model parameters through a constant until the model parameters iterate until the total iterations are passed by the model parameters.
vi) lr_scheduler.LinearLR is used to decay the learning rate of each of the parameters based on linearly changing model parameters and the scheduler will iterate until the maximum iterations are reached by the model parameters.
vii) lr_scheduler.ExponentialLR is used to decay the learning rate exponentially and the scheduler will iterate until the maximum model parameters are reached.
The above-mentioned learning rate schedulers basically iterate until the maximum number of model iterations is reached. Also, there are many learning rate schedulers that operate based on cosine angles properties and some of the schedulers operate on a list of schedulers that would be chained into a single learning rate scheduling instance.
The necessity of learning rate schedulers
Learning rate schedulers in PyTorch are made available in the form of a ready-to-use package built-in with various functionalities. The optimal learning rate is very much necessary to obtain better optimal solutions and better-converged models. So by using learning rate schedulers while modeling the loss value can be computed for models until the total number of iterations is reached.
By using learning rate schedulers the learning rate for each iteration can be validated and the learning rate with the low loss value can be used to make the model to the optimal solution. Learning rate schedulers also use various decaying methods which helps us opt for different learning rate schedulers for each task. The learning rate scheduler emphasizes on reporting the testing loss and accuracy for each iteration and decaying learning rate values. So the testing loss and accuracy can be validated and the best learning rate value can be used to find the optimal solution to the model architecture.
Let us understand the usage and functionality of learning rate schedulers in PyTorch better through a case study.
How to use learning rate schedulers in PyTorch?
For the case study let us extract the Fashion MNIST data that is available in the datasets module of torchvision and perform certain transform operations in the working environment. In this section, a complete case study of using the ExponentialLR learning rate scheduler is shown.
Step-1: Importing the required libraries
As this case study is carried out in PyTorch, let us import some of the torch libraries.
import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim import matplotlib.pyplot as plt from torchvision import datasets,transforms from torch.optim.lr_scheduler import StepLR
Step-2: Validating device configuration
The case study is carried out in Colab so the device configuration is validated to check for the availability of any accelerators
device=torch.device("cuda" if torch.cuda.is_available() else "gpu")
Step-3: Transforming the data
The data is transformed into tensors using the Compose function of PyTorch, as shown in the below code.
transform=transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.1307),(0.3081))])
Step-4: PyTorch data processing and data loading
Let us use FashioMNIST data and perform certain data processing and load the data in the working environment
train_df=datasets.FashionMNIST('./data',train=True,download=True,transform=transform) test_df=datasets.FashionMNIST('./data',train=False,transform=transform) train_df_load=torch.utils.data.DataLoader(train_df,batch_size=batch_size,shuffle=True) test_df_load=torch.utils.data.DataLoader(test_df,batch_size=batch_size)
Step-5: Creating a PyTorch model architecture
A PyTorch model architecture is created in the working environment, as shown below. Here basically, layers and forward propagation parameters are being declared.
### Creating a Model architecture class FashionMNIST_Net(nn.Module): def __init__(self): super(FashionMNIST_Net, self).__init__() self.conv1 = nn.Conv2d(1, 32, 3, 1) self.conv2 = nn.Conv2d(32, 64, 3, 1) self.dropout1 = nn.Dropout(0.25) self.dropout2 = nn.Dropout(0.5) self.fc1 = nn.Linear(9216, 128) self.fc2 = nn.Linear(128, 10) def forward(self, x): x = self.conv1(x) x = F.relu(x) x = self.conv2(x) x = F.relu(x) x = F.max_pool2d(x, 2) x = self.dropout1(x) x = torch.flatten(x, 1) x = self.fc1(x) x = F.relu(x) x = self.dropout2(x) x = self.fc2(x) output = F.log_softmax(x, dim=1) return output
Step-6: Creating the training function
A user-defined function is used to train the model with required optimizers and iterations. The structure of the user-defined function is shown below.
def train(model,optimizer,epoch,log_interval): model.train() for batch_idx, (data, target) in enumerate(train_df_load): data, target = data.to(device), target.to(device) optimizer.zero_grad() output = model(data) loss = F.nll_loss(output, target) loss.backward() optimizer.step() lr=optimizer.param_groups[0]["lr"] if batch_idx % log_interval == 0: print('Train Epoch: {} batch-{}\tLoss: {:.6f} Learning Rate: {}'.format(epoch, batch_idx ,loss.item(),lr)) lrs.append(lr)
Step-7: Creating the testing function
A user-defined function is used to test the model where the model loss and predictions are being evaluated.
def test(model): model.eval() test_loss = 0 correct = 0 with torch.no_grad(): for data, target in test_df_load: data, target = data.to(device), target.to(device) output = model(data) test_loss += F.nll_loss(output, target, reduction='sum').item() pred = output.argmax(dim=1, keepdim=True) correct += pred.eq(target.view_as(pred)).sum().item() test_loss /= len(test_df_load.dataset) print('\nTest set: Average loss: {:.4f}, Accuracy: {:.4f}\n'.format( test_loss,correct / len(test_df_load.dataset))
Step-8: Using the learning rate scheduler
For using the learning rate scheduler the model is compiled and the model is iterated along the training and testing user-defined functions to validate the learning rate until the model reaches the final number of iterations mentioned.
model=FashionMNIST_Net().to(device) optimizer=optim.Adam(model.parameters(),lr=0.01) scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer,gamma=0.1) epochs=10 lrs=[] for epoch in range(1,epochs+1): train(model,optimizer,epoch,1000) test(model) scheduler.step()
Here we can see that for each epoch the model has iterated through the mentioned batch size and after each epoch, the testing loss and accuracy are retrieved. By validating the testing loss and accuracy the optimal learning rate can be used for the model.
So this is how the ExponentialLR learning rate scheduler is used in PyTorch.
Summary
Learning rate is a very important metric that is required for the model to converge to the optimal solution in a better way. So PyTorch offers a variety of learning rate schedulers with diverse properties that can be used along with different model architectures to enable the model to converge to its best optimal solution. Various decaying methods can be used and the model can be validated for each of the iterations and the batches to find the optimal learning rate for the model architecture in use.