Have you ever wondered how we humans evolved so much? – because we learn from our mistakes and try to continuously improve ourselves on the basis of those mistakes now the same case is with machines, just like humans machines can also tend to learn from their mistakes but how? – In neural networks & AI, we always give freedom to algorithms to find the best prediction but one can not improve without comparing it with its previous mistakes, hence comes the Loss function in the picture.
Loss functions are the mistakes done by machines if the prediction of the machine learning algorithm is further from the ground truth that means the Loss function is big, and now machines can improve their outputs by decreasing that loss function. Earlier we used the loss functions algorithms manually and wrote them according to our problem but now libraries like PyTorch have made it easy for users to simply call the loss function by one line of code.
Today we will be discussing the PyTorch all major Loss functions that are used extensively in various avenues of Machine learning tasks with implementation in python code inside jupyter notebook. Now According to different problems like regression or classification we have different kinds of loss functions, PyTorch provides almost 19 different loss functions.
Table of contents
- Loss function
- Getting started
- 1. Mean Absolute Error (nn.L1Loss)
- 2. Mean Squared Error (nn.L2Loss)
- 3. Binary Cross Entropy(nn.BCELoss)
- 4. BCEWithLogitsLoss(nn.BCEWithLogitsLoss)
- 5. Negative Log-Likelihood Loss(nn.NLLLoss)
- 6. PoissonNLLLoss (nn.PoissonNLLLoss)
- 7. Cross-Entropy Loss(nn.CrossEntropyLoss)
- 8 Hinge Embedding Loss(nn.HingeEmbeddingLoss)
- 9. Margin Ranking Loss (nn.MarginRankingLoss)
- 10. Smooth L1Loss
- 11. Triplet Margin Loss Function(nn.TripletMarginLoss)
- 12. Kullback-Leibler divergence(nn.KLDivLoss)
- Wrapping Up
loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some “cost” associated with the event. An optimization problem seeks to minimize a loss function. An objective function is either a loss function or its negative (in specific domains, variously called a reward function, a profit function, a utility function, a fitness function, etc.), in which case it is to be maximized.Wikipedia
You can try the tutorial below in Google Colab, it comes with a preinstalled major data science package, including PyTorch.
import torch loss = torch.nn.L1Loss()
To run PyTorch locally into your machine you can download PyTorch from here according to your build: https://pytorch.org/get-started/locally/
Torch is a Tensor library like NumPy, with strong GPU support, Torch.nn is a package inside the PyTorch library. It helps us in creating and training the neural network. Read more about torch.nn here
Jump straight to the Jupyter Notebook here
1. Mean Absolute Error (nn.L1Loss)
It is the simplest form of error metric. Mean Absolute Error(MAE) measures the numerical distance between predicted and true value by subtracting and then dividing it by the total number of data points. MAE is a linear score metric. Let’s see how to calculate it without using the PyTorch module.
Algorithmic way of find loss Function without PyTorch module
import numpy as np y_pred = np.array([0.000, 0.100, 0.200]) y_true = np.array([0.000, 0.200, 0.250]) # Defining Mean Absolute Error loss function def mae(pred, true): # Find absolute difference differences = pred - true absolute_differences = np.absolute(differences) # find the absolute mean mean_absolute_error = absolute_differences.mean() return mean_absolute_error mae_value = mae(y_pred, y_true) print ("MAE error is: " + str(mae_value))
With PyTorch module(nn.L1Loss)
import torch mae_loss = torch.nn.L1Loss() input = torch.tensor(y_pred) target = torch.tensor(y_true) output = mae_loss(input, target) print(output)
2. Mean Squared Error (nn.L2Loss)
Like, Mean absolute error(MAE), Mean squared error(MSE) sums the squared paired differences between ground truth and prediction divided by the number of such pairs.
MSE loss function is generally used when larger errors are well-noted, But there are some cons like it also squares up the units of data. Which makes an evaluation with different units not at all justified.
Mean-Squared Error using PyTorch
target = torch.randn(3, 4) mse_loss = nn.MSELoss() output = mse_loss(input, target) output.backward() print('input -: ', input) print('target -: ', target) print('output -: ', output)
3. Binary Cross Entropy(nn.BCELoss)
This loss metric creates a criterion that measures the BCE between the target and the output. Also with binary cross-entropy loss function, we use the Sigmoid activation function which works as a squashing function and hence limits the output to a range between 0 and 1.
Using Binary Cross Entropy loss function without Module
y_pred = np.array([0.1580, 0.4137, 0.2285]) y_true = np.array([0.0, 1.0, 0.0]) #2 labels: (0,1) def BCE(y_pred, y_true): total_bce_loss = np.sum(-y_true * np.log(y_pred) - (1 - y_true) * np.log(1 - y_pred)) # Getting the mean BCE loss num_of_samples = y_pred.shape mean_bce_loss = total_bce_loss / num_of_samples return mean_bce_loss bce_value = BCE(y_pred, y_true) print ("BCE error is: " + str(bce_value))
Binary Cross Entropy(BCELoss) using PyTorch
bce_loss = torch.nn.BCELoss() sigmoid = torch.nn.Sigmoid() # Ensuring inputs are between 0 and 1 input = torch.tensor(y_pred) target = torch.tensor(y_true) output = bce_loss(input, target) output
It adds a Sigmoid layer and the BCELoss in one single class. This provides numerical stability for log-sum-exp. It is more numerically stable than using a plain Sigmoid followed by a BCELoss.
target = torch.ones([10, 64], dtype=torch.float32) # 64 classes, batch size = 10 output = torch.full([10, 64], 1.5) # A prediction (logit) pos_weight = torch.ones() # All weights are equal to 1 criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight) criterion(output, target) # -log(sigmoid(1.5))
5. Negative Log-Likelihood Loss(nn.NLLLoss)
The negative log likelihood loss is mostly used in classification problems, here Likelihood refers to the chances of some calculated parameters producing some known data.
input = torch.randn(3, 5, requires_grad=True) # every element in target should have value(0 <= value < C) target = torch.tensor([1, 0, 4]) m = nn.LogSoftmax(dim=1) nll_loss = nn.NLLLoss() output = nll_loss(m(input), target) output.backward() print('input -: ', input) print('target -: ', target) print('output -: ', output)
6. PoissonNLLLoss (nn.PoissonNLLLoss)
This loss represents the Negative log likelihood loss with Poisson distribution of target, below is the formula for PoissonNLLLoss.
import torch.nn as nn loss = nn.PoissonNLLLoss() log_input = torch.randn(5, 2, requires_grad=True) target = torch.randn(5, 2) output = loss(log_input, target) output.backward() print(output)
7. Cross-Entropy Loss(nn.CrossEntropyLoss)
Cross-Entropy loss or Categorical Cross-Entropy (CCE) is an addition of the Negative Log-Likelihood and Log Softmax loss function, it is used for tasks where more than two classes have been used such as the classification of vehicle Car, motorcycle, truck, etc.
The above formula is just the generalization of binary cross-entropy with an additional summation of all classes: j
input = torch.randn(3, 5, requires_grad=True) target = torch.empty(3, dtype=torch.long).random_(5) cross_entropy_loss = nn.CrossEntropyLoss() output = cross_entropy_loss(input, target) output.backward() print('input: ', input) print('target: ', target) print('output: ', output)
8 Hinge Embedding Loss(nn.HingeEmbeddingLoss)
Hinge Embedding loss is used for calculating the losses when the input tensor:x, and a label tensor:y values are between 1 and -1, Hinge embedding is a good loss function for binary classification problems.
target = torch.randn(3, 5) hinge_loss = nn.HingeEmbeddingLoss() output = hinge_loss(input, target) output.backward() print('input -: ', input) print('target -: ', target) print('output -: ', output)
9. Margin Ranking Loss (nn.MarginRankingLoss)
Margin Ranking Loss computes the criterion to predict the distances between inputs. This loss function is very different from others, like MSE or Cross-Entropy loss function.
This function can calculate the loss provided there are inputs X1, X2, as well as a label tensor, y containing 1 or -1. When the value of y is 1 the first input will be assumed as the larger value and will be ranked higher than the second input. Similarly if y=-1, the second input will be ranked as higher. It is mostly used in ranking problems.
first_input = torch.randn(3, requires_grad=True) Second_input = torch.randn(3, requires_grad=True) target = torch.randn(3).sign() ranking_loss = nn.MarginRankingLoss() output = ranking_loss(first_input, Second_input, target) output.backward() print('input one: ', first_input) print('input two: ', Second_input) print('target: ', target) print('output: ', output)
10. Smooth L1Loss
It is also known as Huber loss, uses a squared term if the absolute error goes less than1, and an absolute term otherwise. SmoothL1 loss is more sensitive to outliers than the other loss functions like mean square error loss and in some cases, it can also prevent exploding gradients.
sample, target = dataset[i] target_predicted = model(sample) loss = torch.nn.L1Loss() loss_value = loss(target, target_predicted)
11. Triplet Margin Loss Function(nn.TripletMarginLoss)
The Triplet Margin Loss function is used to determine the relative similarity existing between the samples, and it is used in content-based retrieval problems.
This function can calculate the loss when there are input tensors: x1, x2, x3, as well as margin with a value greater than zero a triplet consists of: an anchor: a, positive examples: p, and negative examples:n
anchor = torch.randn(100, 128, requires_grad=True) positive = torch.randn(100, 128, requires_grad=True) negative = torch.randn(100, 128, requires_grad=True) triplet_margin_loss = nn.TripletMarginLoss(margin=1.0, p=2) output = triplet_margin_loss(anchor, positive, negative) output.backward() print('anchors -: ', anchor) print('positive -: ', positive) print('negative -: ', negative) print('output -: ', output)
12. Kullback-Leibler divergence(nn.KLDivLoss)
Also known as the KL divergence loss function is used to compute the amount of lost information in case the predicted outputs are utilized to estimate the expected target prediction.
It outputs the proximity of two probability distributions If the value of the loss function is zero, it implies that the probability distributions are the same.
Kullback-Leibler divergence behaves mostly like the Cross-Entropy Loss function, the only difference is Cross entropy punishes the model on basis of confidence of predictions, and KL Divergence doesn’t!
input = torch.randn(2, 3, requires_grad=True) target = torch.randn(2, 3) kld_loss = nn.KLDivLoss(reduction = 'batchmean') output = kld_loss(input, target) output.backward() print('input tensor: ', input) print('target tensor: ', target) print('Loss: ', output)
That’s it we covered all the major PyTorch’s loss functions, and their mathematical definitions, algorithm implementations, and PyTorch’s API hands-on in python.
The Working Notebook of the above Guide is available at here You can find the full source code behind all these PyTorch’s Loss functions Classes here. Some of the loss functions which we didn’t cover in this tutorial, you can learn more about their usage from the below references: