# Loss Functions in Deep Learning: An Overview

Neural Network uses optimising strategies like stochastic gradient descent to minimize the error in the algorithm. The way we actually compute this error is by using a Loss Function. It is used to quantify how good or bad the model is performing. These are divided into two categories i.e.Regression loss and Classification Loss.

Neural Network uses optimising strategies like stochastic gradient descent to minimize the error in the algorithm. The way we actually compute this error is by using a Loss Function. It is used to quantify how good or bad the model is performing. These are divided into two categories i.e.Regression loss and Classification Loss.

In this article, we will cover some of the loss functions used in deep learning and implement each one of them by using Keras and python.

### Regression Loss Function

Regression Loss is used when we are predicting continuous values like the price of a house or sales of a company.

### 1.Mean Squared Error

Mean Squared Error is the mean of squared differences between the actual and predicted value. If the difference is large the model will penalize it as we are computing the squared difference.

Practical Implementation

from sklearn.datasets import make_regression
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from matplotlib import pyplot
# generate regression dataset
X, y = make_regression(n_samples=5000, n_features=20, noise=0.1, random_state=1)
# standardize dataset
X = StandardScaler().fit_transform(X)
y = StandardScaler().fit_transform(y.reshape(len(y),1))[:,0]
# split into train and test
train1 = 2500
trainX, testX = X[:train1, :], X[train1:, :]
trainy, testy = y[:train1], y[train1:]
# define model
model = Sequential()
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='mean_squared_error', optimizer=opt)
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=50, verbose=0)
# evaluate the model
train_mse = model.evaluate(trainX, trainy, verbose=0)
test_mse = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_mse, test_mse))
# plot loss during training
pyplot.title('Mean Squared Error')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

### 2.Mean Squared Logarithmic Error Loss

Suppose we want to reduce the difference between the actual and predicted variable we can take the natural logarithm of the predicted variable then take the mean squared error. This will overcome the problem possessed by the Mean Square Error Method. The model will now penalize less in comparison to the earlier method.

# define model
model = Sequential()
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='mean_squared_logarithmic_error', optimizer=opt, metrics=['mse'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=50, verbose=0)
# evaluate the model
train_mse = model.evaluate(trainX, trainy, verbose=0)
test_mse = model.evaluate(testX, testy, verbose=0)
# plot loss during training
pyplot.title('Mean Squared Logarithmic Error Loss')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

### 3.Mean Absolute Error Loss

Sometimes there may be some data points which far away from rest of the points i.e outliers, in case of cases Mean Absolute Error Loss will be appropriate to use as it calculates the average of the absolute difference between the actual and predicted values.

# define model
model = Sequential()
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='mean_absolute_error', optimizer=opt, metrics=['mse'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=50, verbose=0)
# evaluate the model
train_mse = model.evaluate(trainX, trainy, verbose=0)
test_mse = model.evaluate(testX, testy, verbose=0)
# plot loss during training
pyplot.title('Mean Absolute Error Loss')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

### Binary Classification Loss Function

Suppose we are dealing with a Yes/No situation like “a person has diabetes or not”, in this kind of scenario Binary Classification Loss Function is used.

### 1.Binary Cross Entropy Loss

It gives the probability value between 0 and 1 for a classification task. Cross-Entropy calculates the average difference between the predicted and actual probabilities.

# Cross entropy loss
from sklearn.datasets import make_circles
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
from matplotlib import pyplot
# generate 2d classification dataset
X, y = make_circles(n_samples=5000, noise=0.1, random_state=1)
# split into train and test
train1 = 2500
trainX, testX = X[:train1, :], X[train1:, :]
trainy, testy = y[:train1], y[train1:]
# define model
model = Sequential()
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=50, verbose=0)
# evaluate the model
train_acc = model.evaluate(trainX, trainy, verbose=0)
test_acc = model.evaluate(testX, testy, verbose=0)
# plot loss during training
pyplot.title('Binary Cross Entropy Loss')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

### 2.Hinge Loss

This type of loss is used when the target variable has 1 or -1 as class labels. It penalizes the model when there is a difference in the sign between the actual and predicted class values.

These are particularly used in SVM models.

# Hinge Loss
model = Sequential()
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='hinge', optimizer=opt, metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=50, verbose=0)
# evaluate the model
train_acc = model.evaluate(trainX, trainy, verbose=0)
test_acc = model.evaluate(testX, testy, verbose=0)
# plot loss during training
pyplot.title('Hinge Loss')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

### Multi-Class Classification Loss Function

If we take a dataset like Iris where we need to predict the three-class labels: Setosa, Versicolor and Virginia, in such cases where the target variable has more than two classes Multi-Class Classification Loss function is used.

### 1.Categorical Cross Entropy Loss

These are similar to binary classification cross-entropy, used for multi-class classification problems.

# Multi-class Cross-entropy loss
from sklearn.datasets import make_blobs
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD
from keras.utils import to_categorical
from matplotlib import pyplot
# generate 2d classification dataset
X, y = make_blobs(n_samples=5000, centers=3, n_features=2, cluster_std=2, random_state=2)
# one hot encode output variable
y = to_categorical(y)
# split into train and test
train1 = 500
trainX, testX = X[:train1, :], X[train1:, :]
trainy, testy = y[:train1], y[train1:]
# define model
model = Sequential()
# compile model
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=50, verbose=0)
# evaluate the model
train_acc = model.evaluate(trainX, trainy, verbose=0)
test_acc = model.evaluate(testX, testy, verbose=0)
# plot loss during training
pyplot.title('Categorical Cross Entropy')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

### 2. Kullback Leibler Divergence Loss

Kullback Leibler Divergence Loss calculates how much a given distribution is away from the true distribution. These are used to carry out complex operations like autoencoder where there is a need to learn the dense feature representation.

# KL Divergence
model = Sequential()
# compile model
opt = SGD(lr=0.01, momentum=0.9)
model.compile(loss='kullback_leibler_divergence', optimizer=opt, metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=50, verbose=0)
# evaluate the model
train_acc = model.evaluate(trainX, trainy, verbose=0)
test_acc = model.evaluate(testX, testy, verbose=0)
# plot loss during training
pyplot.title('KL Divergence Loss')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()

### Conclusion

In this blog, we have covered most of the loss functions that are used in deep learning for regression and classification problem. Further, we can experiment with this loss function and check which is suitable for a particular problem. Hope this blog is useful to you.

The complete code of the above implementation is available at the AIM’s GitHub repository. Please visit this link to find the notebook of this code.

## More Great AIM Stories

### Virtual Real Estate: Stuff Science Fictions Are Made Of

A data analyst with expertise in statistical analysis, data visualization ready to serve the industry using various analytical platforms. I look forward to having in-depth knowledge of machine learning and data science. Outside work, you can find me as a fun-loving person with hobbies such as sports and music.

## Our Upcoming Events

Conference, in-person (Bangalore)
MachineCon 2022
24th Jun

Conference, Virtual
Deep Learning DevCon 2022
30th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Telegram Channel

Discover special offers, top stories, upcoming events, and more.

##### Why the high accuracy in classification is not always correct?

The high accuracy of classification model could be misleading.

LTI and Mindtree both play in Analytics services businesses, just like most other large IT/ITes service providers. But, what would the analytics services business of the merged entity look like?

##### GitHub now offers math support in markdown

GitHub’s math rendering capability uses MathJax; an open-source, JavaScript-based display engine.

Meta recently organised messaging event called ‘Conversations.’

##### Wipro announces 40,000 sq.ft. Innovation Studio in Texas

The studio will leverage Wipro’s deep reservoir of IPs, patents, and innovation DNA.

##### Google’s facial recognition tech to replace smart cards in Bengaluru metro trains￼

BMRCL plans to introduce the technology at its automatic fare collection gates.

##### Data science hiring process at DealShare

In the next few months, DealShare looks to grow its data science team by 15-20 members.

##### DeepMind’s AlphaFold 2 is half of the story

The idea was if I give you a sequence of amino acids, can you predict what will be the structure or the shape that it will take in the 3D space?

##### Lenskart invests USD 2 Mn in location intelligence platform GeoIQ

GeoIQ’s AI-based location tool will help Lenskart with its aggressive store rollout strategy.