Ultimate Guide To Loss functions In Tensorflow Keras API With Python Implementation

We have already covered the PyTorch loss functions implementations in our previous article, now we are heading forward to the other libraries that have been used more widely than PyTorch, today we are going to discuss the loss functions supported by the Tensorflow library, there are almost 15 different kinds of loss functions supported by TensorFlow, some of them are available in both Class and functions format you can call them as a class method or as a function. 

The class handles enable you to pass configuration arguments to the constructor (e.g. loss_fn = CategoricalCrossentropy(from_logits=True)), and they perform reduction by default when used in a standalone way they are defined separately, all the loss functions are available under Keras module, exactly like in PyTorch all the loss functions were available in Torch module, you can access Tensorflow loss functions by calling tf.keras.losses method.

Tensorflow Keras Loss functions

Remember, Keras is a deep learning API written in Python programming language and runs on top of TensorFlow. So don’t get confused in Keras and Tensorflow, both have their documentation of loss functions but with the same code, you can check out here:

You can refer to anyone as they are integrated into each other.

All losses are available both via a class handle and via a function handle. The class handles enable you to pass configuration arguments to the constructor (e.g. loss_fn = BinaryCrossentropy(from_logits=True)), and they perform reduction by default when used in a standalone usage.

Now we have three major categories of Loss functions:

  1. Probabilistic Losses
    1. Binary Crossentropy
    2. Categorical Crossentropy
    3. Sparse Categorical Crossentropy
    4. Poisson
    5. Kullback-Leibler Divergence
  2. Regression losses
    1. Mean Squared error
    2. Mean Absolute Error
    3. Mean Absolute percentage error
    4. Mean Squared Logarithmic Error
    5. Cosine Similarity
    6. Huber
    7. Log Cosh
  3. Hinge losses for “maximum-margin” classification
    1. Hinge
    2. Squared Hinge
    3. Categorical Hinge


You can use the loss function by simply calling tf.keras.loss as shown in the below command, and we are also importing NumPy additionally for our upcoming sample usage of loss functions:

import tensorflow as tf
import numpy as np
bce_loss = tf.keras.losses.BinaryCrossentropy()

1. Binary Cross-Entropy(BCE) loss

BCE is used to compute the cross-entropy between the true labels and predicted outputs, it is majorly used when there are only two label classes problems arrived like dog and cat classification(0 or 1), for each example, it outputs a single floating value per prediction.

Here is standalone usage of Binary Cross Entropy loss by taking sample y_true and y_pred data points:

y_true = [[0., 1.], [0., 0.]]
y_pred = [[0.5, 0.4], [0.4, 0.5]]

# Using 'auto'/'sum_over_batch_size' reduction type
bce_loss = tf.keras.losses.BinaryCrossentropy()
bce_loss(y_true, y_pred).numpy()

You can also call the loss using sample weight by using below command:

bce_loss(y_true, y_pred, sample_weight=[1, 0]).numpy()

2. Categorical Crossentropy loss

The categorical cross-entropy loss function is used to compute loss between labels and prediction, it is used when there are two or more label classes present in our problem use case like animal classification: cat, dog, elephant, horse, etc. Also if you ever want to use labels as integers, you can this loss functions confidently.

Below given example shows the standalone usage, The shape of both y_pred and y_true are [batch_size, num_classes].

# inputs
y_true = [[0, 1, 0], [0, 0, 1]]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]

# Using 'auto'/'sum_over_batch_size' reduction type.
cce_loss = tf.keras.losses.CategoricalCrossentropy()
cce_loss(y_true, y_pred).numpy()

3. Sparse Categorical Crossentropy loss

It is used when there are two or more label classes present in our case statement, and labels are expected to be provided in integers. If you want to provide labels using the one-hot encoding method, you should use the above method i.e. CategoricalCrossentropy loss.

In this, we use a single floating value for y_true and #classes floating pointing for y_pred.

y_true = [1, 2]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
loss = tf.keras.losses.sparse_categorical_crossentropy(y_true, y_pred)
assert loss.shape == (2,)

4. Poisson loss

Computes the Poisson loss between y_true and y_pred. The Poisson loss is the mean of the elements of the Tensor y_pred – y_true * log(y_pred).

y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [0., 0.]]

# Using 'auto'/'sum_over_batch_size.
p = tf.keras.losses.Poisson()
p(y_true, y_pred).numpy()

5. Kullback-Leibler Divergence loss

KL(P || Q) = – sum x in X P(x) * log(Q(x) / P(x))

KL divergence is calculated by doing a negative sum of the probability of each event in P and then multiplying it by the log of the probability of the event.

Kullback–Leibler divergence - Wikipedia

KLDivergence loss function computes loss between y_true and y_pred, formula is pretty simple:

y_true = [[0, 1], [0, 0]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]
# Using 'auto'/'sum_over_batch_size' reduction type.
kl = tf.keras.losses.KLDivergence()
kl(y_true, y_pred).numpy()

Learn more: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence

6. Mean Squared Error(MSE) 

Computes the mean of squares of errors between labels and predictions.

MSE tells you how close a regression line is to a set of points. And It does this by taking the distances from the points to the regression line and squaring them. The squaring is a must as it removes the negative signs from the problem. MSE also gives more weight to larger differences which are called the mean squared error.

y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [1., 0.]]
# Using 'auto'/'sum_over_batch_size' reduction type.
mse = tf.keras.losses.MeanSquaredError()
mse(y_true, y_pred).numpy()

7. MeanAbsoluteError

Computes the mean of the absolute difference between labels and predictions.

It is the difference between the measured value and the “true” value. For example, if a scale states 80 kg but you know your true weight is 79 kg , then the scale has an absolute error of 80  kg – 79 kg = 1 kg.

y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [1., 0.]]
# Using 'auto'/'sum_over_batch_size' reduction type.
mae = tf.keras.losses.MeanAbsoluteError()
mae(y_true, y_pred).numpy()

8. Mean Absolute Percentage Error(MAPE)

It is also known as mean absolute percentage deviation (MAPD), is a measure of prediction accuracy of a forecasting method in statistics, for example in trend estimation, also used as a loss function for regression problems in machine learning. It usually expresses the accuracy as a ratio defined by the formula:

It Computes the mean absolute percentage error between y_true and y_pred data points as shown in below standalone code usage:

y_true = [[2., 1.], [2., 3.]]
y_pred = [[1., 1.], [1., 0.]]
# Using 'auto'/'sum_over_batch_size' reduction type.
mape = tf.keras.losses.MeanAbsolutePercentageError()
mape(y_true, y_pred).numpy()

9. Mean Squared Logarithmic Error(MSLE)

MSE is a measure of the ratio between the true and predicted values.

Mean squared logarithmic error is, as the name suggests, a variation of the Mean Squared Error and it only cares about the percentual difference, that means MSLE will treat small fluctuations between small true and predicted value as the same as a big difference between large true and predicted values.

y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [1., 0.]]
# Using 'auto'/'sum_over_batch_size' reduction type.
msle = tf.keras.losses.MeanSquaredLogarithmicError()
msle(y_true, y_pred).numpy()

10. CosineSimilarity loss

Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space. This loss function Computes the cosine similarity between labels and predictions.

  • It is just a number between -1 and 1. 
  • When it is a negative number between -1 and 0, then
    • 0 indicates orthogonality, 
    • and values closer to -1 indicate greater similarity. 
y_true = [[0., 1.], [1., 1.]]
y_pred = [[1., 0.], [1., 1.]]

# Using 'auto'/'sum_over_batch_size' reduction type.
cosine_loss = tf.keras.losses.CosineSimilarity(axis=1)
cosine_loss(y_true, y_pred).numpy()

11. Huber loss

This function is quadratic for small values of a and linear for large values, It Computes the Huber loss between y_true and y_pred.

For each value of x in error = y_true – y_pred:

loss = 0.5 * x^2                  if |x| <= d
loss = 0.5 * d^2 + d * (|x| - d)  if |x| > d

Here d is delta.

y_true = [[0, 1], [0, 0]]
y_pred = [[0.5, 0.4], [0.4, 0.5]]

# Using 'auto'/'sum_over_batch_size' reduction type.
hub_loss = tf.keras.losses.Huber()
h_loss(y_true, y_pred).numpy()

12. LogCosh loss

Computes the logarithm of the hyperbolic cosine of the prediction error.

y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [0., 0.]]

# Using 'auto'/'sum_over_batch_size' reduction type.
l = tf.keras.losses.LogCosh()
l(y_true, y_pred).numpy()

13. Hinge loss

In machine learning and deep learning applications, the hinge loss is a loss function that is used for training classifiers. The hinge loss is used for problems like “maximum-margin” classification, most notably for support vector machines (SVMs)

Here y_true values are expected to be -1 or 1. In the case of binary: 0 or 1 is provided and then we will convert them to -1 or 1.

y_true = [[0., 1.], [0., 0.]]
y_pred = [[0.5, 0.4], [0.4, 0.5]]

# Using 'auto'/'sum_over_batch_size' reduction type.
h_loss = tf.keras.losses.Hinge()
h_loss(y_true, y_pred).numpy()

14. Squared Hinge loss

Similarly square hinge is just the square of hinge loss, 

y_true = [[0., 1.], [0., 0.]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]

# Using 'auto'/'sum_over_batch_size' reduction type.  
h = tf.keras.losses.SquaredHinge()
h(y_true, y_pred).numpy()

15. CategoricalHinge loss

Computes the categorical hinge loss between y_true and y_pred.

y_true = [[0, 1], [0, 0]]
y_pred = [[0.5, 0.4], [0.4, 0.5]]

# Using 'auto'/'sum_over_batch_size' reduction type.
h = tf.keras.losses.CategoricalHinge()
h(y_true, y_pred).numpy()


We have discussed almost all the major loss function supported by TensorFlow Keras API, we have covered already covered the PyTorch loss functions previously, for more you can follow the official documentation, some of the sources you can look for to try out these functions:

Read more:

More Great AIM Stories

Mohit Maithani
Mohit is a Data & Technology Enthusiast with good exposure to solving real-world problems in various avenues of IT and Deep learning domain. He believes in solving human's daily problems with the help of technology.
Yugesh Verma
How is Boolean algebra used in Machine learning?

Machine learning model with Boolean algebra starts with the data with a target variable and input or learner variables and using the set of rules it generates output value by considering a given configuration of input samples.

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM