Mohit is a Data & Technology Enthusiast with good exposure…

We have already covered the PyTorch loss functions implementations in our previous article, now we are heading forward to the other libraries that have been used more widely than PyTorch, today we are going to discuss the loss functions supported by the **Tensorflow **library, there are almost 15 different kinds of loss functions supported by TensorFlow, some of them are available in both Class and functions format you can call them as a class method or as a function.

The class handles enable you to pass configuration arguments to the constructor (e.g. ** loss_fn = CategoricalCrossentropy(from_logits=True**)), and they perform reduction by default when used in a standalone way they are defined separately, all the loss functions are available under Keras module, exactly like in PyTorch all the loss functions were available in Torch module, you can access Tensorflow loss functions by calling

**tf.keras.losses**method.

## Table of contents

- Tensorflow Keras Loss functions
- Implementation
- 1. Binary Cross-Entropy(BCE) loss
- 2. Categorical Crossentropy loss
- 3. Sparse Categorical Crossentropy loss
- 4. Poisson loss
- 5. Kullback-Leibler Divergence loss
- 6. Mean Squared Error(MSE)
- 7. MeanAbsoluteError
- 8. Mean Absolute Percentage Error(MAPE)
- 9. Mean Squared Logarithmic Error(MSLE)
- 10. CosineSimilarity loss
- 11. Huber loss
- 12. LogCosh loss
- 13. Hinge loss
- 14. Squared Hinge loss
- 15. CategoricalHinge loss
- Conclusion

## Tensorflow Keras Loss functions

Remember, **Keras** is a deep learning API written in Python programming language and runs on top of **TensorFlow**. So don’t get confused in Keras and Tensorflow, both have their documentation of loss functions but with the same code, you can check out here:

- Keras documentation
- Tensorflow Documentation

You can refer to anyone as they are integrated into each other.

All losses are available both via a class handle and via a function handle. The class handles enable you to pass configuration arguments to the constructor (e.g. ** loss_fn = BinaryCrossentropy(from_logits=True)**), and they perform reduction by default when used in a standalone usage.

Now we have three major categories of Loss functions:

- Probabilistic Losses
- Binary Crossentropy
- Categorical Crossentropy
- Sparse Categorical Crossentropy
- Poisson
- Kullback-Leibler Divergence

- Regression losses
- Mean Squared error
- Mean Absolute Error
- Mean Absolute percentage error
- Mean Squared Logarithmic Error
- Cosine Similarity
- Huber
- Log Cosh

- Hinge losses for “maximum-margin” classification
- Hinge
- Squared Hinge
- Categorical Hinge

## Implementation

You can use the loss function by simply calling **tf.keras.loss** as shown in the below command, and we are also importing NumPy additionally for our upcoming sample usage of loss functions:

```
import tensorflow as tf
import numpy as np
bce_loss = tf.keras.losses.BinaryCrossentropy()
```

## 1. Binary Cross-Entropy(BCE) loss

BCE is used to compute the cross-entropy between the true labels and predicted outputs, it is majorly used when there are only two label classes problems arrived like dog and cat classification(0 or 1), for each example, it outputs a single floating value per prediction.

Here is standalone usage of Binary Cross Entropy loss by taking sample *y_true *and *y_pred *data points:

Download our Mobile App

#inputs y_true = [[0., 1.], [0., 0.]] y_pred = [[0.5, 0.4], [0.4, 0.5]]# Using 'auto'/'sum_over_batch_size' reduction typebce_loss = tf.keras.losses.BinaryCrossentropy() bce_loss(y_true, y_pred).numpy()

You can also call the loss using sample weight by using below command:

bce_loss(y_true, y_pred, sample_weight=[1, 0]).numpy()

## 2. Categorical Crossentropy loss

The categorical cross-entropy loss function is used to compute loss between labels and prediction, it is used when there are two or more label classes present in our problem use case like animal classification: cat, dog, elephant, horse, etc. Also if you ever want to use labels as integers, you can this loss functions confidently.

Below given example shows the standalone usage, The shape of both **y_pred **and **y_true **are *[batch_size, num_classes].*

```
# inputs
y_true = [[0, 1, 0], [0, 0, 1]]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
# Using 'auto'/'sum_over_batch_size' reduction type.
cce_loss = tf.keras.losses.CategoricalCrossentropy()
cce_loss(y_true, y_pred).numpy()
```

## 3. Sparse Categorical Crossentropy loss

It is used when there are two or more label classes present in our case statement, and labels are expected to be provided in integers. If you want to provide labels using the **one-hot **encoding method, you should use the above method i.e. *CategoricalCrossentropy* loss.

In this, we use a single floating value for y_true and #classes floating pointing for y_pred.

```
y_true = [1, 2]
y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
loss = tf.keras.losses.sparse_categorical_crossentropy(y_true, y_pred)
assert loss.shape == (2,)
loss.numpy()
```

## 4. Poisson loss

Computes the Poisson loss between y_true and y_pred. The Poisson loss is the **mean **of the **elements **of the Tensor **y_pred – y_true * log(y_pred)**.

```
y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [0., 0.]]
# Using 'auto'/'sum_over_batch_size.
p = tf.keras.losses.Poisson()
p(y_true, y_pred).numpy()
```

## 5. Kullback-Leibler Divergence loss

#### KL(P || Q) = – sum x in X P(x) * log(Q(x) / P(x))

KL divergence is calculated by doing a negative sum of the probability of each event in P and then multiplying it by the log of the probability of the event.

KLDivergence loss function computes loss between y_true and y_pred, formula is pretty simple:

```
y_true = [[0, 1], [0, 0]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]
# Using 'auto'/'sum_over_batch_size' reduction type.
kl = tf.keras.losses.KLDivergence()
kl(y_true, y_pred).numpy()
```

Learn more: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence

## 6. Mean Squared Error(MSE)

Computes the mean of squares of errors between labels and predictions.

MSE tells you how close a regression line is to a set of points. And It does this by taking the distances from the points to the regression line and squaring them. The squaring is a must as it removes the negative signs from the problem. MSE also gives more weight to larger differences which are called the mean squared error.

```
y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [1., 0.]]
# Using 'auto'/'sum_over_batch_size' reduction type.
mse = tf.keras.losses.MeanSquaredError()
mse(y_true, y_pred).numpy()
```

## 7. MeanAbsoluteError

Computes the mean of the absolute difference between labels and predictions.

It is the difference between the measured value and the “true” value. For example, if a scale states 80 kg but you know your true weight is 79 kg , then the scale has an absolute error of 80 kg – 79 kg = 1 kg.

```
y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [1., 0.]]
# Using 'auto'/'sum_over_batch_size' reduction type.
mae = tf.keras.losses.MeanAbsoluteError()
mae(y_true, y_pred).numpy()
```

## 8. Mean Absolute Percentage Error(MAPE)

It is also known as mean absolute percentage deviation (MAPD), is a measure of prediction accuracy of a forecasting method in statistics, for example in trend estimation, also used as a loss function for regression problems in machine learning. It usually expresses the accuracy as a ratio defined by the formula:

It Computes the mean absolute percentage error between y_true and y_pred data points as shown in below standalone code usage:

```
y_true = [[2., 1.], [2., 3.]]
y_pred = [[1., 1.], [1., 0.]]
# Using 'auto'/'sum_over_batch_size' reduction type.
mape = tf.keras.losses.MeanAbsolutePercentageError()
mape(y_true, y_pred).numpy()
```

## 9. Mean Squared Logarithmic Error(MSLE)

MSE is a measure of the ratio between the true and predicted values.

Mean squared logarithmic error is, as the name suggests, a variation of the Mean Squared Error and it only cares about the percentual difference, that means MSLE will treat small fluctuations between small true and predicted value as the same as a big difference between large true and predicted values.

```
y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [1., 0.]]
# Using 'auto'/'sum_over_batch_size' reduction type.
msle = tf.keras.losses.MeanSquaredLogarithmicError()
msle(y_true, y_pred).numpy()
```

## 10. CosineSimilarity loss

Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space. This loss function Computes the cosine similarity between labels and predictions.

- It is just a number between -1 and 1.
- When it is a negative number between -1 and 0, then
- 0 indicates orthogonality,
- and values closer to -1 indicate greater similarity.

```
y_true = [[0., 1.], [1., 1.]]
y_pred = [[1., 0.], [1., 1.]]
# Using 'auto'/'sum_over_batch_size' reduction type.
cosine_loss = tf.keras.losses.CosineSimilarity(axis=1)
cosine_loss(y_true, y_pred).numpy()
```

## 11. Huber loss

This function is quadratic for small values of a and linear for large values, It Computes the Huber loss between **y_true **and **y_pred**.

For each value of x in **error = y_true – y_pred**:

loss = 0.5 * x^2 if |x| <= d loss = 0.5 * d^2 + d * (|x| - d) if |x| > d

Here d is delta.

```
y_true = [[0, 1], [0, 0]]
y_pred = [[0.5, 0.4], [0.4, 0.5]]
# Using 'auto'/'sum_over_batch_size' reduction type.
hub_loss = tf.keras.losses.Huber()
h_loss(y_true, y_pred).numpy()
```

## 12. LogCosh loss

Computes the logarithm of the hyperbolic cosine of the prediction error.

```
y_true = [[0., 1.], [0., 0.]]
y_pred = [[1., 1.], [0., 0.]]
# Using 'auto'/'sum_over_batch_size' reduction type.
l = tf.keras.losses.LogCosh()
l(y_true, y_pred).numpy()
```

## 13. Hinge loss

In machine learning and deep learning applications, the hinge loss is a loss function that is used for training classifiers. The hinge loss is used for problems like “maximum-margin” classification, most notably for support vector machines (SVMs)

Here y_true values are expected to be -1 or 1. In the case of binary: 0 or 1 is provided and then we will convert them to -1 or 1.

```
y_true = [[0., 1.], [0., 0.]]
y_pred = [[0.5, 0.4], [0.4, 0.5]]
# Using 'auto'/'sum_over_batch_size' reduction type.
h_loss = tf.keras.losses.Hinge()
h_loss(y_true, y_pred).numpy()
```

## 14. Squared Hinge loss

Similarly square hinge is just the square of hinge loss,

```
y_true = [[0., 1.], [0., 0.]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]
# Using 'auto'/'sum_over_batch_size' reduction type.
h = tf.keras.losses.SquaredHinge()
h(y_true, y_pred).numpy()
```

## 15. CategoricalHinge loss

Computes the categorical hinge loss between y_true and y_pred.

```
y_true = [[0, 1], [0, 0]]
y_pred = [[0.5, 0.4], [0.4, 0.5]]
# Using 'auto'/'sum_over_batch_size' reduction type.
h = tf.keras.losses.CategoricalHinge()
h(y_true, y_pred).numpy()
```

## Conclusion

We have discussed almost all the major loss function supported by TensorFlow Keras API, we have covered already covered the PyTorch loss functions previously, for more you can follow the official documentation, some of the sources you can look for to try out these functions:

- Keras documentation
- Tensorflow Documentation
- Working Jupyter Notebook

### Read more:

###### What's Your Reaction?

Mohit is a Data & Technology Enthusiast with good exposure to solving real-world problems in various avenues of IT and Deep learning domain. He believes in solving human's daily problems with the help of technology.