Precision and Recall are two of the most important metrics to look at when evaluating an imbalanced classification model. These help us to find out what fraction of the actual positives were classified correctly and among the ones classified positive, what fraction were actually positives. Going ahead, we will try to explore the significance of these and thresholds in detail.

The topics that we will discuss in this article are the following:

## Table of Contents

- What are Precision and Recall?
- Why the need for Precision-Recall curve when ROC curve is there
- How to read a PR curve
- Baseline of PR curve
- PR Curve of a No Skill Model
- PR Curve of a Perfect Model
- PR Curve of a Good Model
- PR Curve of a Bad Model
- Finding optimal threshold from PR Curve

## What are Precision and Recall?

Precision is defined as the fraction of the relevant positives out of the retrieved ones. Recall is defined as the fraction of retrieved positives out of the relevant ones.

To understand from an example, let’s imagine we’re casting a fishing net in a lake and hoping to catch some fish. However, there are some stones as well in the lake and it’s likely that our net will end up catching some stones as well. Now there can be a few different ways to look at this situation:

- We may want to catch as many fish as possible from the lake, no matter how many stones we catch along with them.
- We may want to catch only the fish and minimize the number of stones that we catch, no matter how few fish we catch.
- Or, we may want to catch most of the fish present in the lake and at the same time minimize the number of stones caught.

Think of the fishing net as the **model** which gives out some **outputs** (preferably, fish). However, just *like everything in life*, no model is perfect and hence, our net catches some stones as well along with fish. The fish present in the lake is the **Relevant outputs**. The **Retrieved output** contains some fish and some stones.

Precision = fraction of fish among the retrieved stuff

Recall = fraction of fish retrieved from the lake

In Case 1, we want to maximize Recall and ignore Precision.

In Case 2, we want to maximize Precision and ignore Recall.

In Case 3, we want to keep a balance between Precision and Recall and try to maximize both at the same time. This is an ideal situation.

Now we can formally define Precision and Recall as follows:

Why is the need for a Precision-Recall curve when the ROC curve is there

To classify an output to either class (0 or 1), we need to apply a threshold filter (just like the fishing net). For example, a default threshold of 0.5 is taken to classify outputs (any output >= 0.5 will belong to class 1). I have talked about how different thresholds might be useful for different kinds of problems here.

Coming to the question of why we need a PR curve when we already have the ROC curve (which plots True and False Positive Rates for different thresholds). Here are 2 reasons why we need PR curve:

- ROC curve provides an overly optimistic picture of the performance, compared to PR curve, when it comes to imbalanced classification.
- Also, when class distribution changes, ROC curve doesn’t change, however, PR curve does reflect the change.

How to read a PR Curve

Fig 1

On the curves, each point corresponds to a different threshold, and its location corresponds to the resulting Precision and Recall when we choose that threshold.

Some important pointers on the curve:

- Point 1 corresponds to the threshold of 1
- Point 3 corresponds to the threshold of 0
- Point 4 corresponds to the threshold somewhere in the range (0, 1)
- Point 2 corresponds to a Perfect model (along with Point 3)

The Area under the PR Curve is a metric that helps us compare 2 similar looking curves. Higher the AUC, the better the performance.

The baseline of PR Curve

Fig 2

Fig 3

The Baseline of a PR Curve changes with class imbalance, unlike ROC Curve. This is due to the fact that Precision of a No-Skill model (which gives 0.5 score for every output) directly depends on the class imbalance.

Fig 2 shows the baseline corresponding to a balanced dataset, whereas, Fig 3 shows the baseline corresponding to a dataset having a 10% positive class.

**From here on, we will stick to a 10% positive class dataset (as is the case for so many real-life datasets).**

PR Curve of a ‘No Skill’ Model

Fig 4

The PR curve of a no-skill model (which gives 0.5 output for every data point) consists of 2 points:

- Point 1 Corresponds to threshold = 0.5
- Point 2 Corresponds to threshold 𝞊 [0, 0.5)

- Precision isn’t defined for thresholds 𝞊 (0.5, 1] for a no-skill model due to division by zero.
- You may notice that Precision is a constant here, i.e., 0.1 (= class imbalance)
- AUC = 0.1

## PR Curve of a Perfect Model

Fig 5

The PR curve of a perfect model also consists of 2 points:

- Point 1 Corresponds to threshold 𝞊 (0, 1]
- Point 2 Corresponds to threshold = 0

- AUC = 1

## PR Curve of a Good Model

Fig 6

The PR curve of a good model consists of as many points (or thresholds) as which result in a different set of precision and recall for the dataset:

- Point 1 Corresponds to threshold = 1
- Point 3 Corresponds to threshold = 0
- Point 4 Corresponds to threshold 𝞊 (0, 1)

- AUC 𝞊 (0.1, 1)

## PR Curve of a Bad Model

Fig 7

The PR curve of a bad model goes even below the baseline. The curve indicates that the model performs even worse than a no-skill model.

- An obvious way to improve the performance of this model, without tweaking anything, is to just reverse the output (Class 0 <-> Class 1) that the model gives. This will result in a higher-than-baseline performance automatically.
- Usually, a PR curve like this indicates that there’s definitely something wrong in the pipeline.
- It can be the case that the data is too random.
- Or, the model just can’t grasp ANY trend from the data (in other words, the model is too simple for the data). An example can be a basic linear model trying to fit on a complex non-linear dataset.

- AUC 𝞊 (0, 0.1)
- There might be a hybrid case where a bad model works better than baseline for certain thresholds.

## Finding optimal threshold from PR Curve

Fig 8

Since we now know that a good model’s PR curve approaches the Perfect model point (i.e. the point 2); it is quite intuitive that the optimal threshold for our model is going to be corresponding to the point on the curve which is closest to Point 2.

Here are 2 ways to find the optimal threshold:

- Find the euclidean distance of every point on the curve, which is denoted by (recall, precision) for a corresponding threshold, from (1,1).
- Pick the point and the corresponding threshold, for which the distance is minimum.

- Find F1 score for each point (recall, precision) and the point with the maximum F1 score is the desired optimal point.
- You may
*recall*(pun intended)

- You may

## Conclusion

Some key pointers worth noting:

- Recall of a No-Skill model lies in the set {0.5, 1} irrespective of the class imbalance.
- Precision of a No-Skill model is equal to the fraction of positive class in the dataset.
- It is possible to get a model which performs worse than a no-skill model, especially when the data is too complex for the model.
- Just like the ROC curve, a PR curve can also be used to find the optimal threshold.
- PR curve works better than ROC curve in cases of imbalanced data.

Today, we’ve learnt all the basics of PR Curve and how it is used in classification problems and its utility! I hope now you won’t have any problem working with PR Curves in your future classification models.

Also Read:

Complete Guide to Understanding ROC Curves

How to Use ROC Curves and Precision-Recall Curves for Classification in Python

## Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.Join our Telegram Group. Be part of an engaging community

Data Scientist at Piramal Capital & Housing Finance | IIT Bombay