Advertisement

Beginners Guide To Understanding ROC Curve: How To Find The Perfect Probability Threshold

Listen to this story

In the Machine Learning spectrum, Binary Classification is one of the simplest and most popular problems that has been widely used across domains. From identifying fraudulent bank transactions to classifying or diagnosing diseases, Binary Classifiers have been in use since the inception of Machine Learning.

Many classification algorithms like Logistic Regressor uses probability to distribute samples into classes and in most cases the probability threshold defaults to 0.5. Which means that the algorithm classifies a sample as positive if the probability of that sample being positive is above 0.5(50%) and classifies a sample as negative if the probability of that sample being positive is less 0.5(50%) 

This default threshold may not be enough when we consider a variety of problems that demand otherwise, say for example while diagnosing a disease it may be wise to choose a lower probability threshold to prevent any chance of the disease going misclassified. Thus the classification of critical data demands a more custom threshold which meets certain requirements. This is where the Receiver operating characteristic Curve shortly known as the ROC curve comes into play. It illustrates the diagnostic ability of a binary classifier.

In layman’s terms, the ROC curve visualises the effect of a chosen probability threshold on the classification efficiency. It helps analyse how the efficiency of Binary Classification changes with the values of Probability threshold.

The Confusion Matrix

The simplest way to analyse classification efficiency is by creating a Confusion Matrix with the predicted outcomes. The Confusion Matrix is also known as the Error Matrix as it determines the error rate in a classification prediction.

The following image represents a Confusion Matrix.

Consider a simple classification problem where the objective is to classify fraudulent and non-fraudulent transactions. Class 1 will be the positives which are the Fraudulent transactions and Class 2 will be the negatives or the non-fraudulent transactions.

Then,

  • True Positive(TP) is the number of predicted fraudulent transactions that are actually fraudulent transactions.
  • True Negative(TN) is the number of predicted non-fraudulent transactions that are actually non-fraudulent transactions.
  • False Positive(FP) is the number of non-fraudulent transactions which were incorrectly classified as fraudulent transactions.
  • False Negative(FN) is the number of fraudulent transactions that were incorrectly predicted or classified as non-fraudulent transactions.

Plotting The ROC Curve

The ROC curve can be plotted using the Confusion matrix as follows.

The ROC curve is created by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR) of a classification problem for various probability thresholds.

Lets us understand each of these terms.

False Positive Rate

False Positive Rate (FPR) also called fall out is the ratio of negative samples which are incorrectly classified. 

For example, if we consider the bank transaction example stated above, the False Positive Rate is the ratio of non-fraudulent transactions that were incorrectly classified as fraudulent transactions.

It can be calculated as follows:

True Positive Rate

True Positive Rate(TPR) also called as sensitivity, recall and hit rate is the ratio of Positive samples which were correctly classified. It can be calculated in the following ways.

True Positive Rate gives the ratio of fraudulent transactions which were correctly classified as fraudulent by the classifier.

We plot the (TPR, FPR) for different values of the probability threshold to obtain the ROC curve.

  • In the above figure, the red line indicates the points where the TPR and FPR are equal and any point on the line suggests that the ratio of correctly classified Positives is the same as the ratio of incorrectly classified Negatives which is only as good as a random classifier.
  • Any point above the red line is better than a random classifier. 
  • Any point to the left of the red line indicates that the ratio of true positives are greater than that of false positives indicating a better probability threshold as the point moves towards left.
  • The blue, orange and green lines represent ROC curves. Since the blue line is further away from the random classifier, it represents a better classifier in terms of efficiency.
  • The blue dot represents a perfect classifier. At coordinate (0,1) the FPR is 0 and TPR is 1, meaning that all the positives are correctly classified and no negatives are incorrectly classified. Simply this means that there are no incorrect predictions at all.

Closing Note

The ROC Graph thus provides a summary of all the Confusion matrices generated with different probability thresholds and helps us choose the right probability based on the required/acceptable False Positives.

Download our Mobile App

Amal Nair
A Computer Science Engineer turned Data Scientist who is passionate about AI and all related technologies. Contact: amal.nair@analyticsindiamag.com

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Bangalore

Future Ready | Lead the AI Era Summit

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

20th June | Bangalore

Women in Data Science (WiDS) by Intuit India

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR