MITB Banner

How does Mean Average Precision (mAP) work in Object Detection?

Mean average precision is the mean of precision-recall tradeoff
Listen to this story

Object detection models attempt to detect the existence of important things in pictures and classify those objects. A statistic known as Average Precision (AP) is used to assess the efficacy of the object recognition and localization method (and mean average precision). Precision scores from all classes do not add up to the average precision score. When recall values range from 0 to 1, the average precision value is computed. This article will focus on explaining the Mean Average Precision in the context of object detection. Following are the topics to be covered in this article.

Table of contents

  1. What is the Precision-Recall curve?
  2. What is Average precision (AP)?
  3. Using mAP as an evaluation metric

Let’s start from the basics of mAP which is the Precision-Recall curve.

What is the Precision-Recall curve?

The ROC-AUC curve (receiver operating characteristic curve) depicts a classification model’s performance overall categorization levels. Similarly, at IoU thresholds, there is a precision-recall curve that depicts the tradeoff between accuracy and recall.

Precision is a measure of “how frequently does your model predict right when it guesses?” The recall is a metric that asks, “Has your model guessed every time it should have guessed?” Consider the following picture, which contains ten fruits. A model with great accuracy but poor recall discover just one of these 10 but correctly identifies it as “fruit” (only one of ten fruits has been found).

Models with a confidence component can trade off precision for recall by altering the amount of confidence required to make a prediction. In other words, if the model is in a situation where avoiding false positives (saying fruit is present when it was a vegetable) is more important than avoiding false negatives, it can raise the confidence threshold to encourage the model to only produce high precision predictions at the expense of reducing its amount of coverage (recall).

The precision-recall curve is the process of visualising the model’s accuracy and recall as a function of the model’s confidence threshold. It slopes downhill because as confidence falls, more predictions are produced which aids recall and fewer exact forecasts are made reducing precision.

Analytics India Magazine

The primary distinction between ROC curves and precision-recall curves is that the number of true-negative findings is not considered to calculate a PRC.

Analytics India Magazine

Are you looking for a complete repository of Python libraries used in data science, check out here.

What is Average precision (AP)?

The goal of Average Precision (AP) is to evaluate the detector’s accuracy throughout the whole recall domain. As a result, it prefers approaches with accuracy over the whole recall domain over detectors with Recall-Precision curves closer to the top-right corner. 

In other words, AP compares the detectors’ overall performance rather than their highest capability/performance. Other parameters such as IoU, confusion matrix (True Positive, False Positive, False Negative), accuracy, and recall are used to construct AP.

If the class label of the predicted bounding box and the class label of the ground truth bounding box are the same and the IoU between them is larger than a threshold value, the prediction is said to be accurate.

We compute the following three metrics based on the IoU, threshold, and class labels of the ground truth and predicted bounding boxes.

  • True Positive: The model predicted the existence of a bounding box at a specific place (positive), and it was right (true)
  • False Positive: The model predicted the existence of a bounding box at a specific place (positive), but it was incorrect (false)
  • False Negative: The model predicted a bounding box at a certain place but was incorrect (false), indicating that a ground truth bounding box existed at that location.
  • True Negative: The model predicted a bounding box but it was incorrect (true). This relates to the backdrop, the region without bounding boxes, and is not utilised in the final metrics calculation.

Mathematics behind mAP

The area under the PR curve is used to determine the AP. The mean average precision (mAP) is a common metric used to assess the accuracy of an object detection model. The mAP for object detection is the average of all the APs computed. The average precision is mathematically expressed as:

Where,

p = precision  

r = recall

Similarly the formula for mean average precision.

Using mAP as an evaluation metric

This article uses custom data which contains prediction scores and ground-truth values of two classes predicted by a machine learning model. Consider the two classes as “positive” and “negative”. A custom precision-recall curve will be built to visualize the precision and recall tradeoff.

Let’s import the necessary libraries.

import numpy as np
import pandas as pd
from sklearn.metrics import precision_score, recall_score
import matplotlib.pyplot as plt

Creating the data

As discussed creating two different datasets containing binary data.

y_true_class1 = ["positive", "negative", "positive", "negative", "positive", "positive", "positive", "negative", "positive", "negative","positive", "negative", "positive", "positive"]
pred_scores_class1 = [0.72, 0.38, 0.55, 0.64, 0.55, 0.91, 0.75, 0.28, 0.88, 0.35, 0.76, 0.8, 0.47, 0.68]
y_true_class2 = ["negative", "positive", "positive", "negative", "negative", "positive", "positive", "positive", "negative", "positive", "negative", "positive", "negative", "negative"]
pred_scores_class2 = [0.32, 0.97, 0.45, 0.17, 0.25, 0.99, 0.55, 0.33, 0.35, 0.85, 0.62, 0.54, 0.39, 0.69]

Set a threshold value ranging from 0.2 to 0.9 with the step of 0.05.

thresholds = np.arange(start=0.2, stop=0.9, step=0.05)

Define precision-recall curve

Because accuracy and recall are important, there is a precision-recall curve that displays the tradeoff between precision and recall values for different thresholds. This curve assists in determining the appropriate threshold to optimise both measures.

The precision-recall curve requires the following inputs:

  • The labels of ground truth.
  • The prediction scores 
  • The threshold value for converting prediction scores to labels.
def precision_recall_curve(y_true, pred_scores, thresholds):
    precisions = []
    recalls = []
    
    for threshold in thresholds:
        y_pred = ["positive" if score >= threshold else "negative" for score in pred_scores]
 
        precision = precision_score(y_true=y_true, y_pred=y_pred, pos_label="positive")
        recall = recall_score(y_true=y_true, y_pred=y_pred, pos_label="positive")
        
        precisions.append(precision)
        recalls.append(recall)
 
    return precisions, recalls

Calculate the average precision scores for the first datasets.

precisions, recalls = precision_recall_curve(y_true=y_true_class1, 
                                             pred_scores=pred_scores_class1, 
                                             thresholds=thresholds)
 
plt.plot(recalls, precisions, linewidth=2, color="red", zorder=0)
 
plt.xlabel("Recall", fontsize=12, fontweight='bold')
plt.ylabel("Precision", fontsize=12, fontweight='bold')
plt.title("Precision-Recall Curve", fontsize=15, fontweight="bold")
plt.show()
 
precisions.append(1)
recalls.append(0)
 
precisions = np.array(precisions)
recalls = np.array(recalls)
 
avg_precision_class1 = np.sum((recalls[:-1] - recalls[1:]) * precisions[:-1])
print('============================================')
print('Average precision score:',np.round(avg_precision_class1,2))
Analytics India Magazine
Analytics India Magazine

Here got an average precision score of 0.86 which is good and in the precision-recall curve, it could be observed that for initial threshold values the recall was low and the precision was high which is bad and for the rest thresholds the precision did not exceed 0.86. With this, it could be concluded that the tradeoff was balanced at a precision of 0.86 and a recall of 0.68. 

Calculate the average precision scores for the second datasets

precisions_2, recalls_2 = precision_recall_curve(y_true=y_true_class2, 
                                             pred_scores=pred_scores_class2, 
                                             thresholds=thresholds)
 
plt.plot(recalls_2, precisions_2, linewidth=2, color="red", zorder=0)
 
plt.xlabel("Recall", fontsize=12, fontweight='bold')
plt.ylabel("Precision", fontsize=12, fontweight='bold')
plt.title("Precision-Recall Curve", fontsize=15, fontweight="bold")
plt.show()
 
precisions_2.append(1)
recalls_2.append(0)
 
precisions_2 = np.array(precisions_2)
recalls_2 = np.array(recalls_2)
 
avg_precision_class2 = np.sum((recalls_2[:-1] - recalls_2[1:]) * precisions_2[:-1])
print('============================================')
print('Average precision score:',np.round(avg_precision_class2,2))
Analytics India Magazine
Analytics India Magazine

Here got an average precision score of 0.83 which is good and in the precision-recall curve, it could be observed that for initial threshold values the recall was low and the precision was high which is bad and a sudden increase in recall but the precision was stable. 

Calculating the mAP

num_labels= 2
mAP = (avg_precision_class2 + avg_precision_class1)/num_labels
print('Mean average Precision score:',np.round(mAP,3))
Analytics India Magazine

The mean average precision score for a dataset containing binary data is 0.836 which is obtained by taking the mean of the average precision.

Conclusion

The Intersection of Union (IoU) threshold influences the AP metric. When employing AP as an assessment measure for object detection, selecting the IoU threshold becomes an arbitrary procedure. With this article, we have understood the Average precision and Mean average precision evaluation metrics for object detection.

References

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Sourabh Mehta

Sourabh Mehta

Sourabh has worked as a full-time data scientist for an ISP organisation, experienced in analysing patterns and their implementation in product development. He has a keen interest in developing solutions for real-time problems with the help of data both in this universe and metaverse.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories