What do you think should we consider only the accuracy score as a benchmark for our classification task? Many beginners in this field have misunderstood; getting good accuracy for classification models means they have built a perfect model which classifies every instance. Well, you can consider only accuracy as a benchmark for regression problems.
For better understanding, let’s take a famous and general example that every Data Science enthusiast comes through, i.e. Diabetes Prediction. So here, both classes means whether a person has diabetes or not is equally important under different conditions. Say you have trained your model for 200K samples with 180K samples as a negative class, 20K samples as a positive class, and you have achieved accuracy greater than 95% sounds good. Hold on! While this solution has nearly perfect accuracy, this problem is one in which accuracy is clearly not a proper metric to be used!
Diabetes detection and similar problems are mostly imbalanced classification tasks where most data points represent a negative class, and a positive class greatly outnumbered with a negative class. This is a fairly common problem in most of the classification tasks. In such a case the only accuracy metric is not a correct measure to check the performance of the model. Here Precision and Recall come in pictures when we want to get a clear insight about the performance of each class.
Let’s quickly jump to the coding, where we can check these things practically.
Code Implementation for Precision-Recall Tradeoff:
The Popular Heart Diseases dataset from the UCI repository is used to predict whether the patient is suffering from heart illness. You can download Dataset from here.
Importing all libraries:
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import confusion_matrix,classification_report from sklearn.metrics import precision_recall_curve
Selecting input features, target variable, train test split:
df = pd.read_csv('heart.csv') df.head()
x = df.drop(['target'],axis=1) y = df.target x_train, x_test, y_train, y_test = train_test_split(x,y, test_size =.33, random_state = True)
Training the model and Classification Report:
Logistic Regression is used to fit the model and to get clear information about precision and recall values. Solver changed from ‘lbgfs’ to ‘newton-cg’ to avoid model convergence issues.
model = LogisticRegression(solver='newton-cg').fit(x_train,y_train) y_predict = model.predict(x_test) print(classification_report(y_test,y_predict))
Deep dive to precision-recall Tradeoff:
To proceed further, one should know the confusion matrix.
The factor which told the exactness of the model. In our example from patients suffering from heart illness, how many are correctly predicted as positive.
We can say this model has predicted 77% correctly from the actual ground truth from the classification report.
The factor which told the completeness of the model in other words measure of the model correctly identifying True Positives.
For our case, the recall for the positive class is 0.81. Recall gives information about how accurately our model is able to identify the relevant data. The recall is also referred to as Sensitivity and True +ve rate.
For any problem, we mainly have to focus on either of the class or both. In our example, the aim of the model should have high recall means should have a lower number of false negatives. So if we say the model predicted that a person is not having heart illness, then he should not have heart illness.
If your main focus is to detect a person having heart illness, your model should have high precision, which means you have to lower the False Positive.
Unfortunately, you can’t have both precision and recall high. If you increase precision, it will reduce recall and vice versa. This is called the precision/recall tradeoff.
Classifier performs differently for different threshold values means positive and negative predication can be changed by setting the threshold value. Scikit does not provide a facility to set the threshold value but gives access to the decision score used in the backend to make predictions. You can find here how to use a decision score to change precision and recall for your model and to find a tradeoff point.
Here we are using a graphical method to detect tradeoffs between precision and recall.
y_decision_function = model.decision_function(x_test) precision,recall,threshold = precision_recall_curve(y_test,y_decision_function) plt.plot(recall,precision) plt.xlabel('Recall') plt.ylabel('Precision') plt.title('Precision Recall Tradeoff') plt.show()
From the above graph, see the trend; for precision to be 100%, we are getting recall roughly around 40%. You might choose the Tradeoff point where precision is nearly 87% and recall is around 70% from the graph. Again it depends on your problem or your priority which satisfies the needs of the actual problem.
So this is all about Precision-Recall Tradeoff, where we understood terminologies with great details and practical examples. Whenever you solve classification problems, imbalance in the outcome variable plays a very important role. My previous article where classification problem outcome variables are nearly balanced results in overall high precision and recall. So make sure you maintain a balance between the classes this helps to get overall good results.