# Accuracy Isn’t the Best Metric for Imbalanced Data

While the module may yield 99% accuracy, would it be a success? Instead of a high percentage, reliability should be a focus.
 Listen to this story

What is accuracy? It’s the degree of closeness to the acquired result and true value, at least that is what our mathematics teacher taught us. But, is accuracy always the correct metric to check data? In the case of ML Classification, it certainly doesn’t seem so.

Here, the model first predicts the examples in the dataset individually and then compares them to the known labels of the given examples. In that case, the accuracy is – correct predictions÷ total predictions.

Let’s take an example: If we were to assume all the trains were on time and tried predicting the future outcome by naming the data of the number of trains running at the correct time as ‘On-time (1)’ and late trains as ‘Late (0)’, then the accuracy of ‘On-time’ trains would be 100% and that of the ‘late’ trains would be 0%.

Since the accuracy model is easy to understand and use, it is one of the most used models in the market. However, there are problems with it; it can’t be used for an imbalanced dataset

If the data considered in both cases is equal, we call it a balanced data. For example, if the data for ‘On-time(1)’ incidents is 50% and for ‘Late(0)’ is 50%, it can be termed balanced data. Similarly, imbalanced data would be if the ‘On-time(1)’ data covers around 99%+ and ‘Late(0)’ data covers the rest.

This is where the problem arises. In ML classification, accuracy just isn’t the best option to check an imbalanced data. There could be 20,000 cases considered in the ‘On-time(1)’ dataset and only 200 in the ‘Late(0)’. The system is bound to overlook the second dataset and predict that 100% of trains are ‘On Time’, however, in reality, it has studied only a small amount of data in the second case.

Data imbalance in cases like 1:5 or 1:20 won’t impact the result as much as in cases of 1:10000. ML models aim to achieve accuracy and in case of high data imbalance, it will term the bigger data in case 1 as ‘normal’ and the smaller data in case 2 as ‘abnormal’. Moreover, the model will ignore the smaller data and focus on the larger data to acquire high accuracy.

Data imbalance in the above example can create problems. Since the system arrived at a conclusion of 100% ‘On-time’ trains, ignoring the small dataset of ‘Late’ trains, the future passengers would suffer.

To avoid such blunder, it’s advisable to use the F1 score or MCC metric. Though disputable, the MCC metric is often considered to be more trusted than the F1 Score. Both are single-value metrics and summarize the confusion matrix. However, F1 score tends to ignore true negatives while the MCC metric is much more inclusive.

But first, what is the confusion matrix? As discussed earlier, the model predicts given examples in a test sample and then matches with the given labels of the example. It can be shown in a tabular form and is called the Confusion Matrix.

Now let’s take a look at the formulas for F1 score and MCC matrix:

As we can see, the MCC metric does take account of all the four entries in the confusion matrix and hence, arrives at better results. For example, if we look at the given confusion matrix on MCC’s Wikipedia page:

TP = 90, FP = 4; TN = 1, FN = 5

In this case, the accuracy came out to be 91% and the F1 score was around 95%, which at the first glance looks perfect. But if we use the MCC metric here, the value arrives at 0.14, which indicates that the model is performing poorly.

Similarly, if we take TP=0, FP=0; TN=5, FN=95, the F1 score would arrive at 0% but the same is not the case with the MCC metric.

“For these reasons, we strongly push for the evaluation of each test performance through the Matthews correlation coefficient (MCC), instead of the accuracy and the F1 score, for any binary classification problem,” says Davide Chicco, author of Ten Quick Tips for Machine Learning in Computational Biology.

## Our Upcoming Events

### Telegram group

Discover special offers, top stories, upcoming events, and more.

### Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

### Apple Should be Scared of Windows Copilot

Copilot will start its early rollout as part of the free Windows 11 update, beginning on September 26

### Top 5 Libraries in C/C++ for ML in 2023

There are tons of libraries in C/C++ for ML, such as TensorFlow, Caffe, and mlpack

### Tesla Optimus Finally Learns Yoga, Performs Vrikshasana

Jim Fan, senior AI scientist at NVIDIA, has come forward with insights on how exactly Optimus functions with such brilliance

### NVIDIA’s Dominance Set to Surge Further

NVIDIA’s Meteoric Rise in 2023: On Track to Surpass \$50 Billion Revenue, Achieves \$1 Trillion Market Cap, and Forges Global Partnerships for AI Dominance.

### 6 Brilliant JavaScript Frameworks for Every Developer

Although Python and R are more famous for machine learning, Java can serve this purpose effectively, especially if you’re already familiar with it

### Meet the Researcher Curing the Healthcare System with ML

Ziad Obermeyer is bringing the long-delayed impact of ML in healthcare

### Why Focus on Future AI Regulations When Deepfake Crimes Persist?

With discussions on AI regulations happening on one side, and deepfake crimes increasing on the other, shouldn’t the present be checked before moving to the future?

### Can Stability AI and Meta Meet OpenAI’s Multimodal Challenge?

Llama 3 is anticipated to introduce open-source multimodal capabilities

### How Oracle is Fuelling Musk’s Ambitions

“This guy is landing rockets on robot drones. Who else is landing rockets? Who are you?” said Larry about Elon.