Active Hackathon

Reality Of Metrics: Is Machine Learning Success Overhyped?

In one of the most revealing research papers written recent times, the researchers from Cornell Tech and Facebook AI quash the hype around the success of machine learning. They opine and even demonstrate that the trend appears to be overstated. In other words, the so-called cutting edge research or benchmark work perform similarly to one another even if they are a decade apart. In other words, the authors believe that metric learning algorithms have not made spectacular progress.

In this work, the authors try to demonstrate the significance of assessing algorithms more diligently and how few practices can help reflect ML success in reality.  


Sign up for your weekly dose of what's up in emerging technology.

Where Do Things Go Wrong

Over the past decade, deep convolutional networks have made tremendous progress. Their application in computer vision is almost everywhere; from classification to segmentation to object detection and even generative models. But is the metric evaluation carried out to track this progress has been leakproof? Are the techniques employed weren’t affected by the improvement in deep learning methods?

The goal of metric learning is to map data to an embedding space, where similar data are close together, and the rest are far apart. So, the authors begin with the notion that the deep networks have had a similar effect on metric learning. And, the combination of the two is known as deep metric learning.

The authors then examined flaws in the current research papers, including the problem of unfair comparisons and the weaknesses of commonly used accuracy metrics. They then propose a training and evaluation protocol that addresses these flaws and then run experiments on a variety of loss functions.

For instance, one benchmark paper in 2017, wrote the authors, used ResNet50, and then claimed huge performance gains. But the competing methods used GoogleNet, which has significantly lower initial accuracies. Therefore, the authors conclude that much of the performance gain likely came from the choice of network architecture, and not their proposed method. Practices such as these can put ML on headlines, but when we look at how much of these state-of-the-art models are really deployed, the reality is not that impressive.

The authors underline the importance of keeping the parameters constant if one has to prove that a certain new algorithm outperforms its contemporaries.

To carry out the evaluations, the authors introduce settings that cover the following:

  • Fair comparisons and reproducibility
  • Hyperparameter search via cross-validation
  • Informative accuracy metrics

As shown in the above plot, the trends, in reality, aren’t that far from the previous related works and this indicates that those who claim a dramatic improvement might not have been fair in their evaluation.

If a paper attempts to explain the performance gains of its proposed method, and it turns out that those performance gains are non-existent, then their explanation must be invalid as well.

The results show that when hyperparameters are properly tuned via cross-validation, most methods perform similarly to one another. This work, believe the authors, will lead to more investigation into the relationship between hyperparameters and datasets, and the factors related to particular dataset/architecture combinations. 

Key Findings

According to the authors, this work exposes the following:

  • Changes in network architecture, embedding size, image augmentation method, and optimisers leads to unfair comparisons
  • The use of accuracy metrics are either misleading or do not provide a complete picture of the embedding space
  • Papers have been inconsistent in their choice of the optimiser, and most papers do not present confidence intervals for their results
  • Papers do not check performance at regular intervals and report accuracy after training for a predetermined number of iterations

The authors conclude that if proper machine learning practices are followed, then the results of metric learning papers will better reflect reality, and can lead to better works in most impactful domains like self-supervised learning.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

The curious case of Google Cloud revenue

Porat had earlier said that Google Cloud was putting in money to make more money, but even with the bucket-loads of money that it was making, profitability was still elusive.

Global Parliaments can do much more with Artificial Intelligence

The world is using AI to enhance the performance of its policymakers. India, too, has launched its own machine learning system NeVA, which at the moment is not fully implemented across the nation. How can we learn and adopt from the advancement in the Parliaments around the world? 

Why IISc wins?

IISc was selected as the world’s top research university, trumping some of the top Ivy League colleges in the QS World University Rankings 2022