Reality Of Metrics: Is Machine Learning Success Overhyped?

In one of the most revealing research papers written recent times, the researchers from Cornell Tech and Facebook AI quash the hype around the success of machine learning. They opine and even demonstrate that the trend appears to be overstated. In other words, the so-called cutting edge research or benchmark work perform similarly to one another even if they are a decade apart. In other words, the authors believe that metric learning algorithms have not made spectacular progress.

In this work, the authors try to demonstrate the significance of assessing algorithms more diligently and how few practices can help reflect ML success in reality.  

Where Do Things Go Wrong

Over the past decade, deep convolutional networks have made tremendous progress. Their application in computer vision is almost everywhere; from classification to segmentation to object detection and even generative models. But is the metric evaluation carried out to track this progress has been leakproof? Are the techniques employed weren’t affected by the improvement in deep learning methods?

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

The goal of metric learning is to map data to an embedding space, where similar data are close together, and the rest are far apart. So, the authors begin with the notion that the deep networks have had a similar effect on metric learning. And, the combination of the two is known as deep metric learning.

The authors then examined flaws in the current research papers, including the problem of unfair comparisons and the weaknesses of commonly used accuracy metrics. They then propose a training and evaluation protocol that addresses these flaws and then run experiments on a variety of loss functions.

For instance, one benchmark paper in 2017, wrote the authors, used ResNet50, and then claimed huge performance gains. But the competing methods used GoogleNet, which has significantly lower initial accuracies. Therefore, the authors conclude that much of the performance gain likely came from the choice of network architecture, and not their proposed method. Practices such as these can put ML on headlines, but when we look at how much of these state-of-the-art models are really deployed, the reality is not that impressive.

The authors underline the importance of keeping the parameters constant if one has to prove that a certain new algorithm outperforms its contemporaries.

To carry out the evaluations, the authors introduce settings that cover the following:

  • Fair comparisons and reproducibility
  • Hyperparameter search via cross-validation
  • Informative accuracy metrics

As shown in the above plot, the trends, in reality, aren’t that far from the previous related works and this indicates that those who claim a dramatic improvement might not have been fair in their evaluation.

If a paper attempts to explain the performance gains of its proposed method, and it turns out that those performance gains are non-existent, then their explanation must be invalid as well.

The results show that when hyperparameters are properly tuned via cross-validation, most methods perform similarly to one another. This work, believe the authors, will lead to more investigation into the relationship between hyperparameters and datasets, and the factors related to particular dataset/architecture combinations. 

Key Findings

According to the authors, this work exposes the following:

  • Changes in network architecture, embedding size, image augmentation method, and optimisers leads to unfair comparisons
  • The use of accuracy metrics are either misleading or do not provide a complete picture of the embedding space
  • Papers have been inconsistent in their choice of the optimiser, and most papers do not present confidence intervals for their results
  • Papers do not check performance at regular intervals and report accuracy after training for a predetermined number of iterations

The authors conclude that if proper machine learning practices are followed, then the results of metric learning papers will better reflect reality, and can lead to better works in most impactful domains like self-supervised learning.

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Download our Mobile App


AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIM Research

Pioneering advanced AI market research

Request Customised Insights & Surveys for the AI Industry

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.