MITB Banner

Reality Of Metrics: Is Machine Learning Success Overhyped?

Share

In one of the most revealing research papers written recent times, the researchers from Cornell Tech and Facebook AI quash the hype around the success of machine learning. They opine and even demonstrate that the trend appears to be overstated. In other words, the so-called cutting edge research or benchmark work perform similarly to one another even if they are a decade apart. In other words, the authors believe that metric learning algorithms have not made spectacular progress.

In this work, the authors try to demonstrate the significance of assessing algorithms more diligently and how few practices can help reflect ML success in reality.  

Where Do Things Go Wrong

Over the past decade, deep convolutional networks have made tremendous progress. Their application in computer vision is almost everywhere; from classification to segmentation to object detection and even generative models. But is the metric evaluation carried out to track this progress has been leakproof? Are the techniques employed weren’t affected by the improvement in deep learning methods?

The goal of metric learning is to map data to an embedding space, where similar data are close together, and the rest are far apart. So, the authors begin with the notion that the deep networks have had a similar effect on metric learning. And, the combination of the two is known as deep metric learning.

The authors then examined flaws in the current research papers, including the problem of unfair comparisons and the weaknesses of commonly used accuracy metrics. They then propose a training and evaluation protocol that addresses these flaws and then run experiments on a variety of loss functions.

For instance, one benchmark paper in 2017, wrote the authors, used ResNet50, and then claimed huge performance gains. But the competing methods used GoogleNet, which has significantly lower initial accuracies. Therefore, the authors conclude that much of the performance gain likely came from the choice of network architecture, and not their proposed method. Practices such as these can put ML on headlines, but when we look at how much of these state-of-the-art models are really deployed, the reality is not that impressive.

The authors underline the importance of keeping the parameters constant if one has to prove that a certain new algorithm outperforms its contemporaries.

To carry out the evaluations, the authors introduce settings that cover the following:

  • Fair comparisons and reproducibility
  • Hyperparameter search via cross-validation
  • Informative accuracy metrics

As shown in the above plot, the trends, in reality, aren’t that far from the previous related works and this indicates that those who claim a dramatic improvement might not have been fair in their evaluation.

If a paper attempts to explain the performance gains of its proposed method, and it turns out that those performance gains are non-existent, then their explanation must be invalid as well.

The results show that when hyperparameters are properly tuned via cross-validation, most methods perform similarly to one another. This work, believe the authors, will lead to more investigation into the relationship between hyperparameters and datasets, and the factors related to particular dataset/architecture combinations. 

Key Findings

According to the authors, this work exposes the following:

  • Changes in network architecture, embedding size, image augmentation method, and optimisers leads to unfair comparisons
  • The use of accuracy metrics are either misleading or do not provide a complete picture of the embedding space
  • Papers have been inconsistent in their choice of the optimiser, and most papers do not present confidence intervals for their results
  • Papers do not check performance at regular intervals and report accuracy after training for a predetermined number of iterations

The authors conclude that if proper machine learning practices are followed, then the results of metric learning papers will better reflect reality, and can lead to better works in most impactful domains like self-supervised learning.

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.