Advertisement

Reality Of Metrics: Is Machine Learning Success Overhyped?

In one of the most revealing research papers written recent times, the researchers from Cornell Tech and Facebook AI quash the hype around the success of machine learning. They opine and even demonstrate that the trend appears to be overstated. In other words, the so-called cutting edge research or benchmark work perform similarly to one another even if they are a decade apart. In other words, the authors believe that metric learning algorithms have not made spectacular progress.

In this work, the authors try to demonstrate the significance of assessing algorithms more diligently and how few practices can help reflect ML success in reality.  

Where Do Things Go Wrong

Over the past decade, deep convolutional networks have made tremendous progress. Their application in computer vision is almost everywhere; from classification to segmentation to object detection and even generative models. But is the metric evaluation carried out to track this progress has been leakproof? Are the techniques employed weren’t affected by the improvement in deep learning methods?

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

The goal of metric learning is to map data to an embedding space, where similar data are close together, and the rest are far apart. So, the authors begin with the notion that the deep networks have had a similar effect on metric learning. And, the combination of the two is known as deep metric learning.

The authors then examined flaws in the current research papers, including the problem of unfair comparisons and the weaknesses of commonly used accuracy metrics. They then propose a training and evaluation protocol that addresses these flaws and then run experiments on a variety of loss functions.


Download our Mobile App



For instance, one benchmark paper in 2017, wrote the authors, used ResNet50, and then claimed huge performance gains. But the competing methods used GoogleNet, which has significantly lower initial accuracies. Therefore, the authors conclude that much of the performance gain likely came from the choice of network architecture, and not their proposed method. Practices such as these can put ML on headlines, but when we look at how much of these state-of-the-art models are really deployed, the reality is not that impressive.

The authors underline the importance of keeping the parameters constant if one has to prove that a certain new algorithm outperforms its contemporaries.

To carry out the evaluations, the authors introduce settings that cover the following:

  • Fair comparisons and reproducibility
  • Hyperparameter search via cross-validation
  • Informative accuracy metrics

As shown in the above plot, the trends, in reality, aren’t that far from the previous related works and this indicates that those who claim a dramatic improvement might not have been fair in their evaluation.

If a paper attempts to explain the performance gains of its proposed method, and it turns out that those performance gains are non-existent, then their explanation must be invalid as well.

The results show that when hyperparameters are properly tuned via cross-validation, most methods perform similarly to one another. This work, believe the authors, will lead to more investigation into the relationship between hyperparameters and datasets, and the factors related to particular dataset/architecture combinations. 

Key Findings

According to the authors, this work exposes the following:

  • Changes in network architecture, embedding size, image augmentation method, and optimisers leads to unfair comparisons
  • The use of accuracy metrics are either misleading or do not provide a complete picture of the embedding space
  • Papers have been inconsistent in their choice of the optimiser, and most papers do not present confidence intervals for their results
  • Papers do not check performance at regular intervals and report accuracy after training for a predetermined number of iterations

The authors conclude that if proper machine learning practices are followed, then the results of metric learning papers will better reflect reality, and can lead to better works in most impactful domains like self-supervised learning.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

AIM Upcoming Events

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 10th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
AIM TOP STORIES

Top BI tools for Mainframes

Without BI, organisations will not be able to dominate with data-driven decision-making but focus on experiences, intuition, and gut feelings.