How To Encourage Reproducibility Within ML Community

The longevity of any scientific domain relies on its openness to being falsifiable. In the case of machine learning (ML), which made a late entry into the scientific community, there seems to be a lack of replicability that otherwise exists in other scientific fields. Today, it is a daunting task even to verify a claim in a paper given that thousands of ML papers get published every week.

To establish an ecosystem that encourages ML researchers to volunteer for reproducibility of claimed results, the organizers of NeurIPS 2019 introduced new policies into their paper submission guidelines. 

A report on the results from deployment of these components was recently published, where the authors – who happen to be researchers from top institutes and organizations – discussed their findings in detail.


Sign up for your weekly dose of what's up in emerging technology.

Facilitating Reproducibility In ML

This renewed interest around replicability of results was kickstarted at last year’s NeurIPS conference, the premier international conference for research in ML. A reproducibility program was introduced, designed to improve the standards across the community and evaluate ML research. 

 As part of the paper submission process, the new program contained three components: 

Download our Mobile App

  • a code submission policy, 
  • a community-wide reproducibility challenge, and
  • a Machine Learning Reproducibility checklist

According to the authors, the results of this reproducibility experiment at NeurIPS 2019 could be summarized as follows:

  • Indicating a success of code submission policy, NeurIPS witnessed a rise in several authors willingly submitting code. This increased from less than 50% a year ago, to nearly 75%. 
  • The authors claim that the number of participants in the reproducibility challenge continues to increase, suggesting the support for the movement.

The increase in code submissions can also be attributed to the NeurIPS 2019 policy, which states that it “expects code only for accepted papers”. So code submission is not mandatory, and the code is not expected to be used during the review process to decide on the soundness of the work.

Challenges To Reproducibility

Reproducibility is an essential characteristic for widespread adoption of any scientific method. In the case of ML, however, the process is not so straightforward and ML model’s black box nature is not helping either. There is also an overwhelming hype around AI, which can nudge the researchers into inflating the results for various personal reasons. Overclaiming results is a major headache, and one of the reasons we still do not see anything materialize beyond breaking news. 

If we take the example of neural ODEs, which garnered accolades for its breakthrough results, one of the authors, David Duvenaud, exposed the paper for its flaws. This paper, which took the best paper award at NeurIPS 2018, was ripped apart by one of the main authors a year later, coincidentally at NeurIPS 2019!

While the neuralODE paper led to other breakthrough work, the author admitted that their paper had many inaccuracies. To the dismay of his audience, he even went on to explain how they thought of a ‘cool sounding name’ for their paper to get more eyeballs.

This clearly shows the perils of hype in any nascent field. Fortunately for the community, Duvenaud came out clean and set a precedent for veracity that so far has been a hit and miss.

That said, there are a few immediate challenges to reproducibility, and these can be summarized as follows:

  • Same training data might not be accessible
  • Misspecification training procedures in the paper
  • No code or erroneous code
  • Being lenient with the metrics 
  • Improper statistical testing, or using the wrong statistic tests.
  • Overclaiming of results

When it comes to code submission, the following objections were commonly reported as per this report:

  1. Dataset confidentiality
  2. Proprietary software
  3. Computation infrastructure

In an interview published by Nature, Joelle Pineau – who is also one of the authors – brought the attention of the whole ML community towards reproducibility

In reinforcement learning, said Pineau, if you do two runs of some algorithms with different initial random settings, you can get very different results. And, if you do a lot of runs, you are able to report only the best ones. 

Pineau and her peers have provided the much-needed impetus to a movement that has already shown promising results, and hopefully, might translate into widespread adoption of fairness and transparency across the community.

Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges