Any decent research paper will consist of the following:
- How to do part, so that it can be repeated
- What to do to get results that are consistent with fresh experiments
The same stands true for machine learning research papers as well. However, the last decade has seen a heavy rise in the number of publications per year. Few hundreds of papers are being published every day, and, keeping track of these papers itself has become a challenge nowadays.
To check the papers, at least those that get the state-of-the-art status, is a tedious job. Edward Raff, a machine learning researcher from Booz Allen Hamilton, had shouldered this gigantic responsibility of testing the papers for reproducibility.
His work, which was the culmination of eight years of his research, consisted of 255 papers, was published between 1984 and 2017. Raff’s paper was presented at the prestigious NeurIPS, last year, in which, he compiled his findings and also had released the same on a popular portal for machine learning.
Overview Of Raff’s Experiment
Here are a few excerpts from Raff’s paper on reproducibility and the factors that were considered in selecting the papers:
- The papers that were selected for checking reproducibility have proposed at least one new algorithm or method that is the subject of reproduction.
- The paper is excluded from analysis if the available source code for a paper under consideration was successfully reproduced before.
- Any paper was excluded if the paper’s authors had any significant relationship with the reproducers (e.g., academic advisor, coworker, close friends, etc.) because the ability to have more regular communication could bias results.
- A paper is regarded as reproducible if the majority (75% +) of the claims in the paper could be confirmed with code written independently.
Raff was left with 255 papers, of which 162 (63.5%) were successfully replicated, and 93 were not.
Read more here.
Importance of Being Independently Reproducible
Before we go further, we need to understand what reproducibility in the context of machine learning really means.
A work is said to be reproducible when a reader follows the procedure listed in the paper and ends up getting the same results as shown in the original work. Machine learning papers nowadays come with code for easy implementation. Developers can easily validate the paper with code.
However, too much emphasis on papers with code also led to an essential question of the reason a procedure should be revealed. Because a paper, when appropriately explained, should be enough to devise a procedure that would give the same results. This is now called independently reproducible work.
Independently reproducibility also has another advantage of finding a more efficient solution to the same problem.
According to Raff, his findings can be summarised as follows:
- More math in a paper is not suitable for reproducibility.
- Open-sourcing of code is at best a weak indicator of reproducibility.
- Too many simplifications and analogies can hamper reproducibility.
Science Of Doing Meta Science
Late last year, machine learning researcher Joelle Pineau, brought the whole ML communities attention to reproducibility. In an interview published by Nature, Pineau addressed them in a detailed way.
In reinforcement learning, says Pineau, if you do two runs of some algorithms with different initial random settings, you can get very different results. And if you do a lot of runs, you’re able to report only the best ones.
Results from the people with more computing power to do more runs will look better.
“Papers don’t always say how many runs were performed. But it makes a big difference to the conclusions you draw,” added Pineau.
This whole new obsession with reproducibility didn’t go well with researchers such as Misha Denil of DeepMind, who responded to Nature’s interview with the following tweet:
As said earlier, being able to replicate the original paper might increase the credibility of the paper but doesn’t encourage new solutions. If machine learning is thought of as science, independent reproducibility is indeed crucial going forward.