Thousands of machine learning papers get published every week. It is almost impossible to find the most useful paper in this vast and growing list. A paper typically gets credit when it finds a real-world application, or is applauded by top researchers in the community, or even if it gets accepted in prestigious AI conferences, such as NeurIPS, ICML, ICLR etc. Usually, these conferences act as platforms to promote research.
The acceptance guidelines for these top conferences vary, but they all are stringent nevertheless. The reviewers who skim through papers have thumb rules, such as the availability of code, replicability of results, etc. to judge a paper. However, every year, few unlucky papers – that are seemingly good – get discarded. This may be because the reviewers burden themselves with papers, which are nothing but a misguided, misleading clutch of text to inflate publication count.
So papers with code have introduced a checklist that promotes machine learning code completeness, and these recommendations have also been accepted by NeurIPS and will be implemented in this year’s conference.
The ML Code Completeness Checklist assesses a code repository based on the scripts and artefacts that have been provided within it. It checks a code repository for:
This renewed interest around replicability of results was kickstarted when the organizers of NeurIPS 2019 introduced new policies into their paper submission guidelines to establish an ecosystem that encourages ML researchers to volunteer for reproducibility of claimed results.
As part of the paper submission process, the new program contained three components:
- a code submission policy,
- a community-wide reproducibility challenge, and
- a Machine Learning Reproducibility checklist
These recommendations from Papers with Code is a follow up to the Machine Learning Reproducibility Checklist, which was required as part of the NeurIPS 2019 paper submission process, and the focus of the conference’s inaugural Reproducibility Challenge.
One thing common with all top papers is the availability of complete code, and the goal of the checklist mentioned above is to enhance reproducibility and promote best practices and code repository assessments, so that the future work need not be built from scratch every time.
Establishing An Effective ML Ecosystem
via FAIR
The above plots are a comparison of reproducibility in papers for the year 2019 at NeurIPS. We can see that approximately 75% of accepted papers at NeurIPS 2019 included code, compared with 50% the previous year. There were 173 papers submitted as part of the challenge, a 92% increase over the number submitted for a similar challenge at ICLR 2019. These results again resonate with the idea of promoting reproducibility in the ML community, and the recommendations of papers with code have been spot on.
Papers with Code for the last couple of years has been presenting the community with a curated list of papers that have code and beat the benchmark. It is a free community-driven resource and it has recently joined Facebook AI.
There is little doubt now that reproducibility is an essential characteristic for any scientific community. However, in the case of machine learning, achieving this is not so straightforward because of their black box nature of producing results. Added to this, there is also an overwhelming hype around AI, which can nudge the researchers into inflating the results for various personal reasons.
While the efforts to establish reproducibility have gained traction, PyTorch creator Soumith Chintala urged the community to go one step further and introduce initiatives that would incentivize the researchers to add understandable code to their papers.
That said, there also have been instances of papers getting rejected in an inexplicable manner, which have got the researchers fuming.
So, while the conferences have issued recommendations for the researchers for best practices, there is also a need for policies that would avoid any discrepancies of the reviewers.