MITB Banner

Why Is Recreating A ML Research Paper So Difficult & How To Do It Effectively

Suppose you have ever tried replicating a state-of-the-art or any decent machine learning paper. In that case, you might have probably run into packages and libraries issues, version issues, hardware and many other challenges, suggesting that reproducibility in ML is a serious problem. 

Joelle Pineau, an ML researcher, brought the whole community’s attention to reproducibility. In an interview published by Nature last year, Pineau addressed them in a detailed way.

Even a reproducibility program was introduced at neurIPS 2019 that required researchers to consider the following: 

  • a code submission policy, 
  • a community-wide reproducibility challenge, and
  • a Machine Learning Reproducibility checklist

Recently, Grigori Fursin, a computer scientist, has posted about the checklists to keep in mind if researchers care about reproducibility. 

In a recent talk, Fursin shared his experience of reproducing 150+ systems and ML papers during artifact evaluation at ASPLOS, MLSys, CGO, PPoPP and Supercomputing. “Our long-term goal is to help researchers share their new ML techniques as production-ready packages along with published papers and participate in collaborative and reproducible benchmarking, co-design and comparison of efficient ML/software/hardware stacks,” he said.

About Artifact

In order to make it easy for the reviewers, accessing artifacts should be made clear:

  • Whether to clone the repository from GitHub, GitLab, BitBucket or any similar service
  • Downloading package from a public or private website
  • Allow access to artifacts via a private machine with pre-installed software when access to rare hardware is required, or proprietary software is used

Fursin also advises describing the approximate disk space required after unpacking the artifact so as to avoid unnecessary software packages to the VM images.

Dependencies

Changing Anything Changes Everything or the CACE principle is one of the heuristics for managing software products. This refers to the dependency of every change we make in a pipeline. Researchers should describe any specific hardware and specific features required to evaluate artifacts like vendor, CPU/GPU/FPGA, number of processors/cores, interconnect, memory, hardware counters, OS and software packages. 

“This is particularly important if you share your source code and it must be compiled or if you rely on some proprietary software that you can not include to your package. In such a case, we strongly suggest you describe how to obtain and to install all third-party software, data sets and models,” wrote Fursin.

Datasets, Models And Installation

If the datasets are large or proprietary, then it is advisable to add details of how to download the dataset. If proprietary, then reviewers should be provided with a public alternative subset for evaluation. The same goes for models as well. If third-party models are not included in packages (for example, they are very large or proprietary), then provide details about how to download and install them, describe the setup procedures for the artifacts.

Experiment Workflow And Evaluation

Source: ctuning.org

Describe the experimental workflow and how it is implemented, invoked and customised (if needed), i.e. some OS scripts, IPython/Jupyter notebook, portable CK workflow, etc. Also, describe all the steps necessary to evaluate artifacts using the workflow above. Describe the expected result and the maximum allowable variation of empirical results (particularly important for performance numbers and speed-ups). 

Experiment Customisation

Customisation is not always cool. This recommendation is more of an option but not always unimportant. If possible, describe how to customise the workflow, i.e. if it is possible to use different data sets, benchmarks, real applications, predictive models, software environment (compilers, libraries, run-time systems), hardware, etc. Also, describe if it is possible to parameterise the workflow (whatever is applicable such as changing number of threads, applying different optimisations, CPU/GPU frequency, autotuning scenario, model topology, etc).

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories