Listen to this story
AL/ML community has been grappling with the replication crisis for a while now. A replication crisis occurs when scientific studies are difficult or impossible to reproduce. ML researchers often compare their work with benchmark research previously done in the field by other researchers. However, issues arise because the source code of the benchmark research isn’t published in many cases.
“In my opinion, the ‘ML Replication Crisis’ goes against the ethical principles research was founded upon (i.e., reliability, validity, trustworthiness, replicability. . .), but is also a reflection of what research has become,” said Chantel Perry, Senior Data Scientist at Microsoft.
Earlier in 2022, researchers from Princeton University published a paper titled, ‘Leakage and the Reproducibility Crisis in ML-based Science’. The study notes that a research can be termed reproducible only if the codes and data used by the researchers are made available.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
In 2020, Google Health published a paper in Nature that described how AI was leveraged to look for signs of breast cancer in medical images. However, as noteworthy as the innovation was, Google was criticised for providing little to no information about its code or how it was tested.
A replication crisis does not only occur when researchers find it hard to replicate earlier research. Chantel Perry believes the replication crisis may occur due to several reasons, including:
- proprietary reasons,
- limitations in project scope,
- lack of a third-party review process,
- lack of rigour when utilising existing ML frameworks,
- pressure to publish perfect studies rather than listing the details of a study (i.e., limitations, biases, recommendations and future studies),
- lack of rigorous editorial and peer review process, along with several other reasons.
Perry further adds that technology research is no longer just a medium to share knowledge, findings, and mistakes to help other researchers and improve society.
“In some cases, technology research is a source of profit for organisations, which isn’t necessarily a problem, except for the fact that this could put pressure on researchers to move quickly and result in more errors. I think the ‘ML Replication Crisis’ is just a symbol of how technology research is no longer a primary tool for sharing innovation and societal improvement but is being prioritised more for capital gain.”—Chantel Perry, Senior Data Scientist – Microsoft.
The blame game
Several recent developments in AI and ML have come from large enterprises like Google, Microsoft, and Meta. The resources that researchers in these organisations have at their disposal are profoundly abundant. In contrast, researchers from a university might not have access to the same resources.
Additionally, these organisations often don’t open source the code for their algorithms. For example, text-to-image generator AI DALL-E 2 by OpenAI—a company that Microsoft invested nearly USD 1 billion—has been a revelation, but it’s not open source.
Earlier in 2022, a Reddit user published a post titled, ‘I don’t really trust papers out of “Top Labs” anymore’. In the post, the user asked: ‘Why should the AI community trust these papers published by a handful of corporations and the occasional universities? Why should I trust that your ideas are even any good? I can’t check them; I can’t apply them to my own projects.’
Responding to the post, Jathan Sadowski, a senior fellow at Emerging Tech Lab, said, “AI/ML research at places like Google and OpenAI is based on spending absurd amounts of money, compute, and electricity to brute force arbitrary improvements. The inequality, the trade-offs, the waste—all for incremental progress toward a bad future.”
Interestingly, one of the researchers who worked on the Google Health project, Benjamin Haibe-Kains, referred to the research announcement as an advertisement for cool technology with no practical use.
However, in recent years, these large organisations have addressed several community concerns. They are adopting a more open-sourced approach where they share the codes for some of the algorithms they have developed.
Chantel Perry also believes these large corporations are not solely the issue with the replication crisis. “I’ve read a number of papers which lack any limitations, biases, or future recommendations and improvements sections in the ML space.”
Perry concluded, “One could blame the corporation, researcher, editor, journal publication, other researchers citing non-replicable studies, or even readers who don’t challenge the studies. Regardless, research projects are just like any other project with a limited scope of time, money, and resources.”
How to overcome the crisis?
The researchers from Princeton University believe that date leakage often leads to severe reproducibility failures.
“Data leakage has long been recognised as a leading cause of errors in ML applications. Through a survey of literature in research communities that adopted ML methods, we find 17 fields where errors have been found, collectively affecting 329 papers and in some cases leading to wildly overoptimistic conclusions,” stated the research paper.
The researchers believe that fundamental methodological changes to ML-based science could lead to leakage prevention prior to publication.
Most of the time, a replication crisis occurs because researchers do not possess the hardware or compute power. Resolving this would go a long way in overcoming the replication crisis in this field.
BigScience—a collaboration of over 1000 independent researchers, academics, and industrial researchers is working towards solving this problem.
“BigScience aims to make meaningful progress toward solving complex issues and create tools and processes to help a greater diversity of participants,” said Giada Pistilli, Ethicist at Hugging Face. Researchers at Hugging Face have also released ‘BLOOM’—a large language model that is open source and to help solve the problem of hardware, the team will also publish smaller, less hardware-intensive versions. In addition, a distributed system would be created to allow labs to share the model across their servers.
Further, Hugging Face will release a web application that will enable anyone to query BLOOM without downloading.
Chantel Perry emphasises that there are a few solutions which are already established as practices in research that could help solve the reproducibility issue—
“For instance, discussing time constraints, drawbacks of the study, challenging our own findings and following frameworks to minimise error, discussing safeguards for reliability and validity of results, and/or providing more supplemental resources are not new to technology research.”
However, she also notes that these processes take time and would slow down the research and publication process considerably.
Further, she states that it might be helpful to elicit a third-party expert entity who reviews, replicates, and identifies risks within the study. Perry also advocates for allowing a system to flag inaccurate studies to help researchers identify the studies that have issues, the nature of those issues and the provisions to amend the study once such issues are identified.
Lastly, Chantel Perry also believes that researchers should be required to provide supplemental resources to prove the replicability of published studies.
In this context, the Conference and Workshop on Neural Information Processing Systems (NeurIPS) has begun mandating that authors/researchers produce a ‘reproducibility checklist’ along with their submissions. This checklist consists of information such as the number of models trained, computing power used, and links to codes and datasets.
Likewise, another initiative called the ‘Papers with Code’ project was founded with a mission to create free and open-source ML papers, codes, and evaluation tables.