Lack Of Transparency & Replicability Is Harming Research In AI

Transparency AI

Just a month back, more than a dozen researchers from around the world wrote a scathing article voicing their concerns about lack of transparency and replicability of AI code and research. This article, in particular, referred to a Google Health study by McKinney et al. that claimed that an AI system could beat human radiologists in achieving higher robustness and speed in breast cancer screening. This research described successful trials of AI in breast cancer detection, but as per the critics, the team provided such little information about the code and the research in general that it undermined its scientific value.

This is not the first time that the issue of transparency and replication has been raised. The scientific and research community strongly believes that withholding important aspects of studies, especially in domains where larger public good and societal well-being is concerned does a great disservice.

Grave Issues With Transparency & Replicability

In the article that was published in the Nature journal, the researchers noted that the scientific progress of a community depends on the following factors:

  • The ability to scrutinise research results
  • The ability to reproduce the main results of a study using its materials
  • Enhance or build upon the existing research.

The authors argued that the study by McKinney et al. lacked critical information on data processing and training pipelines and suffered from a lack of access to data from which the model was derived from.

The authors of the post wrote regarding the absence of key details and sufficient description of the research, “For scientific efforts where a clinical application is envisioned, and human lives would be at stake, we argue that the bar of transparency should be set even higher. If data cannot be shared with the entire scientific community, because of licensing or other insurmountable issues, at a minimum a mechanism should be set so that some highly-trained, independent investigators can access the data and verify the analyses. This would allow a truly adequate peer-review of the study and its evidence before moving into clinical implementation.”

In August this year, similar criticism was directed at OpenAI, for locking away the GPT-3 algorithm. Incidentally, in the past, the company has released its algorithm to the public, including GPT-2. In its defence, the company says that GPT-3, which has 175 billion parameters (as opposed to the 1.5 million parameters of GPT-2), is ‘too large’ for most people to run. Hence, now the GPT-3 algorithm is put behind a paywall which allows OpenAI to monetise the research. Currently, Microsoft holds the license for the exclusive use of GPT-3. Even as third party users can still use the public API to receive output, the rights to source code belong exclusively with Microsoft.

Closely related to this topic is the latest study by the researchers published in a paper titled, ‘The De-democratisation of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research’, found that concentration of resources and datasets related to AI research in a few elite hands is leading to problems with bias and fairness of technology.

Possible Solutions

Fortunately, there have been a few pathways to breaking this problem in research. Joelle Pineau, a computer science professor at McGill, is a strong advocate for reproducibility of AI research. It was due to her endeavour that premium artificial intelligence conference NeurIPS now asks authors/researchers to produce ‘reproducibility checklist’ along with their submissions. This checklist consists of information such as the number of models trained, computing power used, and links to code and datasets.

There is another initiative called the Papers with Code project that was started with a mission to create a free and open-source with machine learning papers, code and evaluation tables. It has collaborated with the popular preprint server, arXiv. Under this collaboration, all papers published with arXiv come with a Papers with Code section that provides links to the code that the author wants to make available.

Wrapping Up

As discussed above, transparency, replicability and reproducibility in research is not a new issue. In fact, as 2016 The Nature Journal survey, of the 1576 scientists interviewed, 52% admitted that the reproducibility crisis is ‘significant’. One of the solutions, apart from the steps mentioned above, could be incentivising sharing of datasets and research code; this could push more researchers to be more transparent with their research for the larger good.

Download our Mobile App

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Upcoming Events

15th June | Online

Building LLM powered applications using LangChain

17th June | Online

Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications

Jun 23, 2023 | Bangalore

MachineCon 2023 India

26th June | Online

Accelerating inference for every workload with TensorRT

MachineCon 2023 USA

Jul 21, 2023 | New York

Cypher 2023

Oct 11-13, 2023 | Bangalore

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox