Active Hackathon

Lack Of Transparency & Replicability Is Harming Research In AI

Transparency AI

Just a month back, more than a dozen researchers from around the world wrote a scathing article voicing their concerns about lack of transparency and replicability of AI code and research. This article, in particular, referred to a Google Health study by McKinney et al. that claimed that an AI system could beat human radiologists in achieving higher robustness and speed in breast cancer screening. This research described successful trials of AI in breast cancer detection, but as per the critics, the team provided such little information about the code and the research in general that it undermined its scientific value.

This is not the first time that the issue of transparency and replication has been raised. The scientific and research community strongly believes that withholding important aspects of studies, especially in domains where larger public good and societal well-being is concerned does a great disservice.


Sign up for your weekly dose of what's up in emerging technology.

Grave Issues With Transparency & Replicability

In the article that was published in the Nature journal, the researchers noted that the scientific progress of a community depends on the following factors:

  • The ability to scrutinise research results
  • The ability to reproduce the main results of a study using its materials
  • Enhance or build upon the existing research.

The authors argued that the study by McKinney et al. lacked critical information on data processing and training pipelines and suffered from a lack of access to data from which the model was derived from.

The authors of the post wrote regarding the absence of key details and sufficient description of the research, “For scientific efforts where a clinical application is envisioned, and human lives would be at stake, we argue that the bar of transparency should be set even higher. If data cannot be shared with the entire scientific community, because of licensing or other insurmountable issues, at a minimum a mechanism should be set so that some highly-trained, independent investigators can access the data and verify the analyses. This would allow a truly adequate peer-review of the study and its evidence before moving into clinical implementation.”

In August this year, similar criticism was directed at OpenAI, for locking away the GPT-3 algorithm. Incidentally, in the past, the company has released its algorithm to the public, including GPT-2. In its defence, the company says that GPT-3, which has 175 billion parameters (as opposed to the 1.5 million parameters of GPT-2), is ‘too large’ for most people to run. Hence, now the GPT-3 algorithm is put behind a paywall which allows OpenAI to monetise the research. Currently, Microsoft holds the license for the exclusive use of GPT-3. Even as third party users can still use the public API to receive output, the rights to source code belong exclusively with Microsoft.

Closely related to this topic is the latest study by the researchers published in a paper titled, ‘The De-democratisation of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research’, found that concentration of resources and datasets related to AI research in a few elite hands is leading to problems with bias and fairness of technology.

Possible Solutions

Fortunately, there have been a few pathways to breaking this problem in research. Joelle Pineau, a computer science professor at McGill, is a strong advocate for reproducibility of AI research. It was due to her endeavour that premium artificial intelligence conference NeurIPS now asks authors/researchers to produce ‘reproducibility checklist’ along with their submissions. This checklist consists of information such as the number of models trained, computing power used, and links to code and datasets.

There is another initiative called the Papers with Code project that was started with a mission to create a free and open-source with machine learning papers, code and evaluation tables. It has collaborated with the popular preprint server, arXiv. Under this collaboration, all papers published with arXiv come with a Papers with Code section that provides links to the code that the author wants to make available.

Wrapping Up

As discussed above, transparency, replicability and reproducibility in research is not a new issue. In fact, as 2016 The Nature Journal survey, of the 1576 scientists interviewed, 52% admitted that the reproducibility crisis is ‘significant’. One of the solutions, apart from the steps mentioned above, could be incentivising sharing of datasets and research code; this could push more researchers to be more transparent with their research for the larger good.

More Great AIM Stories

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at

Our Upcoming Events

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Ouch, Cognizant

The company has reduced its full-year 2022 revenue growth guidance to 8.5% – 9.5% in constant currency from the 9-11% in the previous quarter

The curious case of Google Cloud revenue

Porat had earlier said that Google Cloud was putting in money to make more money, but even with the bucket-loads of money that it was making, profitability was still elusive.