Lack Of Transparency & Replicability Is Harming Research In AI

Transparency AI

Just a month back, more than a dozen researchers from around the world wrote a scathing article voicing their concerns about lack of transparency and replicability of AI code and research. This article, in particular, referred to a Google Health study by McKinney et al. that claimed that an AI system could beat human radiologists in achieving higher robustness and speed in breast cancer screening. This research described successful trials of AI in breast cancer detection, but as per the critics, the team provided such little information about the code and the research in general that it undermined its scientific value.

This is not the first time that the issue of transparency and replication has been raised. The scientific and research community strongly believes that withholding important aspects of studies, especially in domains where larger public good and societal well-being is concerned does a great disservice.

Grave Issues With Transparency & Replicability

In the article that was published in the Nature journal, the researchers noted that the scientific progress of a community depends on the following factors:

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.
  • The ability to scrutinise research results
  • The ability to reproduce the main results of a study using its materials
  • Enhance or build upon the existing research.

The authors argued that the study by McKinney et al. lacked critical information on data processing and training pipelines and suffered from a lack of access to data from which the model was derived from.

The authors of the post wrote regarding the absence of key details and sufficient description of the research, “For scientific efforts where a clinical application is envisioned, and human lives would be at stake, we argue that the bar of transparency should be set even higher. If data cannot be shared with the entire scientific community, because of licensing or other insurmountable issues, at a minimum a mechanism should be set so that some highly-trained, independent investigators can access the data and verify the analyses. This would allow a truly adequate peer-review of the study and its evidence before moving into clinical implementation.”


Download our Mobile App



In August this year, similar criticism was directed at OpenAI, for locking away the GPT-3 algorithm. Incidentally, in the past, the company has released its algorithm to the public, including GPT-2. In its defence, the company says that GPT-3, which has 175 billion parameters (as opposed to the 1.5 million parameters of GPT-2), is ‘too large’ for most people to run. Hence, now the GPT-3 algorithm is put behind a paywall which allows OpenAI to monetise the research. Currently, Microsoft holds the license for the exclusive use of GPT-3. Even as third party users can still use the public API to receive output, the rights to source code belong exclusively with Microsoft.

Closely related to this topic is the latest study by the researchers published in a paper titled, ‘The De-democratisation of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research’, found that concentration of resources and datasets related to AI research in a few elite hands is leading to problems with bias and fairness of technology.

Possible Solutions

Fortunately, there have been a few pathways to breaking this problem in research. Joelle Pineau, a computer science professor at McGill, is a strong advocate for reproducibility of AI research. It was due to her endeavour that premium artificial intelligence conference NeurIPS now asks authors/researchers to produce ‘reproducibility checklist’ along with their submissions. This checklist consists of information such as the number of models trained, computing power used, and links to code and datasets.

There is another initiative called the Papers with Code project that was started with a mission to create a free and open-source with machine learning papers, code and evaluation tables. It has collaborated with the popular preprint server, arXiv. Under this collaboration, all papers published with arXiv come with a Papers with Code section that provides links to the code that the author wants to make available.

Wrapping Up

As discussed above, transparency, replicability and reproducibility in research is not a new issue. In fact, as 2016 The Nature Journal survey, of the 1576 scientists interviewed, 52% admitted that the reproducibility crisis is ‘significant’. One of the solutions, apart from the steps mentioned above, could be incentivising sharing of datasets and research code; this could push more researchers to be more transparent with their research for the larger good.

Support independent technology journalism

Get exclusive, premium content, ads-free experience & more

Rs. 299/month

Subscribe now for a 7-day free trial

More Great AIM Stories

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

AIM Upcoming Events

Early Bird Passes expire on 3rd Feb

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
AIM TOP STORIES

All you need to know about Graph Embeddings

Embeddings can be the subgroups of a group, similarly, in graph theory embedding of a graph can be considered as a representation of a graph on a surface, where points of that surface are made up of vertices and arcs are made up of edges