Just a month back, more than a dozen researchers from around the world wrote a scathing article voicing their concerns about lack of transparency and replicability of AI code and research. This article, in particular, referred to a Google Health study by McKinney et al. that claimed that an AI system could beat human radiologists in achieving higher robustness and speed in breast cancer screening. This research described successful trials of AI in breast cancer detection, but as per the critics, the team provided such little information about the code and the research in general that it undermined its scientific value.
This is not the first time that the issue of transparency and replication has been raised. The scientific and research community strongly believes that withholding important aspects of studies, especially in domains where larger public good and societal well-being is concerned does a great disservice.
Grave Issues With Transparency & Replicability
In the article that was published in the Nature journal, the researchers noted that the scientific progress of a community depends on the following factors:
- The ability to scrutinise research results
- The ability to reproduce the main results of a study using its materials
- Enhance or build upon the existing research.
The authors argued that the study by McKinney et al. lacked critical information on data processing and training pipelines and suffered from a lack of access to data from which the model was derived from.
The authors of the post wrote regarding the absence of key details and sufficient description of the research, “For scientific efforts where a clinical application is envisioned, and human lives would be at stake, we argue that the bar of transparency should be set even higher. If data cannot be shared with the entire scientific community, because of licensing or other insurmountable issues, at a minimum a mechanism should be set so that some highly-trained, independent investigators can access the data and verify the analyses. This would allow a truly adequate peer-review of the study and its evidence before moving into clinical implementation.”
In August this year, similar criticism was directed at OpenAI, for locking away the GPT-3 algorithm. Incidentally, in the past, the company has released its algorithm to the public, including GPT-2. In its defence, the company says that GPT-3, which has 175 billion parameters (as opposed to the 1.5 million parameters of GPT-2), is ‘too large’ for most people to run. Hence, now the GPT-3 algorithm is put behind a paywall which allows OpenAI to monetise the research. Currently, Microsoft holds the license for the exclusive use of GPT-3. Even as third party users can still use the public API to receive output, the rights to source code belong exclusively with Microsoft.
Closely related to this topic is the latest study by the researchers published in a paper titled, ‘The De-democratisation of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research’, found that concentration of resources and datasets related to AI research in a few elite hands is leading to problems with bias and fairness of technology.
Fortunately, there have been a few pathways to breaking this problem in research. Joelle Pineau, a computer science professor at McGill, is a strong advocate for reproducibility of AI research. It was due to her endeavour that premium artificial intelligence conference NeurIPS now asks authors/researchers to produce ‘reproducibility checklist’ along with their submissions. This checklist consists of information such as the number of models trained, computing power used, and links to code and datasets.
There is another initiative called the Papers with Code project that was started with a mission to create a free and open-source with machine learning papers, code and evaluation tables. It has collaborated with the popular preprint server, arXiv. Under this collaboration, all papers published with arXiv come with a Papers with Code section that provides links to the code that the author wants to make available.
As discussed above, transparency, replicability and reproducibility in research is not a new issue. In fact, as 2016 The Nature Journal survey, of the 1576 scientists interviewed, 52% admitted that the reproducibility crisis is ‘significant’. One of the solutions, apart from the steps mentioned above, could be incentivising sharing of datasets and research code; this could push more researchers to be more transparent with their research for the larger good.