Listen to this story
|
The role of scientific research in pushing the frontiers of artificial intelligence cannot be overstated. The researchers working at MIT’s Computer Science and Artificial Intelligence Laboratory, Stanford Artificial Intelligence Laboratory, Oxford University and many other top labs are shaping the future of humanity. In addition, most top AI labs, even the private players such as DeepMind and OpenAI, publish on preprint servers to democratise and share knowledge.
But, how useful are these papers for the community at large?
Are top AI labs trustworthy?
Recently, a Reddit user published a post titled, ‘I don’t really trust papers out of “Top Labs” anymore. In the post, the user asked: Why should the AI community trust these papers published by a handful of corporations and the occasional universities? Why should I trust that your ideas are even any good? I can’t check them; I can’t apply them to my own projects.
Citing the research paper titled ‘An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems’, the Reddit user said, “It’s 18 pages of talking through this pretty convoluted evolutionary and multitask learning algorithm; it’s pretty interesting, solves a bunch of problems. But two notes. One, the big number they cite as the success metric is 99.43 on CIFAR-10, against a SotA of 99.40.
The Reddit user also referred to a chart towards the end of the paper that details how many TPU core-hours were used for just the training regimens that resulted in the final results.
“The total is 17,810 core-hours. Let’s assume that for someone who doesn’t work at Google, you’d have to use on-demand pricing of USD3.22 per hour. This means that these trained models cost USD57,348.
“Strictly speaking, throwing enough compute at a general enough genetic algorithm will eventually produce arbitrarily good performance, so while you can read this paper and collect interesting ideas about how to use genetic algorithms to accomplish multitask learning by having each new task leverage learned weights from previous tasks by defining modifications to a subset of components of a pre-existing model,” he said.
Jathan Sadowski, a senior fellow at Emerging Tech Lab, responded: “AI/ML research at places like Google and OpenAI is based on spending absurd amounts of money, compute, and electricity to brute force arbitrary improvements. The inequality, the trade-offs, the waste—all for incremental progress toward a bad future.”
The Reddit post has been a source of much debate on social media. Many pointed out that there should be a new journal for papers where one can replicate their results in under eight hours on a single GPU.
Findings that can’t be replicated are intrinsically less reliable. And the fact that the ML community is maturing towards decent scientific practices instead of anecdotes is a positive sign, said Leon Derczynski, associate professor at IT University of Copenhagen.
Replication crisis
The replication crisis has been gripping the scientific community for ages. The AI domain is also grappling with it, mostly because researchers often don’t share their source code. A replication crisis refers to when scientific studies are difficult or impossible to reproduce.
According to a 2016 Nature survey, more than 70 percent of researchers have tried and failed to reproduce another scientist’s experiments. Further, more than 50 percent of them have failed to reproduce their own experiments.
Reproducibility is the basis of quality assurance in science as it enables past findings to be independently verified.
The scientific and research community strongly believes that withholding important aspects of studies, especially in domains where larger public good and societal well-being are concerned, does a great disservice.
According to the 2020 State of AI report, only 15 percent of AI studies share their code, and industry researchers are often the culprits. The report criticises OpenAI and DeepMind, two of the world’s best AI research labs, for not open sourcing their code.
Hypothetical question. Some people have access to GPT-3 and others do not. What happens when we start seeing papers in which GPT-3 is used by non-OpenAI researchers to achieve SOTA results?
— Mark Riedl (@mark_riedl) October 3, 2020
Here’s the real problem, tho: is OpenAI picking research winners and losers?
In 2020, Google Health published a paper in Nature that described how AI was leveraged to look for signs of breast cancer in medical images. But Google drew flak as it provided little information about its code and how it was tested. Many questioned the viability of the paper, and a group of 31 researchers published another paper in Nature titled ‘Transparency and reproducibility in artificial intelligence’. Benjamin Haibe-Kains, one of the paper’s authors, called Google’s paper an advertisement for cool technology with no practical use.
However, things are changing. NeurIPS now asks authors/researchers to produce a ‘reproducibility checklist’ along with their submissions. This checklist consists of information such as the number of models trained, computing power used, and links to code and datasets. Another initiative called the ‘Papers with Code’ project was started with a mission to create free and open-source ML papers, code and evaluation tables.