Active Hackathon

Why is ML research disconnected from reality

"A lot of machine learning research has detached itself from solving real problems, and created their own "benchmark-islands"."
ML research reality

In the last two years, there have been a large number of machine learning research papers published on COVID-19. However, the number of these research papers that actually helped in solving ground-level problems form a very small portion. A study analysed 232 algorithms developed for diagnosing or predicting the disease spread. It found that none of these studies was fit for clinic use. Just two of these 232 tools were deemed promising enough for future testing.

This is a good indication of how many machine learning and AI research works are disconnected from the ground realities and the real challenges at hand. In his recent tweet, Christoph Molnar, data scientist and author of Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, described the same. He wrote, “A lot of machine learning research has detached itself from solving real problems, and created their own “benchmark-islands”.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

He further added that some of these papers attain pioneer status and also get published in some of the ‘good journals’, making it easier for others to write follow-up papers. “It also means that the next generation does not have the burden to establish the necessity of the research topic,” he wrote. This also compounds the existing problem.

Machine learning research

In 2012, NASA computer scientist Kiri Wagstaff wrote a paper titled – Machine learning that matters. He wrote that much of the current machine learning research had lost its connection to the problems of science and society. She wrote, “Many machine learning problems are phrased in terms of an objective function to be optimised. It is time for us to ask a question of larger scope: what is the field’s objective function? Do we seek to maximise performance on isolated data sets? Or can we characterise progress in a more meaningful way that measures the concrete impact of machine learning innovations?”

A lot of research work in machine learning and AI is coming up with a novel approach, tool or algorithm to push the frontier. As Molnar mentioned in his tweet, it takes one ‘new’ paper with a novel approach. What follows is a barrage of papers that take this approach as a benchmark and report marginal or incremental improvement.

Since the newer papers only incorporate minimum changes to the base ‘novel’ research work, it has been found that there is an increasing concentration on fewer datasets in most task communities. The majority of these papers use datasets that were originally created for other tasks. Certain datasets become benchmarks, and the corresponding models are to be referred to as ‘state-of-art’. Critics like Molnar believe that predictive performance becomes the sole measure of performance and progress in this bargain, despite the actual improvement becoming smaller. As the newer work branches away from the base study, the original problem is often forgotten. Molnar describes this as a bait and switch strategy applied in the ML research domain.

The problem compounds

Termed as ‘flawed scholarship‘, this practice may mislead the reader, which also consists of students, journalists, and policy-makers, and further compromise the intellectual foundations. As early as 1976, a former computer science professor at Yale University, Drew McDermott, said in the context of the AI community, “If we can’t criticise ourselves, someone else will save us the trouble”. This holds true even today.

AI pioneer Yoshua Bengio recently said that the research landscape has changed to a ‘conference publication model’ in the last few years. The increased competition has led to researchers rushing to put out their ideas first. He wrote that a Ph D today has at least 50 per cent more papers than they did 20-30 years back.

The current system seems to incentivise incremental work. This creates a lot of pressure on researchers to submit papers on deadline. The reason why machine learning is seeing such massive growth now is that it is developed on a large body of rigorous research. If this momentum is to continue, the community needs to inculcate clear scientific thinking and communication.

More Great AIM Stories

Shraddha Goled
I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

Our Upcoming Events

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM