MITB Banner

Why is ML research disconnected from reality

"A lot of machine learning research has detached itself from solving real problems, and created their own "benchmark-islands"."

Share

ML research reality

In the last two years, there have been a large number of machine learning research papers published on COVID-19. However, the number of these research papers that actually helped in solving ground-level problems form a very small portion. A study analysed 232 algorithms developed for diagnosing or predicting the disease spread. It found that none of these studies was fit for clinic use. Just two of these 232 tools were deemed promising enough for future testing.

This is a good indication of how many machine learning and AI research works are disconnected from the ground realities and the real challenges at hand. In his recent tweet, Christoph Molnar, data scientist and author of Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, described the same. He wrote, “A lot of machine learning research has detached itself from solving real problems, and created their own “benchmark-islands”.

He further added that some of these papers attain pioneer status and also get published in some of the ‘good journals’, making it easier for others to write follow-up papers. “It also means that the next generation does not have the burden to establish the necessity of the research topic,” he wrote. This also compounds the existing problem.

Machine learning research

In 2012, NASA computer scientist Kiri Wagstaff wrote a paper titled – Machine learning that matters. He wrote that much of the current machine learning research had lost its connection to the problems of science and society. She wrote, “Many machine learning problems are phrased in terms of an objective function to be optimised. It is time for us to ask a question of larger scope: what is the field’s objective function? Do we seek to maximise performance on isolated data sets? Or can we characterise progress in a more meaningful way that measures the concrete impact of machine learning innovations?”

A lot of research work in machine learning and AI is coming up with a novel approach, tool or algorithm to push the frontier. As Molnar mentioned in his tweet, it takes one ‘new’ paper with a novel approach. What follows is a barrage of papers that take this approach as a benchmark and report marginal or incremental improvement.

Since the newer papers only incorporate minimum changes to the base ‘novel’ research work, it has been found that there is an increasing concentration on fewer datasets in most task communities. The majority of these papers use datasets that were originally created for other tasks. Certain datasets become benchmarks, and the corresponding models are to be referred to as ‘state-of-art’. Critics like Molnar believe that predictive performance becomes the sole measure of performance and progress in this bargain, despite the actual improvement becoming smaller. As the newer work branches away from the base study, the original problem is often forgotten. Molnar describes this as a bait and switch strategy applied in the ML research domain.

The problem compounds

Termed as ‘flawed scholarship‘, this practice may mislead the reader, which also consists of students, journalists, and policy-makers, and further compromise the intellectual foundations. As early as 1976, a former computer science professor at Yale University, Drew McDermott, said in the context of the AI community, “If we can’t criticise ourselves, someone else will save us the trouble”. This holds true even today.

AI pioneer Yoshua Bengio recently said that the research landscape has changed to a ‘conference publication model’ in the last few years. The increased competition has led to researchers rushing to put out their ideas first. He wrote that a Ph D today has at least 50 per cent more papers than they did 20-30 years back.

The current system seems to incentivise incremental work. This creates a lot of pressure on researchers to submit papers on deadline. The reason why machine learning is seeing such massive growth now is that it is developed on a large body of rigorous research. If this momentum is to continue, the community needs to inculcate clear scientific thinking and communication.

Share
Picture of Shraddha Goled

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.