MITB Banner

Are Too Many Data Scientists Trying To Predict COVID-19 Outcomes In Futility?

Share

data scientists covid19

Data scientists have been creating a lot of tools that help explain significant questions around COVID-19. One example is dashboards based on COVID-19 cases around the globe. It has helped show active cases, those in the testing phase, information on patient history, etc that provide a window into the overall scenario of the pandemic. 

There have also been many challenges and hackathons in response to COVID-19, and several data companies are providing free data resources. Kaggle has thousands of posts related to COVID-19. 

The COVID-19 Open Research Dataset Challenge (CORD-19) dataset on Kaggle contains over 44,000 scholarly articles, and one Kaggle expert Daniel Wolffram has created several widgets that help navigate the current COVID-19 research literature. There are also geospatial trackers of multiple government initiatives built from the work of data scientists, which serve as valuable tools during the pandemic.

Explaining The Issue

Using hundreds of metrics, data scientists have been trying to predict COVID-19 outbreak. But, are the predictions accurate, given the pandemic is a black swan event with not much epidemiological records in the research literature? Even information relating to its DNA sequencing is new. 

While data scientists are using geographical cases to predict how COVID-19 will pan out, some professionals and data scientists on social media think the work is not accurate.

The issue lies in the fact that epidemiologists have been tracking and predicting the spread of pathogens for decades, long before machine learning professionals and data scientists. Also, data scientists may not have expertise when it comes to the highly complex biological aspects of predicting viral outbreaks.

Also, there may also be an issue with the datasets that are being used to create predictive models. “Existing datasets (on COVID-19) are incredibly biased. For example, when calculating the mortality rate, normally we look at the deaths per confirmed case. However, the underlying assumption is that we have captured all of the confirmed cases, which is not true, since we are bottlenecked by the number of tests and only the sickest are diagnosed. For a place like New York, an exponential increase in the availability of testing can also generate an exponential growth curve,” according to Neil Cheng, Senior Data Scientist at Akamai.

Wherever There Is Data, There Is Room For Data Science

Can data scientists can have a key role in predicting all aspects of the global pandemic, regardless of their experience in biology? This is because most microbiologists and epidemiologists have had little or no training on data analysis, where data scientists can add value. 

Indeed, there are only a handful of epidemiologists who are also good data scientists with backgrounds in mathematics, computer science, and machine learning. Here, pure data scientists can certainly collaborate with microbiologists and epidemiologists to create better predictive models. The issue is that when such models are created by pure data scientists, who do not realize whether a data set is even helpful or accurate in most cases, it becomes problematic. 

Wherever there is data, there is scope for data science to make an impact. Of course, data scientists should have some level of domain knowledge so they can effectively analyse and interpret the data.

It is not merely about predicting the COVID-19 outbreak. Data scientists could also help create better models on how to optimize the hospital infrastructure, medical supply chain and medical equipment manufacturing process such as ventilators and masks, instead of forecasting the outbreak of something as complex as a global pandemic. 

Data scientists may uncover unique patterns that may be valuable to those experts by leveraging advanced machine learning techniques. But findings would need to be peer-reviewed, validated, and examined by medical and epidemiological experts as acceptable. Yet, the majority of people downloading COVID-19 datasets may be unqualified to contribute in a meaningful way to save lives, as many point out. This pertains to the complexity of understanding microbiology and epidemiology.

Share
Picture of Vishal Chawla

Vishal Chawla

Vishal Chawla is a senior tech journalist at Analytics India Magazine and writes about AI, data analytics, cybersecurity, cloud computing, and blockchain. Vishal also hosts AIM's video podcast called Simulated Reality- featuring tech leaders, AI experts, and innovative startups of India.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.