Why Data Is Critical In The Fight Against Coronavirus


As the AI community goes into overdrive to help combat coronavirus, it must take a pause to ask a very critical question: how much has AI really helped in tackling the current outbreak? 

While some ground has been covered in discovering ways to make predictions about infectious disease risks and issuing early warnings, these will become less accurate as the scale of the epidemic grows. This is largely because reliable data of the kind that AI needs to feed on has been hard to get hold of.

Machine learning (ML) capabilities can also help optimise logistics and coordinate responses during a massive outbreak like this. But with lack of sufficient information about the novel coronavirus, efforts to fight the virus may be for nought – at least this time.

It could still, however, prove to be critical when the next pandemic emerges. However, some things will need to change if we want to realise the full potential of AI next time, and that involves smart data collection and coordination.

AI Amid Coronavirus

Some tech companies and health surveillance firms have deployed AI to forecast the spread of coronavirus and better understand the disease. In fact, some spotted reasonably accurate signs of a possible outbreak as early as the end of December 2019 by analysing news reports and information circulated through social media to offer early warnings.

Companies like BlueDot, Metabiota and Stratifyd have been using ML to monitor outbreaks of infectious diseases around the world.

BlueDot automates this survey and even contextualises the risk based on geographical, socioeconomic, environmental and other factors to anticipate disease risks. Metabiota uses natural language processing (NLP) algorithms to track news developments and official healthcare reports around the world to identify, quantify and mitigate the risks posed by such diseases. Much like Metabiota, data analytics company Stratifyd also flags social media mentions of such diseases and cross-references them with descriptions of diseases taken from official sources.

While developments by these companies are impressive and show how far data science, ML and AI have advanced in recent years, it may not be enough. These tools, while incapable of entirely eliminating epidemics, can be used to minimize their impact. But even this becomes a challenge with scale, since predicting how the epidemic will spread is much harder.

Data Problem With Disease Predictions

Success of a data science project on an infectious disease outbreak depends on the ability of the software to account for a wide range of sources. However, information from news outlets and particularly social media channels offer inconsistent accounts and oftentimes, may be picked up when it may already be too late to act on.

Especially during the critical response time in the early stages of Covid-19, there was widespread confusion over symptoms and how the virus could pass between people. Furthermore, with vague information on the habits people are adopting, or what is being done at homes or hospitals to contain the virus, using ML on potentially incorrect data will not generate effective solutions.

However, while diseases can spread fast, verified data and knowledge can spread even faster, and that is where the focus needs to be.

Challenges To AI’s Data Problem

One of the main reasons why AI leans heavily on information found online is because public health data is strictly protected by government agencies in many countries. 

In order to make better predictions from ML, AI needs more data from reliable sources. For instance, patients’ medical records can be used to extract life-saving information and gather valuable insights, which can, in turn, be used to identify individuals who are most at risk.

The problem is that AI companies are not allowed to access these records without walking through some critical privacy issues. Different countries have different privacy regulations when it comes to medical data, and a lot of people may not be comfortable sharing private information to third-parties in the first place.

Furthermore, international agreements may also stonewall critical information from passing between geographical borders, eliminating the possibility of feeding massive verified data into machine-learning models at a global scale.

To this end, a fitting solution may be to pool resources and create a database of vetted and verified information that can be leveraged to fight outbreaks like Covid-19. As of now, records are split across multiple databases and countries and managed by various health services, making them hard to analyse.

While reaching a common agreement on international standards will take a long time to achieve, providing patients more control of their health data and giving them the option of sharing it securely with researchers is one way to go about this challenge.


Lack of training data has limited AI’s usefulness in fighting the coronavirus outbreak, but stronger data-collection efforts could prove critical in the next by helping detect early stages of the disease.

AI has great potential to combat epidemics like coronavirus, and can fundamentally change how quickly we can react to such outbreaks.

Download our Mobile App

Anu Thomas
Anu is a writer who stews in existential angst and actively seeks what’s broken. Lover of avant-garde films and BoJack Horseman fan theories, she has previously worked for Economic Times. Contact: anu.thomas@analyticsindiamag.com

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox