Now Reading
A Machine Learning Approach For Monitoring COVID19 Indicators

A Machine Learning Approach For Monitoring COVID19 Indicators

Gaurav Dhooper

As per the COVID-19: Briefing materials- Global Health and crisis response from Mckinsey,  business leaders are asking the following three questions in relation to this upheaval:

  1. How deep are the demand reductions?
  2. How long could the disruption last?
  3. What shape could recovery take?

Since there is a direct impact visible on the depth and length of disruption and shape of the recovery curves, a machine learning (ML) approach could help in reasoning-based monitoring of indicators for the aforesaid questions.

1. In order to understand the depth of disruption, following indicators may need to be monitored:

  • Time to implement social distancing after community transmission is confirmed– Since this pandemic is spreading through community and local transmission, it is very crucial to monitor the time taken to implement social distancing which can be accomplished by using time series analysis. The purpose of Time Series Forecasting is generally twofold- to understand or model the stochastic mechanisms giving rise to an observed series and to predict or forecast the future values of a series based on the history of that series.
  • Number of cases- absolute: Classification algorithms can be used for monitoring the number of absolute active COVID19 cases.
  • Geographic distribution of cases relative to economic contribution- Clustering algorithms can help in monitoring this indicator as it allows grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). For example, Maharashtra State contributes considerable portion to the Indian entertainment industry and economy which has maximum number of COVID-19 cases. As per the India Brand Equity Foundation, Indian media and entertainment industry is expected to reach around Rs 307,000 crore (US$ 43.93 billion) by 2024 which may face an adverse hit in the present scenario.
  • Cuts in spending on durable goods- Due to reduced supply and shortage of components, promotional offers and discounts are also being cut on finished products which is leading to cuts in spending on durable goods such as refrigerators, air conditioners, LCDs etc. This indicator can be  monitored with the help of Regression algorithms where the “outcome variable” of downfall in demand can be analyzed based on the “input features” of cutting promotional offers and discounts.
  • Extent of behavior shift- Sentiment analysis is used to understand the change in behavior and is increasingly being used for social media monitoring, brand monitoring, the voice of the customer (VoC), customer service, and market research. In terms of COVID19 lockdown, it is very crucial to analyze the post COVID19 period on the behavior shift in spending on socializing such as eating out at restaurants, entertainment etc.
  • Extent of travel reduction- Post COVID19 situation needs to be analyzed in terms of the extent of travel reduction with the help of time series analysis and deep learning models. It will impact both tourist and business travels due to one or more independent variables such as employment stability, travel alternatives, urgency of travel and travel cost.

2. In order to understand the length of disruption, the following indicators may need to be monitored:

  • Rate of change of cases- Various factors such as lack of community and local transmission, self-quarantine and self-isolation will allow the chain to break. A time-series analysis will help in understanding the rate of change of COVID19 cases.
  • Evidence of virus seasonality– Time Series Analysis can provide predictions for COVID19 seasonality in a linear or nonlinear pattern that repeats at regular or irregular intervals. It is also stated that with the increase in temperature, the impact and spread of this virus will decrease but it is still unproven. A timeseries analysis will provide the seasonality data and identify the patterns, if observed.
  • % of cases treated at home- This indicator will comprise of structured data classification and can be handled by classification algorithms.
  • % utilization of hospital beds- Utilization forecasting uses linear regression models to extrapolate and make predictions based on existing data. This will help in flattening the curve to let the active cases remain below the threshold capacity of hospitals to treat the infected people.
  • Availability of therapies- Availability of therapies based on infection severity and spread can be dealt by the simple binary classification algorithms and shall allow prediction of future cases with such medical diagnosis details.
  • Case fatality ratio Vs. other countries- Case fatality rate is the proportion of deaths from a certain disease compared to the total number of people diagnosed with the disease for a certain period of time. Time-series and Logistic regression algorithms can be used for monitoring this indicator as the Indian Council for Medical Research (ICMR) mentioned that “Till we see a significant number of cases to indicate community transmission, let us not overinterpret things”.
  • Late payment/credit defaults- The best performing model for detection of defaulting credit card customers has been naive Bayes model.
  • Stock market & volatility indexes- Since stock market and volatility indexes require hypotheses, hence k-nearest neighbors algorithm (k-NN) algorithm is used for both classification and regression. It is a useful technique which can assign weights to the contributions of the neighbors so that the nearer neighbors contribute more to the average than the more distant ones.
  • Purchasing managers index: It is an indicator of economic health for manufacturing and service sectors. The forecasting done with the help of Neural networks proved to be better than that produced by linear regression models.
  • Initial claims for unemployment- The Division of Employment Security has announced that the first payments for unemployment claims connected to the coronavirus in North Carolina will start going out this week. A similar announcement has been made by the government of India for 1.7 lakh crore relief package. The time taken for disbursement of relief package will decide the length of disruption that is going to last. Predictive ML time series models will help in faster claims fulfillment.

3. In order to understand the shape of recovery, following indicators will lay the foundation of positive surge in the economy:

  • Effective integration of public health measures with economic activity- In order to conduct an economic activity, it is necessary to effectively integrate public health measures with that economic activity. For example, regular sanitization of workplace and disease prevention steps in a manufacturing unit may need a certification from the authorized body before starting the business operations which may require sustaining economic models. Correlation and regression models will help in analyzing this indicator.
  • Potential for different disease characteristics over time- A predictive ML model for finding potential threat for different disease characteristics can be analyzed with time series forecasting which will help in risk mitigation and making business continuity plans. A random forest model can also help in this type of monitoring as it may require constructing a multitude of decision trees.
  • Bounce-back in economic activity- It is again a predictive model indicator based on time-series analysis for uplifting the slowed down sectors which will depend on the depth and length of disruption and economic policies.
  • Various epidemiological and economic indicators- As per the report of the WHO-China Joint Mission on COVID-19,  there has been a relentless focus on improving key performance indicators, for example, constantly enhancing the speed of case detection, isolation and early treatment. It is very important to understand the epidemiological indicators based on areas without active cases, areas with sporadic cases (non-linear regression), areas with community clusters (clustering) and areas with community transmissions (classification). The various economic indicators involving leading indicators (such as money supply, interest rate spread), lagging indicators (such as average duration of unemployment, change in Consumer Price Index) and coincident indicators (such as Gross Domestic Product, industrial production) will have to be analyzed through the encompassing ML models. 


It is evident that the response strategy to this crisis can allow the faster recovery and build the confidence level. As per the World Economic Forum, the UN’s trade and development agency says the slowdown in the global economy caused by the coronavirus outbreak is likely to cost at least $1 trillion.

See Also
How ClickPost Is Using Logistics Intelligence Solutions To Reduce Return Rates & Costs

This impact cues the use of econometric ML models providing empirical analysis to economic relationships and allowing data-driven decisions. Despite the models suggested above, a conclusive approach may require testing with different algorithms as the change in training data may require evolutionary approach for selecting an interpretable model or model-agnostic method.


  11. Wikipedia Links

Views expressed in this article are my own and may not necessarily be of my employer.

Provide your comments below


If you loved this story, do join our Telegram Community.

Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top