“The pre-covid forecasting models failed to see this massive drop in demand and these models were rendered useless for any type of practical applications.”Sandip Bhattarcharjee
From simple spreadsheets to complex financial planning software, modern day companies have many tools to build forecasts using time series data. From traditional time series forecasting to models that use deep learning techniques, there are many solutions. But, the deployment is not straight forward. The real world has many variables that influence the model outcomes. Few anomalies can even topple the best of algorithms. COVID-19 pandemic, which forced many companies out of business for months, is a good case in point. The pandemic has left many gaps in the data collection bucket. Oscillating workforce capacity, inconsistent supply chain, intermittent production breaks still pose a great challenge.
For instance, food and retail businesses need to simultaneously manage in-stock availability of fresh produce, while minimising wastage. To balance these competing priorities, companies build and deploy a demand forecasting and automated ordering system for a very granular forecast at store-item-day level.
Consider Swiggy, India’s largest on demand hyperlocal marketplace for urban consumers in 500+ cities and partnerships with over 130K restaurants/stores and operates an on-demand fleet of 200K delivery partners.
To stay afloat, it is essential for Swiggy to quickly react to changes in key business metrics which are segmented spatially (e.g. zone within a city) and temporally (e.g time-of-day). Predicting changes in key business metrics, such as cost per delivery, helps the company manage associated costs.
A standard practice for the IT infrastructure teams is to monitor surge & dips in demand and provide a forecast to maintain a good inventory. But, the industry has seen a trend unlike anything in recent times. While few businesses were shut down forever, others have seen a tremendous rise in usage(eg: online delivery). It goes without saying that the time series forecasting models were put to test. So, to get an industry insider’s perspective, we got in touch with Sandip Bhattacharjee, Head of AI/ML at Tabsquare.ai.
AIM:How was the impact of pandemic on the time-series models in production?
Sandip: Pandemic presented a major disruption in the demand of a wide variety of products and services. For some sectors this was short term disruption with the demand reaching back to older levels in 6-12 months (e.g., digital food ordering), whereas for others this was more like a structural change (think: hand sanitisers, masks) with a long-term impact in the form of a sustained surge/drop in demand. At the pandemic’s peak, most of the time-series models in production failed to see the sudden surge/drop in demand. On one hand, some product/service categories were seeing >10X demand due to stockpiling by end customers leading to a complete chaos in the supply chain. On the other extreme, there were industries like Airlines which saw >60% drop in demand due to Covid. Most of the pre-pandemic time series models didn’t have any data that could mimic the scale of the Covid-19 Pandemic and this, in turn, meant that the models were completely blind to the effects of pandemic.
“Travel restrictions and lockdowns were quite novel in nature and were not part of the exogenous factors.”
Many government decisions during the pandemic were not part of the exogenous factors which the models could have known a-priori and weaved them into model training part.
In a nutshell, most forecasting models–that were trained before the pandemic– were of little use during the pandemic and signaled the need for re-training them with a fresh perspective that caters to policy changes of the governments and public’s reaction to the same.
AIM: How did your organisation adjust to these effects?
Sandip: We operate in the F & B industry and the disruptive effect was quite large, but short lived. By April 2020, we had witnessed >75% drop in orders as compared to January 20. Naturally, the pre-covid forecasting models failed to see this massive drop in demand and these models were rendered useless for any type of practical applications. However, the F&B industry in South East Asia has seen a ‘U’ shaped recovery, where the number of orders were back to pre-covid levels by the end of August. To adjust to these effects, we had to re-look at our forecasting models. Our forecasting models are now using a multi-pronged approach where our models account for a wide variety of external factors like government policy changes & its subsequent impact, Covid-19 infection spread rates and increased preference towards digital solutions (in addition to analysing past demand signals, impact of trend, seasonality and other macroeconomic factors).
AIM:What are some of the good practices while productionizing time series models?
Sandip: In my opinion, there are five main aspects towards productionizing time series model:
- Procuring and preparing the right training data – this is often the most crucial aspect of any forecasting work.
- Granularity: Deciding the right forecasting granularity is the underlying question here and that often defines the choice of explanatory features you want to put in your model. For a retail sales forecasting use case – one may ask should we do item level forecasting or category level or promoted product group (often called PPG) level forecast.? This also decides the feature transformations you want to put in your model. Keep in mind that feature transformations that make sense in one granularity may not make sense in other granularity.
- Right Cross Validation Scheme – getting a right cross validation (CV) scheme is quite crucial to getting an accurate model. The standard k-fold CV scheme doesn’t work out nicely in forecasting models. This is mainly because the standard k-fold scheme introduces a fallacy where one could be using future values to predict past values. One of the methods I use quite often is Walk Forward CV with varying forecasting horizon weights. One can find an unweighted Walk Forward CV in sklearn under TimeSeriesSplit. Another important factor while creating the right CV scheme is to ensure that the validation folds get to see the entire spectrum of variance in target values (otherwise one may run into the risk of underfitting/overfitting). The right CV scheme also plays an important role in creating the final ensemble (in case you are using ensemble of models in production). Thus, one may end up creating a bespoke CV scheme that combines these two aspects of designing the right CV scheme.
- Forecast Reconciliation – most organisations have multi-level forecasts that they use for a wide variety of use cases. There may be an aggregate level forecast that can be used to make Macro level decisions like budget planning, there can also be micro level forecasts such as planning inventories for specific categories. Last thing you want is to have a scenario where a combination of all micro level forecasts tells a completely different story than the macro level forecast. Finding the right balance is often a combination of science and business knowledge.
- Model maintenance – the journey does not end by building a great model. The main challenge is to maintain the model and adapt as needed. Typically, one should invest time in defining the bounds of accuracy metric(s) under which the model set up is deemed usable and doesn’t need any tweaks; either in hyperparameters or in feature set or in CV scheme.
This forms the basis of model health reports which need to be monitored closely. Also, there should be a system to generate alerts when the model health deteriorates beyond acceptable norms. At that point, you need timely intervention to check if the deterioration in model health is because of the underlying data generating process–which is typically observed via changes in distribution–or due to temporary shocks to the system.
A Wake Up Call For ML Based Forecasting
In machine learning, large mathematical models are adapted based on curated observations whose complexity is hard to delineate otherwise, like images, speech, biological systems or economies. These models when deployed in the real world get influenced by various parameters. The components of time-series are as complex and sophisticated as the data itself. With increasing time, the data obtained increases and it doesn’t always mean that more data means more information but, larger samples avoid the error that arises due to random sampling. Anomalies like COVID-19 help build better models in the future. But, the current deep learning assistance to the traditional time-series modelling is still underwhelming.
Few challenges that still persist within the realms of ML(h/t Oracle blogs):
- Many machine learning algorithms do not have the capability of extrapolating patterns outside of the domain of training data.
- Most machine learning models do not have the ability to derive confidence intervals.
- Most ML models are not based on statistical distributions. Confidence intervals can be estimated, but they may not be as accurate.
That said, there are many complex models or approaches like Generalized Autoregressive Conditional Heteroskedasticity (GARCH), Bayesian-based models, VAR or the ones based on deep learning such as recurrent neural networks, LSTMs, GRUs and more. But, many deep learning based models lack interpretability, which is crucial to business leaders who want to make data-driven decisions. Forecasting was not the only department that took a hit from ML’s lack of interpretability. The pre-covid computer vision models that were developed for facial recognition were deemed unfit. According to a preliminary study by the National Institute of Standards and Technology (NIST), the best commercial facial recognition algorithms showed error in matching digitally applied face masks with photos of the same person without a mask.
Also Read: Top Deep Learning Based Time Series Methods
Although deep learning has gained popularity in recent years, many organisations still use logistic regression or support vector machines(SVMs). Though model agnostic techniques can be used for traditional models, they are computationally expensive and lead to poorly approximated explanations. The pandemic has served as a decent litmus test for the models in production and exposed the vulnerabilities, which in turn can be used to build better models.