A lot of business processes are functioning with the help of data science implementations, i.e. machine learning models, time series models, AI solutions etc. These models take into consideration the historical data as well as past trends. In the Pre-COVID arena, all models were working well with the changing environment. All predictions were serving the purpose of the task as desired. With the advent of 2020, COVID 19 emerged on this Earth and caused a major disruption in our usual modelling behaviour.
COVID has affected every industry in a different way. For example, consumers have engaged in a lot of panic buying situations and supply hoarding due to the lockdown scenario in major parts of the world. Hence the consumer goods industry saw a heavy increase in supplies in the month of March and April. In western countries like the UK, US the ‘eating at home scenarios’ increased substantially. There was a major shift from eating outside to eating at home due to restricted movement for people.
The sales for most of the food products went up due to the situation. This sudden spike in sales has been very beneficial from the performance standpoint. The traditional models can no longer be used for sales predictions as they are unable to capture these unusual spikes in sales over the past two to three months. This is just one of the use cases. A similar kind of situation will be observed with respect to every industry.
Sign up for your weekly dose of what's up in emerging technology.
For solving this kind of a complex situation, some careful analysis will need to be undertaken by Analysts and Data Scientists. As these spikes as in the above case are infrequent numbers, they are classified as outliers. Outlier detection and analysis are terms which need to be researched so that we can arrive at a solution to the above problem. Outlier detection for this scenario is an easy process as it is visible on the graph explicitly. The major challenge will come around handling these outliers in the most efficient way.
There can be multiple ways to approach this. You can design a case by case solution where you can make different assumptions around the COVID scenario. The first scenario would be without COVID coming into the picture. The subsequent scenarios can be designed making assumptions until when COVID will last. The usual times series or linear regression models can be used but with some smoothing factors so that the predictions are in line with the expected sales and do not overrun expectations. The kind of outlier handling technique to be used depends on the modifications that you want to do.
I have recently come across a few outlier handling techniques which can be used –
- Bootstrapping methods – These are methods which allow you to boost model performance. It handles the outlier very efficiently. It has various implementations in Python. Some examples of Boosting method implementations are XgBoost, AdaBoost etc.
- Generalized Estimating Equations – This method is used when observations are possibly correlated within a cluster but uncorrelated across clusters. This helps in handling the outliers. It has an implementation in the statsmodels library in Python.
- M – Estimation Method – This is similar to linear regression but modifies the function by removing the square operation and replaces it with another function which helps us with dealing with the outliers. It has an implementation in the statsmodels library in Python.
These are a few of the methods which could be used for handling the outliers depending on the business scenario. Depending on the use case, the data points will display different behaviour. Referred to as “concept drift”, there are changes in human behaviour depending on the situation outside i.e. lockdown, self-isolation etc. Concept drift is affecting all kinds of industries, for example, fraud prediction scenarios cannot be implemented in the same ways as before.
Models which have been deployed and productionalized work on the same old features and the historical data. Models in production don’t account for variables and don’t factor in evolving trends in the real world. But with the changing agile world circumstances, these models will not be able to make predictions according to the scenario. The models will have to be changed to incorporate the changing, agile environment so that they are able to provide the desired results and add value to the business.
Models need to be more adaptive and able to leverage business strategies in the best possible way. Models will become obsolete if not altered. There will have to be mechanisms in place for tracking the trends and the errors in model value predictions. Data Science needs to quickly adapt to the fast paced changes that are happening in the world due to the pandemic.
Companies have the right kind of data in place, but right now it’s all about the modifications that you make and leverage the data in the best way possible. Models will have to be agile and should be able to adapt to immediate emergencies like COVID etc. Data Science teams will need to make models dynamic so that they can be monitored to assess the situation.