The Buddhist philosophy around karma revolves about the law the moral causation. This is defined by Buddha’s statement “We are the heirs of our own actions.” This principle is very apt for the world of predictive analytics.
Once I was attending a presentation of a cross-sell modelling assignment by an analytics consultant. The project started with the usual activity of clustering followed by association analysis. At this stage, the team had divided the customer base into homogenous clusters and identified the most commonly bought combination of products. The next step was to build a purchase propensity model for the second (target) product given that the customer had bought the first product.
The consultant stated that he was not able to build a reliable model for the purchase propensity model across the products. While he was explaining the various efforts expended in this exercise, I could make out the growing frustration in him. He was almost at a loss to explain why the predictive models did not work.
At this point, I requested that he stay with this activity and run everyone through the tasks that he had followed to build this predictive model. It turned out that the team had forgotten to consider if the customer had bought the second product under the influence of a campaign, mass or personalized; or on his own. The focus was only of the customers’ demography and behavior. Apparently, this was affecting the accuracy of the model.
I proposed that he rebuilt the data set and this time only consider those customers who had been targeted in past campaigns. This was because the objective of the exercise was to identify targets for the campaign being planned. So the hypothesis should be to build a purchase propensity model for the second product given that the customer had bought the first product and was under influence of a campaign to purchase the second product.
The business scenario defines the customer behavior. Ignoring these scenarios eliminates a key trigger in understanding customer behavior. In this case, the presence or absence of some stimulus was the key trigger to the customer behavior.
I have often been asked why predictive models fall in accuracy over time. Sometime in as short a period as 3 months. Predictive models are built with certain objective. In case of earlier example, a cross sell model is built with objective of finding customers who are likely to buy an additional product. The business then plans to leverage this additional knowledge and proactively influence customers to purchase this target product. This could be via some campaign or communication or customer interaction activity.
The basic of premise of predictive analytics is that the past will continue in the future. The algorithms used in predictive model building are used to extrapolate the past into the future.
Basis the predictive scores from the cross sell model, the business decides to run some campaigns. This is an act which did not happen in the past. Hence, it is an event which breaks the continuity of past trends. As days pass, the updated data fed into the model is now influenced by the campaigns conducted by the business. This is where the models fail in their robustness over time. The actions from business did not happen in the past period on which the model was trained. These are actions which occurred in the near past and was fed into the model which did not have this event in its training. Thus, the model was trying to predict the future event based on the assumption that the particular business action did not occur (i.e. the campaign was not executed). But this obviously was not the case and hence the model predictions error out. The decision to run the campaign has affected our present and subsequently rendered the model ineffectual.
Dan Brown’s quote in his book Inferno appropriately summarizes this state of analytics.