Machine learning models are subject to entropy. Model drift is the decay in a model’s predictive power due to alterations in the environment. The world is dynamic, and data is constantly changing. The performance of an ML model is as good as the data it is trained on, but these models become obsolete when the world data they are modelled upon changes or the machine loses its predictive power. Further, a change in the environment changes the relationship between the model’s variables. Thus, there is a need to regularly monitor models in real-time.
Two major aspects of machine learning are the training data and the desired outcome. Hence, we have two types of model drift: Data drift and concept drift.
Sign up for your weekly dose of what's up in emerging technology.
Simply put, data drift occurs when the data a model is trained on changes. The change in input data or independent variable leads to poor performance of the model. Microsoft has stated data drift to be one of the top reasons model accuracy degrades over time.
Data drift is generally a consequence of seasonal changes or changes in consumer preferences over time. For instance, educational data collected before Covid shows a lesser preference for online learning than post-covid. Similarly, the demand for lipsticks has reduced considerably after Covid while face masks became a norm. As a result, Models trained on previous data will be useless. Since the input data has changed, the distribution of the variables becomes different and confuses the model.
Mathematically, data drift can be defined as Pt1 (X) ≠ Pt2 (X)
Data drift can also occur at the expense of being trained on specific data but being exposed to a wider scope of data upon production. Spam detection is a good case in point. If the training data included fewer examples of spam emails, the machine is more likely to misidentify spam emails as primary once deployed.
Sequential analysis methods, model-based methods, and time distribution-based methods are key to identifying and overcoming data drift. Drift detection method and early DDM in sequential analysis identify the model’s error rate. The model-based method uses a custom model to identify the drift. The time distribution-based methods leverage statistical distance to calculate the drift between probability distributions.
Contrary to data drift, where the data changes, concept drift occurs when the model’s predicted target or its statistical properties change over time. During training, the model learns a function that maps the target variable, but over time, it unlearns them or is unable to use the patterns in a new environment. For instance, since the definition of spam has evolved, models have to make adjustments. Concept drift also occurs seasonally, suddenly or gradually. For example, consumer behaviour after the Covid pandemic is a sudden drift while changes in fashion trends are gradual.
Mathematically, concept drift can be defined as Pt1 (Y|X) ≠ Pt2 (Y|X)
Concept drift can be measured by continuously monitoring training data and identifying changes within the dataset relationships. Popular concept drift detection algorithms include ADWIN (ADaptive WINdowing) for streaming data and the Kolmogorov–Smirnov test, the chi-squared test or adversarial validation for batched data. These are applied to the model labels, predictions and data features to identify a drift.
Overcoming model drift
Concept and data drift are a response to statistical changes in the data. Hence, approaches monitoring the model’s statistical properties, predictions, and their correlation with other factors help identify the drift. But several steps need to be taken post identification to ensure the model is accurate.
Two popular approaches are online machine learning and periodic retraininG. Online learning involves updating the model to learn in real-time. This allows the data to be sequential. This allows the models to take batches of samples simultaneously and optimise the batch of data in one go. Online machine learning allows us to update learners in real-time. In online learning the models are learned in a setting where it takes the batches of samples with the time and the learner optimises the batch of data in one go. Since these models work on the fixed parameters of a data stream, they must retain the new patterns of the data. Periodic retraining of the model is also critical. Since an ML model degrades every three months on average, retraining them on regular intervals can stop drift in its tracks.