Listen to this story
Continuing the series of Model governance, in this article we will look at it in detail on how to unbox the black box and assess its contents.
Here we pass the data to the black box and measure the performance. What we are measuring is very important. We need to identify the technical metrics and business metrics based on the problem we are solving. The council needs to focus on whether the right technical measures are identified that fits the problem statement. In addition, investigating the baseline is also important. In cases where the baseline is not there, random prediction could be used as the baseline. The council also ensures that the metrics do not change by different cuts of dataset and is consistent. In summary, the council focuses on:
- Technical and business metrics.
- What is the baseline and how to measure it?
- Is there a model lift over the baseline?
- Does the model hold stable for different samples of dataset?
- Does the model hold stable for samples drawn from different periods of time?
Peeping inside the “box” :
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Here we open the box and check the wiring and the circuit. We could classify the validations here into multiple phases—the problem feasibility, technical problem definition, features and dependent variable identification, algorithm, model metrics, model explainability. Briefly, we will capture each of these stages below.
- Problem feasibility : The council needs to check if the underlying business problem warrants a machine learning approach or not. For example, if the marketing budget is INR 10 lakh and providing offers to all the prospects will cost only INR 8 lakh, then it’s better to target everyone and we do not need any model for selecting the ones with high probability of purchase. The council needs to ensure that the problem is indeed worth an ML solution and simple heuristics cannot solve the problem.
- Technical Problem : Once you establish the need of an ML model, the technical problem needs to be validated. Council should check if the technical problem is valid and represents the business problem. For example, if a company wants to predict whether a user will buy a product when given an offer or not. In case they have collected user behaviour data in the past, then solving the problem as predicting propensity to purchase may not be the right fit here. What needs to be predicted rather is the user propensity to purchase when given an offer. Council is responsible to ask deeper questions on business context and should validate the technical solution proposed by the data scientists.
- Features and Dependent Variables: The next step is to evaluate whether the right features are being used. One needs to check if there is any “leakage” in the feature. If the feature happens to come after the event predicted or required for the event to happen, it is considered. For example, “Add to Cart” event as a feature for predicting propensity to purchase since all purchases require adding items to cart. Feature time-lag also needs to be checked. For instance, when predicting rainfall for next ten days, the training dataset needs to have a lag of ten days on all the features. The feature should also be feasible to be computed when the model is being scored. The council also pays attention to how the missing variables and outliers are treated. Sometimes missing values indicate an absence and, hence, should be used in the algorithm and, in other cases, they should be imputed. Understanding how these variables are explored and bi-variate analyses done to establish feature importance is critical.
Validating whether the right dependent variable is taken into the model is important. Let’s say a food delivery company is building dish recommendations—then the question arises, whether the “dish” can become the dependent variable or should it be “Dish Category” in order to avoid sparsity of the model. The exact “dish” could be based on the most frequent “dish” in the predicted category in that location.
- Algorithm: The council then deep dives on the algorithm. Feature processing in the last step and algorithm should also be investigated together. For instance, algorithms like ‘Random Forest’ take in missing values whereas neural networks do not. Additionally, certain algorithms are sensitive to correlations while certain others are not. If the underlying algorithm is sensitive to correlation, then only the non-correlated features should be included in the modelling exercise. The team should also deeply understand the complexity of the algorithm, how much of that is required to explain the problem at hand as well as the trade off on the impact. Following are a few sample of the things that could be looked at:
- Can a smaller set of features explain the model? What is the trade off between model performance and number of features?
- How much non-linear models are improving the performance as compared to the linear equivalents?
- How does the model performance change when the categorical features are treated in different ways? Categorical features could induce large sparsity, hence, testing for this is important.
- Stability of the model with multi-fold cross validations.
- Checking with different algorithms if they have significantly different performance, if so, why?
- Impact of class-weights in the model performance. What if the distribution of classes changes in the future?
- Model Metrics: Compiling with an exhaustive list of model metrics and validating the metrics across all the technical metrics. Being conscious of the underlying distribution of the dependent variable and also recommending metrics based on that. For example, for extremely “imbalanced” sets, AUC PR needs to be looked at as compared to AUC. The metrics also depend on the type of algorithm used—supervised or unsupervised—to start with but also on the specific ML algorithm itself. Various charts—KS chart (for classification), Lift charts, Error with Epoch and others—should also be analysed to understand how the algorithm is functioning and if there are no red signals.
- Model Explainability: This is an important step even if the ML model is not built for explanation purposes. The top features provide an intuitive understanding of how the model is fitting. Validating that with the business context understanding is important. One should also look at whether a single variable is explaining most of the variation and if it does, it may indicate a “leakage” phenomenon. You could also check dropping the variable and rebuilding the model. If the performance drops substantially then the variable needs to be deep-dived further. There are packages that also provide the direction of impact of a variable and we need to check if this also makes sense based on the context at hand. Inter-relations of top variables are to be analysed further to check if they make intuitive sense in explaining the model at hand.
Who should be in the governance council
Having witnessed what is done in the governance process, we need to identify who could participate in the council. The following criteria might help. This is just a guideline and depending on the size and the context of the company, they should be modified:
- Experts in Data Science.
- Enough knowledge in data governance regulation.
- Have authority in rejecting models.
- Understand business context.
- Care about the welfare of the organisation.
“There are no solutions, there are only trade-offs. You try to get the best trade-off you can get, that’s all you can hope for.” — Dr. Thomas Sowell
The key thing to remember is that we need a governance process but, at the same time, innovation and experimentation cannot be stifled. Finding a process that ensures robust and reliable data science models while encouraging learning and out-of-the-box thinking is at the heart of the model governance process.
This article is written by a member of the AIM Leaders Council. AIM Leaders Council is an invitation-only forum of senior executives in the Data Science and Analytics industry. To check if you are eligible for a membership, please fill out the form here