A guide to XAutoML: A tool for visual analytics

XAutoML is an analytics tool that explains the AutoML optimization procedures and machine learning pipelines constructed by AutoML.

Share

Published on March 24, 2022

by Waqqas Ansari

XAutoML is an analytics tool that explains the AutoML optimization procedures and machine learning pipelines constructed by AutoML.To make the AutoML process transparent and understandable, it combines interactive visualizations with established techniques from explainable AI(XAI). Integrating XAutoML in business can help experts to focus on model explainability and can get more transfer models so that stakeholders can believe in the solution. In this post, we are going to discuss XAutoML and how businesses can leverage this technology. Following are the major points that are to be discussed in this post.

Table of content

Introduction to XAutoML
Workflow of XAutoML
How it can be used

Introduction to XAutoML

Machine learning (ML) has ended up a critical component in lots of elements of everyday life. Yet, constructing well-appearing ML applications is a tough and time-eating challenge that calls for pretty specialized information scientists and area experts.

AutoML ambitions to enhance the present-day technique of constructing ML programs in two ways:

ML professionals can save their time by automating tedious tasks, such as hyperparameter optimization (HPO), which leads to better performance and extra attention on extra hard tasks; and
Domain experts can enable themselves to construct ML pipelines without relying upon any machine learning expert.

In AutoML a large number of ML pipeline candidates are generated that can solve a given task, however, AutoML systems create various kinds of candidates with no significant difference between their performance. Validating and selecting a model from a bunch of models is a time-consuming job for an ML developer, and for domain experts with no experience in ML, it is almost impossible to perform this task.

That is why researchers developed new visual analytics tools called eXplainable Automated Machine Learning (XAutoML) for analyzing and understanding the pipeline generated by the AutoML system.

XAutoML’s goal is to empower all AutoML users such as data scientists, domain experts and AutoML researchers, by

Transparency in the internal optimization procedure and search space of AutoML systems
Providing sufficient information to validate and select ML models that are automatically created.

Are you looking for for a complete repository of Python libraries used in data science, check out here.

Workflow of XAutoML

XAutoML workflow blends into the common data science workflow so that it makes the visual analytics tool easier to use.

Image source

This data science workflow is proposed by Wang, you can see in the image workflow is divided into three stages: preparation, modelling, and deployment, these stages are further divided into 10 parts from data acquisition to run time monitoring and model improvement. AutoML systems intend to automate the steps from feature engineering to ensembling in the data science workflow. You can see in the figure Visual Analytics is stuck with the AutoML Optimization Procedure, that section is XAutoML.

There are three major goals of visualization. The first one is efficient validation of models, the second one is understanding and diagnosing AutoML methods and the third one is search space refinements, which means adapting the underlying search space. Furthermore, its design goals are to align with the target audience of AutoML, blend with the usual data science workflow, and provide detailed information.

How it can be used

Right now XAutoML works only on the limited automated machine learning framework such as

auto-sklearn, dswizard, FLAML, Optuna, scikit-learn. But developers are trying to add more AutoML systems in future.

It is very simple to implement. With a few lines of code, you can access the visualization which is built into Jupyter. Below the image is the output of the XAutoML model.

Image source

In order to understand the visualization, it is divided into sections as shown in the above figure.

A: is the code for visualization,
B: The optimization overview on the left shows basic statistics about the optimization run
B1: it tells about the accuracy of all candidates over time
B2: ROC curves of selected candidates.
C: This is a leaderboard view that provides an overview of all evaluated candidates, users can open individual candidates to an overview.
C2: This subsection is called the performance details view. This section shows the performance metrics based on the candidates’ performance, you can see there are confusion matrix, class reports, and ROC curve.
D: These tabs allow users to access the search space and ensemble inspection. You can see there is a Jupyter logo associated with each of the information because users can export the information to Jupyter for further analysis.

In the figure above, the subsections of C’s pieces of information are compiled in a card with a short description that can be expanded by selecting it. Now we are going to discuss those cards briefly.

The performance details view:

The below figure analyzes and visualizes the basic performance metrics. The training and validation performance, the training duration, and the prediction duration are shown. For each target class, the class report provides precision and recall. A standard confusion matrix and ROC curve are displayed.

Image source

Global surrogate view:

It fits a decision tree for the prediction. Specifying a maximum number of leave nodes allows users to interactively adjust the size of the decision tree. A fidelity bar indicates how closely the decision tree matches the actual model. For further analysis, users can export the fitted decision tree to Jupyter.

Image source

Data set a preview:

In the data set preview, users can inspect the dataset of the currently selected pipeline. Users can see how each step in the pipeline modifies input data, providing data transparency. There is a lot to visualize. Users can expand it into Jupyter by just clicking on the Jupyter logo for further analysis.

Image source

Configuration view:

The next part is, it lists the hyperparameters for each step in the pipeline. For each search space, hyperparameters are plotted on a CPC plot and selected the values of each hyperparameter.

Image source

An ML model’s performance can be greatly influenced by its hyperparameters. But in reality, There are only a few hyperparameters that have a significant effect on performance. So to identify those hyperparameters, the Hyperparameter importance view provides a visual representation of the importance of each hyperparameter and the interactions between pairs of them.

fANOVA is used to calculate the importance of hyperparameters. fANOVA is a method from AutoML for evaluating how important an algorithm’s parameters are. When a user selects a hyperparameter, a detailed breakdown of well- and poorly-performing regions in the search space is displayed.

Image source

Feature importance view:

Lastly, the feature importance view shows the importance of features. The impact of each feature on the predictive power of the ML model is measured using a permutation feature importance. Users can view a ranking of all features as well as PDP and ICE plots. When performing a multiclass classification task.

Image source

‘duration’ is the most important feature then ‘savings_status’ and so on you can see in the figure.

Final words

There is a lot to cover in the XAutoML one article is not enough, but I cover almost all major functions of the XAutoML. We saw how XAutoML integrated with Jupyter to make visualization. We discussed how AutoML lacks in providing detailed information about models. Lastly, we went through each of the information provided by XAutoML.

References:

Official reacher paper

Access all our open Survey & Awards Nomination forms in one place

Waqqas Ansari

Waqqas Ansari is a data science guy with a math background. He likes solving challenging business problems through predictive modelling, descriptive modelling, and machine learning algorithms. He is fascinated by new technologies, especially those relating to machine learning.