Now Reading
Automate Your ML Pipelines With EvalML

Automate Your ML Pipelines With EvalML

EvalML

EvalML is an open-source Python library created by folks at Alteryx, the people behind Featuretools, that facilitates automated machine learning (AutoML) and model understanding. It abstracts multiple modelling libraries and provides a simple, unified API for building machine learning models. EvalML supports a wide range of supervised learning problems such as regression, binary classification and multiclass classification. 

The pipelines created by EvalML’s AutoMLSearch includes preprocessing and featuring engineering out of the box. The user has to identify the target attribute; AutoML runs a search algorithm to train and score several models for the problem type. This enables the user to select one of the models based on their scores and then use it to generate predictions or do analysis. It also supports custom problem-specific objective functions, enabling users to specify exactly what makes a model valuable for their use case. 

Not only do these custom objectives help steer the AutoML search towards models with higher impact, but they are also used to tune the classification thresholds of binary classification models. You can find an example of a custom objective function created for the task credit card fraud detection here. Additionally, EvalML has a collection of models and tools for model understanding. It currently supports feature importance and permutation importance, partial dependence, precision-recall, confusion matrices, ROC curves, prediction explanations, and binary classifier threshold optimization.

Furthermore, EvalML provides data checks that can be used to catch common problems with data before modelling. This helps prevent model quality problems, ambiguous bugs and stack traces. Currently EvalML includes the following data checks:

  • An approach for detecting target leakage by providing the model with information during training that won’t be available at prediction-time
  • Detection of invalid datatypes 
  • Checking for class imbalance
  • Looking for redundant features like highly null columns, constant columns, and columns which are probably an ID and not useful for modelling.

Using EvalML’s AutoML to search for the best Classification Algorithm

  1. Install EvalML from PyPI.
pip install evalml
  1. Load the breast cancer dataset and split it.
import evalml
from evalml import AutoMLSearch
X, y = evalml.demos.load_breast_cancer()
X_train, X_test, y_train, y_test = evalml.preprocessing.split_data(X, y, problem_type='binary') 
  1. Run the search for the best classification model.
automl = AutoMLSearch(X_train=X_train, y_train=y_train,   problem_type='binary')
automl.search() 

This uses the default objective function, binary log loss. 

Apply>>
See Also

  1. Print model rankings and get the best pipeline.
automl.rankings
automl.describe_pipeline(automl.rankings.iloc[0]["id"])
  1. Logistic Regression is the best model for the binary log-loss objective. Let’s change it to the area under the Precision-Recall curve and see how that impacts the best model.
 automl_auc = AutoMLSearch(X_train=X_train, y_train=y_train,
                           problem_type='binary',
                           objective='auc',
                           additional_objectives=['f1', 'precision'],                    
                           optimize_thresholds=True)
 automl_auc.search() 
  1. Print model rankings and get the best pipeline.
automl_auc.rankings
automl_auc.describe_pipeline(automl.rankings.iloc[0]["id"])
  1. The optimal model has now changed to ExtraTreesClassifier. This model can be used to make predictions on the validation/test data or saved for use later.
 best_model = automl_auc.best_pipeline
 best_model.save("model.pkl")
 old_model=automl.load('model.pkl')
 old_model.predict_proba(X_test).to_dataframe() 

Last Epoch

This article introduced EvalML, a Python for automating machine learning. In addition to automating searching for the best model for a particular task, EvalML has support for automated data quality checks, custom objectives, automated feature engineering and some rudimentary tools for understanding machine learning models. Combined with Alteryx’s existing solutions, Featuretools and Compose, EvalML enables users to combine different tables/data sources, create transformed and aggregated features and then use these features to search for the best machine learning models. 

To learn more about EvalML, refer to the following resources:

What Do You Think?

Join Our Discord Server. Be part of an engaging online community. Join Here.


Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top