Automate Your ML Pipelines With EvalML

EvalML is an open-source Python library that facilitates automated machine learning (AutoML) and model understanding.

EvalML is an open-source Python library created by folks at Alteryx, the people behind Featuretools, that facilitates automated machine learning (AutoML) and model understanding. It abstracts multiple modelling libraries and provides a simple, unified API for building machine learning models. EvalML supports a wide range of supervised learning problems such as regression, binary classification and multiclass classification. 

The pipelines created by EvalML’s AutoMLSearch includes preprocessing and featuring engineering out of the box. The user has to identify the target attribute; AutoML runs a search algorithm to train and score several models for the problem type. This enables the user to select one of the models based on their scores and then use it to generate predictions or do analysis. It also supports custom problem-specific objective functions, enabling users to specify exactly what makes a model valuable for their use case. 

Not only do these custom objectives help steer the AutoML search towards models with higher impact, but they are also used to tune the classification thresholds of binary classification models. You can find an example of a custom objective function created for the task credit card fraud detection here. Additionally, EvalML has a collection of models and tools for model understanding. It currently supports feature importance and permutation importance, partial dependence, precision-recall, confusion matrices, ROC curves, prediction explanations, and binary classifier threshold optimization.


Sign up for your weekly dose of what's up in emerging technology.

Furthermore, EvalML provides data checks that can be used to catch common problems with data before modelling. This helps prevent model quality problems, ambiguous bugs and stack traces. Currently EvalML includes the following data checks:

  • An approach for detecting target leakage by providing the model with information during training that won’t be available at prediction-time
  • Detection of invalid datatypes 
  • Checking for class imbalance
  • Looking for redundant features like highly null columns, constant columns, and columns which are probably an ID and not useful for modelling.

Using EvalML’s AutoML to search for the best Classification Algorithm

  1. Install EvalML from PyPI.
pip install evalml
  1. Load the breast cancer dataset and split it.
import evalml
from evalml import AutoMLSearch
X, y = evalml.demos.load_breast_cancer()
X_train, X_test, y_train, y_test = evalml.preprocessing.split_data(X, y, problem_type='binary') 
  1. Run the search for the best classification model.
automl = AutoMLSearch(X_train=X_train, y_train=y_train,   problem_type='binary') 

This uses the default objective function, binary log loss. 

Download our Mobile App

  1. Print model rankings and get the best pipeline.
  1. Logistic Regression is the best model for the binary log-loss objective. Let’s change it to the area under the Precision-Recall curve and see how that impacts the best model.
 automl_auc = AutoMLSearch(X_train=X_train, y_train=y_train,
                           additional_objectives=['f1', 'precision'],                    
  1. Print model rankings and get the best pipeline.
  1. The optimal model has now changed to ExtraTreesClassifier. This model can be used to make predictions on the validation/test data or saved for use later.
 best_model = automl_auc.best_pipeline"model.pkl")

Last Epoch

This article introduced EvalML, a Python for automating machine learning. In addition to automating searching for the best model for a particular task, EvalML has support for automated data quality checks, custom objectives, automated feature engineering and some rudimentary tools for understanding machine learning models. Combined with Alteryx’s existing solutions, Featuretools and Compose, EvalML enables users to combine different tables/data sources, create transformed and aggregated features and then use these features to search for the best machine learning models. 

To learn more about EvalML, refer to the following resources:

More Great AIM Stories

Aditya Singh
A machine learning enthusiast with a knack for finding patterns. In my free time, I like to delve into the world of non-fiction books and video essays.

AIM Upcoming Events

Regular Passes expire on 3rd Mar

Conference, in-person (Bangalore)
Rising 2023 | Women in Tech Conference
16-17th Mar, 2023

Early Bird Passes expire on 17th Feb

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
27-28th Apr, 2023

Conference, Virtual
Deep Learning DevCon 2023
27 May, 2023

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox