MITB Banner

Generating Suitable ML Models Using LazyPredict Python Tool

Share

While building machine learning models we are not sure which algorithm should work well with the given dataset, hence we end up trying many models and keep iterating until we get proper accuracy. Have you ever thought about getting all the basic algorithms at once to predict for model performance

LazyPredict is a module helpful for this purpose. LazyPredict will generate all the basic machine learning algorithms’ performances on your model. Along with the accuracy score, LazyPredict provides certain evaluation metrics and the time taken by each model.

Lazypredict is an open-source python package created by Shankar Rao Pandala. Development and contribution to this are still going. 

Properties of LazyPredict:

  1. As of now, it is only based on Supervised learning algorithms(Regression and Classification)
  2. Compatible with python version 3.6 and above.
  3. Could be run on Command Line Interface(CLI). 
  4. Fast in predicting as all the basic model performances for the dataset is given at once.
  5. Has an inbuilt Pipeline to scaling and transform the data and handle missing values and change categorical data to numeric. 
  6. Provides evaluation metrics on individual models.
  7. Shows the time consumed by each model to build.

In this article, I’ll be discussing how to implement LazyPredict for regression and classification models with just a few lines of code.

Installing LazyPredict:

This is very simple using pip command :

pip install lazypredict

LazyPredict for Regression

I’ll be using the Mercedes dataset from Kaggle which is a regression problem to predict the time a car will take to spend on testing each feature.

Dataset Link: https://www.kaggle.com/c/mercedes-benz-greener-manufacturing/overview

The dataset presents custom features of the cars(X0 to X385) associated with a unique ID and target variable y is the time (in seconds) the car took to pass testing for each variable.

We import the LazyPredict Supervised model wherein LazyRegressor class is present.

import pandas as pd
import lazypredict
from sklearn.model_selection import train_test_split
from lazypredict.Supervised import LazyRegressor
df = pd.read_csv('/content/drive/My Drive/datasets/mercedes.csv')
df.head()
X = df.drop(['y'], axis=1)
Y = df['y']
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)
reg = LazyRegression(verbose=0,ignore_warnings=True,
custom_metric=None, predictions=True)
models,pred = reg.fit(X_train, X_test, y_train, y_test)

Dataset is split into dependent and independent variables where independent variables are stored in X and dependent variables in Y. Then training 80% of data and testing 20%.

models variable contain all the models with two metric values and pred contains the predictions.

Parameters used in LazyRegressor():

  • verbose  – by default 0
  • ignore_warning – by default set to True, to avoid warning messages for any kind of discrepancy in generating models
  • custom_metric – by default None, can be set to custom metrics if defined
  • predictions – by default False, if set to True it’ll return predictions based on each model.
  • random_state by default is set to 42.

Note that all of these parameters are optional, if not defined they will take the default values. 

models output:

Total of 39 models.

For regression models in LazyPredict there are two evaluation metrics available RMSE(Root Mean Squared Error) and R2 squared error ranging from best to worst fit. The time taken is given in seconds for each model to build.  Predictions for regression returned in a data frame.

prediction output:

Lazypredict for Classification

For this demonstration, I’ve taken the wine recognition dataset from scikit-learn, which is a multiclass classification(class 0, class 1, class 2) containing 13 features – Alcohol, Malic acid, Ash, Alkalinity of ash, Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins, Color intensity, Hue, OD280/OD315 of diluted wines, Proline. All of these features are numeric.

012345678910111213
0114.231.712.4315.61272.803.060.282.295.641.043.92
1113.201.782.1411.21002.652.760.261.284.381.053.40
First 2 rows of dataset
from sklearn.datasets import load_wine
from lazypredict.Supervised import LazyClassifier
data = load_wine()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=.2,random_state =0)
classifier = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=None, predictions=True)
models,predictions = classifier.fit(X_train, X_test, y_train, y_test)

Dataset is loaded and then separated into two variables. All the features are stored in variable X and target value in variable y. Then training 80% of the data and testing 20%.

Classifier parameters are the same as Regressor. Lastly, models are fitted.

models output:

Total of 30 models

For classification, we need to import LazyClassifier module from lazypredict.Supervised. The available evaluation metrics are – accuracy score, balanced accuracy, f1 score, and ROC AUC.

Predictions of each model:

Conclusion

LazyPredict would be very handy for selecting the more accurate model for the dataset being used from a variety of different models along with evaluation metrics within some seconds. Thereafter the best model could be tested against hyperparameters. Easy to implement and use as it performs all the preprocessing.

The complete code of the above implementation is available at the AIM’s GitHub repository. Please visit this link to find the notebook of this code.

Share
Picture of Jayita Bhattacharyya

Jayita Bhattacharyya

Machine learning and data science enthusiast. Eager to learn new technology advances. A self-taught techie who loves to do cool stuff using technology for fun and worthwhile.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.