We live in a period of computational and technological supremacy, where computing has moved from large mainframes with tangled wires to PCs to an era of cloud computing. The world around us is changing quickly, and what makes it even remarkable is not what has happened until now but what is yet to come. It is an exciting time to live where various tools and techniques are being developed, followed by a major boost in computing, which can truly be called the world of Data Science!. Machine Learning, or ML for short, has proven to be one of the most game-changing technological advancements of the current decade. In an increasingly competitive corporate world, ML enables organizations to fast-track digital transformation and move swiftly into an age of automation. AI/ML are here to stay relevant due to their demand of usage in everyday life, such as digital payments and fraud detection in banking or providing product recommendations to customers.
The adoption of machine learning algorithms and methods to learn them are well-documented and readily accessible, with different companies moving in to adopt machine learning at scale across the verticals. Every other app and software available today all over the Internet uses machine learning in some way or the other. Machine Learning has now become the go-to solution for companies to solve problems. Today, one can use machine learning to process current and past data to predict future data. Other real-world applications vary from finding the shortest way in a map to reach a destination to identifying types of cancer cells.
The process of developing a machine learning model can be complicated, and the model that is developed must be constructed in such a manner that it fits the problem perfectly. Machine learning is used to either solve a problem or offer insights that can lead to better decision making, wherever data is present. In the case of machine learning models, however, no one method can be used to solve all problems. Distinct types of algorithms are developed to solve different issues using entirely different techniques. For each class, the inputs supplied, the task completed, and the results achieved are all extremely distinct.
Some of the major types are Supervised Learning, Semi-Supervised Learning and Reinforcement Learning. Supervised learning is a machine learning algorithm where a model or a function is being developed to map the input from the test data to their respective output. Here, the training dataset is a data bank of labelled data, and the test data is a set of inputs having no labels. Unsupervised learning is a type of machine learning that uses the inferences drawn from a dataset without labels. Reinforcement learning algorithms are the type of machine learning model where tasks are being performed by an agent in a particular simulation environment. During this, the agent either receives a reward or punishment for each task that is performed. Unlike other machine learning approaches, the algorithm is not given any instructions and learns by itself.
Machine learning methods depend upon the type of task and can be further categorized as Classification models, Regression models, Clustering etc. Classification is the task of predicting a type or class of an object from a finite number of options. The output variable generated by classification is usually categorical, but regression may be used to solve a collection of issues using a continuous output variable. Predicting the price of airline tickets, for example, is a classic regression assignment. Clustering, on the other hand, is the challenge of grouping items that are related in some way. It assists in the automated identification of similar items without the need for human interaction.
What is LazyPredict?
LazyPredict is an open-source python library that helps you to semi-automate your Machine Learning Task. It can build multiple models without writing much code and helps understand which models work better for the processed dataset without requiring any parameter tuning. Using LazyPredict, one can apply all the models on that dataset to compare and analyze how our basic model is performing. Here a basic model means a “Model without parameters”. It can help deriving accuracies, and after getting accuracy for all the models, one can choose the top 5 models and then apply hyperparameter tuning to them. It comes with a Lazy Classifier to solve classification problems and Lazy Regressor to solve the regression problems.
While building machine learning models, one cannot be sure which algorithm will work well on the given dataset; hence, it ends up trying many models and keeping iterating until proper accuracy is synthesized. LazyPredict comes to the rescue for such use cases, generating all the basic machine learning algorithms’ performances on your model. Along with the accuracy score, LazyPredict also provides certain evaluation metrics and describes the time taken by each model.
Getting Started with the Code
In this article, we will implement a model using the LazyPredict library, which will help us find the best-suited model for Classification and Regression to be used for our dataset and each model’s accuracy score. In addition, we will also be visualizing the accuracy scores to compare and choose the best-suited model for the dataset being processed. The following code is inspired by the documentation provided by the creators of LazyPredict, whose link can be found here.
Installing The Library
The first step will be to install the LazyPredict library to set up our model; you can use the following code to do so,
!pip install lazypredict !pip install scipy==1.7.1
We are also installing the latest version of SciPy, which will help us process the data better.
#cloning the model !git clone https://github.com/shankarpandala/lazypredict.git
Performing Classification: Importing Dependencies
Now we will be importing the required essential dependencies for our model,
# Import libraries import lazypredict from lazypredict.Supervised import LazyClassifier from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split
Loading The Dataset
Let us now load our dataset; we use the inbuilt Breast Cancer dataset for our first problem.
# Load dataset data = load_breast_cancer() X = data.data y= data.target
Splitting the dataset into train and test,
# Splitting Dataset X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=.2,random_state =42)
With the dataset loaded and everything else set up, now let’s perform our classification task. For this, we will be setting up our classification pipeline using the LazyClassifier.
# Defining parameters for lazyclassifier clf = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=None) models_train,predictions_train = clf.fit(X_train, X_train, y_train, y_train) models_test,predictions_test = clf.fit(X_train, X_test, y_train, y_test) # Printing all the model performances models_train
As you can see, the LazyClassifier has provided us with scores for all the possible models according to our dataset!
Let us now visualize the scores to derive a better understanding,
#plotting the accuracy scores import matplotlib.pyplot as plt import seaborn as sns plt.figure(figsize=(10, 5)) sns.set_theme(style="whitegrid") ax = sns.barplot(x=models_train.index, y="Accuracy", data=models_train) plt.xticks(rotation=90)
import matplotlib.pyplot as plt import seaborn as sns plt.figure(figsize=(5, 10)) sns.set_theme(style="whitegrid") ax = sns.barplot(y=models_train.index, x="Accuracy", data=models_train)
Using such visualizations, we can now easily understand the best model to be used for optimal accuracy.
We can also perform Regression using the LazyRegressor module on another dataset!
# Importing the libraries from lazypredict.Supervised import LazyRegressor from sklearn import datasets from sklearn.utils import shuffle import numpy as np # Loading the Boston dataset boston = datasets.load_boston() X, y = shuffle(boston.data, boston.target, random_state=42) #Splitting Data X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=.2,random_state =42) # building the pipeline reg = LazyRegressor(verbose=0,ignore_warnings=False, custom_metric=None) models_train,predictions_train = reg.fit(X_train, X_train, y_train, y_train) models_test,predictions_test = reg.fit(X_train, X_test, y_train, y_test) # Printing all model performances models_train
Creating Visualization for a particular column, such as for R-squared Scores only,
import matplotlib.pyplot as plt import seaborn as sns plt.figure(figsize=(10, 5)) sns.set_theme(style="whitegrid") ax = sns.barplot(x=models_train.index, y="R-Squared", data=models_train) ax.set(ylim=(0, 1)) plt.xticks(rotation=90)
import matplotlib.pyplot as plt import seaborn as sns models_train["R-Squared"] = [0 if i < 0 else i for i in models_train.iloc[:,0] ] plt.figure(figsize=(5, 10)) sns.set_theme(style="whitegrid") ax = sns.barplot(y=models_train.index, x="R-Squared", data=models_train) ax.set(xlim=(0, 1))
In this article, we learnt about the importance of different machine learning models and their uses. We also created a model using the LazyPredict library that helps us understand the best-suited model for our dataset for optimal results and accuracy. The Following implementation can be found as a Colab notebook, using the link here.
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
Victor is an aspiring Data Scientist & is a Master of Science in Data Science & Big Data Analytics. He is a Researcher, a Data Science Influencer and also an Ex-University Football Player. A keen learner of new developments in Data Science and Artificial Intelligence, he is committed to growing the Data Science community.