MITB Banner

Complete Guide to Using AutoSklearn – Tool For Faster Machine Learning Implementations

In this article, we will see how to make use of autosklearn for classification and regression problems.

Share

Automated machine learning algorithms can be a huge time saver especially if the data is huge or the algorithm to be used is a simple classification or regression type problem. One such open-source automation in AutoML was the development of AutoSklearn. We know that the popular sklearn library is very rampantly used for building machine learning models. But with sklearn, it is up to the user to decide the algorithm that has to be used and do the hyperparameter tuning. With autosklearn, all the processes are automated for the benefit of the user. The benefit of this is that along with data preparation and model building, it also learns from models that have been used on similar datasets and can create automatic ensemble models for better accuracy.

In this article, we will see how to make use of autosklearn for classification and regression problems. 

Installing the package

Before we understand how to build models with autosklearn we need to install the package in our working environment. To do this we can use the pip command if you have a Linux Operating system. 

pip3 install auto-sklearn

However, if you are making use of Colab you will need to install the following:

!sudo apt-get install build-essential swig
!curl https://raw.githubusercontent.com/automl/auto-sklearn/master/requirements.txt | xargs -n 1 -L 1 pip install
!pip install auto-sklearn

This will install the library and we can move to the next step. 

AutoSklearn for classification problems

Now that we have everything needed to start we can build a model using autosklearn on a classification type problem. For these types of problems, we need to configure the method called AutoSklearnClassifier. Let us first select the dataset and then proceed with the model. 

The dataset

I will use a simple wine quality dataset from the UCI repository. For using the same dataset you can download it here. Now let us load the dataset.

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd
from autosklearn.classification import AutoSklearnClassifier
wine= read_csv('https://raw.githubusercontent.com/sharmaroshan/Wine-Quality-Predictions/master/winequality-red.csv')
wine
classification

Splitting the dataset

Now, let us split the dataset into training and test sets and also split the dataset into features and targets respectively. 

dataset = wine.values
ft, target = dataset[:, :-1], dataset[:, -1]
X_train, X_test, y_train, y_test = train_test_split(ft, target, test_size=0.2, random_state=1)

Building the classification model

Since we are using auto-sklearn, we need not specify the name of the algorithm or the parameters. These are done automatically for us and the final result is displayed. 

autosk = AutoSklearnClassifier(time_left_for_this_task=60*2) 
autosk.fit(X_train, Y_train)
print(autosk.sprint_statistics())

Time_left_for_this_task is the amount of time the user specifies for searching all the right models. I have allowed the search to take place for two minutes but you can choose any amount of time as you wish. 

autosklearn

Now we have the statistics of the model and the algorithms that were checked were 21. Let us now see the accuracy of the model. 

pred = autosk.predict(X_test)
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, pred))
autosklearn

This is a good score since we have not scaled or pre-processed the data and we have allowed the model to run only for 2 minutes. Thus, we have built a classification model using autosklearn. 

Autosklearn for regression

We have already seen how autosklearn works for classification type of models. Next, let us implement this for a regression problem and check the results. 

The dataset

For this, I will use the built-in sklearn dataset called Boston housing dataset. Let us now load the dataset. The task here is to predict the price of houses in Boston using the features given.

from sklearn.datasets import load_boston
import pandas as pd
boston_data=load_boston()
features=pd.DataFrame(boston_data.data,columns=boston_data.feature_names)
target=pd.DataFrame(boston_data.target,columns=['TARGET'])
dataset=pd.concat([features,target],axis=1)

Splitting the dataset

Let us split this dataset into train and test data using the train_test_split function of sklearn.

xtrain,xtest,ytrain,ytest=train_test_split(features,target,test_size=0.2)

Model building

Just like we used the autosklearnclassifier for classification, we will be using autosklearnregressor for regression models. 

regressor=autosklearn.regression.AutoSklearnRegressor(time_left_for_this_task=60*5) 
regressor.fit(xtrain, ytrain)

Here I have given the time as 5 minutes to see the impact on the results. 

Now, let us see the statistics of the model along with the error rate. Since this is a regression problem we will use the mean absolute error as the metric. 

print(regressor.sprint_statistics())
pred= model.predict(xtest)
mae = mean_absolute_error(ytest, pred)
print("MAE:" ,mae)
autosklearn

This shows that the error is very less which means there is less loss and the model has performed very well. It also shows that the validation score is 0.86 which is good accuracy. As we see the model has searched 57 algorithms in the 5 minutes and has performed really well. 

Conclusion

In this article, we saw how to use autosklearn and build both classification and regression models without having to specify the name of the algorithm. We achieved good results in both of these models. AutoSklearn can be really useful in business analytics and research to build faster and better models. 

Share
Picture of Bhoomika Madhukar

Bhoomika Madhukar

I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.