Complete Guide to Using AutoSklearn – Tool For Faster Machine Learning Implementations

In this article, we will see how to make use of autosklearn for classification and regression problems.

Automated machine learning algorithms can be a huge time saver especially if the data is huge or the algorithm to be used is a simple classification or regression type problem. One such open-source automation in AutoML was the development of AutoSklearn. We know that the popular sklearn library is very rampantly used for building machine learning models. But with sklearn, it is up to the user to decide the algorithm that has to be used and do the hyperparameter tuning. With autosklearn, all the processes are automated for the benefit of the user. The benefit of this is that along with data preparation and model building, it also learns from models that have been used on similar datasets and can create automatic ensemble models for better accuracy.

In this article, we will see how to make use of autosklearn for classification and regression problems. 

Installing the package

Before we understand how to build models with autosklearn we need to install the package in our working environment. To do this we can use the pip command if you have a Linux Operating system. 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

pip3 install auto-sklearn

However, if you are making use of Colab you will need to install the following:

!sudo apt-get install build-essential swig
!curl https://raw.githubusercontent.com/automl/auto-sklearn/master/requirements.txt | xargs -n 1 -L 1 pip install
!pip install auto-sklearn

This will install the library and we can move to the next step. 

AutoSklearn for classification problems

Now that we have everything needed to start we can build a model using autosklearn on a classification type problem. For these types of problems, we need to configure the method called AutoSklearnClassifier. Let us first select the dataset and then proceed with the model. 

The dataset

I will use a simple wine quality dataset from the UCI repository. For using the same dataset you can download it here. Now let us load the dataset.

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd
from autosklearn.classification import AutoSklearnClassifier
wine= read_csv('https://raw.githubusercontent.com/sharmaroshan/Wine-Quality-Predictions/master/winequality-red.csv')
wine
classification

Splitting the dataset

Now, let us split the dataset into training and test sets and also split the dataset into features and targets respectively. 

dataset = wine.values
ft, target = dataset[:, :-1], dataset[:, -1]
X_train, X_test, y_train, y_test = train_test_split(ft, target, test_size=0.2, random_state=1)

Building the classification model

Since we are using auto-sklearn, we need not specify the name of the algorithm or the parameters. These are done automatically for us and the final result is displayed. 

autosk = AutoSklearnClassifier(time_left_for_this_task=60*2) 
autosk.fit(X_train, Y_train)
print(autosk.sprint_statistics())

Time_left_for_this_task is the amount of time the user specifies for searching all the right models. I have allowed the search to take place for two minutes but you can choose any amount of time as you wish. 

autosklearn

Now we have the statistics of the model and the algorithms that were checked were 21. Let us now see the accuracy of the model. 

pred = autosk.predict(X_test)
print("Accuracy score", sklearn.metrics.accuracy_score(y_test, pred))
autosklearn

This is a good score since we have not scaled or pre-processed the data and we have allowed the model to run only for 2 minutes. Thus, we have built a classification model using autosklearn. 

Autosklearn for regression

We have already seen how autosklearn works for classification type of models. Next, let us implement this for a regression problem and check the results. 

The dataset

For this, I will use the built-in sklearn dataset called Boston housing dataset. Let us now load the dataset. The task here is to predict the price of houses in Boston using the features given.

from sklearn.datasets import load_boston
import pandas as pd
boston_data=load_boston()
features=pd.DataFrame(boston_data.data,columns=boston_data.feature_names)
target=pd.DataFrame(boston_data.target,columns=['TARGET'])
dataset=pd.concat([features,target],axis=1)

Splitting the dataset

Let us split this dataset into train and test data using the train_test_split function of sklearn.

xtrain,xtest,ytrain,ytest=train_test_split(features,target,test_size=0.2)

Model building

Just like we used the autosklearnclassifier for classification, we will be using autosklearnregressor for regression models. 

regressor=autosklearn.regression.AutoSklearnRegressor(time_left_for_this_task=60*5) 
regressor.fit(xtrain, ytrain)

Here I have given the time as 5 minutes to see the impact on the results. 

Now, let us see the statistics of the model along with the error rate. Since this is a regression problem we will use the mean absolute error as the metric. 

print(regressor.sprint_statistics())
pred= model.predict(xtest)
mae = mean_absolute_error(ytest, pred)
print("MAE:" ,mae)
autosklearn

This shows that the error is very less which means there is less loss and the model has performed very well. It also shows that the validation score is 0.86 which is good accuracy. As we see the model has searched 57 algorithms in the 5 minutes and has performed really well. 

Conclusion

In this article, we saw how to use autosklearn and build both classification and regression models without having to specify the name of the algorithm. We achieved good results in both of these models. AutoSklearn can be really useful in business analytics and research to build faster and better models. 

Bhoomika Madhukar
I am an aspiring data scientist with a passion for teaching. I am a computer science graduate from Dayananda Sagar Institute. I have experience in building models in deep learning and reinforcement learning. My goal is to use AI in the field of education to make learning meaningful for everyone.

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR