Guide To Microsoft’s FLAML

FLAML

The current data science scenario raises a big question: how and what to select as a machine learning model to predict all best. When selecting, we use conventional ways like hyperparameter tuning, GridSearchCV and Random search to choose the best-fit parameters. These conventional techniques help us a lot, but they are time-consuming, take high computation power, and are a huge weight for our working environment.

What if a lightweight model can tell us the best fit model and the best parameters to tune for our dataset? Imagine how much research and time it can save. For this reason, Microsoft FLAML  comes into the picture.

Introduction to FLAML 

FLAML is a python package that can tell us the best-fit machine learning model for low computation. Thus, it removes the burden of the manual process of choosing the best model and best parameter. 

Nowadays, many businesses started building machine learning embedded applications, and it costs a lot to select a single machine learning model from a variety of machine learning models. After choosing a model, it is also time-consuming to select the best parameters for every dataset. To solve this problem, Microsoft built an AutoML system which is mainly focused on:

  • Model selection
  • Hyperparameter tuning
  • Feature engineering
  • Neural architecture search
  • Model compression 

This is an ongoing project for model selection, feature engineering, and hyperparameter tuning. Microsoft has built the Microsoft FLAML library package using python development. In the next section, we will start with the basics of Microsoft FLAML(a fast and lightweight autoML library. 

Let’s get started with the Microsoft FLAML:

Setting up the environment in google colab.

Requirements : python version 3.6 or above, scikit-learn 0.23.2 or above, xgboost 0.90 or above, catboost 0.26 or above,  lightgbm 3.2.1 or above scikit-learn 0.24.2 or above and  threadpoolctl 2.1.0 or above

We can install them by using pip. 

!pip install flaml

Importing required libraries

 import pandas as pd 
 import numpy as np 
 from sklearn.model_selection import train_test_split
 from flaml import AutoML 

We are going to make a classification model on the iris dataset, which is present in sklean.dataset library. We can learn about the data set from here.

Loading the iris data set 

 from sklearn.datasets import load_iris
 dataset = load_iris()
 dataset 

These are feature and the target column of the dataset in the dataset  target value names are setosa , versicolor and virginica. Next we will split the dataset into training and testing. We can do this by using following code:

 x, y = dataset.data, dataset.target
 x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=123) 

In this section,  we will implement autoML engine to get the best classification model. This is very important that we give the select ‘classifier’ in the task parameter.

 automl_clf = AutoML()
 automl_clf.fit(x_train, y_train, task="classification") 

Output:

We can see it started iterating between the different models and using Strathfield k fold cross-validation for evaluation in the image.

In the image, we can see the result of the automl_clf model, and we got to know which model with parameter is the best fit for the data set. It is already fitted and ready to predict in just 60.0 seconds, pretty less than the time consumed by any other hyperparameter tuning technique we use in general. Finally, we would see the accuracy score of the automl_clf model.

We are predicting x_test.

y_pred = automl_clf.predict(x_test)

Now we call accuracy_score and find the accuracy score between y_test and y_true.

 from sklearn.metrics import accuracy_score
 accuracy_score(y_test,y_pred) 

Output

We can see this is a satisfactory result; now we would go with the model itself, which AutoML suggests; it suggested an ExtraTreesClassifier classification model with some parameters picking up that model from sklearn.ensemble library.

Calling and defining the model. 

 from sklearn.ensemble import ExtraTreesClassifier
 sug_clf=ExtraTreesClassifier(max_features=0.987503868840176, n_estimators=4, n_jobs=-1)
 Fitting the ExtraTreesClassifier model with the data and making the prediction:
 sug_clf.fit(x_train,y_train)
 y_pred = sug_clf.predict(x_test) 

Let’s check for the accuracy score by the following command. 

accuracy_score(y_test,y_pred)

Output:

0.9333333333333333

We got the same accuracy for this model also; this is fascinating that without wasting so much time and hard work, we have got a pretty decent model, and everything happened with only a few lines of code where it searched for a better model to best fit parameters and gave the result.

Next, we try the same thing with a regression model also. To perform this, we are again going to use a data set from sklearn.datasets library, but this would be the Boston Housing data set. Two know more about the data set; you can click here.

Loading the data set

 from sklearn.datasets import load_boston
 dataset= load_boston() 

Dividing the dataset into predictor variable (x) and target variable (y).

 x, y = dataset.data, dataset.target
 x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=123) 

Now we are going to fit the model with the x_train and y_train, but this time we are seeking results in regression, so in this case, we will provide “regression” as the value to task parameter :

 automl_reg = AutoML()
 automl_reg.fit(x_train, y_train, task="regression") 

As we can see, the output model started finding for best fit model and using repeatedKFold cross-validation methods for evaluation.

As the final results, we got the ExtraTreeRegression model as the best fit model and some parameter; with the same parameters, we will be performing the model fitting task again, but in the next step, we will see R2 score, mean squared log error, mean squared error, mean absolute error, the max error between test and predicted data.

 y_pred=automl_reg.predict(x_test)
 from sklearn.metrics import max_error, mean_absolute_error,mean_squared_log_error, mean_squared_error, r2_score
 print('max error value :',max_error(y_test,y_pred))
 print('mean absolute error value :',mean_absolute_error(y_test,y_pred))
 print('mean squared error :', mean_squared_error(y_test,y_pred))
 print("mean squared log error :", mean_squared_log_error(y_test,y_pred))
 print("r2 score :" ,r2_score(y_test,y_pred)) 

Let’s check with the ExtraTreesRegressor model, which the AutoML model suggests and what result it will give. First, let’s call the model from sklearn.ensemble library, and after giving the same parameters to the model suggested by the AutoML model, we will check for different errors.

 from sklearn.ensemble import ExtraTreesRegressor
 sug_reg = ExtraTreesRegressor(max_features=0.7408696786456366, n_estimators=10, n_jobs=-1)
 Fitting of the ExtraTreeRegressor model-
 sug_reg.fit(x_train,y_train) 

Making the prediction-

y_pred=sug_reg.predict(x_test)

Majoring the error values and r2 value bet y_test and y_pred-

 print('max error value :',max_error(y_test,y_pred))
 print('mean absolute error value :',mean_absolute_error(y_test,y_pred))
 print('mean squared error :', mean_squared_error(y_test,y_pred))
 print("mean squared log error :", mean_squared_log_error(y_test,y_pred))
 print("r2 score :" ,r2_score(y_test,y_pred)) 

So here, we can see a slight deviation between AutoMl and ExtraTreeRegressor, but both are satisfying. AutoML model can also be used as a hyperparameter tuning in very simple steps, lets see how it works and how it is going to perform, in modeling we have seen that it goes through different algorithms tried to change the parameters and try to give the best fit among all the model, but what if we want best-fit parameter of any single algorithm or model. for example, what if we want to know about the best-fit parameters of a random forest classification model for iris dataset using AutoML.

Let’s go straight to the hyperparameter tuning using FLAML’s AutoML package. 

We can restrict our automl_reg learning model and use it as a hyperparameter tuning tool for random forest regression. 

automl_reg.fit(x_train, y_train, task="regression", estimator_list=['rf'])

Output:

And this is how it succeeded to give the best fit for random forest regressor parameters as a hyperparameter tuning tool; now, in the next step, we would see the results of the errors and r2 score for prediction of this model.

Codes are following:

 y_pred=automl_reg.predict(x_test)
 print('max error value :',max_error(y_test,y_pred))
 print('mean absolute error value :',mean_absolute_error(y_test,y_pred))
 print('mean squared error :', mean_squared_error(y_test,y_pred))
 print("mean squared log error :", mean_squared_log_error(y_test,y_pred))
 print("r2 score :" ,r2_score(y_test,y_pred)) 

Output:

In this case, we can see we improved the accuracy level of the predictions compared to the last two models. From the basics, we got to know how to use Microsoft FLAML’s Automl package. As of now,  I am personally impressed by this package.

Some of the advantages of using this package are:

  • First, we get satisfactory results in very little time.
  • Variety of option which we can chose and use. 
  • It is very light-weight as its name suggest fast and lightweight automl.
  • Finds accurate machine learning models automatically, efficiently and economically.

Let’s sum it all up into the steps we have followed in this article :

  • First, we tried to install the FLAML package in google colab. 
  • We tried to search for a best fit classification model for the iris dataset, and we found one that gave us good results in very little time. It took only 60.0 seconds to find the best model.
  • We again searched for a best-fit regression model for the Boston Housing Dataset, and we got it again in very little time. By the use of the suggested model, we again had satisfactory results.
  • At last, we tried FLAML’s automl as a hyperparameter tuning tool.

So this is the get started with the Microsoft  FLAML, and we found it very interesting and useful for small classification and predictive analytics use cases. As of now, it is a developing project of Microsoft, and the results are really impressive. So I hope in future days it will be more satisfactory than now.

References :

Download our Mobile App

Yugesh Verma
Yugesh is a graduate in automobile engineering and worked as a data analyst intern. He completed several Data Science projects. He has a strong interest in Deep Learning and writing blogs on data science and machine learning.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
MOST POPULAR

6 IDEs Built for Rust

Rust IDEs aid efficient code development by offering features like code completion, syntax highlighting, linting, debugging tools, and code refactoring

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week.