Now Reading
Guide To Hummingbird – A Microsoft’s Library For Expediting Traditional Machine Learning Models

Guide To Hummingbird – A Microsoft’s Library For Expediting Traditional Machine Learning Models

Conventional Machine Learning algorithms such as Linear Regression, Logistic Regression and Decision Tree are extensively used for a variety of real-world applications. For an easy-to-handle implementation of these algorithms, there are several ML libraries and toolkits available such as scikit-learn, h2o, ML.NET etc. However, they can run only on CPU environments. 

Though the performance of above mentioned traditional ML frameworks can be significantly improved using multi-core parallel processing, they cannot take advantage of hardware acceleration. On the contrary, the recent advancements in the field of Deep Learning has made it possible to use hardware accelerators such as GPUs and TPUs for implementing neural networks. With an aim of enabling the traditional ML libraries to take advantage of hardware acceleration and optimizations implemented for the neural networks without restructuring the model, Microsoft launched a library named Hummingbird.

Widely used Deep Learning frameworks such as PyTorch, ONNX Runtime and TensorFlow use a single abstraction called tensors as the basic unit of any computation. Due to lack of any such common abstraction, traditional ML libraries require (m*n) number of implementations to achieve hardware acceleration (where m and n denote the number of operators in the computation and the number of hardware backends available respectively). This makes the libraries computationally more expensive to use than the optimized neural networks. Hummingbird library resolves this issue by converting the conventional ML pipelines into tensor-oriented computations. The traditional ML models can then be deployed faster in real-time.

Pros of Hummingbird 

  • It provides a single platform for implementing both the conventional ML models as well as neural networks.
  • It accelerates the deployment of traditional ML models by achieving hardware acceleration.
  • It makes use of a few lines of code and lucid syntax.
  • It does not require the traditional ML models to be re-engineered for shifting their performance to hardware accelerators.

Cons of Hummingbird 

Its currently available version can convert the ML models to PyTorch, ONNX, TVM and TorchScript but not to Keras, an extensively used open-source Deep Learning library built on top of TensorFlow.

Practical implementation

The following code has been implemented using GPU and Python 3.6 in Google Colab notebook. The dataset winequality_red used in the code is available on Kaggle. The classification task is to label the wine quality for each instance as good or bad depending upon whether it is above 6.5 or not respectively.

#Installation 

Hummingbird can be installed using pip command as:

pip install humming-ml

Import required libraries

 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt
 from sklearn.tree import DecisionTreeClassifier
 from sklearn.ensemble import RandomForestClassifier
 import seaborn as sns
 from hummingbird.ml import convert
 from sklearn.model_selection import train_test_split 

Load the dataset

data=pd.read_csv('winequality-red.csv')

Separate data into features(x) and labels(y)

 x=data.iloc[:,:-1]
 y=data.iloc[:,-1] 

Perform train-test split of the data by keeping train:test ratio as 3:1 i.e. 75% training data and 25% test data

x_train,x_test,y_train,y_test=train_test_split(x,y,train_size=0.75,random_state=42)

Instantiate Random Forest classifier

model=RandomForestClassifier(n_estimators=300)

Fit the model to the training data

model.fit(x_train,y_train)

Calculation of time for predicting test data without using Hummingbird

%%time

Prediction of labels for test data

y_pred=model.predict(np.array(x_test))

The output noted after implementing the adobe lines of code is as follows:

CPU times : user 60.6 ms, sys : 23 s, total : 60.7 ms

Wall time : 64.2 ms

Convert the model into PyTorch model using Hummingbird library

model_torch=convert(model,'pytorch')

Apply DNN Framework using Nvidia’s CUDA GPU

model_torch.to('cuda')

See Also

Prediction of labels for test data

y_pred_torch=model_torch.predict(np.array(x_test))

After using the Hummingbird library, the output noted is as follows:

CPU times : user 9.41 ms, sys : 944 s, total : 10.4 ms

Wall time : 14.8 ms

Observation and conclusion

The execution time reduced from 60.7 ms before using the Hummingbird library to 10.4 ms after using it. The conversion of computations into tensors done by Hummingbird noticeably reduced the execution time. Thus the Humming library can hardware accelerate the conventional ML models and make them capable of giving faster results at the pace of neural network systems.

End note

Microsoft’s work is in progress for extending the functionalities of the Hummingbird library so that it can support more ML models, neural network backends and operators. The scope of this useful library is thus expected to widen in near future.

Refer to the following sources for detailed information of the library and Jupyter Notebook for the above explained code.

What Do You Think?

Join Our Telegram Group. Be part of an engaging online community. Join Here.

Subscribe to our Newsletter

Get the latest updates and relevant offers by sharing your email.

Copyright Analytics India Magazine Pvt Ltd

Scroll To Top