Conventional Machine Learning algorithms such as Linear Regression, Logistic Regression and Decision Tree are extensively used for a variety of real-world applications. For an easy-to-handle implementation of these algorithms, there are several ML libraries and toolkits available such as scikit-learn, h2o, ML.NET etc. However, they can run only on CPU environments.
Though the performance of above mentioned traditional ML frameworks can be significantly improved using multi-core parallel processing, they cannot take advantage of hardware acceleration. On the contrary, the recent advancements in the field of Deep Learning has made it possible to use hardware accelerators such as GPUs and TPUs for implementing neural networks. With an aim of enabling the traditional ML libraries to take advantage of hardware acceleration and optimizations implemented for the neural networks without restructuring the model, Microsoft launched a library named Hummingbird.
Widely used Deep Learning frameworks such as PyTorch, ONNX Runtime and TensorFlow use a single abstraction called tensors as the basic unit of any computation. Due to lack of any such common abstraction, traditional ML libraries require (m*n) number of implementations to achieve hardware acceleration (where m and n denote the number of operators in the computation and the number of hardware backends available respectively). This makes the libraries computationally more expensive to use than the optimized neural networks. Hummingbird library resolves this issue by converting the conventional ML pipelines into tensor-oriented computations. The traditional ML models can then be deployed faster in real-time.
Pros of Hummingbird
- It provides a single platform for implementing both the conventional ML models as well as neural networks.
- It accelerates the deployment of traditional ML models by achieving hardware acceleration.
- It makes use of a few lines of code and lucid syntax.
- It does not require the traditional ML models to be re-engineered for shifting their performance to hardware accelerators.
Cons of Hummingbird
- It only supports Python 3.5 or higher versions.
- It works only on tree-based traditional classification and regression models till date, namely scikit-learn’s Decision Tree, Random Forest as well as LightGBM and XGBoot regressors and classifiers.
The following code has been implemented using GPU and Python 3.6 in Google Colab notebook. The dataset winequality_red used in the code is available on Kaggle. The classification task is to label the wine quality for each instance as good or bad depending upon whether it is above 6.5 or not respectively.
Hummingbird can be installed using pip command as:
pip install humming-ml
Import required libraries
import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier import seaborn as sns from hummingbird.ml import convert from sklearn.model_selection import train_test_split
Load the dataset
Separate data into features(x) and labels(y)
Perform train-test split of the data by keeping train:test ratio as 3:1 i.e. 75% training data and 25% test data
Instantiate Random Forest classifier
Fit the model to the training data
Calculation of time for predicting test data without using Hummingbird
Prediction of labels for test data
The output noted after implementing the adobe lines of code is as follows:
CPU times : user 60.6 ms, sys : 23 s, total : 60.7 ms
Wall time : 64.2 ms
Convert the model into PyTorch model using Hummingbird library
Apply DNN Framework using Nvidia’s CUDA GPU
Prediction of labels for test data
After using the Hummingbird library, the output noted is as follows:
CPU times : user 9.41 ms, sys : 944 s, total : 10.4 ms
Wall time : 14.8 ms
Observation and conclusion
The execution time reduced from 60.7 ms before using the Hummingbird library to 10.4 ms after using it. The conversion of computations into tensors done by Hummingbird noticeably reduced the execution time. Thus the Humming library can hardware accelerate the conventional ML models and make them capable of giving faster results at the pace of neural network systems.
Microsoft’s work is in progress for extending the functionalities of the Hummingbird library so that it can support more ML models, neural network backends and operators. The scope of this useful library is thus expected to widen in near future.
Refer to the following sources for detailed information of the library and Jupyter Notebook for the above explained code.