MLOps has quickly become one of the most important components of data science, with the market expected to grow by almost $4 billion by 2025. It is already being leveraged heavily with companies like Amazon, Google, Microsoft, IBM, H2O, Domino, DataRobot and Grid.ai using MLOps for pipeline automation, monitoring, lifecycle management and governance. More and more MLOps tools are being developed to address different parts of the workflow, with two dominating the space, Kubeflow and MLflow.
Given their open-sourced nature, Kubeflow and MLflow are both chosen by leading tech companies. However, their capabilities and offerings are quite different when compared. For example, while Kubeflow is pipeline focused, MLflow is experimentation based. We have explored these differences to help you choose the right tool for your use case.
Kubeflow aims to make ML deployment on Kubernetes simple, portable and scalable. This cloud-native framework is built by the developers of Google, based on Google’s internal method, TensorFlow Extended, used to deploy TensorFlow models. After its initial release, tech companies including Arrikto, Cisco, IBM, Red Hat, and CaiCloud contributed to the GitHub issue board.
Kubeflow provides components for each stage in the ML lifecycle, including exploration, training and deployment. Additionally, it helps in scaling machine learning models and deploying them to production. The three components of Kubeflow are:
- Notebooks: Kubeflow allows the users to create and manage interactive Jupyter notebooks while also customising their notebook containers and pods.
- Pipelines: Kubeflow is most well known for its pipelines. Kubeflow pipelines allow users to build and deploy scalable and portable ML workflows.
- Training: Developers can use Kubeflow training to train their ML models on frameworks such as PyTorch, MXNet, Chainer and TensorFlow, among others.
Additionally, since Kubeflow supports TensorFlow Serving containers, trained TensorFlow models can be exported to Kubernetes. It is further integrated with Seldon Core, an open-source framework to deploy machine learning models at scale on Kubernetes. The NVIDIA Triton Inference Server allows for maximum GPU utilisation while deploying a model, and BentoML allows the platform to build production API endpoints for the ML models. Kubeflow can be run on Kubernetes, AWS, GCP and Azure.
MLFlow is an open-source platform to manage the entire machine learning lifecycle with enterprise reliability, security and scale. Created by Databricks, the platform is being used by big tech companies including Facebook, Accenture, Microsoft and Booking.com. MLFlow’s library agnostic feature makes it compatible with any ML library, including TensorFlow, PyTorch, Keras, Pandas and more.
MLFlow supports experimentation, reproducibility, deployment, and a central model registry. This allows the developer to create, track and deploy the model while the platform handles the back-end processes of model management, data versioning and experiment tracking.
The four components of MLflow are:
- Tracking: MLflow tracking is an API and UI for logging parameters, code versions, metrics, and output files that allows the user to monitor experiments.
- Projects: MLFlow project is a standard style for packaging reusable data science code, data, configuration and dependencies.
- Models: MLFlow model is a standard approach for packaging models to be used in various downstream tools.
- Registry: A centralised model store, the MLFlow registry comprises various APIs and UIs to manage the model lifecycle.
Kubeflow is a container orchestration system, and therefore all the processings happen within the Kubernetes infrastructure. Since it manages the orchestration, Kuberflow is considered to be more complex. At the same time, this feature allows it to be more reproducible.
MLflow is a Python program, and thus the training can be done according to the developer’s preference. Furthermore, it can be set up on a single server and easily adapted by the ML model.
Kubeflow metadata tracks the platform, thus requiring the developer to have more technical knowledge. However, MLflow can be developed locally and track runs in a remote archive.
Kubeflow can be deployed through the Kubeflow pipeline, independent of the other components of the platform. Kubeflow pipelines emphasise model deployment and continuous integration.
MLflow leverages the model registry and the APIs/UIs to create a central location for organisations to collaborate, manage the lifecycle and deploy models.
4. Use case examples
Kubeflow use cases include examples such as
– Deploying and managing a complex ML system at scale
– Experimentation with training an ML model
– End to end hybrid and multi-cloud ML workloads
– Tuning the model hyperparameters during training
– Continuous integration and deployment (CI/CD) for ML
MLflow use cases include examples such as
– Track experiments locally on the data scientist’s machine
– Set up an MLflow Tracking server to keep track of and compare the results of multiple people working on the same project.
– Production Engineers can deploy models from a different ML library, store them as files in their preferred management system, and track which run a model came from.