There are various paradigms when it comes to the machine learning lifecycle. It includes ideation, data acquisition and exploratory data analysis, followed by R&D and validation, and finally, deployment and monitoring. When trying different models and features or to update your training dataset, monitoring may periodically send you back to the initial step. Any steps in the lifecycle can send you back to an earlier stage.
Platforms like MLflow have emerged as a go-to option for many data scientists, ensuring smooth transition/experience when managing the machine learning lifecycle. Currently, it is one of the most popular open-source platforms to manage the ML lifecycle. It includes experimentation, reproducibility, deployment, and a central model registry.
MLflow is currently used by companies like Facebook, Databricks, Microsoft, Accenture, and Booking.com, among others. The platform is library-agnostic. It offers a set of lightweight APIs used with any existing machine learning application or library like TensorFlow, PyTorch, XGBoost, etc. It can run on notebooks, standalone applications, or the cloud.
MLflow currently tackles four functions:
- MLflow Tracking: Tracks experiments to record and compare parameters and results.
- MLflow Projects: Packages machine learning code in a reusable, reproducible form to share with other data scientists or transfer to production.
- MLflow Models: Manages and deploys models from various machine learning libraries to a variety of model serving and inference platforms.
- MLflow Model Registry: Provides a central model store to collaboratively manage the full lifecycle of an MLflow model, including stage transitions, model versioning, and annotations.
This article will explore top alternatives to MLflow and discuss their features and specifications that might help you and your team choose the right platform to manage your machine learning life cycle.
- Experiment tracking: Log, display, organise, and compare machine learning experiments
- Model registry: Version, store, manage, and query trained models, and model building metadata
- Monitoring machine learning run live: Record and monitor model training, evaluation, or production runs live
Kubeflow makes deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. The platform offers a straightforward way to deploy best-of-breed open-source systems for machine learning to diverse infrastructures. In other words, it is an ML toolkit for Kubernetes.
Aim is an open-source comparison tool for AI/ML experiments. The platform helps users compare thousands of training runs at once through its framework-agnostic python SDK and performant UI. In addition to this, it gives flexibility to:
- Use multiple sessions in one training script to store multiple runs at once. Also, when not initialised explicitly, it creates a default session.
- Use experiments to group related run together — where an experiment named default is created otherwise.
- Use integrations to automate tracking.
Comet offers a self-hosted and cloud-based meta machine learning platform, allowing data scientists to track, compare, explain, and optimise experiments and models. Backed by users and Fortune 100 companies like Uber, Autodesk, Boeing, Hugging Face, AssemblyAI and others, Comet provides data and insights to build better, more accurate AI/ML models while improving productivity, collaboration and visibility access teams.
Guild AI is an open-source ML/AI experiment tracking, pipeline automation, and hyperparameter tuning platform. It offers several integrated tools, namely Guild Compare, Guild View, TensorBoard and Guild Diff.
- Track experiments: Automatically track code, training data, hyperparameters, weights, metrics, etc.
- Go back in time: You can get back the code and weights from any checkpoint if you need to replicate your results or commit to Git after the fact.
- Version your models: Model weights are stored on Amazon S3 or Google Cloud bucket. Thus, making it easier to feed them into production systems.
ModelDB is an open-source ML model versioning, metadata, and experiment management platform. ModelDB helps in making your ML models reproducible. It also helps you manage your ML experiments, build performance dashboards, and share reports. Lastly, it tracks models across their lifecycle, including development, deployment, and live monitoring.
Sacred is a tool to help you configure, organise, log, reproduce experiments. The platform has been designed to do all the tedious overhead work that you need to do around your actual experiments to —
- Keep track/record of all the parameters of your experiments
- Easily run your experiments for different settings/scenarios
- Save configurations for an individual run in a database
- Reproduce results
It achieves all this through the following mechanisms, including config scopes, config injection, command-line interface, observers, and automatic seeding.