One of the major concerns around machine learning is deploying it. Running a large number of deployment tools and environments, and migrating a model to a production environment can be extremely challenging.
There are countless independent tools from data preparation to model training, and software tools that cover every stage of the machine learning life cycle. Machine learning developers need to use and deploy dozens of libraries while in a production environment. There is no standard way to migrate models from any library to any of these tools, so that every time a new deployment is made, new risks are created.
What Are The Challenges With ML Workflow?
The experimental results are difficult to reproduce. Algorithm scripts are difficult to run repeatedly for many reasons, such as code version, past parameters, and operating environment. Without detailed tracking, the team often encounters difficulties in using the same code to achieve the same effect.
Whether you are a data scientist delivering training code to engineers for production or rolling back to the code to fix a bug, the steps to reproduce the machine learning workflow are critical. We have heard many horror stories, such as the model performance of the production environment being not as good as the training model or one team can not reproduce the results of another team. Whether you work alone or in a team, it isn’t easy to track which parameters, codes, and data in each experiment create a certain model.
These workflow challenges surrounding the ML lifecycle are usually the biggest obstacle to using machine learning in a production environment and scaling it within an organisation. To meet these challenges, many companies have begun to build internal machine learning platforms that can automate some of these steps.
For example, Uber and Facebook have established Michelangelo and FBLearner Flow to manage data preparation, model training and deployment. However, even the internal platform has its limitations: a typical machine learning platform only supports a small group of limited customisation (no matter what the engineering team has built), and the platform is bound to each company’s infrastructure.
Why MLflow Comes In?
At the 2018 Spark + AI Summit, Databricks introduced MLflow, which is a new open-source project that can build an open machine learning platform. In addition to being open-source, MLflow is also open. In a sense that anyone in the organisation or open source community can add new features to MLflow (such as a new training algorithm or a new deployment tool). These functions can automatically cooperate with other parts of MLflow. MLflow provides a powerful way to simplify and linearly expand the deployment of machine learning within the organisation by tracking, reproducing, managing and deploying models in software development.
What Problem Does MLflow Solve?
Machine learning is not a one-way pipeline, but an iterative loop. It includes four parts: data preprocessing, model training, model deployment and data update. Among them, the preprocessing and model training involves the adjustment of parameters while the entire ML process involves cooperation between links. There is a lot of communication work, code rewriting and environment configuration that undergoes in a machine learning process.
To solve the coordination problems between various links, MLflow proposed the two concepts of MLflow Project and MLflow Model, both of which define a set of convention standards, as long as your project or model follows this set Configuration.
MLflow can be used to perform one-click project reproduction and model deployment functions, which is equivalent to the one-click online deployment. There is no need to rewrite the code in the project and no environment configuration.
How Is It Designed?
MLflow is designed to solve these workflow challenges through a set of APIs and tools, and you can use them with any existing machine learning libraries and code repositories. In the current alpha version, MLflow provides three main components:
MLflow tracking module: The MLflow tracking module is an API and UI that is used to record parameters, code versions, performance evaluations and output files when executing machine learning codes so that they can be visualised in the future. By using a few simple lines of code, you can track parameters, performance indicators and “artifacts”.
MLflow project module: It is a code packaging format for reproducible operations. By encapsulating your code in an MLflow project, you can specify the dependencies and allow any other users to rerun it later and reliably reproduce the results.
MLflow model module: It is a simple model packaging format that allows you to deploy the model to many tools. For example, if you can encapsulate the model as a Python function, the MLflow model can be deployed to Docker or Azure ML for online services, to Apache Spark for batch scoring, and so on.
MLFlow On Azure Databricks
The MLflow community has been growing fast, with hundreds of contributors from many companies having contributed code to the open-source project. For example, because the project is part of Databricks, Microsoft uses it on the Azure platform.
Azure Databricks implements a fully managed and hosted version of MLflow, and other Azure Databricks workspace features like experiment and runs management and notebook revision capture. MLflow on Azure Databricks extends an integrated experience for tracking and securing machine learning model training and running ML projects.
MLflow’s tracking URI and logging API, together known as MLflow tracking, can be used to connect MLflow experiments and Azure Machine Learning. Doing so enables users to track and log experiment metrics and artifacts in Azure Machine Learning workspace.
Users can deploy their MLflow experiments as an Azure Machine Learning web service. By deploying as a web service, they can apply the Azure Machine Learning monitoring and data drift detection functionalities to their production models.
Although the various components of MLflow are simple, whether you are using machine learning alone or collaborating with people in a large team, you can combine them in powerful ways. For example, when developing the model on your laptop, using MLflow you can record and visualise the code, data, parameters and performance indicators.
You can encapsulate the codes as MLflow projects to run them on a large scale in a cloud environment for hyperparameter search. You can also share algorithms, feature extraction steps and models as MLflow projects or MLflow models. Finally, you can deploy the same model to batch and real-time processing without the need to develop separate code for two different tools. MLflow is open-source and can be easily installed using pip install MLflow. To start using MLflow, follow the instructions in the MLflow documentation, or view the code on GitHub.