There is a demand for implementing and automating continuous integration, continuous delivery, and continuous training for ML systems. Also known as MLOps, it is an ML engineering trend that strives at consolidating and automating ML system pipelines.
Technology innovation leaders are keen to apply DevOps principles for AI and ML projects. Implementing MLOps suggests automation and monitoring at all steps of the ML system building. Analysts say the real challenge isn’t building an ML model, the challenge is making an integrated ML system and to continuously run it in production.
But Why Will There Be New ML Pipelines In The First Place?
“One of the important things that we need to understand is that there is a differentiation between the current pipelines that we are aware of and how they are slightly more different from what the pipelines are going to be in the future of data science. The current machine learning pipelines are not going to be sustainable, and you have to future proof yourself in order to become more capable of handling the kind of issues that we will get in a few years down the line,” said Lavi Nigam, Data Scientist- Gartner at AIM’s recent plugin virtual conference.
According to him, there is a difference between doing data science at a local scale versus on a cloud-scale. Most of the data science that happens today is either located at your systems or laptops or within a local server. As we move ahead, the whole idea of a scalable machine learning or AI will move local to the cloud, according to analysts. This means that the components of the pipeline will also change when it comes to different cloud platforms.
“Right now, with the kind of pipelines we have, there are many loose components, which are not talking to each other and sitting in silos. But, MLOps is different from the actual data science we do today, and it can facilitate those communications between the different components in the ML pipeline,” Lavi said.
Need For Manual To Automated ML Processes
Today, data analysis, data preparation, model training, and validation need manual effort at each step and manual shift from one step to another. This whole ML process is fundamentally run with iterative code that is written and executed in notebooks by data scientists until a functional prototype model is produced.
Many data scientists and ML researchers can create state-of-the-art models, but their process for building and deploying ML models is completely manual and does not factor in continuous changes in future data. The current process assumes that the data science team manages a few models that don’t frequently change with new data. A fresh model version is deployed only a few times each year. The current ML scenario will be further disrupted by automated tools like AutoML.
There is also a lack of model performance monitoring, and the ML processes don’t track or log the model predictions and actions, which is needed in order to detect model performance degradation and other model behavioural drifts.
“The current process goes in a very streamlined fashion, where you have a business understanding. Then you do the data acquisition, data sourcing, pipeline, wrangling, and exploration. Once that is done, you go to modelling. Then you do your standard things like feature engineering, model training, and modern evaluation. Eventually, the model is deployed. The final output is in the form of an enterprise-ready API. But, is that enough that you just have to create an API, and everything works well?” spoke Lavi Nigam.
This is where MLOps is going to rise, says Lavi. The goal of MLOps is to perform continuous training of the model by consolidating and automating the ML pipeline. The process also automates the process of utilising new data to retrain models in production, by introducing data and model validation efforts to the pipeline in an automated manner.