Machine Learning (ML) model metrics are designed to monitor performance. But when a model goes into production, many factors influence its performance. The traditional checkpoints may no longer help as organisations look to scale these models (think: scaling from a million to billion credit card users). This is why experts advocate for MLOps, a branch of ML that brings together all the nice things from DevOps and ML.
Though a few experts hold MLOps as the best solution available right now, it’s still beset by ambiguities. To address these, Deeplearning.ai recently hosted a panel of MLOps experts to derive insights on the most important aspects of production machine learning and what MLOps looks like at companies. Hosted by Ryan Keenan of Deeplearning.ai, the panel consisted of Andrew Ng, Robert Crowe, Lawrence Moroney, Chip Huyen and Rajat Monga.
The panel of experts began by addressing the significance of MLOps in today’s world. Chip Huyen, who teaches ML at Stanford, considers model training to be a small part of the problem. According to her, the problem is retraining. Once the model is out in the open, data drifts can happen. So how does one keep on updating and compensating for these variations? Models deliver various degrees of performance in a real world setting.
What’s in a name?
Robert Crowe, who is part of Google’s TensorFlow team, played down the MLOps hype saying he is not a fan of the nomenclature. He is more interested in the idea of making a product out of a model and addressing all the issues such as model drift, privacy, resource optimisation and other factors that surface in a production ML setting. “You don’t have these issues when you’re doing research or in academia. For me the focus of MLOps is making it possible to create and sustain a product or service responsibly,” explained Crowe.
A couple of years ago, the ML community’s focus was more towards building models, tuning hyperparameters or picking the right architecture. Today, the industry has made huge leaps in bringing ML and AI to the general populace and the benefits of DevOps regimes in an ML setting can only be figured out as we go along. Lawrence Moroney, who oversees AI advocacy at Google, expressed his excitement about the burgeoning conversation around MLOps and models in production. Even Andrew Ng, the founder of Deeplearning.ai, admitted that deep learning has come a long way in the last decade. He stressed MLOps will help everyone through the entire life cycle of machine learning projects; from scoping to collecting and managing the data to training the model and improving the data, improving the model to then deployment monitoring and managing the concept of data-driven model maintenance. “I think MLOps is an exciting nascent discipline solving the entire life cycle of machine learning projects. MLOps and machine learning production is very much on the cutting edge,” he said.
Ex-Googler Rajat Monga has underlined that data is not static. As models represent data, they do have to change with data. Be it the markets or other domains, the world around us is changing. The data generated is immense. Traditionally, software would be hard coded with the objective functions and given the dynamic nature of the data, Monga observes that it is now being replaced with predictive models. “We’re not going to get too many new species of flowers in the next year or two but on the other hand in a business where you’re relying on customer data or anything like that things change all the time and you want those models to be updated all the time,” he said.
On MLOps in practice
For more than a year now, Google has been pro-actively contributing to the conversations around MLOps. But, can the principles applied at a company like Google work elsewhere? At Google, continued Moroney, ML teams focus on scaling and trying to answer questions such as: “How do we make sure we can focus on billions of users instead of thousands of users and what is the serving infrastructure that has to be in place for scaling?” Next comes the challenge of building a decent monitoring infrastructure to make sure these models run inference at the required parameters at the required speed. “There are so many people and there are so many moving parts to be able to keep all of these working together. We want to make sure that we have the flexibility and we generally design our infrastructures that way.”
When it comes to startups, Monga suggests one should refrain from building individual elements from scratch like the companies that operate at scale. Lot of tools are available to solve most of the problems small organisations are trying to solve with ML.
“MLOps tools right now actually depend a lot on the company size, use case and maturity.”Chip Huyen
When asked about the state of MLOps tools, Andrew said there is a big gap today in the ability of tools to engineer the data in such a way that when one feeds the code, the performance improves. Whereas, Crowe sees people coming from mathematical or statistical backgrounds with strong theoretical understanding of ML facing difficulties in creating production level code and systems. Ten years ago, when the community was coming to terms with deep learning, Andrew had no idea how many tens of thousands of novel inventions and research papers it would take to reach where we are today. But the mist of disbelief of initial days has been lifted. “There came disbelief in TensorFlow and other frameworks that lay the foundations. Today, as we think about MLOps and data centric AI, I think there are easily tens of thousands of ideas yet to be invented.”
At the end of the day, any ML enterprise will have to care about what the customer wants in a product. It’s all about business. On building MLOps teams, Andrew recommends a solid principle: Ask the teams to take a long, hard look to ensure consistently high quality data throughout the entire life cycle. “You won’t see that in a job description but don’t let that fool you. Lots of companies are trying to hire people who know how to build and deploy machine learning systems. Even in job interviews candidates are asked about ML deployment. That’s basically an MLOps question even though the word MLOps doesn’t appear in the job description. So I think this is an important skill for people to be learning today,” said Andrew Ng.