Listen to this story
MLOps has recently emerged as the missing link in the ML lifecycle. MLOps engineers ensure that the pipeline is glued together and works seamlessly. Since 2018, MLOps functions have come under the spotlight and, with time, have been cemented as a vital mechanism.
We caught up with Praveen Nair, Manager at Tredence, to know what the future holds for MLOps teams, how the current fast-paced AI landscape will impact MLOps and what it takes to be an MLOps engineer.
AIM: How do you think MLOps teams will be able to address issues related to ethical concerns like data privacy and security?
Praveen: This is actually something that is not spoken about as often as it should be. MLOps can really help with the implementation of ethical frameworks when countries and governments come up with new data laws around security concerns.
Say, a bank has to deny or approve loans based on the background, financial features and behaviour of a particular person. When models for these processes have to be trained, you have to make sure that your system is not biased towards any demographic or gender. The model is expected to be fair and banks must offer equal opportunities to all. One of the biggest solutions for this is to have more explainability in models. MLOps can help with setting up processes in place to make the model more transparent. Data scientists can also assess this with bias monitoring systems in place.
There have already been open-source contributions in this regard. A bunch of cloud platforms have already implemented some of these preventive measures. However, it is the adoption part that remains slower. Now, with laws coming into the picture, companies will be forced to adopt measures sooner than later.
In the context of data privacy, you wouldn’t want users’ personal details to be exposed in databases and available for people to see. To reiterate, there are a number of open-source tools available to detect if your model or related pipelines are using this type of Personal Identification Information or PII. So, having robust MLOps processes can take care of these risks.
Having an MLOps team will also help address concerns around external security and adversarial attacks. For instance, if a company has a small data science team, they can focus only on the innovation part and the MLOps teams can protect the data security layer and take care of the boundaries of what you can and can’t do.
AIM: Federated learning has a number of advantages, but what are the challenges around using it in MLOps?
Praveen: Federated learning is something that is relatively new in terms of the number of applications that are coming out. A major part of the current MLOps processes is still not prepared enough to take care of what’s coming in the future. For example, with GPT-4, Sam Altman has said that the model works on self-supervised learning, which means that it is building a brain based on inputs from users. When a tool like this arrives on the landscape, new issues around privacy are bound to crop up as well. You wouldn’t want a user to know what other users have used the model for. This is one of the biggest challenges since there are millions of users on it.
In federated learning, everyone’s data is decentralised and they can keep their data with them. The goal is to train intelligence together but your data is not going to be diverted to one central server. Rather, we will train within your device and then take that understanding back and then aggregate it elsewhere. The other challenge is that there are also stringent laws emerging in place against bringing all the data from different regions together in a central place.
AIM: What are the differences and similarities between MLOps and AutoML?
Praveen: One of the new things in terms of monitoring is to understand what behaviours have changed like what we saw after the COVID-19 pandemic. Certain segments like e-commerce activity have witnessed a significant burst in the aftermath of the pandemic when compared to pre-pandemic activity primarily due to a change in customer behaviour. The models that were trained then did not really work during/after the pandemic happened. So, this shift will imply that these models will have to be retrained.
To do this continuously is the tough part because customer behaviours change too often. In this scenario, AutoML is a good process to automate this portion of the workflow.
AutoML can train models much faster as it is sort of abstracting the training efforts. All the data scientist has to do is make the model transparent and ensure the model is bias-free. I don’t think AutoML and MLOps are comparable but rather I’d say that AutoML is a good tool to ensure the implementation of MLOps adoption across organisations. The challenge is in getting state-of-the-art models using AutoML but you still get a baseline model churned out much faster.
AIM: How will the emergence of generative AI further push the implementation of MLOps?
Praveen: To build a GPT model is not easy because they are extremely complex, uses a lot of computing power and takes a lot of time. Now, since people have begun to find business value in it, GPT-4 will definitely be used across industries.
However, small misinformation can lead to a large negative impact, especially in data-sensitive fields like medicine. MLOps can set standard processes for security in this regard. Even when such massive LLMs are being trained, we need huge quantities of quality data which MLOps & Data Engineering disciplines can ensure. You wouldn’t want your generative AI model to be political or inappropriate. So, there are a number of valid concerns here because generative models usually use large amounts of data and the responses need to be validated continuously to look for improvements or accuracy.
Generative AI will definitely push the implementation of MLOps and introduce different types of MLOps processes in the future. There are already Large Language MLOps, which is becoming a thing and that’s great because ethical concerns and necessary guardrails can then be addressed more quickly.
AIM: What are the main skills that prove essential to become an MLOps engineer?
Praveen: The biggest misconception about MLOps is that it is merely the deployment of ML models. While this is partially true, the breadth of what an MLOps engineer does includes more than model deployment. The first thing needed is soft skills since MLOps engineers have to talk to data scientists who will always defend their models or the business teams who want better ROI. MLOps managers are often stuck in the middle trying to make both sides happy. So, it’s also important to have a sense of collaboration.
The second skill is proficiency in ML and deep learning. The usual idea is that we need to know DevOps, which is not true. We also need to optimise these models in different ways, like trying to understand what the algorithm is doing and if it’s been trained in the right way. This requires a deeper understanding of these algorithms themselves.
Thirdly, we need to understand data and store the right kind of data. You need to be able to interpret the data you have and make it useful by coming up with more metrics around quality & accuracy.
In tandem, you need to have an architectural understanding which means that you need to understand that every client requires something different. It’s not like a one-size-fits-all solution. Depending on the client’s environment, you have to determine what is the most cost-efficient and optimal approach. For this, you need to have an architectural understanding of what MLOps is trying to achieve. So, as an MLOps engineer, you need to have different variations of the same architecture ready in your pocket and that ultimately makes the biggest point of difference. The last skill would be the most obvious one, to have great programming skills.