How to use MLOps with COMET for better ML lifecycle management?

MLOps refers to a set of approaches and tools for deploying and monitoring ML models.
Listen to this story

Machine learning operations (MLOps) are increasingly becoming vital to successful business data science project implementation. It is a method that assists companies and business executives in generating long-term value and lowering the risk associated with data science, machine learning, and artificial intelligence efforts. MLOps refers to a set of approaches and tools for deploying ML models in production. This article will be focused on performing ML operations with COMET. Following are the topics to be covered.

Table of contents

  1. Snippet about MLOps
  2. ML operations with COMET
    1. Install Comet
    2. Connect to the server
    3. Import libraries
    4. Reading and preprocessing the data
    5. Building and testing the model
    6. Registry model

MLOps refers to the standardisation and simplification of machine learning life cycle management. Let’s have a high level overview of MLOps.

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Snippet about MLOps

MLOps is a hybrid of DevOps and Machine Learning. DevOps refers to a collection of procedures aimed at lowering the time required for product delivery and closing the gap between software development and operations. Continuous Integration (CI) and Continuous Delivery (CD) are the two primary DevOps concepts.

  • Continuous integration is the practice of software development companies attempting to integrate code created by developer teams regularly. As a consequence, they regularly test their code and make tiny adjustments depending on the faults and vulnerabilities discovered during the tests. As a result, the software development process cycle is reduced.
  • Continuous delivery is a method in which a new version of the software under development is constantly installed for testing, assessment, and eventually production. With this method, software updates resulting from continuous integration with upgrades and new features reach end users considerably more quickly. 

MLOps uses DevOps concepts and methodologies to automate Machine Learning operations. Although it appears to be easy, it is not. This is because a Machine Learning model is not self-contained, but rather part of a larger software system that includes not just code but also data. Because the data is always changing, the model is continually being asked to retrain from fresh data. 

As a result, MLOps offer a new approach, Continuous Training (CT), in addition to CI and CD, to automatically retrain the model as needed. From the foregoing, it is evident that MLOps are far more involved than DevOps and include extra operations involving data and models.

There is a detailed comparative analysis of MLOps and DevOps which could be read here.

There is a detailed guide for the beginning with the MLOps and understanding the effects of it on a data science life cycle which could be read here.

Are you looking for a complete repository of Python libraries used in data science, check out here.

ML operations with COMET

Comet is an MLOps platform that provides full stack observability to machine learning models and enables the set of production performance baselines based on model performance. The Comet connector connects machine learning metrics which allows for continuous monitoring of the data throughout the machine learning lifecycle. This ensures optimal model performance and allows you to obtain superior commercial results.

In this article, we will train, test and register a machine learning model which will classify the type of migraine a patient is having based on various features.

Once you have registered at the comet website it’s all set to connect to the comet server. The redirected page would look like this once the registration is completed.

Analytics India Magazine

A new project could be created by clicking on the ‘New project’ button and just defining the name of the project and a description of the project. This project could also be created with the help of a py notebook. That process will be demonstrated further in the article.

Install comet

! pip install comet_ml

Connect to the server

For connecting to the server generate an API key by going to the settings under profile in the comet webpage. In the account setting the second block, there is an option to generate an API key and copy it to the clipboard.

from comet_ml import Experiment
 
experiment = Experiment(
    api_key="Paste-you-own-api-key",
    project_name="migraine", 
    workspace="Write-your-own-username",
    auto_metric_logging=True,
    auto_param_logging=True,
    log_graph=True,
    auto_metric_step_rate=True,
    parse_args=True,
)
Analytics India Magazine

The logs that are required by the project could be turned on while connecting to the server by just mentioning them in the ‘Experiment’.

Import necessary libraries

import numpy as np
import pandas as pd
 
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
 
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score, precision_score, recall_score, confusion_matrix

Reading and preprocessing the data

data=pd.read_csv('migraine.csv')
data[:5]
Analytics India Magazine

There are no missing values in the data and it has already been labelled encoded except for the target variable which is the ‘Type’. Before moving to the preprocessing let’s have a look at the target variable and its classifications.

def pie_chart (data_val,title):
  fig = px.pie(data, values=data_val, names='Type', 
              title=title,
              hole=0.5,
              color_discrete_sequence=px.colors.sequential.RdBu)
  return fig.show()
 
pie_chart('Vomit','Patients with vomiting problem categorized by the type of migraine')
Analytics India Magazine

The data says that almost 47% of the patients vomit during migraine. The type is categorized as an aura with migraine. Similarly, more interesting facts could be found leaving that to you.

The target variable needs to be encoded, here a label encoder should be used rather than one-hot encoding or dummy variable because it is a multiclass problem.

encoder=LabelEncoder()
data['Type_encode']=encoder.fit_transform(data['Type'])

Splitting the data for the training into a 70:30 ratio.

X=data.drop(['Type','Type_encode'],axis=1)
y=data['Type_encode']
 
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.30, random_state=42)

Building the model

For this article, a random forest classifier would do the work. 

rf = RandomForestClassifier()
rf.fit(X_train,y_train)
prediction=rf.predict(X_test)
f1 = f1_score(y_test, prediction,average='weighted')
precision = precision_score(y_test, prediction,average='weighted')
recall = recall_score(y_test, prediction,average='weighted')

We will now log these metrics to the comet server 

params={"random_state":42,
        "model_type":"random forest classifier",
        "stratify":True
}
 
metrics = {"f1":f1,
"recall":recall,
"precision":precision
}
 
experiment.log_dataset_hash(X_train)
experiment.log_parameters(params)
experiment.log_metrics(metrics)
experiment.log_confusion_matrix(convert_y_test,convert_prediction)

There are two ways these metrics can be logged:

  1. By storing the values in a variable and then creating a dictionary and giving that dictionary as an input to the ‘experiment.log_meterics()’.
  2. Another is directly giving the values in the experiment logging function 

Both of these methods have been illustrated. The metrics f1_score, precision and recall are logged in with the first method and the confusion matrix is logged in with the second method.

Once the values are logged in the processes can be viewed by visiting the webpage. Go on to project under the experiment panel all the current experiments running could be viewed.

Analytics India Magazine
Analytics India Magazine

The dashboard could be viewed in the notebook itself just by using the following code.

experiment.display()

One can also save charts on the server. Let’s have a chart describing the feature importance according to the random forest classifier.

imp_feature = pd.Series(rf.feature_importances_, index=X.columns)
plt.figure(figsize=(15, 8))
plt.title('Feature Importance of Random Forest')
imp_feature.sort_values(ascending=False).plot(kind='barh')
plt.savefig('/content/drive/MyDrive/Datasets/feature_importance.jpeg')
plt.show()
Analytics India Magazine

The chart needs to be saved on the drive and then uploaded to the server by using the logging method. It could be viewed under the graphics section of the dashboard.

experiment.log_image('/content/drive/MyDrive/Datasets/feature_importance.jpeg')

Registry model

Save the trained model and register the model under the registry section. This article will use the pickle package to save the model.

import pickle
filename = 'random_forest_model.sav'
pickle.dump(rf, open(filename, 'wb'))

The dump function will save the model according to the name described and the location described.

Once the model is saved, upload it to the comet server.

experiment.log_model('Random forest classifier v1',"/content/random_forest_model.sav")

Then go to the ‘Assets and Artifacts’ section in the experiment panel. Under the assets section in the models, the saved model would be present. To registry, the model, click on the plus icon and register the model. Now the model could be monitored.

Analytics India Magazine
Analytics India Magazine

At last, exist the server end all the tasks by using this code.

experiment.end()
Analytics India Magazine

Conclusions

MLOps is the most effective method for incorporating ML models into production. A fully mature MLOps system with continuous training can lead to more efficient and realistic ML models, in addition to the use of ML models in production. With this article, we have understood how to build and monitor ML operations with Comet.

References

More Great AIM Stories

Sourabh Mehta
Sourabh has worked as a full-time data scientist for an ISP organisation, experienced in analysing patterns and their implementation in product development. He has a keen interest in developing solutions for real-time problems with the help of data both in this universe and metaverse.

Our Upcoming Events

Masterclass, Virtual
How to achieve real-time AI inference on your CPU
7th Jul

Masterclass, Virtual
How to power applications for the data-driven economy
20th Jul

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, Virtual
Deep Learning DevCon 2022
29th Oct

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM
MOST POPULAR

Will Tesla Make (it) in India?

Tesla has struggled with optimising their production because Musk has been intent on manufacturing all the car’s parts independent of other suppliers since 2017.

Now Reliance wants to conquer the AI space

Many believe that Reliance is aggressively scouting for AI and NLP companies in the digital space in a bid to create an Indian equivalent of FAANG – Facebook, Apple, Amazon, Netflix, and Google.

[class^="wpforms-"]
[class^="wpforms-"]