Listen to this story
|
Machine learning operations (MLOps) are increasingly becoming vital to successful business data science project implementation. It is a method that assists companies and business executives in generating long-term value and lowering the risk associated with data science, machine learning, and artificial intelligence efforts. MLOps refers to a set of approaches and tools for deploying ML models in production. This article will be focused on performing ML operations with COMET. Following are the topics to be covered.
Table of contents
- Snippet about MLOps
- ML operations with COMET
- Install Comet
- Connect to the server
- Import libraries
- Reading and preprocessing the data
- Building and testing the model
- Registry model
MLOps refers to the standardisation and simplification of machine learning life cycle management. Let’s have a high level overview of MLOps.
Snippet about MLOps
MLOps is a hybrid of DevOps and Machine Learning. DevOps refers to a collection of procedures aimed at lowering the time required for product delivery and closing the gap between software development and operations. Continuous Integration (CI) and Continuous Delivery (CD) are the two primary DevOps concepts.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.
- Continuous integration is the practice of software development companies attempting to integrate code created by developer teams regularly. As a consequence, they regularly test their code and make tiny adjustments depending on the faults and vulnerabilities discovered during the tests. As a result, the software development process cycle is reduced.
- Continuous delivery is a method in which a new version of the software under development is constantly installed for testing, assessment, and eventually production. With this method, software updates resulting from continuous integration with upgrades and new features reach end users considerably more quickly.
MLOps uses DevOps concepts and methodologies to automate Machine Learning operations. Although it appears to be easy, it is not. This is because a Machine Learning model is not self-contained, but rather part of a larger software system that includes not just code but also data. Because the data is always changing, the model is continually being asked to retrain from fresh data.
As a result, MLOps offer a new approach, Continuous Training (CT), in addition to CI and CD, to automatically retrain the model as needed. From the foregoing, it is evident that MLOps are far more involved than DevOps and include extra operations involving data and models.
There is a detailed comparative analysis of MLOps and DevOps which could be read here.
There is a detailed guide for the beginning with the MLOps and understanding the effects of it on a data science life cycle which could be read here.
Are you looking for a complete repository of Python libraries used in data science, check out here.
ML operations with COMET
Comet is an MLOps platform that provides full stack observability to machine learning models and enables the set of production performance baselines based on model performance. The Comet connector connects machine learning metrics which allows for continuous monitoring of the data throughout the machine learning lifecycle. This ensures optimal model performance and allows you to obtain superior commercial results.
In this article, we will train, test and register a machine learning model which will classify the type of migraine a patient is having based on various features.
Once you have registered at the comet website it’s all set to connect to the comet server. The redirected page would look like this once the registration is completed.

A new project could be created by clicking on the ‘New project’ button and just defining the name of the project and a description of the project. This project could also be created with the help of a py notebook. That process will be demonstrated further in the article.
Install comet
! pip install comet_ml
Connect to the server
For connecting to the server generate an API key by going to the settings under profile in the comet webpage. In the account setting the second block, there is an option to generate an API key and copy it to the clipboard.
from comet_ml import Experiment experiment = Experiment( api_key="Paste-you-own-api-key", project_name="migraine", workspace="Write-your-own-username", auto_metric_logging=True, auto_param_logging=True, log_graph=True, auto_metric_step_rate=True, parse_args=True, )

The logs that are required by the project could be turned on while connecting to the server by just mentioning them in the ‘Experiment’.
Import necessary libraries
import numpy as np import pandas as pd import plotly.express as px import seaborn as sns import matplotlib.pyplot as plt from sklearn.preprocessing import LabelEncoder from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import f1_score, precision_score, recall_score, confusion_matrix
Reading and preprocessing the data
data=pd.read_csv('migraine.csv') data[:5]

There are no missing values in the data and it has already been labelled encoded except for the target variable which is the ‘Type’. Before moving to the preprocessing let’s have a look at the target variable and its classifications.
def pie_chart (data_val,title): fig = px.pie(data, values=data_val, names='Type', title=title, hole=0.5, color_discrete_sequence=px.colors.sequential.RdBu) return fig.show() pie_chart('Vomit','Patients with vomiting problem categorized by the type of migraine')

The data says that almost 47% of the patients vomit during migraine. The type is categorized as an aura with migraine. Similarly, more interesting facts could be found leaving that to you.
The target variable needs to be encoded, here a label encoder should be used rather than one-hot encoding or dummy variable because it is a multiclass problem.
encoder=LabelEncoder() data['Type_encode']=encoder.fit_transform(data['Type'])
Splitting the data for the training into a 70:30 ratio.
X=data.drop(['Type','Type_encode'],axis=1) y=data['Type_encode'] X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.30, random_state=42)
Building the model
For this article, a random forest classifier would do the work.
rf = RandomForestClassifier() rf.fit(X_train,y_train) prediction=rf.predict(X_test) f1 = f1_score(y_test, prediction,average='weighted') precision = precision_score(y_test, prediction,average='weighted') recall = recall_score(y_test, prediction,average='weighted')
We will now log these metrics to the comet server
params={"random_state":42, "model_type":"random forest classifier", "stratify":True } metrics = {"f1":f1, "recall":recall, "precision":precision } experiment.log_dataset_hash(X_train) experiment.log_parameters(params) experiment.log_metrics(metrics) experiment.log_confusion_matrix(convert_y_test,convert_prediction)
There are two ways these metrics can be logged:
- By storing the values in a variable and then creating a dictionary and giving that dictionary as an input to the ‘experiment.log_meterics()’.
- Another is directly giving the values in the experiment logging function
Both of these methods have been illustrated. The metrics f1_score, precision and recall are logged in with the first method and the confusion matrix is logged in with the second method.
Once the values are logged in the processes can be viewed by visiting the webpage. Go on to project under the experiment panel all the current experiments running could be viewed.


The dashboard could be viewed in the notebook itself just by using the following code.
experiment.display()
One can also save charts on the server. Let’s have a chart describing the feature importance according to the random forest classifier.
imp_feature = pd.Series(rf.feature_importances_, index=X.columns) plt.figure(figsize=(15, 8)) plt.title('Feature Importance of Random Forest') imp_feature.sort_values(ascending=False).plot(kind='barh') plt.savefig('/content/drive/MyDrive/Datasets/feature_importance.jpeg') plt.show()

The chart needs to be saved on the drive and then uploaded to the server by using the logging method. It could be viewed under the graphics section of the dashboard.
experiment.log_image('/content/drive/MyDrive/Datasets/feature_importance.jpeg')
Registry model
Save the trained model and register the model under the registry section. This article will use the pickle package to save the model.
import pickle filename = 'random_forest_model.sav' pickle.dump(rf, open(filename, 'wb'))
The dump function will save the model according to the name described and the location described.
Once the model is saved, upload it to the comet server.
experiment.log_model('Random forest classifier v1',"/content/random_forest_model.sav")
Then go to the ‘Assets and Artifacts’ section in the experiment panel. Under the assets section in the models, the saved model would be present. To registry, the model, click on the plus icon and register the model. Now the model could be monitored.


At last, exist the server end all the tasks by using this code.
experiment.end()

Conclusions
MLOps is the most effective method for incorporating ML models into production. A fully mature MLOps system with continuous training can lead to more efficient and realistic ML models, in addition to the use of ML models in production. With this article, we have understood how to build and monitor ML operations with Comet.