MITB Banner

Watch More

Version Control For ML Models, Explained

Version control frameworks allow developers to look at the records, identify differences, and merge changes wherever necessary.
Version Control

Version control is part of software configuration management used to keep track of changes to documents, computer programs, web sites etc. 

For example, version control keeps track of the source code changes. In the event of code slip-ups (usually happens when more than one person works on the same project), it protects the code from unintended consequences resulting from human oversight.

While building a machine learning model, a developer is accountable for questions such as the dataset used to train the model; hyperparameters; pipeline used to create the model; last deployed version of the model etc. This calls for the application of version control in machine learning models.

Version control frameworks allow developers to look at the records, identify differences, and merge changes wherever necessary. Versioning helps in monitoring applications and ensuring quality. It is also helpful for new members to download the current adaptation and monitor it easily.

Why Version Control 

  • The accuracy of the dataset varies when you update and tinker with different parts of the model. With versioning, developers can scope out the best model and its tradeoffs.
  • A machine learning model can fall flat for several reasons. For example, while adding more data or incorporating performance improvement measures. In case of such failures, version modelling helps in quickly reverting to the previous working version.
  • Machine learning models can be very complex. Factors such as datasets, training and testing, frameworks, among others, account for a model’s success. Version control helps in keeping dependency tracking.
  • Major updates to machine learning models are not usually rolled out at once. To ensure better performance and failure tolerance, the ML models are released in phases. Versioning allows the deployment of the right versions at the right time.
  • Model versioning is an essential component of AI/ML governance for organisations to control access, implement policy, and track model activity.

Tools 

Git: Git is the standard versioning protocol used across the board to monitor and version control software development and deployment. Git tracks changes made to the code and help in implementing, storing, and merging changes.

That said, Git also comes with a few drawbacks. It is a challenge to keep all the folders in sync in Git. The model checkpoints and data size occupy the bulk of the space. Many users alternatively store the datasets in cloud servers such as Amazon 3, reproducible codes in Git, and generate models on the fly. But working with multiple data sets breeds confusion. Further, improper documentation of data changes and upgrades can result in the model losing the context.

DVC: Data Version Control is a Git extension. It is a streamlined version of combining Git with ML specific functionality for data management. DVC can run top of any Git repository and is compatible with the Git server or provider. DVC also offers all the advantages of the distributed version control system, such as lock-free, local branching, and versioning.

Credit: DVC

Pachyderm: It delivers robust data versioning and data lineage to the machine learning loop. It also provides a flexible pipeline system that can use any tool or framework in the transformation steps. Pachyderm uses containers to execute different pipeline steps and solves data provenance issues by tracking data commits and optimising the pipeline.

Machine learning metadata (MLMD): It is a recently introduced library from the Tensorflow team to track the entire ML workflow’s full lineage. The complete lineage includes steps such as data ingestion, preprocessing, validation, training, and deployment. MLMD can be used to trace bad models back to the datasets.

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Shraddha Goled

Shraddha Goled

I am a technology journalist with AIM. I write stories focused on the AI landscape in India and around the world with a special interest in analysing its long term impact on individuals and societies. Reach out to me at shraddha.goled@analyticsindiamag.com.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories