Active Hackathon

How GitHub Got MLOps Right

“DevOps is not a product that you can buy and install.”

Pulkit Agarwal, GitHub.

After productive and informative Day 1, ADasSci’s Deep Learning Developers Conference is live again. Day 2 of DLDC2020 too, had an interesting lineup of speakers along with a full-day workshop on deep learning with Keras. In an hour-long talk, speakers Pulkit Agarwal and Vinod Joshi of Github discussed the various challenges of setting up an ML pipeline.

Pulkit, who is part of the product team at Github, began by defining what MLOps is really about and what makes it challenging while organisations have figured out how to work with DevOps.


Sign up for your weekly dose of what's up in emerging technology.

MLOps comes with an additional challenge of machine learning lifecycle automation. Usually, more emphasis is placed on models, but Pulkit likened the model-building to a small cog in the wheel. For instance, small systems are not sufficient for remote training. VMs or Spark clusters are essential. Pulkit listed four key challenges one might face while setting up an ML pipeline:

  • Collaboration on code
  • Remote training
  • Model Bookkeeping
  • Managing data code and updates.

Model bookkeeping, for example, can cost a project dearly. Developers can lose track of file versions, and deployment becomes chaos. There can be other instances where someone doesn’t know how to write a controller file. Organisations might run into this trivial-sounding yet serious problem sooner or later if attention is not paid to the details.

So how does GitHub get MLOps right? Although Pulkit admits that “easy” in MLOps is a very ambitious goal, the team at GitHub tries their best by incorporating three important components:

  1. ML Optimised compute
  2. Source control and
  3. ML Aware

For example, the job of ML Aware CI/CD component is to warn the system in case of code change or other updates. While the first half of the talk included how GitHub made MLOps easy-ish, the second half, helmed by Vinod Joshi, was about how these principles were put to use in building models for increasing productivity of the developers. Vinod elaborated about the various aspects of ML lifecycle and the importance of building and rebuilding models when there is any change in the data distribution.

Vinod continued his talk by dissecting a use case where he and his team have worked on a model that tracks the coding time of the developers. The whole process can be looked at through the lens of a Markov process where coding and non-coding are the states between which the observations or commits, in this case, are made. Due to the many hidden states, this becomes more of a hidden Markov model. 

So what are the implications of such an experiment? 

The notion here was to identify the patterns between commit intervals and productivity. In large organisations, continued Vinod, developers don’t get enough time to code due to various other activities like meetings etc. For a developer, time spent on coding reflects their job satisfaction. So the insights gathered from this model can have multiple applications within the organisation. Another key application can be identifying the right time code review. If a developer’s coding time is tracked and if peak working time is figured out, then they would give the code for review within those time zones when they are productive. 

Team GitHub, in this talk, gave us a glimpse of what it takes to make MLOps easy for an organisation. According to Pulkit, MLOps or DevOps is not just any software but more of a value, a union of people and processes. DevOps is not just any product that one buys and installs. This sums up the ethos that underlies GitHub’s success with ML.

More Great AIM Stories

Ram Sagar
I have a master's degree in Robotics and I write about machine learning advancements.

Our Upcoming Events

Conference, Virtual
Genpact Analytics Career Day
3rd Sep

Conference, in-person (Bangalore)
Cypher 2022
21-23rd Sep

Conference, in-person (Bangalore)
Machine Learning Developers Summit (MLDS) 2023
19-20th Jan, 2023

Conference, in-person (Bangalore)
Data Engineering Summit (DES) 2023
21st Apr, 2023

Conference, in-person (Bangalore)
MachineCon 2023
23rd Jun, 2023

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM

Data Science Skills Survey 2022 – By AIM and Great Learning

Data science and its applications are becoming more common in a rapidly digitising world. This report presents a comprehensive view to all the stakeholders — students, professionals, recruiters, and others — about the different key data science tools or skillsets required to start or advance a career in the data science industry.

How to Kill Google Play Monopoly

The only way to break Google’s monopoly is to have localised app stores with an interface as robust as Google’s – and this isn’t an easy ask. What are the options?