GitHub Open-Sources A Series Of GitHub Actions For Automating ML Workflow

Recently, GitHub announced that now developers can use GitHub Actions for Machine learning Operations (MLOps) and Data Science. The software development platform created a series of GitHub Actions that integrate parts of data science and machine learning with a software development workflow.

MLOps is a practice for collaborating between data scientists and operations professionals for testing, lineage, versioning, and historical information in an automated way in order to manage machine learning or deep learning production life-cycle. 

Due to the nascent stage of MLOps, developers and data scientists often require to implement these tools from scratch, use disparate tools that are decoupled from codes and thus leading to poor debugging and reproducibility. In order to mitigate these issues, a series of GitHub Actions have been introduced.

There are currently a number of GitHub Actions that are available for MLOps and data science. Some of these are mentioned below: –

Orchestrating Machine Learning Pipelines:

  • Submit Argo Workflows – The Submit Argo Workflows allows a developer to orchestrate machine learning pipelines that run on Kubernetes.
  • Publish Kubeflow Pipelines to GKE– Kubeflow Pipelines is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers. The goal of this action is to provide automated deployments of Kubeflow Pipelines on Google Cloud Platform (GCP). 

Jupyter Notebooks:

  • Run Parameterised Notebooks– This GitHub action runs a Jupyter notebook,  parameterises using papermill and lets a developer upload produced output as an artifact using the upload artifact action.
  • Repo2Docker Action– This action helps to build a Jupyter enabled Docker image from a GitHub repository and push this image to a Docker registry of choice.
  • fastpages– fastpages uses GitHub Actions to simplify the process of creating Jekyll blog posts on GitHub Pages from a variety of input formats. The features of this action include collapsable code cells that are either open or closed by default, ability to add links to Colab and GitHub automatically, built-in search, create posts, including formatting and images, directly from Microsoft Word documents and other such.

End-To-End Workflow Orchestration:

  • Examples and templates for utilising Azure Machine Learning from GitHub Actions. The templates show the extensive capabilities of using GitHub Actions combining with Azure Machine Learning. It helps in managing a machine learning project with automated training and deployment.

Experiment Tracking:

  • Fetch runs from Weights & Biases– The Weights and Biases is an experiment tracking and logging system for machine learning and is free for open-source projects. 

In a blog post, Hamel Hussain, a machine learning engineer at the code hosting platform illustrated how developers and data scientists can easily orchestrate a machine learning pipeline to run on the infrastructure as well as how an experiment tracking system can be integrated with GitHub Actions to enable MLOps.

Wrapping Up

Last year in November, the code hosting platform announced the launch of GitHub Actions and Packages, which makes it easy for the developers to automate all the software workflows. After the series of GitHub Actions, the software development platform also announced GitHub Super Linter

The Super Linter is basically a source code repository that is wrapped up into a Docker container and is called by the GitHub Actions in order to maintain consistency in the documentation and code while making more productive communication and collaboration for developers. 

The intuitive features of Super Linter include 

  • Super linter prevents any broken code from being uploaded to the master branches
  • It assists in establishing coding best practices across various programming languages
  • The source code repo builds guidelines for code layout and format
  • It automates the process to help streamline the code reviews
  • With the basic criteria of this repo, developers will be able to ship better, cleaner and more stable codes internally as well as to the customers and partners.

More Great AIM Stories

Ambika Choudhury
A Technical Journalist who loves writing about Machine Learning and Artificial Intelligence. A lover of music, writing and learning something out of the box.

More Stories

Meeta Ramnani
Postman now supports gRPC

The gRPC for the Postman API Platform is still in beta and the company is in the process of integrating it to the rest of the ecosystem.

3 Ways to Join our Community

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Telegram Channel

Discover special offers, top stories, upcoming events, and more.

Subscribe to our newsletter

Get the latest updates from AIM