A scalable machine learning workflow involves several steps and complex computations. These steps include data preparation and preprocessing, training and evaluating models, deploying these models and much more. While prototyping a machine learning model can be seen as a simple and easygoing task, it eventually becomes hard to track each and every process in an ad-hoc manner.
To simplify the development of machine learning models, Google launches the beta version of Cloud AI Platform Pipelines, which will help to deploy robust, repeatable machine learning pipelines along with monitoring, auditing, version tracking, and reproducibility. It ensures to deliver an enterprise-ready, easy to install, a secure execution environment for the machine learning workflows.
Cloud AI Platform
The AI platform in Google Cloud is a code-based data science development environment, which helps the machine learning developers, data scientists and data engineers to deploy ML models in a quick and cost-effective manner.
The core tech stack of AI Platform Pipelines supports two SDKs to author machine learning pipelines are the Kubeflow Pipelines SDK and the TFX SDK. The Kubeflow Pipelines SDK is a lower-level SDK, which enables direct Kubernetes resource control and simple sharing of containerised components. While the TFX SDK provides a higher-level abstraction with prescriptive, but customisable components with predefined ML types. Thus, with the AI Platform Pipelines, one can specify a pipeline using the Kubeflow Pipelines (KFP) SDK, or by customising the TensorFlow Extended (TFX) Pipeline template with the TFX SDK.
There are two main benefits of using the AI platform pipelines:
- Easy Installation and Management: One can easily access the AI Platform Pipelines by visiting the AI Platform panel in the Cloud Console.
- Easy Authenticated Access: AI Platform Pipelines provides secure and authenticated access to the Pipelines UI via the Cloud AI Platform UI without the need to set up port-forwarding.
AI Platform Pipelines Beta
AI Platform Pipelines include enterprise features for running machine learning workloads, including pipeline versioning, automatic metadata tracking of artefacts and executions, cloud logging, visualisation tools, and more. It provides seamless integration with Google Cloud managed services like BigQuery, Dataflow, AI Platform Training and Serving, Cloud Functions, and other such.
The AI platform includes two major parts, which are:
- The enterprise-ready infrastructure for deploying and running structured ML workflows that are integrated with GCP services.
- The pipeline tools for building, debugging and sharing pipelines and components.
The beta launch of AI Platform Pipelines includes a number of new features, which include support for template-based pipeline construction, versioning, and automatic artefact and lineage tracking.
The features of this pipeline are mentioned below
- Build ML Pipeline with TFX Templates: In order to make it easier for the developers to create an ML pipeline code, the TFX SDK provides templates, or scaffolds, with step-by-step guidance on building a production ML pipeline for the data. With this feature, one can easily add various components to the pipeline as well as iterate them.
- Pipelines Versioning: This feature enables a developer to manage semantically-related workflows together by uploading multiple versions of the same pipeline and group them in the UI.
- Artefact and Lineage Tracking: The AI Platform Pipelines supports automatic artefact and lineage tracking powered by ML Metadata by which one can easily keep track of artefacts for an ML pipeline. With the help of lineage tracking, one can see the history and versions of the ML models, data and other such.
In a blog post, Anusha Ramesh, Product Manager, TFX and Amy Unruh, Staff Developer Advocate further mentioned that some new Pipelines features are going to be updated soon, which include support for multi-user isolation, workload identity, UI-based setup of off-cluster storage of backend data, easy cluster upgrades and more templates for authoring ML workflows.