IBM recently launched a new machine learning, end-to-end pipeline starter kit to help developers and data scientists to build machine learning applications and deploy them quickly in a cloud-native environment.
The starter kit is part of the IBM Cloud-Native Toolkit–an open-source collection of assets that provide an environment for developing cloud-native applications for deployment within Kubernetes and Red Hat OpenShift. Assets created with the Cloud-Native Toolkit can be deployed in any cloud or hybrid cloud environment.
The Cloud-Native Toolkit developed by the IBM Garage provides a set of accelerators to apply end-to-end open source patterns, including GitOps, to any coding pattern to enable developers, administrators, and site reliability engineers to deliver business applications the entire software development life cycle (SDLC).
According to IBM, the toolkit incorporates best practices that increase a developer’s ability to deliver business value, with an aim to:
- Speed up time to business value
- Reduce risk through consistent delivery of the ML models from start to production
- Quickly ramp up development teams on Kubernetes and Red Hat OpenShift
The below image depicts Cloud-Native Toolkit environment components. The environment consists of a Kubernetes or Red Hat OpenShift service deployment cluster, a collection of continuous delivery tools deployed into the cluster, and a set of backend services.
Why ML starter kit?
According to MarketsandMarkets, the AI infrastructure market is expected to grow from $23.7 billion in 2021 to $79.3 billion by 2026, growing at a CAGR of 27.3 percent. The increasing need for high computing power, increasing adoption of cloud machine learning platforms, increasingly large and complex datasets, etc., are key factors driving the AI infrastructure market.
Integrating machine learning technologies with cloud-native environments is an increasingly common scenario, driven by the use of microservices and the need to scale rapidly. Developers are faced with challenges to build machine learning applications, alongside ensuring they run well in production in cloud-native and hybrid environments.
IBM said moving an application from a Jupyter Notebook to production requires numerous components. The components consist of a wide range of tasks that developers and administrators have to manage, including microservices frameworks, code analysis support, monitoring/logging support, continuous integration, secure access to service credentials, DevOps pipeline, Kubernetes YAML files, etc.
The starter kit aligns with popular open-source projects. Moreover, it enables them to create a model as a microservice using MAX Framework and MAX Skeleton and help them build and deploy on RedHat OpenShift with continuous integration and continuous delivery, code analysis, logging, API support, and health checks.
Steps to use IBM’s ML starter kit
- Set up the environment and create a pipeline for the application as outlined here.
- To verify the pipeline, open the OpenShift web console and select ‘Pipelines.’
OpenShift web console pipeline view (Source: IBM)
- From the deployed pipeline, you can access the object detector application, view the code analysis report, access the artifact repository and container image registry, and view the health report of the application.
- Further, to access the deployed application, you can select developer perspective from the OpenShift console, select project, select Topology from the Console, and verify the application is running. The deployed application will look like the image shown below.
(Source: IBM)
Wrapping up
IBM’s starter kit facilitates operationalising and industrialising AI-powered applications, making them production-ready, using Red Hat OpenShift and open-source technologies. The starter kit expedites the development, deployment, and innovation with a set of opinionated approaches/tools.
Last month, AWS launched a beginners guide to use Amazon SageMaker to build, train, and deploy a machine learning (ML) model using the XGBoost ML algorithm. As a fully managed service, Amazon SageMaker offers developers and data scientists the ability to build, train, and deploy machine learning models quickly.
Similarly, Microsoft has also launched Azure Machine Learning SDK, where developers can use ML pipeline to create a workflow that stitches together various ML phases and later publish that pipeline for access or sharing with others. Check out Tutorial: Build an Azure Machine Learning pipeline for batch scoring or Use automated ML in an Azure Machine Learning pipeline in Python.
Google has released TensorFlow Extended (TFX), an open-source project TFX offers components that you can use to ingest and transform data, train and evaluate a model, deploy a trained model for inference, etc., by using the TFX SDK. To get started with building pipelines with the TFX pipeline, check out the TFX pipeline on Google Cloud.