Build your own orchestration components on Watson Pipelines

Archived content

Archive date: 2024-12-10

This content is no longer being updated or maintained. The content is provided “as is.” Given the rapid evolution of technology, some content, steps, or illustrations may have changed.

Watson Pipelines is an orchestration platform for data scientists to automate and share their machine learning lifecycles. Watson Pipelines is based on the open-source project Kubeflow Pipelines on Tekton running on top of Red Hat OpenShift to provide a secured and scalable machine learning orchestration platform for data scientists and enterprises.

Watson Pipelines supports a wide range of data sources enabling teams to streamline their workflows along with native integration on many watsonx services. In addition, it allows data scientists to create complex programmatic logic such as conditions, loops, and sub-graphs into a single pipeline entity. Therefore, data scientists can develop, automate, manage, and monitor all their machine learning lifecycles in the same interface.

In the previous article, "Advance machine learning workflows with IBM Watson Pipelines," we explained the core concepts of Watson Pipelines, such as using programmatic logic, sharing global objects, and integrating with pre-defined watsonx and IBM Cloud services. However, the fields of artificial intelligence and machine learning are evolving quickly and users often need to experiment with cutting-edge technology. Therefore, it's important to let data scientists bring their own code and be able to integrate with the latest technology on a reliable cloud platform.

Enterprises can use IBM watsonx to deploy and embed AI across their business, manage all data sources, and accelerate responsible AI workflows all in one platform. Watson Pipelines is part of the Watson Studio service offering, but it is also now available as part of the new watsonx.ai platform.

Watson Pipelines now provides various ways for users to bring their own code and create custom integrations with all the new watsonx services. In this article, we will describe the different ways of bringing your own code and building custom components on Watson Pipelines.

Bash Scripts

Users can use the Run Bash Script node in Watson Pipelines to run an inline Bash script to automate a function or process for the pipeline. Bash scripts are great for fast simple tasks and perform operating system commands. Bash scripts can be used in many scenarios such as combining task inputs into a large string and ssh into a remote machine.

The Run Bash Script node also comes with the cpdctl tool which allows users to communicate with the watsonx platform API, such as look-up assets, creating deployment spaces, and publish models. This allows users to integrate any watsonx service using the cpdctl tool. For example, the following image shows that cpdctl can search all notebooks with a set variable tags and aggregate the results in a JSON list as the output. Then, the output can be passed to other Watson Pipeline nodes to perform further analysis.

Watson Pipelines Bash Script

Notebooks

Watson Pipelines provides the Run notebook job node to let users run any notebook that is stored on the watsonx platform. Jupyter Notebook is a popular data scientist tool for experimenting with ML models and is able to leverage many popular Data and ML libraries such as Numpy, PyTorch, and Spark.

On the watsonx platform, users can seamlessly import their notebooks and run the notebooks on demand using Watson Pipelines. This allows users to bring their own experimental code without any changes. Furthermore, users can pick the notebook environment such as Python, R, Spark, and Watson NLP, and be able to use GPU resources based on the capacity unit-hour (CUH) rate. This can help users save significant amounts of money on idle GPU resources because the notebook job is only charged when the job is running. The image below demonstrates the flexibility of how users can pick the two P100 GPU card runtime environment in the node configuration for each notebook job.

Watson Pipelines GPU notebook job

Custom Components

Users can create their own custom components using the Watson Pipelines SDK and share it with other users within the project. The custom components are very lightweight Python components that are much cheaper and faster to run compared to notebook jobs because they don't require Jupyter Notebook packages. Therefore, custom components are best for tasks that need to be finished in seconds to save money because custom components use much less capacity unit-hour (CUH) rate to start up the environment. The image below shows the custom component for adding two numbers can be imported easily and reusable between pipelines.

Watson Pipelines custom components

Summary and next steps

In this article, we went over the major ways for users to bring their own code into Watson Pipelines. These custom components not only give more flexibility to the end users to experiment with cutting-edge technology but also allow users to orchestrate any watsonx service in the Watson Pipelines and complete the end-to-end machine learning lifecycle all in one place.

If you want to learn more about how to run production-level pipeline services for AI and large language models (LLMs), check out the Watson Pipelines documentation.

watsonx.ai. Watsonx.ai provides new generative AI capabilities, which are powered by foundation models, and traditional machine learning capabilities in a powerful platform that spans the AI lifecycle.

Try watsonx.ai, the next-generation studio for AI builders. Explore more articles and tutorials about watsonx on IBM Developer.