This is a cache of https://developer.ibm.com/tutorials/awb-data-privacy-using-watsonx-data-with-ibm-knowledge-catalog/. It is a snapshot of the page as it appeared on 2025-11-19T05:05:03.136+0000.
Data privacy using watsonx.data with IBM Knowledge Catalog - IBM Developer

Tutorial

Data privacy using watsonx.data with IBM Knowledge Catalog

Integrate on-premises watsonx.data with the IBM Knowledge Catalog, a crucial step that lets you test data protection rules and governance policies on your own

By

Dinesh Kulkarni

IBM Cloud Pak for Data provides a data fabric solution for faster, trusted AI outcomes by connecting the right data, at the right time, to the right people, from anywhere it’s needed.

It's a single, unified platform that spans hybrid and multicloud environments to ingest, explore, prepare, manage, govern, and serve petabyte-scale data for business-ready AI.

IBM watsonx.data is an open, hybrid, and governed data store that enables you to scale analytics and AI with all of your data, wherever it resides, through:

  • Open formats to access all of your data through a single point of entry and share a single copy of data across your organization and workloads, without needing to migrate or re-catalog, reducing the extract, transform, and load (ETL) process and data duplication
  • Integrated vectorized embedding capabilities to prepare your data for retrieval-augmented generation (RAG) or other machine learning and generative AI use cases (in tech preview)
  • Generative AI-powered conversational interfaces to easily find, augment, and visualize data and unlock new data insights, with no SQL required (in tech preview)
  • Integration with existing databases, tools, and modern data stacks
  • Hybrid deployment options as fully managed SaaS on IBM Cloud and Amazon Web Services (AWS) or self-managed containerized software on premises

IBM Knowledge Catalog is a data governance software that provides a data catalog to automate data discovery, data quality management, data lineage, and data protection. The cloud-based enterprise metadata repository activates information for AI, machine learning, and deep learning supported by active metadata. Access, curate, categorize, and share data, knowledge assets, and their relationships wherever they reside. You can use IBM Knowledge Catalog for IBM Cloud Pak for Data to deliver business-ready data to feed AI and analytics projects.

This tutorial explains how you can integrate on-premises IBM Cloud Pak for Data with on-premises IBM watsonx.data to demonstrate data protection and data governance on personal identification information (PII).

Prerequisites

To follow this tutorial, you need:

Estimated time

It should take you approximately 60 minutes to complete this tutorial.

Solution architecture

The following image shows an example of the solution architecture.

Solution Architect

Steps

Step 1. Create catalog, business terms, categories, and rules

To begin:

  1. Log in to IBM Cloud Pak for Data homepage with the admin_id/xxxx credentials.

    All Catalogs

  2. Navigate to Catalogs > All catalogs while remaining in the admin role.

  3. Click Create Catalog.

    Your catalogs dashboard

  4. Enter a name for your catalog (for example, IKC – WXD Integration).

  5. Select Enforce data protection and data location rules.
  6. Select Update original assets.

    New catalog

    After the catalog is created, proceed to the next steps. You will reference the same catalog when publishing the data assets.

    For the business glossary for the financial domain, you use the IBM Knowledge Accelerator for Financial Services, which is a comprehensive resource that encompasses a rich array of terms and concepts. It provides a nuanced understanding of the intricate nature of the business information tha tis handled by financial institutions in their daily operations. You can create additional categories when required based on your data platform

  7. Navigate to Governance > Categories.

    Categories

    You can view the populated business glossary for all of the required categories.

    All Categories

  8. Navigate to Governance > Business terms.

    Business terms

  9. Navigate to Governance > Classifications

    Classifications

  10. Navigate to Governance > Rules.

    Rules

Step 2. Connect and import the table metadata from the lakehouse to IBM Knowledge Catalog

  1. Log in to IBM Cloud Pak for Data using the admin user name and password.

    IBM Cloud Pak for Data log in screen

  2. Click Catalogs > All catalogs in the left pane.

    All catelogs

    The screen lists the available catalogs.

  3. Click New catalog in the upper right to create a new catalog.

    Creating a new catalog

  4. Enter a catalog name, a catalog description, and select Enforce data protection rules in the New Catalog creation screen. Then, click Create.

    New catalog

    The catalog is created, and you are taken to the catalog page.

  5. Add a new Presto connection to the catalog by clicking Add to catalog, then selecting Connection.

    Adding new Presto connection

  6. Search for and select the watsonx.data connection type from the options.

    Selecting watsonx.data connection

  7. Enter the required details for the Presto connection (shown in Step 3), then test the connection.

Step 3. Get the connection parameters for the IBM Knowledge Catalog watsonx.data connector

This section describes how to get the connection parameters for the IBM Knowledge Catalog watsonx.data connector to connect to an on-premises lakehouse instance. You'll enter the following parameters:

  • Hostname: URL of IBM Cloud Pak for Data
  • Port number: 443
  • instance_id: You can take the instance ID for the on-premises watsonx.data lakehouse console
  • instance_name: lakehouse
  • CRN: Same values as the instance ID for the on-premises
  • Check connect to watsonx cpd checkbox
  • Username: Lakehouse user name
  • Password: Password
  • Uncheck validate SSL cert checkbox
  • Check engine is SSL enabled checkbox
  • Engine hostname: Get the value from the lh-console UI
  • Engine id: Get the value from the lh-console UI
  • Engine Port: 443
  1. Get the hostname from the on-premises lakehouse URL.

    URL and hostname

  2. Use 443 for the port number.

  3. Get the Instance ID from the on-premises lakehouse URL. Click the Instance ID icon.

    Instance_id1

  4. Copy the Instance ID.

    Instance-ID2

  5. Get engine details.

    Engine details

  6. Get engine ID and URL.

    Engine ID

  7. Download the certificate from the lakehouse console using a web browser, and enter it in the Engine SSL Certificate field.

  8. Return to the Catalog, and create connection with all the details that were gathered in previous steps

    Entering details

  9. Test the connection by using Test Connection button, and clicking Create in the lower-right corner.

    Testing the connection

  10. The connection is successfully added to the Catalog. For simplicity. it's renamed connection to wd-wKC-connection.

    D-WKC-CONNECTION

  11. Click Add to Catalog in the upper right, and select Connected asset.

    Clicking connected asset

  12. In the connected asset, click Select source.

    Selecting the source

  13. Navigate to the table that you want to import in the flow Catalog --> schema --> table. Select the table, and click Add.

    Adding the table

  14. On the asset page, verify the Data Class against each column assigned during the metadata import. Note that you can define rules in IBM Knowledge Catalog based on these data classes.

    Verify data class

  15. Click the profile tab, and you can change any Data Class as per the requirement. Note that changing the data class is not mandatory. Do it only if the Data Class populated is wrong or needs to be changed.

    Verify data profile

Step 4. Create a new user in IBM Knowledge Catalog and assign the user as the owner of the asset or table

For data that's available in an asset or table in which the data protection or data governance rules are defined or applied, the Unmasked data is available for the owner alone. The data is masked for rest of the users. You can create a user in IBM Knowledge Catalog and assign the ownership of the asset to that user by using the following steps.

  1. Open the IBM Cloud Pak for Data URL.

    Login screen

  2. From the homepage, click the Access control option under Administration in the left menu.

    Access control

  3. Add the user and provide all the details.

    Adding users

  4. Add the user in in the Catalog.

    All catalogs

  5. Go to the Catalog, and select Add user.

    Add user to catalog

Step 5. Configure IBM Knowledge Catalog in watsonx.data UI

  1. Log in to IBM Cloud Pak for Data by using the admin user name and password.

    Login screen

  2. From the service instances, open the instance for the lakehouse.

    Lakehouse instance

  3. Open the lakehouse console UI, and select Access control in the left pane.

    Lakehouse URL

  4. Click the Integration tab, then click Integrate service.

    Integrate service

  5. Create a Zen API Key.

  6. The Integration screen opens.

    Integration screen

The following image shows a completed integration screen.

alt

Step 6. Verify the masking functions per the rules in IBM Knowledge Catalog

  1. Log in to IBM Cloud Pak for Data using the admin user name and password.

    CP4D login screen

    Service instances

  2. From the homepage, click Rules from the left menu, and cross verify that the rules exist for the fields that you are looking for.

    Rules

  3. Log in to the Lakehouse Engine SQL Editor as a user who is not the owner of the asset in IBM Knowledge Catalog.

    Lakehouse query screenshot:

    Lakehouse query non-admin user

    Catalog screenshot as non-admin user:

    Asset non-admin user

  4. Validate with admin user.

    Catalog data using admin:

    Asset with admin user

    Watsonx.data data using admin for the same table:

    Lakehose query admin user

Summary

In this tutorial, you learned the process of integrating on-premises watsonx.data with the IBM Knowledge Catalog, a crucial step that lets you test data protection rules and governance policies on your own. The hands-on experience helps you grasp the details of lakehouse technology more effectively and explore its potential applications within your specific context.

Learn more about watsonx.data. You can also start your free watsonx.data trial.