This is a cache of https://developer.ibm.com/articles/datacap-watsonxai-custom-actions/. It is a snapshot of the page as it appeared on 2025-11-24T08:03:12.514+0000.
Boosting Document Intelligence - IBM Developer

Article

Boosting Document Intelligence

Integrating IBM Datacap with watsonx.ai using custom actions

By

Ela Dixit,

Deepa Amasar

In the age of AI-driven automation, intelligent document processing is a critical enabler for digital transformation. While IBM Datacap is a powerful tool for capturing and extracting structured and semi-structured data, integrating it with IBM watsonx.ai unlocks an even deeper understanding of unstructured content.

In this tutorial, we will walk you through how to integrate IBM Datacap with watsonx.ai foundation models using a custom action. This integration enables flexible and intelligent data extraction from complex documents like invoices, ID cards, or contracts.

Why integrate Datacap and watsonx?

Datacap excels at document capture, classification, and rules-based extraction. But when documents vary in layout or contain nuanced information (such as handwritten notes or context-dependent fields), traditional approaches hit a limit.

That’s where the large language models (LLMs) of watsonx.ai come in. These models allow natural language prompts to interpret document content, identify relevant data, and return structured insights.

By combining these two products, you get:

  • Zero-template extraction using generative AI
  • Multi-format understanding (for example, images and scanned PDFs)
  • Rapid adaptability to new document types

Integration architecture

This integration uses a simple architecture:

  1. Datacap captures the document image.
  2. A custom action calls a Python script.
  3. The script sends the image to watsonx.ai using an API.
  4. Watsonx.ai returns extracted data based on a prompt.
  5. Datacap stores the extracted results into its data fields.

Here is the flow explained:

integration flow

To build this solution, you need these prerequisites:

Python script to call watsonx.ai

Here’s a sample Python script that encodes the image captured by Datacap and sends it to watsonx.ai for field extraction.

  1. Connect to watsonx.ai using credentials and project info.
  2. Read images from a folder path, where Datacap images are stored
  3. Process each image via watsonx.ai (through InvoiceProcessor Class)
  4. Converts results to a DataFrame (df_invoice).
  5. Saves extracted data to a text file.

python script

In this script, you would replace the endpoint and API key with your own watsonx.ai deployment details, prompt query is also mentioned, to extract the information from the invoice document.

InvoiceProcessor class

The model id defines which foundation model to use for generating the response to the prompt (extraction of the invoice information).

prompt template

IBM watsonx.ai provides access to several powerful models. You’re not limited to one model—this architecture is flexible. You can change the model id based on your document type and processing needs:

  • Use Granite models when:

    • You already have clean OCR output from Datacap
    • The document is primarily text-based
    • You want structured field extraction or summaries
  • Use meta-llama when:

    • You need powerful general-purpose reasoning
    • You’re processing mixed or complex language input
    • You want multilingual support or advanced prompt logic
  • Use Vision LLaMA when:

    • The document includes handwriting, images, or scanned forms
    • OCR isn’t reliable or loses formatting
    • You need to extract data from visual layout, such as: Photos on IDs, Table positions, Signatures, or stamps

Be sure the model you choose is enabled in your watsonx.ai project, and adjust your prompt style accordingly.

Custom actions in Datacap

Watsonx.ai returns the data and saves it as a structured .txt file in the dataframe format. A custom action is written to these fields:

  1. Set Invoice Number, Date, and PO Number, Net Amount to Page level fields. Custom action invokes the python script which will extract the information from the invoices and then sends the information back to Datacap for assigning values back to fields.

    datacap fields 1

  2. Datacap custom action also creates the line items dynamically based on the invoice line items and assigns values to the fields as shown in the image.

    datacap fields 2

Workflow integration

Attach the custom action rule to the document level in the application workflow. Attach it after the scanning step.

workflow

Summary and next steps

By integrating IBM Datacap with watsonx.ai using a custom action, we unlock a new level of intelligence and flexibility in document processing. This approach not only simplifies data extraction from complex and variable document layouts but also enables dynamic behaviors, such as populating line items from structured AI output, that were previously difficult to implement without rigid templates.

You can modernize your document workflows with minimal effort and maximum impact by using lightweight Datacap custom actions, the flexible watsonx.ai LLM prompt-response interface, and the powerful Datacap field and table manipulation capabilities.

This solution is adaptable across use cases—invoices, claims, contracts, and more—making it a foundational pattern for enterprises embracing AI-first automation.