Develop an AI-powered regulatory compliance assistant with Quarkus, Docling, and LangChain4j

Modern financial institutions operate in one of the most heavily regulated environments in the world. Every month, regulators such as the SEC, FINRA, FCA, ESMA, and OCC publish Regulatory Change Bulletins announcing new rules, amendments, reporting obligations, and enforcement guidance. These bulletins are dense, technical, and filled with cross-references to prior regulations, effective dates, and jurisdiction-specific requirements.

Today, compliance teams must manually read dozens of these documents, determine which changes apply to their organization, track deadlines, and assess operational impact. This process is slow, error-prone, and costly. Missing a single requirement can result in fines, audit findings, or reputational damage.

This tutorial shows how to transform this manual workflow into an AI-assisted experience by building an enterprise RAG application, a Regulatory Change Impact Assistant. This assistant can instantly retrieve, summarize, and reason over regulatory bulletins with audit-grade traceability.

RAG is a technique that enhances LLM responses by retrieving relevant information from a knowledge base before answers are generated. Unlike traditional chatbots that rely solely on their training data, RAG-based assistants can query documents that were added after the model was trained, reference specific sources (which is crucial for regulatory compliance), reduce hallucinations by grounding responses in actual document content, and maintain audit trails by tracking which documents informed each answer.

The IBM Granite family of foundation models provides state-of-the-art capabilities for both text generation and embeddings, which makes them a great fit for enterprise-grade RAG applications like this Regulatory Change Assistant. The Granite models are optimized for regulatory and compliance tasks.

In this tutorial, you'll learn about:

How to build a RAG application with Quarkus
Integrating LangChain4j for AI/LLM capabilities
Using the IBM Granite family of foundation models locally through Ollama for privacy and enterprise model fidelity
Using pgvector for vector similarity search
Processing regulatory documents with Docling
Building REST APIs for document-based AI assistants

Prerequisites

Before you start, ensure that you have:

Java 21+ installed:
```
java -version
```
Quarkus CLI 3.30.6 installed, or use the included mvnw wrapper:
```
quarkus --version
```
To learn more about using quarkus, check out the Quarkus Basics learning path.
(Optional) Podman (or Docker), for Quarkus Dev Service

The application uses Quarkus Dev Services, which will automatically start Docling and PostgreSQL in containers. It can also start your model in a llama.cpp container. Some developers find it easier to use native Ollama installed locally though.
(Optional) Ollama

Downloadable from the Ollama site.

To pull the required models for the app:
```
ollama pull granite4:latest
 ollama pull granite-embedding:latest
```
Learn more about the IBM Granite models:
- Granite Embedding Model
- Granite Text Generation Model

Architecture Overview

The RAG application workflow is simple: when a user asks a question, the system:

Converts the question to a vector embedding
Searches for similar content in the document database
Retrieves the most relevant document segments
Provides these segments to the LLM along with the original question to generate a contextual, cited response.

The application follows a layered architecture:

Architecture diagram

The application provides:

Vector Search: Semantic search over regulatory document embeddings by using PostgreSQL with pgvector
Document Processing: Automatic ingestion of Regulatory Change Bulletins (PDF, DOCX, HTML) by using Docling
Intelligent Analysis: AI-powered answers about regulatory changes, compliance requirements, and impact assessment

Key Components

RegulatoryChangeImpactAssistant: AI service interface that handles user queries about regulatory changes
DocumentRetriever: Performs vector similarity search on regulatory document embeddings
DocumentLoader: Processes and ingests Regulatory Change Bulletins (PDF, DOCX, HTML) on startup
DoclingConverter: Converts regulatory documents to text segments with metadata

Data Flow

Understanding how data move through the system helps clarify each component's role:

Document Ingestion (Startup):
- DocumentLoader scans src/main/resources/documents for PDF, DOCX, and HTML files
- DoclingConverter processes each document, extracting text and preserving page structure
- Each page becomes a TextSegment with metadata (document ID, page number, type)
- The embedding model converts each segment into a vector (numerical representation)
- Vectors and segments are stored in PostgreSQL with pgvector
Query Processing (Runtime):
- User submits a question via REST API
- RegulatoryChangeResource forwards the question to RegulatoryChangeImpactAssistant
- The Assistant triggers DocumentRetriever to find relevant content
- Query text is converted to an embedding vector
- Vector similarity search finds the top 5 most relevant document segments (similarity score ≥ 0.7)
- Retrieved segments are enriched with metadata and formatted by CustomContentInjector
- The LLM receives the question plus retrieved context and generates a response with citations
Response Generation:
- The LLM synthesizes the retrieved regulatory content into a coherent answer
- Citations are automatically included by using the document metadata
- The response is returned to the user via the REST API

This architecture ensures that every answer is grounded in actual regulatory documents, making it suitable for compliance-critical applications.

Set up the project

You can either follow along with this tutorial or check out the source code project from the GitHub repo.

Step 1. Create the Quarkus project

Create a new Quarkus project with the required extensions:

quarkus create app com.ibm:regulatory-change-asssistant\
  --no-code\
  --extensions=rest-jackson,langchain4j-pgvector,langchain4j-ollama,io.quarkiverse.docling:quarkus-docling,hibernate-orm-panache,jdbc-postgresql

This command creates a new Quarkus project with:

rest-jackson: Provides RESTEasy Reactive framework with Jackson for JSON serialization. This extension enables building fast, non-blocking REST APIs that can handle concurrent requests efficiently.
langchain4j-pgvector: Integrates LangChain4j with PostgreSQL's pgvector extension. This extension allows storing and searching document embeddings directly in your database, eliminating the need for a separate vector database.
langchain4j-ollama: Connects LangChain4j to Ollama, enabling you to use local LLMs and embedding models. This extension keeps your data private and reduces API costs while maintaining enterprise-grade model quality.
io.quarkiverse.docling:quarkus-docling: Provides document processing capabilities through IBM's Docling service. Docling excels at extracting structured content from PDFs, DOCX files, and HTML while preserving document structure, tables, and metadata.
hibernate-orm-panache: Simplifies database operations with an active record pattern. While we don't use it extensively in this tutorial, it's included for potential future enhancements like storing document metadata or user queries.
jdbc-postgresql: PostgreSQL JDBC driver required for database connectivity. Quarkus Dev Services will automatically start a PostgreSQL container with pgvector enabled.

After creating the Quarkus project, navigate to it

cd regulatory-change-assistant

Step 2. Review the project structure

The tutorial is going to build the following project structure.

src/main/java/com/ibm/
├── ai/
│   ├── CustomContentInjector.java          #Injecting Results from VectorDB
│   ├── RetrievalAugmentorSupplier.java     # Enriching the User Query
│   └── RegulatoryChangeImpactAssistant.java  # AI service interface
├── api/
│   └── RegulatoryChangeResource.java       # REST endpoint
├── ingest/
│   ├── DocumentLoader.java                  # Document ingestion
│   └── DoclingConverter.java                # Document conversion
└── retrieval/
    └── DocumentRetriever.java              # Vector search retriever

Step 3. Configure the application properties

Create or update src/main/resources/application.properties:

# Database Configuration is optional in development. In production you'd set the following properties:
# quarkus.datasource.db-kind=postgresql
# quarkus.datasource.username=quarkus
# quarkus.datasource.password=quarkus
# quarkus.datasource.jdbc.url=jdbc:postgresql://localhost:5432/regulatory_rag

# Hibernate Configuration
quarkus.hibernate-orm.schema-management.strategy=drop-and-create

# LangChain4j - Ollama Configuration
# Quarkus auto-detects or starts Ollama for you. If you are running it on a different host/port, change it here:
# quarkus.langchain4j.ollama.base-url=http://localhost:11434
quarkus.langchain4j.ollama.chat-model.model-id=granite4:latest
quarkus.langchain4j.ollama.embedding-model.model-id=granite-embedding:latest
quarkus.langchain4j.ollama.timeout=PT60S

# Model temperature should be 0.1-0.3 for regulatory work
quarkus.langchain4j.ollama.chat-model.temperature=0.2

# LangChain4j - pgvector Configuration
quarkus.langchain4j.pgvector.dimension=384

# Docling Configuration
# Quarkus spins up Docling for you. If you are running it on a different host/port, change it here:
# quarkus.docling.service.url=http://localhost:8081

Code Walkthrough

Now, let's walk through the code for the Regulatory Change Impact Assistant.

Retrieval Augmentor

First, create the retrieval augmentor supplier that connects the document retriever to the AI service.

The RetrievalAugmentorSupplier:

Implements Supplier<RetrievalAugmentor> to provide the retrieval augmentor
Injects the DocumentRetriever to use for content retrieval
Builds a custom CustomContentInjector with a PromptTemplate that splits out the findings for the LLM.

Create src/main/java/com/ibm/ai/RetrievalAugmentorSupplier.java:

package com.ibm.ai;

import static java.util.Arrays.asList;

import java.util.List;
import java.util.function.Supplier;

import com.ibm.retrieval.DocumentRetriever;

import dev.langchain4j.model.input.PromptTemplate;
import dev.langchain4j.rag.DefaultRetrievalAugmentor;
import dev.langchain4j.rag.RetrievalAugmentor;
import dev.langchain4j.rag.content.injector.ContentInjector;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

@ApplicationScoped
public class RetrievalAugmentorSupplier implements Supplier<RetrievalAugmentor> {

    @Inject
    DocumentRetriever documentRetriever;

    @Override
    public RetrievalAugmentor get() {

        PromptTemplate promptTemplate = PromptTemplate.from(
                """
                        {{userMessage}}

                        Answer using the following information:
                        {{contents}}

                        When citing sources, use the Document Information provided with each content block.
                        Format citations as: [Document: doc_id, Page: page_number]""");

        List<String> metadataKeys = asList(
                "doc_id",
                "page_number",
                "document_type",
                "file_name",
                "retrieval_method",
                "similarity_score",
                "retrieval_timestamp");

        ContentInjector contentInjector = new CustomContentInjector(promptTemplate, metadataKeys);

        return DefaultRetrievalAugmentor.builder()
                .contentRetriever(documentRetriever)
                .contentInjector(contentInjector)
                .build();
    }
}

Next, we need to create the custom ContentInjector that extends DefaultContentInjector to control metadata formatting. The PromptTemplate only receives {{userMessage}} and {{contents}}; metadata formatting happens in the format() method of the CustomContentInjector.

Create src/main/java/com/ibm/ai/CustomContentInjector.java:

package com.ibm.ai;

import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

import dev.langchain4j.data.document.Metadata;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.input.PromptTemplate;
import dev.langchain4j.rag.content.Content;
import dev.langchain4j.rag.content.injector.DefaultContentInjector;

/**
 * Custom ContentInjector that provides enhanced metadata formatting
 * with better structure for document citations and retrieval information.
 */
public class CustomContentInjector extends DefaultContentInjector {

    private final List<String> metadataKeysToInclude;

    public CustomContentInjector(PromptTemplate promptTemplate, List<String> metadataKeysToInclude) {
        super(promptTemplate, metadataKeysToInclude);
        this.metadataKeysToInclude = metadataKeysToInclude;
    }

    @Override
    protected String format(List<Content> contents) {
        if (contents.isEmpty()) {
            return "";
        }

        // Format each content block with clear separation and numbering
        String separator = "\n\n" + "=".repeat(80) + "\n\n";
        return IntStream.range(0, contents.size())
                .mapToObj(i -> formatContentBlock(contents.get(i), i + 1))
                .collect(Collectors.joining(separator));
    }

    /**
     * Formats a single content block with metadata at the top for clarity.
     */
    private String formatContentBlock(Content content, int blockNumber) {
        TextSegment segment = content.textSegment();
        Metadata metadata = segment.metadata();

        if (metadataKeysToInclude == null || metadataKeysToInclude.isEmpty()) {
            return String.format("--- Content Block %d ---\n%s", blockNumber, segment.text());
        }

        StringBuilder formatted = new StringBuilder();

        // Add block number and separator
        formatted.append("--- Content Block ").append(blockNumber).append(" ---\n");

        // Format document metadata FIRST (for citations) - most important
        String docMetadata = formatDocumentMetadata(metadata);
        if (!docMetadata.isEmpty()) {
            formatted.append("Document Information:\n");
            formatted.append(docMetadata);
            formatted.append("\n");
        }

        // Add the content text
        formatted.append("\nContent:\n");
        formatted.append(segment.text());

        // Format retrieval metadata (for transparency) - less prominent
        String retrievalMetadata = formatRetrievalMetadata(metadata);
        if (!retrievalMetadata.isEmpty()) {
            formatted.append("\n\nRetrieval Information:\n");
            formatted.append(retrievalMetadata);
        }

        return formatted.toString();
    }

    /**
     * Formats document-related metadata (doc_id, page_number, document_type, file_name)
     * in a structured way for citations.
     */
    private String formatDocumentMetadata(Metadata metadata) {
        StringBuilder docInfo = new StringBuilder();

        // Document identification
        if (metadataKeysToInclude.contains("doc_id") && metadata.getString("doc_id") != null) {
            docInfo.append("  Document ID: ").append(metadata.getString("doc_id"));
        }

        if (metadataKeysToInclude.contains("file_name") && metadata.getString("file_name") != null) {
            if (docInfo.length() > 0) docInfo.append("\n");
            docInfo.append("  File Name: ").append(metadata.getString("file_name"));
        }

        if (metadataKeysToInclude.contains("page_number") && metadata.getString("page_number") != null) {
            if (docInfo.length() > 0) docInfo.append("\n");
            docInfo.append("  Page: ").append(metadata.getString("page_number"));
        }

        if (metadataKeysToInclude.contains("document_type") && metadata.getString("document_type") != null) {
            if (docInfo.length() > 0) docInfo.append("\n");
            docInfo.append("  Type: ").append(metadata.getString("document_type"));
        }

        return docInfo.toString();
    }

    /**
     * Formats retrieval-related metadata (retrieval_method, similarity_score, retrieval_timestamp)
     * for transparency about how the content was found.
     */
    private String formatRetrievalMetadata(Metadata metadata) {
        StringBuilder retrievalInfo = new StringBuilder();

        if (metadataKeysToInclude.contains("retrieval_method") && metadata.getString("retrieval_method") != null) {
            retrievalInfo.append("  Method: ").append(metadata.getString("retrieval_method"));
        }

        if (metadataKeysToInclude.contains("similarity_score") && metadata.getString("similarity_score") != null) {
            if (retrievalInfo.length() > 0) retrievalInfo.append("\n");
            retrievalInfo.append("  Similarity Score: ").append(metadata.getString("similarity_score"));
        }

        if (metadataKeysToInclude.contains("retrieval_timestamp") && metadata.getString("retrieval_timestamp") != null) {
            if (retrievalInfo.length() > 0) retrievalInfo.append("\n");
            retrievalInfo.append("  Retrieved At: ").append(metadata.getString("retrieval_timestamp"));
        }

        return retrievalInfo.toString();
    }
}

Now, we control the exact format of the content that is injected from retrieval. For each document we hand over a structure similar to this:

--- Content Block 1 ---
Document Information:
  Document ID: filename.pdf
  File Name: filename.pdf
  Page: 1
  Type: PDF

Content:
[content text]

Retrieval Information:
  Method: vector_search
  Similarity Score: 0.85
  Retrieved At: 2024-01-09T...

AI service interface

Now, it's' time to implement the AI service itself.

The RegulatoryChangeImpactAssistant AI service includes these components:

@RegisterAiService: Quarkus extension that generates the implementation.
retrievalAugmentor: Specifies which retrieval strategy to use (vector search in this case).
@SystemMessage: Defines the assistant's role, scope, and response format.
@UserMessage: Marks the user's question parameter.

Create src/main/java/com/ibm/ai/RegulatoryChangeImpactAssistant.java:

package com.ibm.ai;

import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import io.quarkiverse.langchain4j.RegisterAiService;

@RegisterAiService(retrievalAugmentor = RetrievalAugmentorSupplier.class)
public interface RegulatoryChangeImpactAssistant {

    @SystemMessage("""
            You are a specialized Regulatory Change Impact Assistant for financial institutions.

            Your responsibilities:
            - Answer questions about regulatory changes and compliance requirements from Regulatory Change Bulletins
            - Analyze the impact of regulatory bulletins on business processes and operations
            - Provide guidance on compliance obligations, deadlines, and required actions
            - Explain complex regulatory language in clear, actionable terms
            - Identify cross-references to other regulations and related requirements

            Response format:
            1. Direct answer with regulatory context
            2. Supporting evidence from regulatory bulletins (with citations including document ID and page number)
            3. Impact assessment and compliance recommendations
            4. Relevant deadlines or effective dates if mentioned

            Important guidelines:
            - Always cite your sources using the format: [Document: doc_id, Page: page_number]
            - If you cannot find relevant information in the provided bulletins, clearly state that
            - Refuse questions about topics outside regulatory compliance (e.g., general business advice, product recommendations)
            - Be precise about regulatory requirements and avoid speculation
            - Highlight any jurisdiction-specific requirements or exceptions
            """)
    String chat(@UserMessage String userQuestion);
}

Exercise 1

Let's review the system message. Notice how it:

Defines allowed topics (regulatory changes, compliance requirements, impact assessment)
Sets strict boundaries (what to refuse)
Provides guidance on analyzing regulatory bulletins
Structures response format with citations

REST endpoint

The RegulatoryChangeResource REST endpoint:

Does not require authentication (it's a demo application)
Handles dependency injection of the AI service
Is a simple GET endpoint with query parameter
Has basic input validation

Create src/main/java/com/ibm/api/RegulatoryChangeResource.java:

package com.ibm.api;

import com.ibm.ai.RegulatoryChangeImpactAssistant;

import jakarta.inject.Inject;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.Produces;
import jakarta.ws.rs.QueryParam;
import jakarta.ws.rs.core.MediaType;

@Path("/assistant")
public class RegulatoryChangeResource {

    @Inject
    RegulatoryChangeImpactAssistant assistant;

    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public ChatResponse ask(@QueryParam("q") String question) {
        if (question == null || question.trim().isEmpty()) {
            return new ChatResponse("Please provide a question about regulatory changes.");
        }

        String answer = assistant.chat(question);
        return new ChatResponse(answer);
    }

    public static record ChatResponse(String answer) {
    }
}

Exercise 2

Try modifying the endpoint to accept POST requests with a JSON body instead of query parameters.

Vector search retriever

The DocumentRetriever:

Implements ContentRetriever interface from LangChain4j
Converts query text to an embedding vector using the embedding model
Searches embedding store for similar documents
Filters by minimum similarity score (0.7)
Enriches results with metadata (retrieval method, similarity score, timestamp)

Create src/main/java/com/ibm/retrieval/DocumentRetriever.java:

package com.ibm.retrieval;

import java.time.Instant;
import java.util.List;
import java.util.stream.Collectors;

import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.rag.content.Content;
import dev.langchain4j.rag.query.Query;
import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingSearchResult;
import dev.langchain4j.store.embedding.EmbeddingStore;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

@ApplicationScoped
public class DocumentRetriever implements dev.langchain4j.rag.content.retriever.ContentRetriever {

    @Inject
    EmbeddingModel embeddingModel;

    @Inject
    EmbeddingStore<TextSegment> embeddingStore;

    private static final int MAX_RESULTS = 5;
    private static final double MIN_SCORE = 0.7;

    @Override
    public List<Content> retrieve(Query query) {
        String queryText = query.text();
        // Convert query text to embedding vector
        Embedding queryEmbedding = embeddingModel.embed(queryText).content();

        // Search embedding store for similar documents
        EmbeddingSearchRequest searchRequest = EmbeddingSearchRequest.builder()
                .queryEmbedding(queryEmbedding)
                .maxResults(MAX_RESULTS)
                .minScore(MIN_SCORE)
                .build();

        EmbeddingSearchResult<TextSegment> results = embeddingStore.search(searchRequest);

        // Enrich with metadata and return
        return results.matches().stream()
                .map(match -> {
                    TextSegment segment = match.embedded();

                    // Add retrieval metadata
                    dev.langchain4j.data.document.Metadata enriched = dev.langchain4j.data.document.Metadata
                            .from(segment.metadata().toMap());
                    enriched.put("retrieval_method", "vector_search");
                    enriched.put("similarity_score", String.valueOf(match.score()));
                    enriched.put("retrieval_timestamp", Instant.now().toString());

                    TextSegment enrichedSegment = TextSegment.from(segment.text(), enriched);
                    return Content.from(enrichedSegment.text());
                })
                .collect(Collectors.toList());
    }
}

Design Decisions for the DocumentRetriever

Why page-level segmentation? Regulatory bulletins often reference specific pages, and compliance teams need to verify information at the source. Page-level granularity enables precise citations while keeping context intact.
Why the similarity score threshold of 0.7? This threshold balances relevance with recall. Lower values (0.5-0.6) return more results but might include irrelevant content. Higher values (0.8-0.9) ensure high precision but might miss relevant documents. For regulatory work, precision is critical, so 0.7 provides a good balance.
Why max 5 results? Regulatory questions often benefit from multiple perspectives, but too many results can confuse the LLM or include irrelevant information. Five segments typically provide sufficient context without overwhelming the model's context window.
Why enrich metadata at retrieval time? Adding retrieval metadata (similarity score, timestamp) at query time rather than ingestion time allows you to track how well each query performed and enables future analytics on retrieval quality.

Exercise 3

Modify MAX_RESULTS and MIN_SCORE constants to see how it affects retrieval quality.

Document ingestion

The DocumentLoader:

The @PostConstruct runs on application startup after dependency injection
Processes all Regulatory Change Bulletins in src/main/resources/documents
Supports PDF, DOCX, and HTML file formats (filtered by extension)
Uses Docling to extract page-level content
Generates embeddings and stores in pgvector
Includes comprehensive logging and error handling

Create src/main/java/com/ibm/ingest/DocumentLoader.java:

package com.ibm.ingest;

import java.io.File;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.Set;
import java.util.stream.Stream;

import org.jboss.logging.Logger;

import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStore;
import io.quarkus.runtime.Startup;
import jakarta.annotation.PostConstruct;
import jakarta.inject.Inject;
import jakarta.inject.Singleton;

@Startup
@Singleton
public class DocumentLoader {

    private static final Logger LOG = Logger.getLogger(DocumentLoader.class.getName());
    private static final Set<String> SUPPORTED_EXTENSIONS = Set.of(".pdf", ".docx", ".html");

    @Inject
    DoclingConverter doclingConverter;

    @Inject
    EmbeddingModel embeddingModel;

    @Inject
    EmbeddingStore<TextSegment> embeddingStore;

    @PostConstruct
    void loadDocuments() {
        try {
            Path documentsPath = Paths.get("src/main/resources/documents");

            if (!Files.exists(documentsPath)) {
                LOG.warnf("Documents directory not found: " + documentsPath);
                return;
            }

            int successCount = 0;
            int failureCount = 0;
            int skippedCount = 0;
            int totalPages = 0;

            try (Stream<Path> paths = Files.list(documentsPath)) {
                for (Path filePath : paths.toList()) {
                    if (!Files.isRegularFile(filePath)) {
                        continue;
                    }

                    String fileName = filePath.getFileName().toString();
                    String extension = fileName.substring(fileName.lastIndexOf('.')).toLowerCase();

                    if (!SUPPORTED_EXTENSIONS.contains(extension)) {
                        LOG.info("Skipping unsupported file: " + fileName);
                        skippedCount++;
                        continue;
                    }

                    try {
                        LOG.info("Processing document: " + fileName);
                        File sourceFile = filePath.toFile();

                        // Extract pages using Docling
                        List<TextSegment> segments = doclingConverter.extractPages(sourceFile);

                        // Generate embeddings and store
                        for (TextSegment segment : segments) {
                            Embedding embedding = embeddingModel.embed(segment).content();
                            embeddingStore.add(embedding, segment);
                            totalPages++;
                        }

                        successCount++;
                        LOG.info("Successfully processed: " + fileName + " (" + segments.size() + " pages)");
                    } catch (Exception e) {
                        LOG.errorf("Failed to process document: " + fileName + " - " + e.getMessage());
                        e.printStackTrace();
                        failureCount++;
                    }
                }
            }

            LOG.info(String.format(
                    "Document loading completed. Success: %d, Failures: %d, Skipped: %d, Total pages: %d",
                    successCount, failureCount, skippedCount, totalPages));
        } catch (Exception e) {
            LOG.errorf("Error loading documents: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Exercise 4

Add support for processing documents from a remote URL or S3 bucket.

Document conversion

The DoclingConverter:

Uses Docling's hybrid chunking for better document understanding
Preserves page-level metadata for regulatory bulletins
Handles multi-page documents (PDF, DOCX, HTML)
Extracts document metadata (document type, file name, page number)
Groups chunks by page number to maintain document structure

Create src/main/java/com/ibm/ingest/DoclingConverter.java:

package com.ibm.ingest;

import java.io.File;
import java.io.IOException;
import java.nio.file.Path;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;

import ai.docling.serve.api.chunk.response.Chunk;
import ai.docling.serve.api.chunk.response.ChunkDocumentResponse;
import ai.docling.serve.api.convert.request.options.OutputFormat;
import dev.langchain4j.data.document.Metadata;
import dev.langchain4j.data.segment.TextSegment;
import io.quarkiverse.docling.runtime.client.DoclingService;
import jakarta.enterprise.context.ApplicationScoped;
import jakarta.inject.Inject;

@ApplicationScoped
public class DoclingConverter {

    @Inject
    DoclingService doclingService;

    public List<TextSegment> extractPages(File sourceFile) throws IOException {
        Path filePath = sourceFile.toPath();
        String fileName = sourceFile.getName();

        // Use Docling to chunk the document
        ChunkDocumentResponse chunkResponse = doclingService.chunkFileHybrid(filePath, OutputFormat.MARKDOWN);

        List<Chunk> chunks = chunkResponse.getChunks();

        // Group chunks by page number
        Map<Integer, StringBuilder> pageTextMap = new HashMap<>();
        for (Chunk chunk : chunks) {
            List<Integer> pageNumbers = chunk.getPageNumbers();
            if (pageNumbers != null && !pageNumbers.isEmpty()) {
                for (Integer pageNumber : pageNumbers) {
                    pageTextMap.computeIfAbsent(pageNumber, k -> new StringBuilder())
                            .append(chunk.getText())
                            .append("\n\n");
                }
            } else {
                // If no page numbers, assign to page 1
                pageTextMap.computeIfAbsent(1, k -> new StringBuilder())
                        .append(chunk.getText())
                        .append("\n\n");
            }
        }

        // Create TextSegments with metadata
        return pageTextMap.entrySet().stream()
                .map(entry -> {
                    int pageNumber = entry.getKey();
                    String text = entry.getValue().toString().trim();

                    // Extract file extension to determine document type
                    String extension = fileName.substring(fileName.lastIndexOf('.') + 1).toLowerCase();
                    String documentType = switch (extension) {
                        case "pdf" -> "PDF";
                        case "docx" -> "DOCX";
                        case "html", "htm" -> "HTML";
                        default -> "UNKNOWN";
                    };

                    Metadata metadata = Metadata.from(Map.of(
                            "doc_id", fileName,
                            "page_number", String.valueOf(pageNumber),
                            "document_type", documentType,
                            "file_name", fileName));

                    return TextSegment.from(text, metadata);
                })
                .collect(Collectors.toList());
    }
}

Running the application

Now that we understand the code for the demo application, let's run it.

1. Prepare the documents to ingest

You can basically use any form of PDF, .docx, or HTML files. Because the prompt is designed to be a regulatory change bulletin answer machine, it makes sense to use real data. For the demo, we have picked three examples. Download them and place them in /src/main/resources/documents.

2. Start Ollama (if not using Dev Services)

ollama serve

In another terminal, verify that the models are available:

ollama list

3. Run the application

./mvnw quarkus:dev

The application will:

Start on http://localhost:8080.
Automatically start Docling and PostgreSQL Dev Services (Podman or Docker container).
Load documents from src/main/resources/documents.
Create database tables and indexes.

Here's what's happening behind the scenes. When you run ./mvnw quarkus:dev, Quarkus does the following:

Starts Dev Services: If Podman/Docker is available, Quarkus automatically:
- Starts a PostgreSQL container with pgvector extension
- Starts a Docling service container for document processing
- Configures the application to connect to these services
- Tears down the containers when you stop the application
Initializes the Database:
- Creates the pgvector extension
- Sets up embedding storage tables
- Creates vector indexes for fast similarity search
Loads Documents:
- DocumentLoader runs on startup via @PostConstruct
- Each document is processed sequentially
- Embeddings are generated that use the Granite embedding model
- Progress is logged for each document
Starts the Application:
- REST endpoints become available at http://localhost:8080
- The AI service is ready to process queries
- Hot reload is enabled for development

The first startup might take several minutes depending on:

Number and size of documents
Speed of embedding generation
Whether models need to be downloaded
System resources (CPU, memory)

The database is recreated each time in development mode (drop-and-create). For production, you will need a separate document ingestion pipeline.

4. Verify startup

Check the logs for:

Processing document: regulatory-notice2026-01-09 08:59:18,564 INFO  [com.ibm.ingest.Successfully processed: regulatory-notice-24-01.pdf (3 pages)
Processing document: 33-11216-fact-sheet.pdf
Successfully processed: 33-11216-fact-sheet.pdf (2 pages)
Processing document: ps23-4.pdf
Successfully processed: ps23-4.pdf (65 pages)

Document loading completed. Success: 3, Failures: 0, Skipped: 0, Total pages: 70

Testing the application

Now that the application is running, let's test it is working as expected.

1. Test the REST endpoint

curl "http://localhost:8080/assistant?q=Briefly+list+the+5+most+important+regulatory+changes+in+the+latest+bulletin."

2. Test regulatory impact analysis

curl "http://localhost:8080/assistant?q=What+is+the+impact+of+the+new+data+privacy+regulation+on+our+compliance+requirements?"

3. Test document-specific queries

curl "http://localhost:8080/assistant?q=What+does+the+SEC+bulletin+say+about+disclosure+requirements?"

Interpreting the results

When you test the application, here's what to look for:

Good responses should:

Directly answer the question with regulatory context
Include citations in the format [Document: filename.pdf, Page: X]
Reference-specific requirements, deadlines, or obligations
Acknowledge when information isn't found in the bulletins
Provide actionable compliance guidance

Signs of quality retrieval:

Citations match the question's topic
Multiple relevant documents are cited (if applicable)
Page numbers are accurate and the content is relevant
The LLM synthesizes information rather than just copying text

Common Issues:

No citations: Check whether documents were loaded successfully
Irrelevant citations: The similarity threshold might be too low
Generic answers: The retrieved content might lack specificity
Missing information: The question might require documents that were not yet ingested

Testing Strategy:

Start with broad questions to test general retrieval.
Progress to specific regulatory requirements.
Test edge cases (questions about topics not in documents).
Verify citation accuracy by checking source documents.
Test with different document types (PDF, DOCX, HTML).

Best practices when building RAG-based assistants

Building an effective RAG-based assistant requires careful attention to document quality, query design, and system performance. The following best practices will help you optimize your assistant's accuracy, reliability, and user experience. These guidelines cover the entire lifecycle from document preparation through query optimization and performance tuning.

Document preparation

Proper document preparation is the foundation of accurate retrieval. Well-organized, high-quality documents with descriptive naming conventions ensure that your RAG system can effectively index and retrieve relevant information.

File naming:

Use descriptive filenames: SEC-2024-01-disclosure-requirements.pdf
Include dates: FINRA-2024-03-15-trading-rules.pdf
Avoid special characters that might cause issues

Document quality:

Use text-based PDFs (not scanned images) when possible
Ensure that documents are complete and not corrupted
Remove password protection before ingestion
For HTML files, ensure they're well formed

Organization:

Keep documents in src/main/resources/documents
Organize by regulator or date if needed
Remove outdated documents to avoid confusion

Query optimization

The quality of responses from your RAG assistant depends heavily on how questions are formulated. Well-crafted queries that are specific, contextual, and appropriately scoped will yield more accurate and actionable results.

Effective questions:

Be specific: "What are the SEC disclosure requirements for Q1 2024?" vs. "Tell me about SEC"
Include context: "What data retention requirements apply to broker-dealers?"
Ask for impact: "How does the new FINRA rule affect existing compliance procedures?"

Avoid:

Overly broad questions that return too many results
Questions about topics not in your document set
Multi-part questions (ask one at a time for better results)

Performance considerations

Understanding the performance characteristics of your RAG system helps you optimize resource usage and response times. Consider these factors when you scale your application or working with large document collections.

Embedding generation:

Document processing happens at startup, so the first load takes time
Consider processing documents in batches for large collections
Monitor memory usage with many large documents

Query performance:

Vector search is fast with proper indexing
Response time depends on LLM generation speed
Consider caching frequent queries in production

Database optimization:

pgvector indexes (HNSW) are created automatically
Monitor index size as document count grows
Consider archiving old documents to maintain performance

Extending the application

The basic RAG assistant provides a solid foundation, but you can enhance it with additional features to better serve specific use cases. The following extensions demonstrate how to add document filtering, improve metadata handling, customize AI behavior, implement streaming responses, optimize performance through caching, and add observability. Each extension builds on the core architecture while maintaining the application's modularity and maintainability. Think about them as advanced exercises that you could do if you want to push the boundaries of the basic application.

Add document type filtering

Allow users to narrow their searches to specific document types (for example, SEC bulletins, FINRA notices, internal policies). This improves retrieval precision when dealing with diverse document collections.

Enhance DocumentRetriever to filter by document type:

@ApplicationScoped
public class DocumentRetriever implements ContentRetriever {
    @Override
    public List<Content> retrieve(Query query) {
        // Extract document type from query if specified
        String docType = extractDocumentType(query.text());

        EmbeddingSearchRequest.Builder requestBuilder = EmbeddingSearchRequest.builder()
            .queryEmbedding(queryEmbedding)
            .maxResults(5)
            .minScore(0.7);

        // Add metadata filter if document type specified
        if (docType != null) {
            requestBuilder.metadataFilter(metadata -> 
                docType.equals(metadata.get("document_type"))
            );
        }

        // Perform search...
    }
}

Add regulation type metadata

Enrich document metadata with regulation classifications (e.g., "data privacy", "trading rules", "disclosure requirements"). This enables more sophisticated filtering and helps users find relevant regulatory guidance faster.

Enhance DoclingConverter to extract regulation types:

public List<TextSegment> extractPages(File sourceFile) throws IOException {
    // Extract regulation type from document
    String regulationType = extractRegulationType(sourceFile);

    // Create TextSegments with regulation metadata
    return pageTextMap.entrySet().stream()
        .map(entry -> {
            Metadata metadata = Metadata.from(Map.of(
                "doc_id", fileName,
                "page_number", String.valueOf(pageNumber),
                "regulation_type", regulationType
            ));
            return TextSegment.from(text, metadata);
        })
        .collect(Collectors.toList());
}

Customize the system prompt

Tailor the AI assistant's behavior, tone, and response format to match your organization's specific needs and compliance requirements. A well-crafted system prompt ensures consistent, professional responses aligned with your business context.

Edit RegulatoryChangeImpactAssistant.java:

@SystemMessage("""
    You are a specialized Regulatory Change Impact Assistant for [YOUR ORGANIZATION].

    Your responsibilities:
    - Answer questions about regulatory changes and compliance requirements
    - Analyze the impact of regulatory bulletins on business processes
    - Provide guidance on compliance obligations
    - Refuse questions about topics outside regulatory compliance

    Response format:
    1. Direct answer with regulatory context
    2. Supporting evidence from regulatory bulletins (with citations)
    3. Impact assessment and compliance recommendations
    """)

Add response streaming

Implement real-time response streaming to provide immediate feedback to users as the AI generates answers. This significantly improves perceived performance and user experience, especially for complex queries that require longer processing times.

Modify RegulatoryChangeResource to support Server-Sent Events:

@GET
@Produces(MediaType.SERVER_SENT_EVENTS)
public Multi<String> askStreaming(@QueryParam("q") String question) {
    return Multi.createFrom().emitter(emitter -> {
        bot.chatStreaming(question, chunk -> {
            emitter.emit(chunk);
        });
        emitter.complete();
    });
}

Add monitoring and metrics

Gain visibility into your RAG system's performance with metrics tracking. Monitor retrieval times, embedding generation speed, and query patterns to identify bottlenecks and optimize system performance.

Use Micrometer to track retrieval performance:

@Inject MeterRegistry registry;

public List<Content> retrieve(Query query) {
    Timer.Sample sample = Timer.start(registry);
    try {
        // Perform retrieval
        return results;
    } finally {
        sample.stop(registry.timer("retrieval.duration", "type", "vector"));
    }
}

Add query logging and analytics

Build intelligence about user behavior and system effectiveness by logging queries and their results. This data helps identify knowledge gaps, improve document coverage, and refine retrieval strategies over time.

Track which questions are asked and how well the system responds:

@ApplicationScoped
public class QueryAnalytics {

    public void logQuery(String question, List<Content> retrieved, String answer) {
        QueryLog log = new QueryLog();
        log.question = question;
        log.retrievedCount = retrieved.size();
        log.avgSimilarity = calculateAvgSimilarity(retrieved);
        log.timestamp = Instant.now();
        log.persist();
    }
}

Add document expiration tracking

Ensure compliance accuracy by tracking document lifecycle events. Automatically identify outdated regulations and prevent the assistant from citing expired or superseded bulletins in its responses.

Track when regulatory bulletins expire or are superseded:

@Entity
public class RegulatoryBulletin extends PanacheEntity {
    public String docId;
    public LocalDate effectiveDate;
    public LocalDate expirationDate;
    public boolean superseded;
    public String supersededBy;

    public static List<RegulatoryBulletin> findActive(LocalDate date) {
        return find("effectiveDate <= ?1 AND (expirationDate IS NULL OR expirationDate >= ?1) AND superseded = false", date).list();
    }
}

Implement query rewriting

Enhance retrieval recall by automatically expanding user queries with domain-specific synonyms and related terminology. This helps capture relevant documents even when users phrase questions differently than the source material.

Improve retrieval by expanding queries with synonyms or related terms:

public class QueryRewriter {
    private static final Map<String, List<String>> REGULATORY_SYNONYMS = Map.of(
        "disclosure", List.of("reporting", "filing", "notification"),
        "retention", List.of("storage", "maintenance", "preservation")
    );

    public String expandQuery(String original) {
        // Add synonyms to improve retrieval
        return enhancedQuery;
    }
}

Conclusion

You've now learned how to build an enterprise-grade RAG application for regulatory compliance with:

Vector search over regulatory document embeddings using pgvector
Metadata enrichment for traceability and citation
Document processing with page-level granularity for Regulatory Change Bulletins (PDF, DOCX, HTML)
Production-ready architecture with Quarkus
Custom content injection for structured, citable responses

Key takeaways

RAG enables precise, cited answers - Every response can be traced back to source documents
Embeddings capture semantic meaning - Find relevant content even when wording differs
Metadata is crucial - Enables citations, filtering, and audit trails
Quarkus simplifies integration - Dev Services handle infrastructure automatically
Local models provide privacy - Granite models via Ollama keep data on-premises

Real-world applications

This foundation can be extended to support:

Impact Assessment: Analyze how new regulations affect existing processes
Compliance Monitoring: Track regulatory changes and alert relevant teams
Audit Preparation: Quickly find and cite relevant regulations for audits
Training: Help new compliance officers understand regulatory requirements
Multi-Regulator Support: Extend to handle regulations from multiple jurisdictions

Production considerations

Before you deploy this assistant to production:

Add authentication and authorization
Implement rate limiting
Set up monitoring and alerting
Configure proper database backups
Add document versioning and update mechanisms
Implement query logging and analytics
Set up CI/CD pipelines
Add comprehensive error handling
Configure logging and observability

Continue learning

Explore advanced RAG techniques: re-ranking, query expansion, multi-hop retrieval
Study prompt engineering for better LLM responses
Quarkus beginner's guide: Quarkus for the Impatient
Quarkus Basics learning path
Learn more about the IBM Enterprise build of Quarkus

More community resources are available on my Substack, The Main Thread.