Accelerating Gen AI on IBM Cloud: Deployable Architectures

When building and delivering generative AI (Gen AI) solutions, clients need to consider not only the AI aspects but also the infrastructure, data layers, and non-functional requirements of a complete enterprise solution. These include security, compliance, and manageability.

Deployable Architectures (DAs) can address these needs. DAs are automated reference architectures and solutions that allow clients to use IBM Cloud for faster deployment, enhanced security, continuous compliance, and seamless integration.

Retrieval Augmented Generation Pattern (RAG DA)

The Retrieval Augmented Generation Pattern (RAG) DA is available on the IBM Cloud catalog. It accelerates Gen AI solutions based on watsonx on IBM Cloud by significantly reducing deployment effort from weeks to hours, lowering costs and risks of misconfiguration, and accelerating the discovery of security vulnerabilities. With the RAG DA you can utilize data from your enterprise to achieve productivity gains in activities related to question/answer conversations, content search, summarization, and generation. The RAG pattern can be deployed in multiple configurations and is applicable to various industry use cases and solutions.

Using the RAG DA, you can quickly set up a watsonx solution on IBM Cloud, providing a secure and trusted environment to deploy and govern generative AI applications throughout their lifecycle. The most recent release (V2) includes many enhancements to support multiple RAG scenarios, including:

Support for watsonx services deployment (watsonx.data, watsonx Orchestrate) in addition to watsonx.ai, watsonx.gov, watsonx Assistant.
Enablement for ICD Elastic and watsonx.data Milvus as Vector dbs
watsonx Assistant Conversational Search + RAG with ICD Elastic, or Watson Discovery
watsonx.ai chat with documents using ICD Elasticsearch and Milvus
Code Engine or IBM Cloud OpenShift cluster for Gen AI App workload
New/updates IBM Cloud surround DAs: Cloud Logs, Event Notifications, etc.
Enhanced automated deployment, flexibility, and configuration.
Improved security and compliance posture with SCC AI Guardrails V2 Profile

This tutorial outlines the steps needed to deploy the Retrieval Augmented Generation Pattern for watsonx on IBM Cloud Deployable Architecture (DA).

The following sections also provide an overview of key capabilities included in the solution, such as DevSecOps toolchains, Code Engine workload, launching and interacting with the sample application, and using the Security and Compliance Center (SCC) dashboard for monitoring the application for compliance and security. For a deeper understanding of the deployed architecture, see the reference architecture.

Deployment of the Deployable Architecture

Before You Begin

Before deploying, ensure that you have completed the prerequisite configuration steps detailed in the GitHub repository.

Steps

Navigate to the Catalog in your Cloud account.
Change the catalog to Community registry. Filter for Deployable Architecture on the left-hand side under Type.
Select Retrieval Augmented Generation (RAG) Pattern. You can also refer to the RAG DA Tile directly here.

Select the variation that is most appropriate for your scenario.

Variation - Basic (Deploy on Code Engine)

This variation enables:

Code Engine for containerized and serverless workloads
Elasticsearch Enterprise for building and storing dense vector indexes or keyword search indexes
watsonx.ai in-memory vector store for RAG trial and exploration
watsonx.ai UI to upload documents
watsonx.ai Prompt Lab for inferencing and Prompt Templates
watsonx Assistant Conversational Search with embedded LLM
Build your own data processing, ingestion pipeline, and indexes

Variation - Standard (Deploy on Red Hat OpenShift)

This variation enables:

Red Hat OpenShift cluster for microservices workloads
Elasticsearch Platinum for building and storing sparse vectors, dense vector indexes, or keyword search indexes
- watsonx.ai use of Elasticsearch ELSER2 vector index for RAG
- watsonx Assistant Conversational Search with UI features for uploading documents to create or use Elasticsearch ELSER2 vector index for RAG
watsonx.ai in-memory vector store for RAG trial and exploration
watsonx.ai UI to upload documents
watsonx.ai Prompt Lab for inferencing and Prompt Templates
watsonx Assistant Conversational Search with embedded LLM
Build your own data processing, ingestion pipeline, and indexes

Note: The following steps are valid for the Basic Variation with Code Engine. Additional tutorials will be provided for the Standard variation as well as other configurations for Knowledge Base/Vector db with Watson Discovery and watsonx.data Milvus.

When the stack is successfully deployed, open the Navigation Menu, hover over Code Engine, and select Projects.
Select your project.
Select Applications, then select Open URL.
Interact with the chatbot.

Overview of key elements in RAG DA

The RAG DA comprises multiple components, providing a comprehensive generative AI enterprise solution on IBM Cloud.

RAG DA

The top layer features a RAG sample application and its dependencies, showcasing the complete stack. In a client deployment, this sample application would be replaced by the client’s Gen AI application. It also includes Elasticsearch, which can be used as the Vector db on the RAG pattern.
The middle layer includes the watsonx SaaS components on IBM Cloud.
The bottom layer consists of essential IBM Cloud components crucial for a best practices enterprise solution on the cloud. These components ensure core security and observability are established out of the box with foundational cloud services, providing end-to-end security for data at rest and in transit, as well as logging and monitoring for audit purposes. The core security services include:
- IBM Key Protect: Encrypts data.
- IBM Secrets Manager: Stores credentials and certificates.
- IBM Security and Compliance Center: Scans and monitors the environment for compliance gaps.

Toolchains

The ALM: DevSecOps Application Lifecycle Management framework includes Continuous Integration (CI), Continuous Deployment (CD), and Continuous Compliance (CC) pipelines. These pipelines are important components in modern software deployment, facilitating seamless application deployment and ensuring security and compliance standards are met throughout the development process.

Within this framework, several compliance features are instrumental in maintaining the integrity and security of the application:

Vulnerability scans: Identify and resolve security issues preemptively through comprehensive codebase scans.
Signed build artifacts: Ensure code integrity by digitally signing executable artifacts, guarding against unauthorized alterations.
Evidence gathering: Collect and store evidence like commit logs for traceability and accountability.
Evidence locker: Securely store critical development data in a centralized repository for streamlined auditing and tracking code evolution.

IBM Code Engine

Deploying the Gen AI application on IBM Code Engine is one of the options with the RAG DA. It is best suited for containerized and serverless workloads. IBM Code Engine is a fully managed, serverless platform that enables developers to build, deploy, and scale containerized applications and batch jobs without managing the underlying infrastructure. It automatically handles tasks such as scaling, monitoring, and maintenance, allowing developers to focus on writing code. IBM Code Engine supports multiple programming languages and containerization technologies, making it versatile for various application development needs.

Red Hat OpenShift

Deploying the Gen AI application on Red Hat OpenShift is another option with the RAG DA, which is best suited for microservices based workloads.

Red Hat OpenShift can provide a Red Hat OpenShift cluster. Red Hat OpenShift on IBM Cloud offers a fully managed Kubernetes platform that lets you easily build and run containerized applications on IBM's infrastructure. This solution combines a smooth user interface, inherent security features, and sophisticated management tools to ensure high availability and seamless deployment of your applications within the public cloud environment.

Sample Gen AI application: RAG Pattern

This sample Gen AI application provides an example customer scenario: a generative AI application for customer care which can be used with different client data for Bank Loans, Insurance Policies (both sample data included in the DA) or with any other client data (once uploaded to the Vector db).

In the Bank Loan sample, imagine you are a prospective customer looking for a loan. You visit the bank’s website and start asking questions to the bank’s virtual agent. Depending on whether your question is general or pertains to a specific loan product offered by the bank, the responses are generated using a generative AI foundation model and the bank’s own data.

The sample application uses the Retrieval Augmented Generation (RAG) pattern. First, an AI-powered natural language search query (semantic search) retrieves relevant content from the bank’s data. Then, a foundation model processes prompts to generate a summary response from the retrieved content. The bank’s content, such as FAQs and blog knowledge-base, is ingested, indexed, and stored in IBM Watson Discovery. Descriptions of the bank’s loan products are used in the IBM Cloud watsonx Granite foundation model prompts.

alt

Here are some example questions to ask the virtual agent:

What is a conventional loan?
What is ARM?
Is ARM a good choice?
Which loan should I get for an expensive property?
How much does it cost to get a bank loan?

Security and Compliance Center (SCC)

The Security and Compliance Center (SCC) enforces policies through code, deploys secure data and workload controls, and evaluates security and compliance posture. This is supported by two distinct profiles: the IBM Cloud Framework for Financial Services and AI ICT Guardrails. These profiles contain controls that ensure compliance with industry-specific or regulatory requirements, enhancing the overall security and compliance framework.

To configure the SCC instance created by the stack and to run the first scan, complete the following steps:

Open the Navigation Menu and select Overview under Security and Compliance.
Ensure you are on the correct instance if you have multiple instances in your account, and then select Attachments and click Create. Provide a name for your attachment and then click Next.
Select the AI ICT Guardrails profile from the drop-down menu and then click Next.
Under the Scope drop-down menu, select the resource group in which the RAG pattern is located and then click Next.
Configure Schedule the scans, for example “Every Day”, and click Next.
Review the configuration and click Create.
Select the triple dot on the right side of your attachment and select Run scan.
When the scan completes, return to this menu and select View scan result.
Review the dashboard.

Upon entering the dashboard, you're presented with three graphical representations of your scan data.
- The Overview tab showcases your success rate and any drift in results over your chosen timeframe.
- The Controls tab provides a concise overview of each control's compliance status at the time of scanning.
- The Resources tab offers detailed results for each specific resource evaluated.
  
  This streamlined approach to managing and analyzing your scan data enables you to make informed decisions to enhance the security and compliance of your environment with SCC.

Note: With the current configuration, Watson resources will not be scanned and will appear as "Unable to perform". To achieve a 100% scan pass rate, additional steps are required to include these resources in the scan.

Summary and next step

Gen AI Deployable Architectures deliver trust and faster return on investment for the enterprise. These architectures enable clients to dramatically improve their deployment velocity, and to run and manage Gen AI solutions and their dependent components (surround services) with security and compliance on IBM Cloud.

Give the RAG DA on IBM Cloud a try! It's ideal for your RAG solutions. And watch this space for additional capabilities and deployable architectures in this area, including configurations for other Gen AI patterns.