This is a cache of https://developer.ibm.com/tutorials/groq-anthropic-granite-multi-agent-workflow/. It is a snapshot of the page as it appeared on 2025-12-08T12:04:27.791+0000.
Combine Groq, Anthropic, and Granite 4.0 Nano models in multi-agent workflows with watsonx Orchestrate - IBM Developer
Combine Groq, Anthropic, and Granite 4.0 Nano models in multi-agent workflows with watsonx Orchestrate
A hands-on guide for building multi-agent workflows that integrate high-speed inference, advanced reasoning, and summarization using Groq, Anthropic, and Granite 4.0 Nano models
There is no single model that solves every problem, and the real power of orchestration begins at that point. By orchestrating specialized AI agents, the watsonx Orchestrate platform enables developers to combine the strengths of multiple models and achieve optimal performance across diverse tasks.
The preceding overview diagram shows three key components that you will use in this tutorial: two model families and one fast, low-cost AI inference provider.
IBM Granite 4.0 Nano models provide lightweight, on-device inferencing optimized for CPU execution. These models offer fast and efficient summarization and workflow automation even on everyday devices. In this tutorial, you will use a 350-million-parameter model.
Together, these technologies demonstrate how hybrid model orchestration delivers faster, smarter, and more sustainable AI agent behavior.
In this tutorial, learn how to:
Create an agent that uses the GPT OSS 120B model on Groq LPU infrastructure through watsonx Orchestrate to answer queries.
Create an agent that uses the Claude Sonnet 4.5 model to enable watsonx Orchestrate agents to perform context-aware inferencing and sophisticated multi-step reasoning.
Architecture of Groq, Anthropic, and Granite 4.0 Nano models in multi-agent workflows
The following image shows the architecture of the sample multi-agent system for this tutorial.
The following sequence describes how the agentic workflow processes a user query from start to finish:
The user enters a question through the watsonx Orchestrate web chat interface. For example, What is the net increase in price for the streaming platforms?.
The supervisor agent (oic_cost_insight_agent) uses the GPT-OSS-120B model running on Groq LPU hardware to generate the initial reasoning.
The supervisor agent selects the collaborator agent (oic_cost_inflation_analysis_agent) for detailed cost-impact reasoning. No external tools are invoked at this stage.
The collaborator agent (oic_cost_inflation_analysis_agent) uses the Claude Sonnet 4.5 model to support advanced reasoning to conduct detailed cost-impact analysis, identify pricing patterns, compute numerical differences, and produce reasoning outputs.
The collaborator agent performs retrieval augmented generation (RAG) on the Excel dataset by invoking the RAG tool (oic_excel_rag_tool). The RAG tool embeds the dataset, retrieves the most relevant rows, constructs context blocks, and might call the large language model (LLM) for retrieval-guided reasoning. The RAG tool returns a detailed, data-grounded analysis that includes both raw reasoning and retrieved context.
The supervisor agent receives the output from the collaborator agent and passes it to the tool (oic_granite_summary_tool). The summary tool invokes the Granite 4.0 Nano model that is running on Red Hat OpenShift with vLLM to convert the long reasoning output into a concise and user-friendly summary.
The supervisor agent returns the Granite model generated summary to the watsonx Orchestrate web chat interface.
Note: Preconfigured connections such as anthropic_credentials and oic_llm_creds route the model calls. The Granite model runs on Red Hat OpenShift with vLLM for fast summarization. The GPT-OSS-120B model executes on Groq LPU infrastructure. The Claude Sonnet model is available through the Anthropic connection whenever the workflow requires advanced reasoning.
Note: For this tutorial, you will use the GPT OSS 120B model available in watsonx Orchestrate on Groq LPU infrastructure.
Step 1. Import the Claude Sonnet model using the AI Gateway
By integrating the Claude Sonnet model from Anthropic into watsonx Orchestrate, you equip your AI agents with nuanced decision-making capabilities and expert-level inferencing.
Open the repository in Visual Studio Code or any editor of your choice, and navigate to the i-oic-cost-inflation-analysis-agent folder. This folder will serve as the working directory for the next steps.
Log in to the Anthropic Console and create an API key for accessing the Anthropic models. If you need to add credit to the accounnt, the console will prompt you to enter payment details. Then, you can proceed to create the API key.
a. Click the API Keys option in the navigation.
b. Click Create Key.
c. Enter a name for the new API key and click Add.
d. Click Copy Key to copy the generated API key.
Make a note of the API key that you created in the previous step.
Replace the value of ANTHROPIC_API_KEY in the following command with the API key that you copied and then run the commands to import the claude_connections.yaml file and validate the connections that watsonx Orchestrate creates in both the draft environment and the live environment.
Review the anthropic-claude.yaml file. Ensure that the app_id value in the YAML file matches the name of the credentials you specified in step 5.
Enter the following commands to import the model and to confirm that the model has been successfully imported into the watsonx Orchestrate Agent Development Kit (ADK).
The Claude Sonnet 4.5 model (claude-sonnet-4-5-20250929) is successfully imported into the watsonx Orchestrate environment.
Step 2. Deploy the Granite 4.0 Nano model on Red Hat OpenShift using vLLM
The Granite 4.0 Nano 350-million-parameter model acts as the primary inference engine for your AI agent, enabling fast, on-demand reasoning throughout the tutorial.
Log in to your Red Hat OpenShift cluster in the terminal.
oc login
Copy codeCopied!
Create a namespace for the vLLM deployment and switch to it:
This change ensures that the deployment uses the most recent vLLM image, which now supports serving the Granite 4.0 Nano model.
Deploy the vLLM server on the Red Hat OpenShift cluster and wait for the pod to reach the Running state.
oc apply -f vllm.yaml
oc get po
Copy codeCopied!
Apply the service and the route configuration so that watsonx Orchestrate can access the vLLM endpoint.
oc apply -f svc.yaml
oc apply -f route.yaml
Copy codeCopied!
Export the Red Hat OpenShift route to an environment variable.
export VLLM_ROUTE_OCP="$(oc get route sai-vllm -o jsonpath='{.spec.host}')"
Copy codeCopied!
Display the value of VLLM_ROUTE_OCP and copy it for use in the next step.
echo$VLLM_ROUTE_OCP
Copy codeCopied!
Test the Granite 4.0 Nano model by sending a prompt to the vLLM endpoint.
curl --request POST \
--url "https://${VLLM_ROUTE_OCP}/v1/chat/completions" \
--header "Content-Type: application/json" \
--data '{
"model": "ibm-granite/granite-4.0-350m",
"messages": [
{
"role": "user",
"content": "Generate a one sentence summary for this. Based on the data, here are insights about youtube Premium pricing:\n\nPrice Trend (Brazil - Individual Plan):\n- 2022: 20.90\n- 2023: 25.90 (23.92% increase)\n- 2024: 28.90 (11.58% increase)\n- 2025: 34.90 (20.76% increase)\n\nKey Insights:\n1. youtube Premium prices have increased 67% over 3 years (from 20.90 to 34.90)\n2. User sentiment has declined from positive to negative, with users considering cancellation\n3. Users mention considering ad blockers as an alternative due to continuous price increases\n4. The comment \"Prices keep climbing, might cancel\"\n\nFor Future Planning:\n- Expect continued annual price increases\n- Budget for potential annual increases\n- Monitor your budget closely as streaming costs compete with essential expenses like rent"
}
],
"stream": false
}'
Copy codeCopied!
The following image shows the values that need to be changed.
Step 3. Create a Python tool that uses the Granite 4.0 Nano model
Create a tool that uses the Granite 4.0 Nano model to summarize the answer provided as input. The model is deployed in a Red Hat OpenShift cluster and is accessible through an API endpoint.
Open the oic_granite_summary_tool.py file. This summary tool makes an API call to the Granite model and returns the summarized output.
When you build Python tools that use external connections, you must declare the required connection in the expected_credentials array in the @tool decorator.
a. Use MY_APP_ID="oic_llm_creds" as the application identifier.
b. The credentials are of type Key Value. The credential set must include a key named url_name that contains the hostname of the application where the Granite 4.0 Nano model (granite-4.0-350m) is deployed.
Note: The Granite 4.0 Nano model is not currently supported on the watsonx.ai platform. In the future, if out-of-the-box support becomes available, you can reference the model directly in the agent specification YAML file.
c. Import the summary tool by using the watsonx Orchestrate Agent Development Kit (ADK).
d. Log in to the watsonx Orchestrate user interface to verify that the oic_granite_summary_tool tool appears in the tool catalog.
i. Click Manage agents.
ii. Click All tools. You should see the oic_granite_summary_tool entry in the list.
Step 4. Build a Python tool that applies RAG over an Excel dataset
Create a Python tool that uses the watsonx Orchestrate Agent Development Kit (ADK) framework to perform retrieval augmented generation (RAG) over an Excel dataset. The tool embeds the Excel data, retrieves the most relevant rows based on the user query, and produces a context-aware answer. This RAG workflow provides precise, data-grounded responses by combining Excel-based retrieval with large language model reasoning.
Open the oic_excel_rag_tool.py file. This RAG tool sends an API request to the watsonx model and returns the generated output.
The oic_excel_rag_tool.py file uses watsonx embedding models and watsonx slate models. Therefore, the tool requires additional credentials, WATSONX_APIKEY and PROJECT_ID, which you can retrieve from watsonx.ai. To get the PROJECT_ID, complete the following procedure:
a. Log in to IBM Cloud, navigate to the Resource list, and click the watsonx.ai service.
ii. Launch the watsonx.ai workspace.
iii. Open the Project, navigate to the Project Details page, and copy the value of Project ID.
For this tutorial, you will use the default embedding model named ibm/slate-30m-english-rtrvr-v2.
You will use streaming_cost_inflation.xlsx as the dataset for the RAG tool. The current tool implementation supports Excel files. You can modify the implementation to support PDF files, CSV files, and other formats.
Ensure that you are in the tools directory. Run the following commands in the VS Code terminal, and then verify that the oic_excel_rag_tool has been imported into the watsonx Orchestrate tool catalog.
Log in to the watsonx Orchestrate user interface to view the oic_excel_rag_tool in the tool catalog.
a. Click Manage agents.
b. Click All tools, and you should see the oic_excel_rag_tool.
Step 5. Import the agents to watsonx Orchestrate
Import the following two agents into watsonx Orchestrate to demonstrate the usage of Groq models and Anthropic models.
oic_cost_insights_supervisor_agent (Supervisor Agent): This agent orchestrates the complete cost insight workflow. The agent uses the GPT OSS 120B model running on Groq LPU infrastructure, which is integrated into watsonx Orchestrate, to perform real-time reasoning, aggregate insights from collaborator agents, and generate concise executive summaries on cost trends, cost drivers, and cost anomalies.
oic_cost_inflation_analysis_agent (Collaborator Agent): This agent specializes in cost inflation diagnostics using the Claude Sonnet 4.5 for detailed inferencing and contextual analysis. The agent examines historical and regional price fluctuations, identifies abnormal inflation patterns, and explains the contributing factors in natural language for downstream summarization.
Import both agents using the watsonx Orchestrate ADK. Import the agents in the same sequence because the first agent functions as the collaborator agent for the second agent.
Log in to the watsonx Orchestrate user interface to view the recently imported agents.
Next, test and validate the imported agents.
Review the configuration of the collaborator agent in the watsonx Orchestrate user interface.
a. Click oic_cost_inflation_analysis_agent in the Agent Catalog.
b. Validate each section of the agent. You can update or modify the agent instructions as needed.
c. Add the following Quick Starter prompts:
Show the price history for Netflix, Disney+, and HBO subscriptions.
Compare Disney+ price changes with changes in consumer sentiment.
d. On the right-side panel, you can view the changes that are shown in the starter instructions.
e. Validate that the model selected at the top of the page is Claude Sonnet 4-5, which you imported in the previous steps.
Note: A legal statutory notice about the usage of third-party LLMs is shown. No action is required because this notice is only for your information.
f. Scroll down to the Channels section and toggle the Home Page option to Off. This agent functions as a collaborator agent and should not appear in the chat interface.
g. Test the agent by entering the instruction, Compare Disney+ price changes with changes in consumer sentiment, in the chat window.
h. Review the response generated by the agent. After verification, click Deploy.
Review the configuration of the supervisor agent in the watsonx Orchestrate user interface.
a. Search for the oic_cost_insight_agent supervisor agent in the Agent Catalog (Manage Agents → Search). Click the agent entry to open it in the Agent Builder.
b. Update the configuration to use the model deployed on Groq for inference.
i. Validate the AI Model field. It should display Groq OpenAPI GPT OSS 120B in the menu. This model will be used for inference for all incoming conversations.
ii. Scroll down to the Toolset section. You should see the oic_granite_summary_tool mapped as a tool and the oic_cost_inflation_analysis_agent mapped as a collaborator for this agent.
iii. Update the starter prompts. These prompts will be used for the final testing of the agent.
iv. In the Channels section, ensure that the Home Screen option is enabled. This setting makes the agent visible on the landing screen in the agent selection list.
v. If you make any configuration changes, deploy the agent. Then, go to the chat interface, select the agent from the list, and the chat interface should appear.
Step 6. Validate all components in watsonx Orchestrate
Validate the oic_cost_insight_agent supervisor agent to confirm that all components of the end-to-end orchestration execute as expected. You will test how multiple AI models collaborate to deliver actionable insights.
Run test queries
In the chat interface, enter the following test queries in the conversation pane:
What is the net increase in price for the streaming platforms?
What is the net price change for youtube as a streaming platform?
What are the customer sentiments for Netflix and Disney+?
Review the detailed, step-by-step reasoning chain for the query What is the net increase in price for the streaming platforms? to understand how the system processes the request and generates the final response.
a. In the first step, the oic_cost_insight_agent supervisor agent routes the query to its oic_cost_inflation_analysis_agent collaborator agent.
Note: For this query, the agent uses the GPT OSS 120B (via Groq) model to perform deep reasoning and cost-impact analysis. This large-scale model identifies pricing patterns, computes changes across streaming platforms, and generates actionable insights from both structured and unstructured data.
b. In the next step, the oic_cost_inflation_analysis_agent collaborator agent evaluates the tools that are required to process the query. The agent then calls the oic_excel_rag_tool to access the knowledge base, perform reasoning, and generate a detailed response.
Note: The Anthropic Claude Sonnet 4-5 (claude-sonnet-4-5-20250929) model contributes to context comprehension and natural language refinement. For this query, the collaborator agent reuses the response from the initial reasoning step instead of making a new reasoning call. This approach improves processing efficiency and reduces unnecessary model invocations.
c. In the final step, the oic_cost_insight_agent supervisor agent receives the completed response from the oic_cost_inflation_analysis_agent collaborator agent. The supervisor agent then calls the oic_granite_summary_tool to generate a summarized output. These steps are defined in the agent’s behavior and can be customized as needed.
Note: The Granite 4.0 Nano model performs both summarization and sentiment analysis. It condenses complex reasoning outputs into clear, concise summaries and interprets tone, emotion, and user intent in customer feedback, while maintaining high processing speed and cost efficiency.
Note: The oic_granite_summary_tool summarization tool uses the Granite 4.0 Nano model to interpret and summarize customer sentiments in multiple languages. The tool provides a balanced and language-agnostic evaluation of customer feedback. The original dataset contains comments in multiple languages, and the tool automatically handles this multilingual data.
Summary and next steps
In this tutorial, you created a multi-agent system that demonstrates how combining the GPT OSS 120B model running on Groq LPU infrastructure with the Claude Sonnet 4-5 model delivers powerful intelligent workflows. The Granite 4.0 Nano model complements these models by summarizing outputs and generating clear, actionable insights.
Together, these models illustrate the synergy between high-speed inference, advanced reasoning, and collaborative agent workflows in an agentic AI system.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.