This is a cache of https://developer.ibm.com/articles/awb-introducing-watsonx-ai-on-microsoft-azure/. It is a snapshot of the page as it appeared on 2026-02-17T06:31:09.995+0000.
Introducing watsonx.ai on Microsoft Azure
IBM Developer

Article

Introducing watsonx.ai on Microsoft Azure

IBM watsonx.ai can now run on Azure as customer-managed solution

By Maryam Ashoori, Amiyo Basak, Leon Harris, Amit Banik

Artificial intelligence (AI) is revolutionizing industries by enabling advanced analytics, automation, and personalized experiences. Enterprises have reported a 30% productivity gain in application modernization after implementing generative AI workloads. Organizations are taking an intentional design approach to hybrid cloud and AI to realize desired returns on their cloud investments. They're focusing on business outcomes to drive technology decisions and enable adoption of generative AI to significantly increase business value, productivity, and cost reductions, thus accelerating business growth.

For IBM and Microsoft, it’s about client choice, providing joint solutions companies seek to improve costs, productivity, and resilience by accelerating their hybrid cloud and AI journeys. We know a hybrid cloud approach brings an average of 2.5 times higher ROI versus public cloud alone. And trailblazers investing in AI are seeing a revenue uplift of 3 to 15 percent and a sales ROI uplift of 10 to 20 percent.

"Achieving success as a modern enterprise takes the right type of partner, and approach. Together IBM and Microsoft have a proven track record of delivering meaningful innovation and industry firsts through the combination of our aligned technology portfolios and consulting services."
– IBM and Microsoft

IBM defines a foundation model as an Al model that can be adapted to a wide range of downstream tasks. An added benefit is the ability to fine-tune the model for a prescriptive use case, often with increased accuracy.

Organizations are accelerating their adoption and use of foundation models and leveraging them for a wide range of purposes, including:

  • Content summarization
  • Extraction
  • Classification
  • Content generation (including text, images, video, code, or molecules)
  • Sophisticated question answering
  • Software code generation

At IBM THINK 2024, we announced that IBM watsonx.ai is now supported by IBM to run on Microsoft Azure and available to purchase through IBM and our business partner ecosystem as a customer-managed solution on Azure Red Hat OpenShift (ARO). Organizations are also able to transact on Azure Marketplace to procure watsonx.ai to build AI applications in an enterprise-secure, trusted environment. Organizations can leverage the power of IBM’s hybrid cloud approach to run watsonx.ai anywhere via ARO and complement with Azure native services to meet business goals and objectives.

Open, trusted, governed, and targeted

We’re seeing that organizations are transforming their business operations with the use of foundation models and generative AI across various departments, including IT, to HR, sales, and marketing. As they ramp up, they are looking to use open source frameworks, AI models from multiple model providers, and integrated tools that bring AI builders together to create, train, and tune their foundation models for their business, with minimal data. From there, they want to effectively manage and monitor the AI lifecycle from model development to deployment in one place.

As clients move from exploration and investigation to production and scale with generative AI, they are looking for the right model choices, robust platform to infuse AI into applications, and a reliable partner who can help scale and operationalize AI with minimal risks. IBM has four core principles on the vital characteristics of generative AI and foundation models used for business. Specifically, it should be:

  • Open: IBM watsonx.ai is a hybrid cloud-native studio built on Red Hat OpenShift. This allows watsonx.ai to integrate easily across environments. IBM is also committed to open innovation, supporting and contributing to open source communities, including Hugging Face.
  • Trusted: IBM has a long history of building secure AI and data platforms. This focus is carried over to watsonx.ai to ensure IBM’s suite of foundation models – the Granite model series – are built on the principles of trust and transparency, making them ready for business applications. IBM’s Granite models are trained in accordance with the IBM AI Ethics code and integrated governance approach. Further, clients benefit from IP indemnification for the Granite models.
  • Targeted: watsonx.ai is designed for targeted business use cases to unlock new value for clients. For instance, the Granite models are trained on domain-specific, enterprise-relevant data that performs on par with 3-5x larger models in accuracy measures at lower latencies.
  • Empowering: Generative AI is not just about foundation models,. It must provide a platform that empowers enterprises to bring their own data to tune, train, and deploy generative AI models. As such, watsonx.ai is designed for AI value creators, not just users. IBM empowers AI builders to scale generative AI faster with tools for training, validating, tuning, and bring your own models (BYOM).

IBM watsonx.ai is an enterprise-ready studio for AI builders to train, tune and deploy trusted models and AI applications at scale. Part of the IBM watsonx AI and data platform, watsonx.ai brings together new generative AI capabilities powered by foundation models and traditional machine learning (ML) into a powerful studio spanning the AI lifecycle. With watsonx.ai, organizations get access to multi-model and multilingual model variety, with the added flexibility to integrate and deploy your AI workloads across hybrid, multi-cloud environments.

Our Granite models are being developed in different sizes to fit unique business needs of organizations. For example, businesses can use the models for:

  • Summarization: Condense a lengthy piece of text into a shorter, coherent version while retaining its essential information and meaning.
  • Insight extraction and classification: Identify and categorize valuable information or knowledge from unstructured data, often in the form of text. Example: Sentiment determination.
  • Retrieval-Augmented Generation (RAG): Combining retrieval of information from a knowledge base with text generation to provide context-aware, high-quality responses. For example, an HR chatbot to inquire about maternity leave.

IBM’s standard intellectual property protection—similar to what it provides for hardware and software products—will apply to IBM-developed watsonx models and provides an IP indemnity (contractual protection) for its foundation models, enabling its clients to be more confident AI creators when using their data, which is the source of competitive advantage in generative AI. Organizations can develop AI applications using their own data along with the client protections, accuracy, and trust afforded by IBM foundation models. Here is a view of the selection of the foundation models currently available in the watsonx.ai library:

IBM watsonx.ai library foundation models

Architecture on Microsoft Azure

Clients are able to deploy watsonx.ai as customer-managed software deployed onto Red Hat OpenShift on Microsoft Azure. Clients can choose either a self-managed instance, OpenShift Container Platform (OCP), or a managed instance, Azure Red Hat OpenShift (ARO). Licenses can be purchased directly from an IBM seller or Business Partner. The self-managed OCP option includes OpenShift licenses. Instructions on how to configure and deploy onto either option are available on request.

IBM watsonx.ai reference architecture on Azure Figure 1: watsonx.ai reference architecture on Azure

The OpenShift cluster for both managed and unmanaged options run on Microsoft Azure virtual machines with Red Hat CoreOS. The virtual machines are split into different sets. The Control plane manages the cluster functions such as workload placement, scaling, and availability. The IBM Fusion plane hosts the persistent storage functions and includes managed disks for the storage cluster. The worker nodes run the watsonx.ai components outside of the foundation models, such as the IBM Cloud Pak foundation services. Lastly, the GPU VMs are worker nodes that are specifically configured to run the foundation models. These can include the Standard_NC24ads_v4, a 24 CPU, 220GiB memory virtual machine with a single A100 GPU, which is able to host models such as IBM’s Granite 13b model, or the Standard_NC96ads_v4 virtual machine with 96 CPU, 1.9TB memory and 4 Nvidia Ampere A100 80GB GPUs able to handle the Llama 3 70b model.

The following diagram shows a typical infrastructure deployment on Microsoft Azure. The quantity and size of virtual machines, together with the external connectivity approach and database connectors, will vary depending on specific requirements.

Typical watsonx.ai for Azure infrastructure architecture Figure 2: Typical watsonx.ai for Azure infrastructure architecture

IBM watsonx.ai on Azure reference architecture

Enterprises are choosing to deploy watsonx.ai on Red Hat OpenShift to accelerate their generative AI and ML use cases, including knowledge management, content creation, insight extraction and AI forecasting to deliver a consistent, streamlined, and automated experience when handling the workload and performance demands of AI projects across the hybrid cloud. As a turnkey solution, Azure Red Hat OpenShift (ARO) further accelerates a customer’s time to value by delivering an Azure-native, comprehensive application platform that is jointly engineered, operated, and supported by Red Hat and Microsoft. With integrated tools and services out of the box on a fully managed platform and a consistent hybrid cloud experience anywhere, organizations can simplify operations, deploy AI applications fast, and focus on differentiating AI and ML innovations at enterprise scale.

Use cases for watsonx.ai on Azure:

  1. Knowledge management
  2. Extract Insights and discover trends
  3. Generate synthetic tabular data
  4. Generate content, technology, and code

Knowledge management

Enhance accuracy by using RAG (retrieval-augmented generation) for contextual information guidance to analyze multiple documents and data inputs, provide effective responses based on real-time information feeds, and improve documentation quality with feeds from internal or external knowledge bases. This use case can be utilized in a number of scenarios, such as using AI to help users navigate a user interface, suggesting common follow-up actions, and helping users execute specific multiple-step tasks.

You can expand this method of integrating context into prompts by leveraging information from a knowledge base. The retrieval-augmented generation pattern comprises three fundamental steps:

  1. Search for relevant content in your knowledge base: Identify and retrieve pertinent information from the knowledge base that aligns with the desired context.
  2. Pull the most relevant content into your prompt as context: Integrate the extracted relevant content into the prompt text to provide context for the model.
  3. Send the combined prompt text to the model to generate output: Present the amalgamated prompt text, encompassing both user-provided input and the context from the knowledge base, to the model for producing accurate and contextually informed output.

This tutorial incorporates a sample notebook that demonstrates the utilization of the retrieval-augmented generation pattern method. The tutorial aims to showcase how this method can enhance the accuracy of the generated output by providing step-by-step guidance within the notebook environment.

Analyze large amounts of data to identify and extract insights or facts from documents or reports, customer interactions, security, or IT incidents. Discover patterns, trends, or anomalies that occur within the data with the use of generative AI. This tutorial walks you through extracted details from a complaint.

While this is a common traditional machine learning use case, the ease with which this can be achieved using a foundation model is greatly enhanced. Instead of needing to clean and tag large volumes of data to train a model, a foundation model needs only a small amount of data to be able to then extract insights.

Generate synthetic tabular data

IBM watsonx.ai includes a Synthetic Data Generator tool that helps you generate synthetic tabular data. The benefit to synthetic data is that you can procure the data on-demand, then customize to fit your use case, and produce it in large quantities. This tutorial guides you to generate synthetic tabular data based on production data or a custom data schema using visual flows and modeling algorithms.

Generate content, technology, and code

Create new technology, content, and code through the power of generative AI to support developer and business user productivity across a range of business domains. This tutorial walks you through the steps to create code from a set of instructions.

A common scenario with this use case is to assist developers by suggesting the next section of code for a particular task in a given program language.

Get started with IBM watsonx.ai on Azure

For more information about watsonx.ai, IBM’s next-generation enterprise studio for AI builders to train, validate, tune and deploy generative AI and ML models on Azure, see the following resources: