LLM Observability with Elastic’s Azure AI Foundry Integration

Introduction

As organizations increasingly adopt LLMs for AI-powered applications such as content creation, Retrieval-Augmented Generation (RAG), and data analysis, SREs and developers face new challenges. Tasks like monitoring workflows, analyzing input and output, managing query latency, and controlling costs become critical. LLM Observability helps address these issues by providing clear insights into how these models perform, allowing teams to quickly identify bottlenecks, optimize configurations, and improve reliability. With better observability, SREs can confidently scale LLM applications, especially on platforms like Azure AI Foundry, while minimizing downtime and keeping costs in check.

Elastic is expanding support for LLM Observability with Elastic Observability's new Azure AI Foundry integration. This is now available as a tech preview on Elastic Cloud. This new observability integration provides you with comprehensive visibility into the performance and usage of foundational models, such as GPT-4, Mistral, Llama, and thousands of others from leading AI companies and from Azure available through Azure AI Foundry. The new Azure AI Foundry Integration in Elastic Observability integration offers an out-of-the-box experience by simplifying the collection of metrics and logs, making it easier to gain actionable insights and effectively manage your models. The integration is simple to set up and comes with pre-built, out-of-the-box dashboards. With real-time insights, SREs can now monitor, optimize and troubleshoot LLM applications that are using Azure AI Foundry.

This blog will walk through the features available to SREs, such as monitoring invocations, errors, and latency information across various models, along with the usage and performance of LLM requests. Additionally, the blog will show how easy it is to set up and what insights you can gain from Elastic for LLM Observability.

Prerequisites

To get started with the Azure AI Foundry integration, you will need:

An account on Elastic Cloud and a deployed stack in Azure (see instructions here). Ensure you are using version 9.0.0 or higher.
An Azure account with permissions to pull the necessary data from Azure and Azure AI Foundry. See details in our documentation.

Configuring Azure AI Foundry Integration

To collect logs and metrics from Azure AI Foundry ensure you properly configure Azure logs and metrics from the following links:

Configure to receive Azure Metrics - This integration specifically collects Azure AI Foundry metrics which will come from the service, and ensure you have the client id, subscription id, and tenant id from Azure AI Foundry to collect metrics.
Configure to receive Azure Logs and more specifically ensure that you configure Azure event hub to properly allow Elastic to ingest logs. Once you have the Azure event hub information, you will need it to configure the logs section of the Azure AI Foundry Integration.

Maximize Visibility with Out-of-the-box dashboards

Azure AI Foundry integration offers rich out-of-the-box visibility into the performance and usage information of models in Azure AI Foundry, including text and image models. There are several dashboards currently available. More will be coming as the integration goes to GA.

Azure AI Foundry Overview dashboard provides a summarized view of the invocations, errors and latency information across various models.
Azure AI Foundry Billing dashboard - which provides total costs and daily usage costs from Azure cognitive services.
Azure AI Foundry Advanced Monitoring - which focuses on logs generated by the Azure AI Foundry service when connected through the API Management Service. Provides request rate, error rate, model usage, latency, LLM prompt input, response completion.

Each dashboard provides specific insights important to SREs. Here is a quick overview of some of these insights:

Model Usage and Token Trends – Visualize token consumption and completion counts by model, endpoint, and time window.
Latency Metrics – Monitor average and percentile latency per prompt, per endpoint, and correlate with prompt types or user IDs.
Cost Estimation – Estimate API usage cost based on token consumption and model pricing.
Prompt/Completion Logging – View prompt-response pairs for debugging and quality monitoring.
Content Filtering and Guardrails – See which prompts or completions are being filtered, and why.

You can drill into specific users or sessions, slice by model type or region, and export reports for usage reviews or compliance.

Try it out today

The Azure AI Foundry Integration is currently available in Elastic Cloud (both serverless and hosted options). Sign up for a 7 day trial by signing up to Elastic Cloud directly or through Azure Marketplace. Alternatively you can also deploy a cluster on our Elasticsearch Service, download the Elasticsearch stack, or run Elastic from Azure Marketplace then spin up the new technical preview of Azure AI Foundry integration, open the curated dashboards in Kibana and start monitoring your Azure AI Foundry service!