Transforming Industries and the Critical Role of LLM Observability: How to use Elastic's LLM integrations in real-world scenarios

In today's tech-centric world, Large Language Models (LLMs) are transforming sectors from finance and healthcare to research. LLMs are starting to underpin products and services across the spectrum. Take for example recent advanced coding developments in Google's Gemini 2.5 which enable it to use its reasoning capabilities to create a video game by producing the executable code from a short prompt. Or new ways to interact with Amazon's Alexa - for example, you could send a picture of a live music schedule, and have Alexa add the details to your calendar. And let's not forget Microsoft's personalization of Copilot which remembers what you talk about, so it learns your likes and dislikes and details about your life; the name of your dog, that tricky project at work, what keeps you motivated to stick to your new workout routine.

Despite their widespread utility of LLMs, deploying these sophisticated tools in real-world scenarios poses distinct challenges, especially in managing their complex behaviors. For users such as Site Reliability Engineers (SREs), DevOps teams, and AI/ML engineers, ensuring reliability, performance, and compliance of these models introduces an additional layer of complexity. This is where the concept of LLM Observability becomes essential. It offers crucial insights into the performance of these models, ensuring that these advanced AI systems operate both effectively and ethically.

Why LLM Observability Matters and How Elastic Makes It Easy

LLMs are not just another piece of software; they are sophisticated systems capable of human-like capabilities such as text generation, comprehension, and even coding. But with great power comes greater need for oversight. The opaque nature of these models can obscure how decisions are made and content generated. This makes it even more critical to implement robust observability to monitor and troubleshoot issues such as hallucinations, inappropriate content, cost overruns, errors and performance degradation. By monitoring these models closely, we can safeguard against unexpected outcomes and maintain user trust.

Real-World Scenarios

Let's explore real-world scenarios where companies leverage LLM-powered applications to enhance productivity and user experience, and how Elastic's LLM observability solutions monitor critical aspects of these models.

1. Generative AI for Customer Support

Companies are increasingly leveraging LLMs and generative AI to enhance customer support, using platforms like Google Vertex AI for hosting these models efficiently. With the introduction of advanced AI models such as Google's Gemini, which is integrated into Vertex AI, businesses can deploy sophisticated chatbots that manage customer inquiries, from basic questions to complex issues, in real time. These AI systems understand and respond with natural language, offering instant support for issues such as product troubleshooting or managing orders thus reducing wait times. They also learn from each interaction to improve accuracy continuously. This boosts customer satisfaction and allows human agents to focus on complex tasks, enhancing overall efficiency. Other ways that AI tools can further empower customer care agents is with real-time analytics, sentiment detection, and conversation summarization.

To support use cases like the AI-powered customer support described above, Elastic recently launched LLM observability integrations including support for LLMs hosted on GCP Vertex AI. Customers who wish to monitor foundation models such as Gemini and Imagen hosted on Google Vertex AI can benefit from Elastic’s Vertex AI integration to get a deeper understanding of model behavior and performance, and ensure that the AI-driven tools are not only effective but also reliable. Customers get out-of-the-box experience ingesting a curated set of metrics from Vertex AI as well as a pre-configured dashboard.

By continuously tracking these metrics, customers can proactively manage their AI resources, optimize operations, and ultimately enhance the overall customer experience.

Let's look at some of the metrics you get from the Google Vertex AI integration which are helpful in the context of using generative AI for customer support.

Prediction Latency: Measures the time taken to complete predictions, critical for real-time customer interactions.
Error Rate: Tracks errors in predictions, which is vital for maintaining the accuracy and reliability of AI-driven customer support.
Prediction Count: Counts the number of predictions made, helping assess the scale of AI usage in customer interactions.
Model Usage: Tracks how frequently the AI models are accessed by both virtual assistants and customer support tools.
Total Invocations: Measures the total number of times the AI services are used, providing insights into user engagement and dependency on these tools.
CPU and Memory Utilization: By observing CPU and memory usage, users can optimize resource allocation, ensuring that the AI tools are running efficiently without overloading the system.

To learn more about how Elastic's Google Vertex AI integration can augment your LLM observability, have a quick read of this blog.

2. Transforming Healthcare with Generative AI

The healthcare industry is embracing generative AI to enhance patient interactions and streamline operational workflows. By leveraging platforms like Amazon Bedrock, healthcare organizations deploy advanced large language models (LLMs) to power tools that convert doctor-patient conversations into structured medical notes, reducing administrative overhead and allowing clinicians to prioritize diagnosis and treatment. These AI-driven solutions provide real-time insights, enabling informed decision-making and improving patient outcomes. Additionally, patient-facing applications powered by LLMs offer secure access to health records, empowering individuals to manage their care proactively.

Robust observability is essential to maintain the reliability and performance of these generative AI applications in healthcare. Elastic’s Amazon Bedrock integration equips providers with tools to monitor LLM behavior, capturing critical metrics like invocation latency, error rates, token usage and guardrail invocation. Pre-configured dashboards provide visibility into prompt and completion text, enabling teams to verify the accuracy of AI-generated outputs, such as medical notes, and detect issues like hallucinations.

Additionally, customers who configure Guardrails for Amazon Bedrock to filter harmful content like hate speech, personal insults, and other inappropriate topics, can use the Bedrock Integration to observe the prompts and responses that caused the guardrail to filter them out. This helps application developers take proactive actions to maintain a safe and positive user experience.

Some of the logs and metrics that can be helpful for customers using LLMs hosted on Amazon Bedrock are the following

Invocation Details: This Integration records the Invocation latency, count, throttles. These metrics are critical for ensuring that generative AI models respond quickly and accurately to patient queries or appointment scheduling tasks, maintaining a seamless user experience.
Error Rates: Tracking error rates ensures that AI tools, such as patient query assistants or appointment systems, consistently deliver accurate and reliable results. By identifying and addressing issues early, healthcare providers can maintain trust in AI systems and prevent disruptions in critical patient interactions.
Token Usage: In healthcare, tracking token usage helps identify resource-intensive queries, such as detailed patient record summaries or complex symptom analyses, ensuring efficient model operation. By monitoring token usage, healthcare providers can optimize costs for AI-powered tools while maintaining scalability to handle growing patient interactions.
Prompt and Completion Text: Capturing prompt and completion text allows healthcare providers to analyze how AI models respond to specific patient queries or administrative tasks, ensuring meaningful and contextually accurate interactions. This insight helps refine prompts to improve the AI's understanding and ensures that generated responses, such as appointment details or treatment explanations, meet the quality standards expected in healthcare.
Prompt and response where guardrails intervened: Being able to track requests and responses that were deemed inappropriate by guardrails helps healthcare providers monitor what information patients are asking for. With this information users can make continuous adjustments to the LLMs to ensure appropriate responses, balancing flexibility and rich communication on the one hand, and on the other, privacy protection, hallucination prevention, and harmful content filtering.

Amazon Bedrock Gaurdrails OOTB dashboard

To learn about the Amazon Bedrock Integration, read this blog. To dive deeper into how the integration can help with observability of Guardrails for Amazon Bedrock, take a look at this blog.

3. Enhancing Telco Efficiency with GenAI

The telecommunication industry can leverage services like Azure OpenAI to transform customer interactions, optimize operations, and enhance service delivery. By integrating advanced generative AI models, telcos can offer highly personalized and responsive customer experiences across multiple channels. AI-powered virtual assistants streamline customer support by automating routine queries and providing accurate, context-aware responses, reducing the workload on human agents and enabling them to focus on complex issues while improving efficiency and satisfaction. Additionally, AI-driven insights help telcos understand customer preferences, anticipate needs, and deliver tailored offerings that boost customer loyalty. Operationally, LLMs such as Azure OpenAI enhance internal processes by enabling smarter knowledge management and faster access to critical information.

Elastic's LLM observability integrations like the Azure OpenAI integration can provide visibility into AI performance and costs, empowering telecom providers to make data-driven decisions and enhance customer engagement. It can help optimize resource allocation by analyzing call patterns, predicting service demands, and identifying trends, enabling telcos to scale their AI operations efficiently while maintaining high service quality.

Some of the key metrics and logs that Azure OpenAI that can provide insights are:

Error Counts: It provides critical insights into failed requests and incomplete transactions, enabling telecom providers to proactively identify and resolve issues in AI-powered applications.
Prompt Input and Completion Text: This captures the input queries provided to AI systems and the corresponding AI-generated outputs. These fields allow telecom providers to analyze customer queries, monitor response quality, and refine AI training datasets to improve relevance and accuracy.
Response Latency: It measures the time taken by AI models to generate responses, ensuring that virtual assistants and automated systems deliver quick and efficient replies to customer queries.
Token Usage: It tracks the number of input and output tokens processed by the AI model, offering insights into resource consumption and cost efficiency. This data helps telecom providers monitor AI usage patterns, optimize configurations, and scale resources effectively
Content Filter Results: In Azure OpenAI, this plays a crucial role in handling sensitive inputs provided by customers, ensuring compliance, safety, and responsible AI usage. This feature identifies and flags potentially inappropriate or harmful queries and responses in real time, enabling telecom providers to address sensitive topics with care and accuracy.

The Azure OpenAI content filtering OOTB dashboard

You can learn more about Elastic's Azure OpenAI integration from these two blogs - Part 1 and Part 2.

4. OpenAI Integration for Generative AI Applications

As AI-powered solutions become integral to modern workflows, OpenAI's sophisticated models, including language models like GPT-4o and GPT-3.5 Turbo, image generation models like DALL·E, and audio processing models like Whisper, drive innovation across applications such as virtual assistants, content creation, and speech-to-text systems. With growing complexity and scale, ensuring these models perform reliably, remain cost-efficient, and adhere to ethical guidelines is paramount. Elastic's OpenAI integration provides a robust solution, offering deep visibility into model behaviour to support seamless and responsible AI deployments.

By tapping into the OpenAI Usage API, Elastic's integration delivers actionable insights through intuitive, pre-configured dashboards, enabling Site Reliability Engineers (SREs) and DevOps teams to monitor performance and optimize resource usage across OpenAI's diverse model portfolio. This unified observability approach empowers organizations to track critical metrics, identify inefficiencies, and maintain high-quality AI-driven experiences. The following key metrics from Elastic's OpenAI integration help organizations achieve effective oversight:

Request Latency: Measures the time taken for OpenAI models to process requests, ensuring responsive performance for real-time applications like chatbots or transcription services.
Invocation Rates: Tracks the frequency of API calls across models, providing insights into usage patterns and helping identify high-demand workloads.
Token Usage: Monitors input and output tokens (e.g., prompt, completion, cached tokens) to optimize costs and fine-tune prompts for efficient resource consumption.
Error Counts: Captures failed requests or incomplete transactions, enabling proactive issue resolution to maintain application reliability.
Image Generation Metrics: Tracks invocation rates and output dimensions for models like DALL·E, helping assess costs and usage trends in image-based applications.
Audio Transcription Metrics: Monitors invocation rates and transcribed seconds for audio models like Whisper, supporting cost optimization in speech-to-text workflows.

To learn more about Elastic's OpenAI integration, read this blog.

Actionable LLM Observability

Elastic's LLM observability integrations empower users to take proactive control of their AI operations through actionable insights and real-time alerts. For instance, by setting a predefined threshold for token count, Elastic can trigger automated alerts when usage exceeds this limit, notifying Site Reliability Engineers (SREs) or DevOps teams via email, Slack, or other preferred channels. This ensures prompt awareness of potential cost overruns or resource-intensive queries, enabling teams to adjust model configurations or scale resources swiftly to maintain operational efficiency.

In the example below, the rule is set to alert the user if token_count crosses a threshold of 500.

The alert is triggered when the token count exceeds the threshold as seen below

Another example is tracking invocation spikes, such as when the number of predictions or API calls surpasses a defined Service Level Objective (SLO). For example, if a Bedrock AI-hosted model experiences a sudden surge in invocations due to increased customer interactions, Elastic can alert teams to investigate potential anomalies or scale infrastructure accordingly. These proactive measures help maintain the reliability and cost-effectiveness of LLM-powered applications.

By providing pre-configured dashboards and customizable alerts, Elastic ensures that organizations can respond to critical events in real time, keeping their AI systems aligned with cost and performance goals as well as standards for content safety and reliability.

Conclusion

LLMs are transforming industries, but their complexity requires effective oversight observability to ensure their reliability and safe use. Elastic's LLM observability integrations provide a comprehensive solution, empowering businesses to monitor performance, manage resources, and address challenges like hallucinations and content safety. As LLMs become increasingly integral to various sectors, robust observability tools like those offered by Elastic ensure that these AI-driven innovations remain dependable, cost-effective, and aligned with ethical and safety standards.

Transforming Industries and the Critical Role of LLM Observability: How to use Elastic's LLM integrations in real- world scenarios