LLM observability: track usage and manage costs with Elastic's OpenAI integration

In an era where AI-driven applications are becoming ubiquitous, understanding and managing the usage of language models is crucial. OpenAI has been at the forefront of developing advanced language models that power a multitude of applications, from chatbots to code generation. However, as applications grow in complexity and scale, observing crucial metrics that ensure optimal performance and cost-effectiveness becomes essential. Specific needs arise in areas such as performance and reliability monitoring, and cost management, which are pivotal for maximizing the potential of language models.

As organizations adopt OpenAI's diverse AI models, including language models like gPT-4o and gPT-3.5 Turbo, image models like DALL·E, and audio models like Whisper, comprehensive usage monitoring is crucial to track and optimize performance, reliability, usage and cost of each model.

Elastic's new OpenAI integration offers a solution to the challenges faced by developers and businesses using these models. It is designed to provide a unified view of your OpenAI usage across all model types.

Key benefits of the OpenAI integration

OpenAI's usage-based pricing model applies across all these services, making it essential to track consumption and identify which models are being used to control costs and optimize deployments. The new OpenAI integration by Elastic utilizes the OpenAI Usage API to track consumption and identify specific models being used. It offers an out-of-the-box experience with pre-built dashboards, simplifying the process of monitoring your usage patterns.

Continue reading to learn about what you will get with the integration. We'll also show you the setup process, how to leverage the pre-built dashboards, and what insights you can gain from Elastic for LLM Observability.

Setting up the OpenAI Integration

Prerequisites

To follow along with this blog, you will need:

An Elastic cloud account (version 8.16.3 or higher). Alternatively, you can use Elastic Cloud Serverless, a fully managed solution that eliminates infrastructure management, automatically scales based on usage, and lets you focus entirely on extracting value from your data.
An OpenAI account with an Admin API key.
Applications that use the OpenAI APIs.

generating sample OpenAI usage data

If you're new to OpenAI and eager to try this integration, you can quickly set it up and populate your dashboards with sample data. You'll just need to generate some usage by interacting with the OpenAI API. If you don't have an OpenAI API key, you can create one here. For more information on authentication, refer to the OpenAI documentation.

The OpenAI documentation provides detailed examples for each of their API endpoints. Here are direct links to the relevant sections for generating sample usage data:

Language models (completions): Use the Chat Completions API to generate text. See the examples here.
Audio models (text-to-speech): generate audio from text using the Speech API. See the examples here.
Audio models (speech-to-text): Transcribe audio to text using the Transcriptions API. See the examples here.
Embeddings: generate vector representations of text using the Embeddings API. See the examples here.
Image models: Create images from text prompts using the Image generation API. See the examples here.
Moderation: Check the contents with Moderation API. See the examples here.

There are more endpoints that you can explore to generate sample usage data.

After running these examples (using your API key), remember that the OpenAI Usage API has a delay. It may take some time (usually a few minutes) for the usage data to appear in your dashboard.

Configuration

To connect the OpenAI integration to your OpenAI account, you'll need your OpenAI's Admin API key. The integration will use this key to periodically retrieve usage data from the OpenAI Usage API.

The integration supports eight distinct data streams, corresponding to different categories of OpenAI API usage:

Audio speeches (text-to-speech)
Audio transcriptions (speech-to-text)
Code interpreter sessions
Completions (language models)
Embeddings
Images
Moderations
Vector stores

By default, all data streams are enabled. However, you can disable any data streams that are not relevant to your usage. All enabled data streams are visualized in a single, comprehensive dashboard, providing a unified view of your usage.

For advanced users, the integration offers additional configuration options, including setting the bucket width and initial interval. These options are documented in detail in the official integration documentation.

Maximize visibility with the out-of-the-box dashboard

You can access the OpenAI dashboard in two ways:

Navigate to the Dashboards menu in the left side panel and search for "OpenAI". In the search results select [Metrics OpenAI] OpenAI Usage Overview to open the dashboard.
Alternatively, navigate to the Integrations Menu — Open the Integrations menu under the Management section in Elastic, select OpenAI, go to the Assets tab, and choose [Metrics OpenAI] OpenAI Usage Overview from the dashboards assets.

Understanding the pre-configured dashboard for OpenAI

The pre-built dashboard provides a structured view of OpenAI's API consumption, displaying key metrics such as token usage, API call distribution, and model-wise invocation counts. It highlights top-performing projects, users, and API keys, along with breakdowns of image generation, audio transcription, and text-to-speech usage. By analyzing these insights, users can track usage patterns, and optimize AI-driven applications.

OpenAI usage metrics overview

This dashboard section shows key usage metrics from OpenAI, including invocation rates, token usage, and the top-performing models. It also highlights the total number of invocations and tokens and the invocation count by object type. Understanding these insights can help users optimize model usage, reduce costs, and enhance efficiency when integrating AI models into their applications.

Top performing Project, User, and API Key IDs

Here, you can analyze the top Project IDs, User IDs, and API Key IDs based on invocation counts. This data provides valuable insights to help organizations track usage patterns across different projects and applications.

Token metrics

In this dashboard section you can see token usage trends across various models. This can help you analyze trends across input types (e.g., audio, embeddings, moderations), output types (e.g., audio), and input cached tokens. This information can help developers fine-tune their prompts and optimize token consumption.

Image generation metrics

AI-generated images are becoming increasingly popular across industries. This section provides an overview of image generation metrics, including invocation rates by model and the most common output dimensions. These insights help assess invocation costs and analyze image generation usage.

Audio transcription metrics

OpenAI's AI-powered transcription services make speech-to-text conversion easier than ever. This section tracks audio transcription metrics, including invocation rates and total transcribed seconds per model. Understanding these trends can help businesses optimize costs when building audio transcription-based applications.

Audio speech metrics

OpenAI's text-to-speech (TTS) models deliver realistic voice synthesis for applications such as accessibility tools and virtual assistants. This section explores TTS invocation rates and the number of characters synthesized per model, offering insights into the adoption of AI-driven voice synthesis.

Creating Alerts and SLOs to monitor OpenAI

As with every other Elastic integration, all the logs and metrics information is fully available to leverage in every capability in Elastic Observability, including SLOs, alerting, custom dashboards, in-depth logs exploration, etc.

To proactively manage your OpenAI token usage and avoid unexpected costs, create a custom threshold rule in Observability Alerts.

Example: Target the relevant data stream, and configure the rule to sum the related tokens field (along with other token-related fields, if applicable). Set a threshold representing your desired usage limit, and the alert will notify you if this limit is exceeded within a specified timeframe, such as daily or hourly.

When an alert condition is met, the Alert Details view linked in the alert notification for that alert provides detailed insights surrounding the violation, such as when the violation started, its current status, and any previous history of similar violations, enabling proactive issue resolution, and improving system resilience.

Example: To create an SLO that monitors model distribution in OpenAI, start by defining a custom metric SLI definition, adding good events where

openai.base.model

contains

gpt-3.5*

and total events encompassing all OpenAI requests, grouped by

openai.base.project_id

and

openai.base.user_id

. Then, set an appropriate SLO target such as 80% and monitor this over a 7-day rolling window to identify projects and users that may be overusing more expensive models.

You can now track the distribution of requests across different OpenAI models by project and user. This example demonstrates how Elastic's OpenAI integration helps you optimize costs. By monitoring the percentage of requests handled by cost-efficient gPT-3.5 models — the SLI — against the 80% target (part of the SLO), you can quickly identify which specific projects or users are driving up costs through excessive usage of models like gPT-4-turbo, gPT-4o, etc. This visibility enables targeted optimization strategies, ensuring your AI initiatives remain cost-effective while still leveraging advanced capabilities.

Conclusion, next steps and further reading

You now know how Elastic's OpenAI integration provides an essential tool for anyone relying on OpenAI's models to power their applications. By offering a comprehensive and customizable dashboard, this integration empowers SREs and developers to effectively monitor performance, manage costs, and optimize your AI systems effortlessly. Now, it's your turn to onboard this application following the instructions in this blog and start monitoring your OpenAI usage! We'd love to hear from you on how you get on and always welcome ideas for enhancements.

To learn how to set up Application Performance Monitoring (APM) tracing of OpenAI-powered applications, read this blog. For further reading and more LLM observability use cases, explore Elastic's observability lab blogs here.