In a previous blog we showed you how to set up observability for your models hosted on Azure OpenAI using Elastic’s integration. We’ve expanded the integration to also include Azure OpenAI content filtering, and cost analysis for Azure OpenAI. If you previously onboarded the Azure OpenAI integration, just upgrade it and you will automatically get all new features we discuss in this blog. The enhanced integration now provides multiple dashboards including a general Azure OpenAI Overview, Azure Provisioned Throughput Unit dashboard, Azure Content filtering, and a dashboard for Azure OpenAI billing.
In this blog we will cover how to use Azure OpenAI Content Filtering and tracking Azure OpenAI usage costs. Let’s first review what these two capabilities from Azure OpenAI enable you to do:
Azure OpenAI Content Filtering: Enhancing AI Safety
Content filtering for Azure OpenAI plays a critical role in addressing AI safety challenges by helping to mitigate the risks associated with harmful or inappropriate content generated by AI models. By implementing robust content filtering mechanisms, organizations can proactively identify and filter out potentially harmful content, such as hate speech, misinformation, or violent imagery, before it is disseminated to users. This helps prevent the spread of harmful content and reduces the potential negative impact on individuals and communities.
Monitoring Azure OpenAI content filtering is essential for staying proactive in addressing emerging content moderation challenges. By closely monitoring the system, businesses can quickly detect any new types of harmful content or patterns of misuse that may arise. This enables organizations to stay ahead of potential content moderation issues and take timely action to protect their users and uphold their brand reputation.
Tracking Azure OpenAI Usage Costs
Monitoring Azure OpenAI model usage costs is crucial for managing budget and resource allocation effectively. By keeping track of usage costs, organizations can optimize their operations to avoid unnecessary expenses and ensure that they are getting the best value from their investment in AI technologies. Additionally, it helps in forecasting future expenses and aids in scaling resources according to the demand without compromising performance or incurring excessive costs. Effective monitoring also allows for transparency and accountability, enabling better decision-making in terms of AI deployment and utilization within Azure environments.
As we walk through this blog, we will provide you with prerequisites to set up and use the pre-configured dashboards for both of these capabilities, which are part of the Azure OpenAI integration.
Prerequisites
In order to follow along in this blog you will have to
-
Set up and install the Azure billing integration to monitor the usage costs. Once the integration is installed, you can track the usage in the enhanced Azure OpenAI Billing dashboard.
-
Additionally, make sure you have enabled the Azure API Management service to access the Azure OpenAI models.
How to Use Azure API Management with Azure OpenAI:
- Provision an Azure OpenAI resource: Create an Azure OpenAI resource and select a model for your application.
- Create an API Management instance: Establish an Azure API Management instance to manage the Azure OpenAI APIs.
- Import the Azure OpenAI API: Import the Azure OpenAI API into your API Management instance using its OpenAPI specification.
- Configure Policies: Implement policies in API Management to manage request authentication, rate limiting, traffic shaping, and more.
Steps to create a content filter for Azure OpenAI
Before you set up observability for the content filtering, ensure that you have configured the Azure content filtering for your model. Follow the steps below to create an Azure OpenAI content filtering,
- Access the Azure OpenAI service console:
- Sign in to the Azure Console with the appropriate permissions and navigate to the Azure OpenAI service console.
- Navigate to Safety + security:
- From the left-hand menu, select Safety + security.
- Create a New Content filter:
- Select Create content filter.
- Configure various content filter policies including the following
- Set input filter: Content will be annotated by category and blocked according to the threshold you set for prompts.
- Set output filter: Content will be annotated by category and blocked according to the threshold you set for response output.
- Blocklists: Define specific words or phrases to block.
- Deployments: Apply filters to model deployments.
- Review and Create:
- Review your settings and select Create to finalize the content filter configurations.
Customers can also configure content filters and create custom safety policies that are tailored to their use case requirements. The configurability feature allows customers to adjust the settings, separately for prompts and completions, to filter content for each content category at different severity levels.
Content filter types
- The content filtering categories,
- (hate, sexual, violence, self-harm)
- Other optional classification models aimed at detecting jailbreak risk and known content for text and code.
- Severity level within each content filter category,
- (low, medium, high)
- Content detected at the 'safe' severity level is labeled in annotations but isn't subject to filtering and isn't configurable.
Understanding the pre-configured dashboard for Azure OpenAI Content Filtering
Now that you have set up the filter, you can see what is being filtered in Elastic through the Azure OpenAI content filtering dashboard.
- Navigate to the Dashboard Menu – Select the Dashboard menu option in Elastic and search for [Azure OpenAI] Content Filtering Overview to open the dashboard.
- Navigate to the Integrations Menu – Open the Integrations menu in Elastic, select Azure OpenAI, go to the Assets tab, and choose [Azure OpenAI] Content Filtering Overview from the dashboard assets.
The Azure OpenAI Content Filtering Overview dashboard in the Elastic integration provides insights into blocked requests, API latency, error rates. This dashboard also provides detailed breakdown of content being filtered by the content filtering policy.
Content Filter overview
When the content filtering system detects harmful content, you receive either an error on the API call if the prompt was deemed inappropriate, or the finish_reason on the response will be content_filter to signify that some of the completion was filtered.
This can be summarized as,
-
Prompt filters: The prompt content that is classified in the filtered category will return HTTP 400 error.
-
Non-streaming completion: When the content is filtered, non-streaming completions calls won't return any content. In rare cases with longer responses, a partial result can be returned. In these cases, the finish_reason is updated.
-
Streaming completion: For streaming completions calls, segments are returned back to the user as they're completed. The service continues streaming until either reaching a stop token, length, or when content that is classified at a filtered category and severity level is detected.
Prompt and response where content has been blocked
This dashboard section displays the original LLM prompt, inputs from various sources (API calls, applications, or chat interfaces), and the corresponding completion response. The panel below gives a view on the responses after applying content filtering policy for prompts and completions.
You can use the following code snippet to start integrating your current prompt and settings into your application to test the content filter:
chat_prompt = [
{
"role": "user",
"content": "How to kill a mocking bird?"
}
]
After running the code, you can find the content being filtered by violence category with the severity level medium.
Content filtered by content source (Input & Output)
The content filtering system helps monitor and moderate different categories of content based on severity levels. The categories typically include things like adult content, offensive language, hate speech, violence, and more. The severity levels indicate the degree of sensitivity or potential harm associated with the content. This panel helps the user to effectively monitor and filter out inappropriate or harmful content to maintain a safe environment.
These metrics can be categorized into the following groups:
- Blocked requests by category: Provides insights into the total blocked requests by category.
- Severity distribution by categories: Monitors the blocked requests by categories and severity distribution. The severity distribution may be either low, medium or high.
- Content filtered categories: Provides insights into the content filtered categories over time.
Reviewing the Azure OpenAI Billing dashboard
You can now look at what you are spending on Azure OpenAI.
Here is what you see on this dashboard:
- Total costs: This measures the total usage cost across all the model deployments.
- Overall Usage by model: This tracks the total usage costs broken down by model.
- Daily usage: Monitors usage costs on a daily basis.
- Daily usage costs by model: Monitors daily usage costs broken down by model deployments.
Conclusion
The Azure OpenAI integration makes it easy for you to collect a curated set of metrics and logs for your LLM-powered applications using Azure OpenAI along with content filtered responses. It comes with an out-of-the-box dashboard which you can further customize for your specific needs.
Deploy a cluster on our Elasticsearch Service or download the stack, spin up the new Azure OpenAI integration, open the curated dashboards in Kibana and start monitoring your Azure OpenAI service!