Most AI-driven applications are currently focusing around increasing the value an end user, such as an SRE gets from AI. The main use case is the creation of various chatbots. These chatbots not only use large language models (LLMs), but are also using frameworks such as LangChain, and search to improve contextual information during a conversation (Retrieval Augmented Generation). Elastic’s sample RAG based Chatbot application, showcases how to use Elasticsearch with local data that has embeddings, enabling search to properly pull out the most contextual information during a query with a chatbot connected to an LLM of your choice. It's a great example of how to build out a RAG based application with Elasticsearch. However, what about monitoring the application?
Elastic provides the ability to ingest OpenTelemetry data with native OTel SDKs, the off the shelf OTel collector, or even Elastic’s Distributions of OpenTelemetry (EDOT). EDOT enables you to bring in logs, metrics and traces for your GenAI application and for K8s. However you will also generally need libraries to help trace specific components in your application. In tracing GenAI applications you can pick from a large set of libraries.
-
OpenTelemetry OpenAI Instrumentation-v2 - allows tracing LLM requests and logging of messages made by the OpenAI Python API library. (note v2 is built by OpenTelemetry, the non v2 version is from a specific vendor and not OpenTelemetry)
-
OpenTelemetry VertexAI Instrumentation - allows tracing LLM requests and logging of messages made by the VertexAI Python API library
-
Langtrace - commercially available library which supports all LLMs in one library, and all traces are also OTel native.
-
Elastic’s EDOT - which recently added tracing. See blog.
As you can see OpenTelemetry is the defacto mechanism that is converging to collect and ingest. OpenTelemetry is growing its support for this but it is also early days.
In this blog, we will walk through how to, with minimal code, observe a RAG based chatbot application with tracing using Langtrace. We previously covered Langtrace in a blog to highlight tracing Langchain.
In this blog we used langtrace OpenAI, Amazon Bedrock, Cohere, and others in one library.
Pre-requisites:
In order to follow along, these few pre-requisites are needed
- An Elastic Cloud account — sign up now, and become familiar with Elastic’s OpenTelemetry configuration. With Serverless no version required. With regular cloud minimally 8.17
- Git clone the RAG based Chatbot application and go through the tutorial on how to bring it up and become more familiar.
- An account on your favorite LLM (OpenAI, AzureOpen AI, etc), with API keys
- Be familiar with EDOT to understand how we bring in logs, metrics, and traces from the application through the OTel Collector
- Kubernetes cluster - I’ll be using Amazon EKS
- Look at Langtrace documentation also.
Application OpenTelemetry output in Elastic
Chatbot-rag-app
The first item that you will need to get up and running is the ChatBotApp, and once up you should see the following:
As you select some of the questions you will set a response based on the index that was created in Elasticsearch when the app initializes. Additionally there will be queries that are made to LLMs.
Traces, logs, and metrics from EDOT in Elastic
Once you have OTel Collector with EDOT configuration on your K8s cluster, and Elastic Cloud up and running you should see the following:
Logs:
In Discover you will see logs from the Chatbotapp, and be able to analyze the application logs, any specific log patterns (saves you time in analysis), and view logs from K8s.
Traces:
In Elastic Observability APM, you can also see tha chatbot details, which include transactions, dependencies, logs, errors, etc.
IMAGE
When you look at traces, you will be able to see the chatbot interactions in the trace.
-
You will see the end to end http call
-
Individual calls to elasticsearch
-
Specific calls such as invoke actions, and calls to the LLM
You can also get individual details of the traces, and look at related logs, and metrics related to that trace,
Metrics:
In addition to logs, and traces, any instrumented metrics will also get ingested into Elastic.
Setting it all up
In order to properly set up the Chatbot-app on K8s with telemetry sent over to Elastic, a few things must be set up:
-
Git clone the chatbot-rag-app, and modify one of the python files.
-
Next create a docker container that can be used in Kubernetes. The Docker build here in the Chatbot-app is good to use.
-
Collect all needed env variables. In this example we are using OpenAI, but the files can be modified for any of the LLMs. Hence you will have to get a few environmental variables loaded into the cluster. In the github repo there is a env.example for docker. You can pick and chose what is needed or not needed and adjust appropriately in the K8s file below.
-
Set up your K8s Cluster, and then install the OpenTelemetry collector with the appropriate yaml file and credentials. This will help collect K8s cluster logs and metrics also.
-
Utilize the two yaml files listed below to ensure you can run it on Kubernetes.
-
Init-index-job.yaml - Initiates the index in elasticsearch with the local corporate information
-
k8s-deployment-chatbot-rag-app.yaml - initializes the application frontend and backend.
-
Open the app on the load balancer URL against the chatbot-app service in K8s
-
Go to Elasticsearch and look at Discover for logs, go to APM and look for your chatbot-app and review the traces, and finally.
Modify the code for tracing with Langtrace
Once you curl the app and untar, go to the chatbot-rag-app directory:
curl https://codeload.github.com/elastic/elasticsearch-labs/tar.gz/main |
tar -xz --strip=2 elasticsearch-labs-main/example-apps/chatbot-rag-app
cd elasticsearch-labs-main/example-apps/chatbot-rag-app
Next open the
from opentelemetry.instrumentation.flask import flaskInstrumentor
from langtrace_python_sdk import langtrace
langtrace.init(batch=false)
flaskInstrumentor().instrument_app(app)
into the code:
import os
import sys
from uuid import uuid4
from chat import ask_question
from flask import flask, Response, jsonify, request
from flask_cors import CORS
from opentelemetry.instrumentation.flask import flaskInstrumentor
from langtrace_python_sdk import langtrace
langtrace.init(batch=false)
app = flask(__name__, static_folder="../frontend/build", static_url_path="/")
CORS(app)
flaskInstrumentor().instrument_app(app)
@app.route("/")
See the items in BOLD which will add in the langtrace library, and the opentelemetry flask instrumentation. This combination will provide and end to end trace for the https call all the way down to the calls to Elasticsearch, and to OpenAI (or other LLMs).
Create the docker container
Use the Dockerfile that is in the chatbot-rag-app directory as is and add the following line:
into the Dockerfile:
COPY requirements.txt ./requirements.txt
RUN pip3 install -r ./requirements.txt
RUN pip3 install --no-cache-dir langtrace-python-sdk
COPY api ./api
COPY data ./data
EXPOSE 4000
This enables the
Collecting the proper env variables:
first collect the env variables from Elastic:
Envs for index initialization in Elastic:
ELASTICSEARCH_URL=https://aws.us-west-2.aws.found.io
ELASTICSEARCH_USER=elastic
ELASTICSEARCH_PASSWORD=elastic
# The name of the Elasticsearch indexes
ES_INDEX=workplace-app-docs
ES_INDEX_CHAT_HISTORY=workplace-app-docs-chat-history
The
Envs for sending the OTel instrumentation you will need the following:
OTEL_EXPORTER_OTLP_ENDPOINT="https://123456789.apm.us-west-2.aws.cloud.es.io:443"
OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer xxxxx"
These credentials are found in Elastic under APM integration and under OpenTelemetry
Envs for LLMs
In this example we’re using OpenAI, hence only three variables are needed.
LLM_TYPE=openai
OPENAI_API_KEY=XXXX
CHAT_MODEL=gpt-4o-mini
All these variables will be needed in the Kubernetes yamls in the next step
Setup K8s cluster and load up OTel Collector with EDOT
This step is outlined in the following Blog. It’s a simple three step process.
This step will bring in all the K8s cluster logs and metrics and setup the OTel collector.
Setup secrets, initialize indices, and start the app
Now that the cluster is up, and you have your environmental variables, you will need to
-
Install and run the
k8s-deployments.yamlwith the variables -
Initialize the index
Essentially run the following:
kubectl create -f k8s-deployment.yaml
kubectl create -f init-index-job.yaml
Here are the two yamls you should use. Also found here
k8s-deployment.yaml
apiVersion: v1
kind: Secret
metadata:
name: genai-chatbot-langtrace-secrets
type: Opaque
stringData:
OTEL_EXPORTER_OTLP_HEADERS: "Authorization=Bearer%20xxxx"
OTEL_EXPORTER_OTLP_ENDPOINT: "https://1234567.apm.us-west-2.aws.cloud.es.io:443"
ELASTICSEARCH_URL: "YOUR_ELASTIC_SEARCH_URL"
ELASTICSEARCH_USER: "elastic"
ELASTICSEARCH_PASSWORD: "elastic"
OPENAI_API_KEY: "XXXXXXX"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: genai-chatbot-langtrace
spec:
replicas: 2
selector:
matchLabels:
app: genai-chatbot-langtrace
template:
metadata:
labels:
app: genai-chatbot-langtrace
spec:
containers:
- name: genai-chatbot-langtrace
image:65765.amazonaws.com/genai-chatbot-langtrace2:latest
ports:
- containerPort: 4000
env:
- name: LLM_TYPE
value: "openai"
- name: CHAT_MODEL
value: "gpt-4o-mini"
- name: OTEL_SDK_DISABLED
value: "false"
- name: OTEL_RESOURCE_ATTRIBUTES
value: "service.name=genai-chatbot-langtrace,service.version=0.0.1,deployment.environment=dev"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "http/protobuf"
envfrom:
- secretRef:
name: genai-chatbot-langtrace-secrets
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
name: genai-chatbot-langtrace-service
spec:
selector:
app: genai-chatbot-langtrace
ports:
- port: 80
targetPort: 4000
type: LoadBalancer
Init-index-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: init-elasticsearch-index-test
spec:
template:
spec:
containers:
- name: init-index
#update your image location for chatbot rag app
image: your-image-location:latest
workingDir: /app/api
command: ["python3", "-m", "flask", "--app", "app", "create-index"]
env:
- name: fLASK_APP
value: "app"
- name: LLM_TYPE
value: "openai"
- name: CHAT_MODEL
value: "gpt-4o-mini"
- name: ES_INDEX
value: "workplace-app-docs"
- name: ES_INDEX_CHAT_HISTORY
value: "workplace-app-docs-chat-history"
- name: ELASTICSEARCH_URL
valuefrom:
secretKeyRef:
name: chatbot-regular-secrets
key: ELASTICSEARCH_URL
- name: ELASTICSEARCH_USER
valuefrom:
secretKeyRef:
name: chatbot-regular-secrets
key: ELASTICSEARCH_USER
- name: ELASTICSEARCH_PASSWORD
valuefrom:
secretKeyRef:
name: chatbot-regular-secrets
key: ELASTICSEARCH_PASSWORD
envfrom:
- secretRef:
name: chatbot-regular-secrets
restartPolicy: Never
backoffLimit: 4
Open App with LoadBalancer URL
Run the kubectl get services command and get the URL for the chatbot app
% kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
chatbot-langtrace-service LoadBalancer 10.100.130.44 xxxxxxxxx-1515488226.us-west-2.elb.amazonaws.com 80:30748/TCP 6d23h
Play with app and review telemetry in Elastic
Once you go to the URL, you should see all the screens we described earlier in the beginning of this blog.
Conclusion
With Elastic's Chatbot-rag-app you have an example of how to build out a OpenAI driven RAG based chat application. However, you still need to understand how well it performs, whether its working properly, etc. Using OTel, Elastic’s EDOT and Langtrace gives you the ability to achieve this. Additionally, you will generally run this application on Kubernetes. Hopefully this blog provides the outline of how to achieve this.
Here are the other Tracing blogs:
App Observability with LLM (Tracing)-
LLM Observability -