Our vision is clear: to support OpenTelemetry within Elastic. A key aspect of this transition are integrations — how can we seamlessly adapt all existing integrations to fit the OpenTelemetry model?
Elastic integrations are designed to simplify observability by providing tools to ingest application data, process it through Ingest pipelines, and deliver prebuilt dashboards for visualization. With OpenTelemetry support, data collection and processing will transition to the OpenTelemetry Collector, while dashboards will need to adopt the OpenTelemetry data structure.
From a Log to an Integration
Although the concept of an OpenTelemetry Integration has not yet been officially defined, we envision it as a structured collection of artifacts that enables users to start monitoring an application from scratch. Each artifact has a specific role; for example, an OpenTelemetry Collector configuration file, which must be integrated into the main Collector setup. This bundled configuration instructs the Collector on how to gather and process data from the relevant application.
In the OpenTelemetry Collector, data collection is handled by the receivers component. Some receivers are tailored for specific applications, such as Kafka or MySQL, while others are designed to support general data collection methods. The specialized receivers combine data gathering and transformation within a single component. For the more generic receivers, however, additional components are needed to refine and transform the incoming data into a more application-specific format. Let’s take a look at how we can build an integration for monitoring a Nginx Ingress Controller.
The Ingress Nginx is an Ingress controller for Kubernetes, using NGINX as a reverse proxy and load balancer. Widely adopted, it plays a crucial role in directing external traffic into Kubernetes services, making its usage, performance and health essential to observe. How can we start observing the external requests done to our Ingress controller? Fortunately, the NGINX Ingress Controller generates a structured log entry for each processed request. This structured format ensures that each log entry follows a consistent structure, making it straightforward to parse and generate consistent output.
log_format upstreaminfo '$remote_addr - $remote_user [$time_local]
"$request" ' '$status $body_bytes_sent "$http_referer" "$http_user_agent" '
'$request_length $request_time [$proxy_upstream_name]
[$proxy_alternative_upstream_name] $upstream_addr ' '$upstream_response_length
$upstream_response_time $upstream_status $req_id';
All the field's definition can be found here.
The OpenTelemetry Contrib Collector does not include a receiver capable of reading and parsing all fields in an NGINX Ingress log. There are two primary reasons for this:
- Application Diversity: The landscape of applications is vast, with each generating logs in unique formats. Developing and maintaining a dedicated receiver for every application would be resource-intensive and difficult to scale.
- Data Source Flexibility: Receivers are typically designed to collect data from a specific source, like an HTTP endpoint. However, in some cases, we may want to parse logs from an alternate source, such as an NGINX Ingress log file stored in an AWS S3 bucket.
These challenges can be addressed by combining receivers and processors. Receivers handle the collection of raw data, while processors can extract specific values when a known data structure is detected. Do we need a dedicated processor to parse NGINX logs? Not necessarily. The transform processor can handle this by modifying telemetry data according to a specified configuration. This configuration is written in the OpenTelemetry Transformation Language (OTTL), a language for transforming open telemetry data based on the OpenTelemetry Collector Processing Exploration.
The concept of processors in OpenTelemetry is quite similar to the Ingest pipeline strategy currently used in Elastic integrations. The main challenge, therefore, lies in migrating Ingest pipeline configurations to OpenTelemetry Collector configurations. For a deeper dive into the challenges of such migrations, check out this article.
For reference, you can view the current Elastic NGINX Ingress Controller Ingest pipeline configuration in the following link: Elastic NGINX Ingress Controller Ingest Pipeline.
Let’s start with the data collection. By default, the NGINX Ingress Controller logs to stdout, and Kubernetes captures and stores these logs in a file. Assuming that the OpenTelemetry Collector running the following configuration has access to the Kubernetes Pod logs, we can use the filelog receiver to read the controller logs:
receivers:
filelog/nginx:
include_file_path: true
include: [/var/log/pods/*nginx-ingress-nginx-controller*/controller/*.log]
operators:
- id: container-parser
type: container
This configuration is designed to exclusively read the controller's pod logs, focusing on their default file path within a Kubernetes node. Furthermore, since the Ingress controller does not inherently have access to its associated Kubernetes metadata, the
Avoiding duplicated logs
The configuration outlined in this blog is designed for Kubernetes environments, where the collector runs as a Kubernetes Pod. In such setups, handling Pod restarts properly is crucial. By default, the
To make the configuration resilient to restarts, you can use a storage extension to track file offsets. These offsets allow the
extensions:
file_storage:
directory: /var/lib/otelcol
receivers:
filelog/nginx:
storage: file_storage
...
Important: The /var/lib/otelcol directory must be mounted as part of a Kubernetes persistent volume to ensure the stored offsets persist across Pod restarts.
Data transformation with OpenTelemetry processors
Now it’s time to parse the structured log fields and transform them into queryable OpenTelemetry fields. Initially, we considered using regular expressions with the extract_patterns function available in the OpenTelemetry Transformation Language (OTTL). However, Elastic recently contributed a new OTTL function, ExtractGrokPatterns, based on Grok—a regular expression dialect that supports reusable, aliased expressions. The function’s underlying library Elastic Go-Grok ships with numerous predefined grok patterns that simplify working with pattern matching, like
Each Ingress Controller log entry begins with the client's source IP address (which may be a single IP or a list of IPs) and the username provided via Basic authentication, represented as “$remote_addr - $remote_user”. The Grok IP alias can be used to parse either an IPv4 or IPv6 address from the remote_addr field, while the
For example, the following OTTL configuration will transform an unstructured body message to a structured one with two fields:
- Parses a single IP address and assign it to the source.address key.
- Delimited by a “-”, captures the optional value of the authenticated username
in the user.namekey.
transform/parse_nginx_ingress_access/log:
log_statements:
- context: log
statements:
- set(body, ExtractGrokPatterns(body, "%{IP:source.address} - (-|%{GREEDYDATA:user.name})", true))
The screenshot below illustrates the transformation process, showing the original input data alongside the resulting structured format (diff):
In real-world scenarios, NGINX Ingress Controller logs may begin with a list of IP addresses or, at times, a domain name. These variations can be handled with an extended Grok pattern. Similarly, we can use Grok to parse an HTTP UserAgent and URL strings, but additional OTTL functions, such as URL or UserAgent, are required to extract meaningful data from these fields.
The complete configuration is available in the documentation for Elastic’s OpenTelemetry NGINX Ingress Controller integration: Integration Documentation.
Usage
The Elastic OpenTelemetry NGINX Ingress Controller is currently on Technical preview. To access it, you must enable the "Display beta integrations" toggle in the Integrations menu within Kibana.
By installing the Elastic OpenTelemetry NGINX Ingress Controller integration, a couple of dashboards will become available in your Kibana profile. One of these dashboards provides insights into access events for the controller, displaying information such as HTTP response status codes over time, request volume per URL, distribution of incoming requests by browser, top requested pages, and more. The screenshot below shows the NGINX Ingress Controller Access Logs dashboard, displaying data from a controller routing requests to an OpenTelemetry Demo deployment:
The second dashboard focuses on errors within the Nginx Ingress controller, highlighting the volume of error events generated over time:
To start gathering and processing controller logs, we recommend incorporating the OpenTelemetry Collector pipeline outlined in the integration’s documentation into your collector configuration: Integration Documentation. Keep in mind that this configuration requires access to the Kubernetes node's Pods logs, typically stored in
The OpenTelemetry Collector configuration service pipeline should include a similar configuration:
service:
extensions: [file_storage]
pipelines:
logs/nginx_ingress_controller:
receivers:
- filelog
processors:
- transform/parse_nginx_ingress_access/log
- transform/parse_nginx_ingress_error/log
- resourcedetection/system
exporters:
- elasticsearch
Adding GeoIP Metadata
As an optional enhancement, the OpenTelemetry Collector GeoIP processor can be configured and added to the pipeline to enrich each NGINX Ingress Controller log with geographical attributes, such as the request’s originating country, region, and city, enabling geo maps in Kibana to visualize traffic distribution and geographic patterns.
While the OpenTelemetry GeoIP processor is similar to Elastic's GeoIP processor, it requires users to provide their own local GeoLite2 database. The following configuration extends the Integration’s configuration to include the GeoIP processor with a MaxMind's database.
processors:
geoip:
context: record
providers:
maxmind:
database_path: /tmp/GeoLite2-City.mmdb
service:
extensions: [file_storage]
pipelines:
logs/nginx_ingress_controller:
receivers:
- filelog
processors:
- transform/parse_nginx_ingress_access/log
- transform/parse_nginx_ingress_error/log
- resourcedetection/system
- geoip
exporters:
- elasticsearch
Sample Kibana Map with the OpenTelemetry Nginx Ingress Controller integration:
Next steps
OpenTelemetry Log Event
A closer look at the OTTL integration’s statements reveals that the raw log message is replaced by the parsed fields. In other words, the configuration transforms the body log field* from a string into a structured map of key-value pairs, as seen in “set(body, ExtractGrokPatterns(body,...)”. This approach is based on treating each NGINX Ingress Controller log entry as an OpenTelemetry Event—a specialized type of LogRecord. Events are OpenTelemetry’s standardized semantic formatting for LogRecords, containing an “event.name” attribute which defines the structure of the body field. An NGINX Ingress Controller log record aligns well with the OpenTelemetry Event data model. It follows a structured format and clearly distinguishes between two event types: access logs and error logs. There is an ongoing PR to incorporate the NGINX Ingress controller log into the OpenTelemetry semantic convention: https://github.com/open-telemetry/semantic-conventions/pull/982
Operating system breakdown
Each controller log contains the source UserAgent, from which the integration extracts the browser that originated the request. This information is valuable for understanding user access patterns, as it provides insights into the types of browsers commonly interacting with your services. Additionally, an ongoing pull request into OTTL aims to extend this functionality by extracting operating system (OS) details as well, providing even deeper insights into the environments interacting with the NGINX Ingress Controller.
Configuration encapsulation
Setting up the configuration for the NGINX Ingress Controller integration can be somewhat tedious, as it involves adding several complex processor configurations to the existing collector pipelines. This process can quickly become cumbersome, especially for non-expert users or in cases where the collector configuration is already quite complex. In an ideal scenario, users would simply reference a pre-defined integration configuration, and the collector would automatically "unwrap" all the necessary components into the corresponding pipelines. This would significantly simplify the setup process, making it more accessible and reducing the risk of misconfigurations. To address this, there is a RFC (Request for Comments) proposing support for shareable, modular configurations within the OpenTelemetry Collector. This feature would allow users to easily collect signals from specific services or applications by referencing modular configurations, streamlining the setup and enhancing usability for complex scenarios.
*The OpenTelemetry community is currently discussing whether structured body-extracted information should be stored in the attributes or body field. For details, see this ongoing issue.
This product includes GeoLite2 data created by MaxMind, available from https://www.maxmind.com