Unify your data: Cross-cluster search with ES|QL is now generally available!

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

We are thrilled to announce that Cross-Cluster Search (CCS) with ES|QL is now generally available as an Enterprise feature. You can now use the advanced analytical power and simplified syntax of ES|QL to seamlessly query data across your entire fleet of Elasticsearch clusters. Whether you distribute clusters by geographic region, environment, or business unit, ES|QL for CCS empowers you to get answers from all your data with a single, elegant query.

One query to rule them all

Querying with ES|QL across multiple clusters is incredibly intuitive. As with existing cross-cluster search with the _search/ DSL API, simply prefix your index name with the alias of your configured remote cluster. That’s it.

Imagine you want to find the top 10 client IPs with the highest average response time, and your data lives in your local cluster plus two remote clusters: cluster-one and cluster-two. With ES|QL, your query is this simple:

This single query securely and efficiently gathers data from all specified clusters and computes the final result, giving you a unified view without moving or re-indexing a single byte of data. All other cross-cluster index pattern syntax features, such as wildcards and exclusions, are also supported, for example: FROM logs-*,cluster-*:logs-*,-cluster-three:*

The remote query processing works like this:

The user sends the query to the local Elasticsearch cluster. This query is routed to one of the cluster nodes, further referred to as the coordinator node. This node is responsible for executing the query, aggregating the results, and sending them to the user. The coordinator node parses the query, validates it, and generates a query plan that describes the exact manner of query execution. The coordinator then figures out which other local nodes and remote clusters need to be contacted to execute the query.
The coordinator node then sends out the query plan to other local nodes and to the remote clusters. Since the remote clusters each handle their own nodes, the coordinator node only communicates with a single coordinator node on the remote cluster. This works the same way as “minimize roundtrips” mode in cross-cluster DSL search.
The local nodes execute the query plan. Note that some commands, such as SORT, STATS, and LIMIT, cannot be fully executed remotely, since they need access to the full data set. The remote nodes will only perform part of the work, deferring the rest to the later aggregation stage.
The local cluster nodes return the results to the local cluster’s coordinator node, and the remote nodes return their results to the remote cluster’s coordinator node, which pre-aggregates the data if necessary and forwards it to the local cluster’s coordinator node.
The local cluster’s coordinator node runs the final aggregation stages for commands like SORT, STATS, LIMIT, etc., which need the full data set before they can be executed. Note that since part of the work may have been executed remotely (such as sorting the output of each node or counting the rows for STATS), the coordinator node usually needs to do less work than it is required to process the whole resulting data set.
After all the stages are finished, the coordinator node will format the response and return it to the user.

Resilient by design

Running queries in a distributed environment requires robust and predictable failure handling mechanisms. We've built a multi-layered resilience model into ES|QL CCS.

Handling unavailable clusters and shards: By default, ES|QL CCS will not fail the whole query if a subset of the remote clusters or shards is unavailable. This is controlled by two settings:
- skip_unavailable: true (default): This remote cluster setting ensures that if a remote cluster is unreachable or has no matching indices, it is gracefully skipped.
- allow_partial_results: true (default): At the shard level, this query setting ensures that if a shard fails mid-query (e.g., during a rolling restart), the query succeeds with results from all available shards. The engine will even automatically retry the failed shard first, maximizing the chance of a complete response.
Transparent failure detection: When a query returns partial results, it's critical for the client applications and users to know. The query response includes metadata for this purpose (when requested with the include_ccs_metadata query setting):
- The is_partial flag is set to true.
- The _clusters object provides a detailed breakdown of the status of each cluster (successful, skipped, or failed), so you can programmatically identify the source of the issue.
- In Kibana, the query inspector UI displays this metadata as follows:

Client-side timeout management: To prevent resource exhaustion from long-running queries, clients like Kibana can enforce timeouts. If an ES|QL query exceeds the Kibana configured timeout (search:timeout, default is 10 min), Kibana will automatically stop the query by calling the stop command, and will return any results that have accumulated up to that point. This functionality is also available to other clients that use asynchronous querying.

Data enrichment - local and remote

ES|QL ENRICH command allows adding data from an enrichment policy to the query result. This capability is available across clusters, supporting three modes.

By default, the query engine would decide by itself, based on performance and query semantics, on which clusters and nodes the data merging operation would run. This requires the enrichment policies to contain the same data on all clusters, for consistency. Example:

The default mode will essentially choose either the coordinator or the remote mode, depending on the nature of the query.

The coordinator mode will make the local coordinating node execute the ENRICH command, thus eliminating the need for the remote clusters to maintain their own enrich policies. However, this can have performance costs as this requires bringing all the remote data to the local coordinating node before it can be further processed. Example:

The data flow in this case looks like this:

Notice that in the coordinator mode, all processing of the ENRICH data and everything that follows must be executed on the local cluster’s coordinator node, thus requiring more data to be transferred and not benefiting from the processing capacity of the remote clusters.

The remote mode, instead, makes each remote cluster execute the ENRICH command. This mode does not require the policies to hold the same data, so each cluster can have its own policy. This is recommended when each cluster has its own set of localized data, such as mapping from IP to hostnames in different data centers, which both use private IP addresses. Example:

The data flow in this case looks like this:

Notice that in this case, the ENRICH processing and part of the following processing remain on the remote side, thus taking advantage of the distributed processing and requiring less data to be sent to the coordinating node.

Performance at scale

Performance is not an afterthought; it’s a core feature. We have optimized the coordination layer to ensure your cross-cluster queries are fast. Our benchmark analysis shows that for most queries, the overhead of coordinating across clusters is logarithmic. This means that even as you add more clusters, the local execution time on each cluster remains the dominant factor in overall query latency.

We achieved a particularly significant optimization with cluster-level reductions. For common SORT queries, which are essential for "top-N" style analyses, this optimization provides a speedup of approximately 50%. By pushing down more of the reduction work to the data clusters, we reduce the coordination overhead, letting you scale your analytical workloads further than ever before.

Below is a sample of benchmark results for various queries depending on the number of remote clusters connected to the local cluster, from local-only to 250 connected clusters. You can see that while the computation becomes somewhat slower when working with more clusters, due to the inevitable costs of network traffic, the increase is much less than linear to the number of clusters.

Monitoring your clusters

In order to help you gain visibility into your Cross-cluster search usage, the _cluster/stats data has been extended with ES|QL remote query monitoring data. In the ccs/_esql section of the data, you can find the data about cross-cluster ES|QL queries that have been executed, the performance statistics for each cluster, how many queries succeeded and failed, etc. Example:

What's next?

The journey doesn’t stop here. We are already working on bringing more capabilities to ES|QL CCS, with support for LOOKUP JOINs across remote clusters being next on our roadmap— allowing users to perform data enrichment without the need to create and maintain an enrichment policy.

Cross-cluster search with ES|QL is now available as an Enterprise feature. Upgrade to the latest version of Elasticsearch to unify your data and supercharge your analytics.

To learn more, check out the documentation. We can't wait to see what you build with it!

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

Report an issue