This is a cache of https://www.elastic.co/search-labs/blog/searchable-snapshots-benchmark. It is a snapshot of the page at 2025-01-16T00:37:39.093+0000.
Ice, Ice, Maybe: Measuring Searchable Snapshots Performance - Elasticsearch Labs

Ice, Ice, Maybe: Measuring Searchable Snapshots Performance

Learn how Elastic’s searchable snapshots enable the frozen tier to perform on par with the hot tier, demonstrating latency consistency and reducing costs.

For 99.9% of queries, the frozen tier performs on par with the hot tier, showcasing exceptional latency consistency.

The frozen data tier can achieve both low cost and good performance by leveraging Elastic's Searchable Snapshots - which offer a compelling solution for managing vast amounts of data while maintaining the performant searchability of data within a budget.

In this article, we delve into a benchmark of Elastic's hot and frozen data tiers by running sample queries on 105 terabytes of logs spanning more than 90 days. These queries replicate common tasks within Kibana, including search with highlighting, total hits, date histogram aggregation, and terms aggregation. The results reveal that Elastic's frozen data tier is quick and delivers latency comparable to its hot tier, with only the first query to the object store being slower - subsequent queries are fast.

We replicated the way a typical user would interact with a hot-frozen deployment through Kibana's Discover - its main interface for interacting with indexed documents.

When a user issues a search using Discover's search bar three tasks are typically executed in parallel:

  • a search and highlight operation on 500 docs that doesn't track the total amount of hits (referred as discover_search tasks on the results)
  • a search that tracks the total hits (discover_search_total in the results)
  • a date histogram aggregation to construct the bar chart (referred as discover_date_histogram)

and also

  • a terms aggregation (referred as discover_terms_agg) when/if the user clicks the left side bar

Data tiers in Elastic

Some types of data decrease in value over time. It's natural to think about application logs where the most recent records are usually the ones that need to be queried more frequently and also need the fastest possible response time. But there are several other examples of such data like medical records (detailed patient histories, diagnoses and physician notes); legal documents (contracts, court rulings, case files, etc.) and bank records (transaction records including descriptions of purchases and merchant names)-just to cite three. All contain unstructured or semi-structured text that requires efficient search capabilities to extract relevant information. As these records age, their immediate relevance may diminish, but they still hold significant value for historical analysis, compliance, and reference purposes.

Elastic's data tiers — Hot, Warm, Cold, and Frozen– provide the ideal balance of speed and cost, ensuring you maximize the value of these types of data as they age without sacrificing usability. Through both Kibana and Elasticsearch's search API the use of the underlying data tiers is always automatic and transparent–users don't need to issue search queries in a different way to retrieve data from any specific tier (no need to manually restore the data, or "rehydrate").

In this blog we keep it simple by using solely the Hot and Frozen tiers, in what is commonly called a hot-frozen scenario.

How the Frozen Tier Works

In a hot-frozen scenario, data begins its journey in the hot tier, where it is actively ingested and queried. The hot tier is optimized for high-speed read and write operations, making it ideal for handling the most recent and frequently accessed data. As data ages and becomes less frequently accessed, it is transitioned to the frozen tier to optimize storage costs and resource utilization.

The transition from the hot tier to the frozen tier involves converting the data into searchable snapshots. Searchable snapshots leverage the snapshot mechanism used for backups, allowing the data to be stored in a cost-effective manner while still being searchable. This eliminates the need for replica shards, significantly reducing the local storage requirements.

Once the data is in the frozen tier, it is managed by nodes specifically designated for this purpose. These nodes do not need to have enough disk space to store full copies of all indices. Instead, they utilize an on-disk Least Frequently Used (LFU) cache. This cache stores only portions of the index data that are downloaded from the blob store as needed to serve queries. The on-disk cache functions similarly to an operating system's page cache, enhancing access speed to frequently requested parts of the data.

When a query is executed in the frozen tier, the process involves several steps to ensure efficient data retrieval and caching:

1. Read Requests Mapping: At the Lucene level, read requests are mapped to the local cache. This mapping determines whether the requested data is already present in the cache.

2. Cache Mishandling: If the required data is not available in the local cache (a cache miss), Elasticsearch handles this by downloading a larger region of the Lucene file from the blob store. Typically, this region is a 16MB chunk, which is a balance between minimizing the number of fetches and optimizing the amount of data transferred.

3. Adding Data to Cache: The downloaded chunk is then added to the local cache. This process ensures that subsequent read requests for the same region can be served directly from the local cache, significantly improving query performance by reducing the need to repeatedly fetch data from the blob store.

4. Cache Configuration Options:

  • Shared Cache Size: This setting accepts either a percentage of the total disk space or an absolute byte value. For dedicated frozen tier nodes, the default is 90% of the total disk space.
xpack.searchable.snapshot.shared_cache.size: 4TB
  • Max Headroom: Defines the maximum headroom to maintain. If not explicitly set, it defaults to 100GB for dedicated frozen tier nodes.
xpack.searchable.snapshot.shared_cache.size.max_headroom: 100GB

5. Eviction Policy: The node-level shared cache uses a LFU policy to manage its contents. This policy ensures that frequently accessed data remains in the cache, while less frequently accessed data is evicted to make room for new data. This dynamic management of the cache helps maintain efficient use of disk space and quick access to the most relevant data.

6. Lucene Index Management: To further optimize resource usage, the Lucene index is opened only on-demand—when there is an active search. This approach allows a large number of indices to be managed on a single frozen tier node without consuming excessive memory.

Methodology

We ran the tests on a six node cluster in Elastic Cloud hosted on Google Cloud Platform on N2 family nodes:

  • 3 x gcp.es.datahot.n2.68x10x45 - Storage-optimized Elasticsearch instances for hot data.
  • 3 x gcp.es.datafrozen.n2.68x10x90 - Storage-optimized (dense) Elasticsearch instances serving as a cache tier for frozen data.

We measured the following spans, which also equate to Terabytes in size, since we indexed one Terabyte per day.

We used Rally to run the tests, below is a sample test relative to an uncached search on one day of frozen data (discover_search_total-1d-frozen-nocache), iterations refer to the number of times the entire set of operations is repeated, which in this case is 10. Each operation defines a specific task or set of tasks to be performed, and in this example, it is a composite operation. Within this operation, there are multiple requests that specify the actions to be taken, such as clearing the frozen cache by issuing a POST request. The stream within a request indicates a sequence of related actions, such as submitting a search query and then retrieving and deleting the results.

{
  "iterations": 10,
  "operation": {
    "operation-type": "composite",
    "name": "discover_search_total-1d-frozen-nocache",
    "requests": [
      {
        "name": "clear-frozen-cache",
        "operation-type": "raw-request",
        "path": "/_searchable_snapshots/cache/clear",
        "method": "POST"
      },
      {
        "stream": [
          {
            "name": "async-search",
            "operation-type": "submit-async-search",
            "index": "hotfrozenlogs",
            "request-params": {
              "track_total_hits": "true"
            },
            "body": {
              "_source": false,
              "size": 0,
              "query": {
                "bool": {
                  "filter": [
                    {
                      "multi_match": {
                        "type": "best_fields",
                        "query": "elk lion rider",
                        "lenient": true
                      }
                    },
                    {
                      "range": {
                        "@timestamp": {
                          "format": "strict_date_optional_time",
                          "gte": "2024-02-15T00:00:00",
                          "lt": "2024-02-16T00:00:00"
                        }
                      }
                    }
                  ]
                }
              },
              "stored_fields": [
                "*"
              ],
              "fields": [
                {
                  "field": "@timestamp",
                  "format": "date_time"
                }
              ]
            }
          }
        ]
      },
      {
        "operation-type": "get-async-search",
        "retrieve-results-for": [
          "async-search"
        ]
      },
      {
        "operation-type": "delete-async-search",
        "delete-results-for": [
          "async-search"
        ]
      }
    ]
  }
}

Each test would run for 10 times per benchmark run, and we performed 500 benchmark runs across several days, therefore the sample for each task is 5,000. Having a high amount of measurements is essential when we want to ensure statistical significance and reliability of the results. This large sample size helps to smooth out anomalies and provides a more accurate representation of performance, allowing us to draw meaningful conclusions from the data.

Results

The detailed results are outlined below. The "tip of the candle" represents the max (or p100) value observed within all the requests for a specific operation, and they are grouped by tier. The green value represents the p99.9, or the value below what 99.9% of the requests would fall.

Due to how Kibana interacts with Elasticsearch–which is via async searches–a more logical way of representing the time is by using horizontal bar charts as below. Since the requests are asynchronous and parallel, they will complete at different times. You don't have to wait for all of them to start seeing query results, and this is how we read the benchmark results.

1 Day Span / 1 Terabyte

Hot performance was 99.9% between 543ms and 2 seconds, while frozen was between 558ms and 11 seconds when cached, and between 1.8 seconds and 14 seconds when not cached.

0.1% of the times we observed a 28 second max latency on frozen.

7 Days Span / 7 Terabytes

Hot performance was 99.9% of the times between 552ms and 791ms, while frozen was between 1 and 12 seconds when cached, and between 2.3 and 14.5 seconds when not cached.

0.1% of the times we observed a maximum latency of 33 seconds on frozen.

14 Days Span / 14 Terabytes

Hot performance was 99.9% of the times between 550ms and 608ms, while frozen was between 1 and 12 seconds when cached, and between 2.3 and 14.5 seconds when not cached.

0.1% of the times we observed a maximum latency of 31 seconds on frozen.

30 Days Span / 30 Terabytes

We did not use hot data past 14 days on this test. Frozen performance was 99.9% of the times between 1 second and 11 seconds when cached and 2 to 12 seconds when not.

0.1% of the times we observed a maximum latency of 68 seconds on frozen.

60 Days Span / 60 Terabytes

Frozen performance was 99.9% of the times between 1 second and 11 seconds when cached and 2 to 13 seconds when not.

0.1% of the times we observed a maximum latency of 240 seconds on frozen.

90 Days Span / 90 Terabytes

Frozen performance was 99.9% of the times between 1 second and 11 seconds when cached and 2.5 to 13 seconds when not.

0.1% of the times we observed a maximum latency of 304 seconds on frozen.

Use Elastic's frozen data tier to cool down the cost of data storage

Elastic's frozen data tier redefines what's possible in data storage and retrieval. Benchmark results show that it delivers performance comparable to the hot tier for 99.9% of queries, efficiently handling typical user tasks. While rare instances of slightly higher latency (0.1% of the time) may occur, Elastic's searchable snapshots ensure a robust and cost-effective solution for managing large datasets. Whether you're searching through years of security data for advanced persistent threats or analyzing historical seasonal trends from logs and metrics, searchable snapshots and the frozen tier deliver unmatched value and performance. By adopting the frozen tier, organizations can optimize storage strategies, maintain responsiveness, keep data searchable, and stay within budget.

To learn more, see how to set up hot and frozen data tiers for your Elastic Cloud deployment.

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself