This is a cache of https://www.elastic.co/search-labs/blog/elasticsearch-improve-performance-integration-tests. It is a snapshot of the page at 2025-02-05T00:43:45.088+0000.
Advanced integration <strong>tests</strong> with real Elasticsearch - Elasticsearch Labs

Advanced integration tests with real Elasticsearch

Mastering advanced Elasticsearch integration testing: Faster, smarter, and optimized.

In the previous post on integration testing, we covered shortening the execution time of integration tests relying on real Elasticsearch by changing the approach to data initialization strategies. In this installment, we're about to shorten the test suite duration even further, this time by applying advanced techniques to the Docker container running Elasticsearch and Elasticsearch itself.

Please note that the techniques described below can often be cherry-picked: you can choose what makes the most sense in your specific case.

Here Be Dragons: The Trade-offs

Before we delve into the ins and outs of various approaches in pursuit of performance, it's important to understand that not every optimization should always be applied. While they tend to improve things, they can also make the setup more obscure, especially to an untrained eye. In other words, in the following sections, we're not going to change anything within the tests; only the "infrastructure code around" is going to be redesigned. These changes can make the code more difficult to understand for less-experienced team members. Using the techniques described below is not rocket science, but some caution is advised, and experience is recommended.

Snapshots

When we left off our demo code, we were still initializing Elasticsearch with data for every test. This approach has some advantages, especially if our dataset differs between test cases, e.g., we index somewhat different documents sometimes. However, if all our test cases can rely on the same dataset, we can use the snapshot-and-restore approach.

It's helpful to understand how snapshot and restore work in Elasticsearch, which is explained in the official documentation.

In our approach, instead of handling this via the CLI or the DevOps method, we will integrate it into the setup code around our tests. This ensures smooth test execution on developer machines as well as in CI/CD.

The idea is quite simple: instead of deleting indices and recreating them from scratch before each test, we:

  • Create a snapshot in the container's local file system (if it doesn't already exist, as this will become necessary later).
  • Restore the snapshot before each test.

Prepare Snapshot Location

One important thing to note – which makes Elasticsearch different from many relational databases – is that before we send a request to create a snapshot, we first need to register a location where the snapshots can be stored, the so-called repository. There are many storage options available (which is very handy for cloud deployments); in our case, it's enough to keep them in a local directory inside the container.

Note:

The /tmp/... location used here is suitable only for volatile integration tests and should never be used in a production environment. In production, always store snapshots in a location that is safe and reliable for backups.

To avoid the temptation of storing backups in an unsafe location, we first add this to our test:

Next, we configure the ElasticsearchContainer to ensure it can use this location as a backup location:

Change the Setup

Now we're ready to append the following logic to our @BeforeAll method:

And our @BeforeEach method should start with:

Checking if the snapshot exists can be done by verifying that the REPO_LOCATION directory exists and contains some files:

The setupDataInContainer() method has minor changes: it's no longer called in @BeforeEach (we execute it on demand when needed), and the DELETE books request can be removed (as it is no longer necessary).

To create a snapshot, we first need to register a snapshot location and then store any number of snapshots there (although we'll keep only one, as the tests don't require more):

Once the snapshot is created, we can restore it before each test as follows:

Please note the following:

  • Before restoring an index, it cannot exist, so we must delete it first.
  • If you need to delete multiple indices, you can do so in a single curl call, e.g., "https://localhost:9200/indexA,indexB".
  • To chain several commands in a container, you don't need to wrap them in separate execInContainer calls; running a simple script can improve readability (and reduce some network round-trips).

In the example project, this technique shortened my build time to 26 seconds. While this might not seem like a significant gain at first glance, the approach is a universal technique that can be applied before, or even instead of, switching to _bulk ingestion (discussed in the previous post). In other words, you can prepare data for your tests in @BeforeAll in any way and then make a snapshot of it to use in @BeforeEach. If you want to maximize efficiency, you can even copy the snapshot back to the testing machine using elasticsearch.copyFileFromContainer(...), allowing it to serve as a form of cache that is only purged when you need to update the dataset (e.g., for new features to test). For a complete example, check out the tag snapshots.

RAM the Data

Sometimes, our test cases are noticeably data-heavy, which can negatively impact performance, especially if the underlying storage is slow. If your tests need to read and write large amounts of data, and the SSD or even hard drive is painfully slow, you can instruct the container to keep the data in RAM – provided you have enough memory available.

This is essentially a one-liner, requiring the addition of .withTmpFs(Map.of("/usr/share/elasticsearch/data", "rw")) to your container definition. The container setup will look like this:

The slower your storage is, the more significant the performance improvement will be, as Elasticsearch will now write to and read from a temporary file system in RAM.

Note:

As the name implies, this is a temporary file system, meaning it is not persistent. Therefore, this solution is suitable only for tests. Do not use this in production, as it could lead to data loss.

To assess how much this solution can improve performance on your hardware, you can try the tag tmpfs.

More Work, Same Time

The size of a product's codebase grows the most during the active development phase. Then, when it moves into a maintenance phase (if applicable), it usually involves just bug fixes. However, the size of the test base grows continuously, as both features and bugs need to be covered by tests to prevent regressions. Ideally, a bug fix should always be accompanied by a test to prevent the bug from reappearing. This means that even when development is not particularly active, the number of tests will keep growing. The approach described in this section provides hints on how to manage a growing test base without significantly increasing test suite duration, provided sufficient resources are available to enable parallelization.

Let's assume, for simplicity, that the number of test cases in our example has doubled (rather than writing additional tests, we will copy the existing ones for this demo).

In the simplest approach, we could add three more @Test methods to the BookSearcherIntTest class. We can then observe CPU and memory consumption by using, in a somewhat unorthodox way, one of Java's profilers: Java Flight Recorder. Since we added it to our POM, after running the tests, we can open recording-1.jfr in the main directory. The results may look like this in Environment -> Processes:

As you can see, running six tests in a single class doubled the time required. Additionally, the predominant color in the CPU usage chart above is... no color at all, as CPU utilization barely reaches 20% during peak moments. Underutilizing your CPU is wasteful when you’re paying for usage time (whether to cloud providers or in terms of your own wall clock time to get meaningful feedback).

Chances are, the CPU you’re using has more than one core. The optimization here is to split the workload into two parts, which should roughly halve the duration. To achieve this, we move the newly added tests into another class called BookSearcherAnotherIntTest and instruct Maven to run two forks for testing using -DforkCount=2. The full command becomes:

With this change, and using JFR and Java Mission Control, we observe the following:

Here, the CPU is utilized much more effectively.

This example should not be interpreted with a focus on exact numbers. Instead, what matters is the general trend, which applies not only to Java:

  • Check whether your CPU is being properly utilized during tests.
  • If not, try to parallelize your tests as much as possible (though other resources might sometimes limit you).
  • Keep in mind that different environments may require different parallelization factors (e.g., -DforkCount=N in Maven). It’s better to avoid hardcoding these factors in the build script and instead tune them per project and environment:
  • This can be skipped for developer machines if only a single test class is being run.
  • A lower number might suffice for less powerful CI environments.
  • A higher number might work well for more powerful CI setups.

For Java, it’s important to avoid having one large class and instead divide tests into smaller classes as much as it makes sense. Different parallelization techniques and parameters apply to other technology stacks, but the overarching goal remains to fully utilize your hardware resources.

To refine things further, avoid duplicating setup code across test classes. Keep the tests themselves separate from infrastructure/setup code. For instance, configuration elements like the image version declaration should be maintained in one place. In Testcontainers for Java, we can use (or slightly repurpose) inheritance to ensure that the class containing infrastructure code is loaded (and executed) before the tests. The structure would look like this:

For a complete demo, refer again to the example project on GitHub.

Reuse - Start Once and Once Only

The final technique described in this post is particularly useful for developer machines. It may not be suitable for traditional CIs (e.g., Jenkins hosted in-house) and is generally unnecessary for ephemeral CI environments (like cloud-based CIs, where build machines are single-use and decommissioned after each build). This technique relies on a preview feature of Testcontainers, known as reuse.

Typically, containers are cleaned up automatically after the test suite finishes. This default behavior is highly convenient, especially in long-running CIs, as it ensures no leftover containers regardless of the test results. However, in certain scenarios, we can keep a container running between tests so that subsequent tests don’t waste time starting it again. This approach is especially beneficial for developers working on a feature or bug fix over an extended period (sometimes days), where the same test (class) is run repeatedly.

How to Enable Reuse

Enabling reuse is a two-step process:

1. Mark the container as reusable when declaring it:

2. Opt-in to enable the reuse feature in the environments where it makes sense (e.g., on your development machine). The simplest and most persistent way to do this on a developer workstation is by ensuring that the configuration file in your $HOME directory has the proper content. In ~/.testcontainers.properties, include the following line:

That’s all! On first use, tests won’t be any faster because the container still needs to start. However, after the initial test:

  • Running docker ps will show the container still running (this is now a feature, not a bug).
  • Subsequent tests will be faster.

Note:

Once reuse is enabled, stopping the containers manually becomes your responsibility.

Leveraging Reuse with Snapshots or Init Data

The reuse feature works particularly well in combination with techniques like copying initialization data files to the container only once or using snapshots. With reuse enabled, there’s no need to recreate snapshots for subsequent tests, saving even more time. All the pieces of optimization start falling into place.

Reuse Forked Containers

While reuse works well in many scenarios, issues arise when combining reuse with multiple forks during the second run. This can result in errors or gibberish output related to containers or Elasticsearch being in an improper state. If you wish to use both improvements simultaneously (e.g., running many integration tests on a powerful workstation before submitting a PR), you’ll need to make an additional adjustment.

The Problem

The issue may manifest itself in errors like the following:

This happens due to how Testcontainers identifies containers for reuse.

When both forks start and no Elasticsearch containers are running, each fork initializes its own container. Upon restarting, however, each fork looks for a reusable container and finds one. Because all containers look identical to Testcontainers, both forks may select the same container. This results in a race condition, where more than one fork tries to use the same Elasticsearch instance. For example, one fork may be reinstating a snapshot while the other is attempting to do the same, leading to errors like the one above.

The Solution

To resolve this, we need to introduce differentiation between containers and ensure that forks select containers deterministically based on these differences.

Step 1: Update pom.xml

Modify the Surefire configuration in your pom.xml to include the following:

This adds a unique identifier (fork_${surefire.forkNumber}) for each fork as an environment variable.

Step 2: Modify Container Declaration

Adjust the Elasticsearch container declaration in your code to include a label based on the fork identifier:

The Effect

These changes ensure that each fork creates and uses its own container. The containers are slightly different due to the unique labels, allowing Testcontainers to assign them deterministically to specific forks.

This approach eliminates the race condition, as no two forks will attempt to reuse the same container. Importantly, the functionality of Elasticsearch within the containers remains identical, and tests can be distributed between the forks dynamically without affecting the outcome.

Was It Really Worth It?

As warned at the beginning of this post, the improvements introduced here should be applied with caution, as they make the setup code of our tests less intuitive. What are the benefits?

We started this post with three integration tests taking around 25 seconds on my machine. After applying all the improvements together and doubling the number of actual tests to six, the execution time on my laptop dropped to 8 seconds. Doubled the tests; shortened the build by two-thirds. It's up to you to decide if it makes sense for your case. ;-)

It Doesn't Stop Here

This miniseries on testing with real Elasticsearch ends here. In part one we discussed when it makes sense to mock Elasticsearch index and when it's a better idea to go for integration tests. In part two, we have addressed the most common mistakes that make your integration tests slow. This part three goes the extra mile to make integration tests run even faster, in seconds instead of minutes.

There are more ways to optimize your experience and reduce costs associated with integration tests of systems using Elasticsearch. Don’t hesitate to explore these possibilities and experiment with your tech stack.

If your case involves any of the techniques mentioned above, or if you have any questions, feel free to reach out on our Discuss forums or community Slack channel.

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself