In the past, we discussed some of the challenges of having to search multiple HNSW graphs and how we were able to mitigate them. At that time we alluded to some further improvements we had planned. This post is the culmination of that work.
You might ask, why use multiple graphs at all? This is a side effect of an architectural choice in Lucene: immutable segments. As with most architectural choices there are pros and cons. For example, we’ve recently GA’d Serverless Elasticsearch. In this context, we’ve gained very significant benefits from immutable segments including efficient index replication and the ability to decouple index and query compute and autoscale them independently. For vector quantization, segment merges give us the opportunity to update parameters to adapt them to data characteristics. Along these lines, we think there are other advantages that having opportunities to measure data characteristics and revisit indexing choices affords.
In this post we will discuss the work we’ve been doing to significantly reduce the overhead of building multiple HNSW graphs and in particular to reduce the cost of merging graphs.
Background
In order to maintain a manageable number of segments Lucene periodically checks to see if it should merge segments. This amounts to checking if the current segment count exceeds a target segment count, which is determined by the base segment size and the merge policy. If the count is exceeded, Lucene merges groups of segments while the constraint is violated. This process has been described in detail elsewhere.
Lucene elects to merge similar sized segments because this achieves logarithmic growth in the write amplification. In the case of a vector index, write amplification is the number of times a vector will be inserted into a graph. Lucene will try to merge segments in groups of approximately 10. Consequently, vectors are inserted into a graph roughly times, where is the index vector count and is the expected base segment vector count. Because of the logarithmic growth, write amplification is single digits even for huge indices. However, the total time spent merging graphs is linearly proportional to the write amplification.
When merging HNSW graphs we already make a small optimization: retaining the graph for the largest segment and inserting vectors from the other segments into it. This is the reason for the 9/10 factor above. Below we show how we are able to do significantly better by using information from all the graphs we are merging.
Graph Merging
Previously we retained the largest graph and inserted vectors from the others ignoring the graphs that contain them. The key insight we make use of below is that each graph we discard contains important proximity information about the vectors it contains. We would like to use this information to accelerate inserting, at least some, of the vectors.
We focus on the problem of inserting a smaller graph into a larger graph , since this is an atomic operation we can use to build any merge policy.
The strategy is to find a subset of vertices of to insert into the large graph. We then use the connectivity of these vertices in the small graph to accelerate inserting the remaining vertices . In the following, we use and to denote the neighbors of a vertex in the small and large graph, respectively. Schematically the process is as follows.
MERGE-HNSW
Inputs and
1Find to insert into using COMPUTE-JOIN-SET
2Insert each vertex into
3for do
4
5
6FAST-SEARCH-LAYER
7SELECT-NEIGHBORS-HEURISTIC
8
We compute the set using a procedure we discuss below (line 1). Then we insert every vertex in into the large graph using the standard HNSW insertion procedure (line 2). For each vertex we haven’t inserted we find its neighbors that we have inserted and their neighbors in the large graph (lines 4 and 5). We use a FAST-SEARCH-LAYER
procedure seeded with this set (line 6) to find the candidates for the SELECT-NEIGHBORS-HEURISTIC
from the HNSW paper (line 7). In effect, we’re replacing SEARCH-LAYER
to find the candidate set in the INSERT
method (Algorithm 1 from the paper), which is otherwise unchanged. Finally, we add the vertex we just inserted into (line 8).
It's clear that for this to work every vertex in must have at least one neighbor in . In fact, we require that for every vertex in that for some , the maximum layer connectivity. We observe that in real HNSW graphs we see quite a spread of vertex degrees. The figure below shows a typical cumulative density function of vertex degree for the bottom layer of an Lucene HNSW graph.

Example vertex degree distribution
We explored using a fixed value for as well as making it a function of the vertex degree. This second choice leads to larger speedups with minimal impact on graph quality so went with the following
Note that | is equal to the degree of the vertex in the small graph by definition. Having a lower limit of two means that we will insert every vertex whose degree is less than two.
A simple counting argument suggests that if we choose carefully we need only insert around into directly. Specifically, we color an edge of the graph if we insert exactly one of its end vertices into . Then we know that for every vertex in to have at least neighbors in we need to color at least edges. Furthermore, we expect that
Here, is the average vertex degree in the small graph. For each vertex we color at most edges. Therefore, the total number of edges we expect to color is at most . We hope by choosing carefully we will color close to this number of edges and so in order to cover all the vertices needs to satisfy
This implies that .
Providing SEARCH-LAYER
dominates the runtime this suggests we could achieve up to a speed up in merge time. Given the logarithmic growth of the write amplification, this means even for very large indices we would typically only double the build time compared to building one graph.
The risk in this strategy is that we damage the graph quality. We initially tried with a no-op FAST-SEARCH-LAYER
. We found this degrades graph quality to the extent that recall as a function of latency was impacted, particularly when merging down to a single segment. We then explored various alternatives using a limited search of the graph. In the end, the most effective choice was the simplest. Use SEARCH-LAYER
but with a low ef_construction
. With this parameterisation we were able to achieve excellent quality graphs and still decrease the merge time by a little over 30% on average.
Computing the Join Set
Finding a good join set can be formulated as a graph covering problem. A greedy heuristic is a simple and effective heuristic for approximating optimal graph covers. The approach we take picks vertices one at a time to add to in decreasing gain order. The gain is defined as follows:
Here, denotes the count of neighbors of a vector in and is the indicator function. The gain includes the change in the count of the vertex we added to , i.e. , since we get closer to our goal by adding a less covered vertex. The gain calculation is illustrated in the figure below for the central orange vertex.

Vertex gain to add to join set J
We maintain the following state for each vertex :
- Whether it is stale,
- Its gain ,
- The count of adjacent vertices in denoted ,
- A random number in the range [0,1] that is used for tie breaking.
The pseudo code for computing the join set is as follows.
COMPUTE-JOIN-SET
Inputs
1
2
3for do
4
5
6
7while do
8 maximum gain vertex in 9Remove the state for from
10if is not stale then
11
12
13for do
14mark as stale if
15mark neighbors of stale if
16
17else
18
19if then
20
21return
We first initialize the state in lines 1-5.
In each iteration of the main loop we initially extract the maximum gain vertex (line 8), breaking ties at random. Before making any change, we need to check if the vertex’s gain is stale. In particular, each time we add a vertex into we affect the gain of other vertices:
- Since all its neighbors have an additional neighbor in their gains can change (line 14)
- If any of its neighbors are now fully covered all their neighbors’ gains can change (lines 14-16)
We recompute gains in a lazy fashion, so we only recompute the gain of a vertex if we want to insert it into (lines 18-20). Since gains only ever decrease we can never miss a vertex we should insert.
Note that we simply need to keep track of the total gain of vertices we’ve added to to determine when to exit. Furthermore, whilst at least one vertex will have non-zero gain so we always make progress.
Results
We ran experiments on four datasets that together cover our three supported distance metrics (Euclidean, cosine and inner product):
- quora-E5-small: 522931 documents, 384 dimensions and uses cosine similarity,
- cohere-wikipedia-v2: 1M documents, 768 dimensions and uses cosine similarity,
- gist: 1M documents, 960 dimensions and uses Euclidean distance, and
- cohere-wikipedia-v3: 1M documents, 1024 dimensions and uses maximum inner product.
For each dataset we evaluate two quantization levels:
- int8 – which uses a 1-byte integer per dimension and
- BBQ – which uses a single bit per dimension.
Finally, for each experiment we evaluated search quality at two retrieval depths and examine after building the index and then after force merging to a single segment.
In summary, we achieve consistent substantial speedups in indexing and merging while maintaining graph quality and so search performance in all cases.
Experiment 1: int8 quantization
The average speedups from the baseline to the candidate, the proposed changes, are:
Index Time Speedup: 1.28
Force Merge Speedup: 1.72
This corresponds to the following breakdown in run times

Index and merge times for the baseline and candidate merge strategies
For completeness the exact times are
Index | Merge | |||
---|---|---|---|---|
Dataset | baseline | candidate | build | candidate |
quora-E5-small | 112.41s | 81.55s | 113.81s | 70.87s |
wiki-cohere-v2 | 158.1s | 122.95s | 425.20s | 239.28s |
gist | 141.82s | 119.26s | 536.07s | 279.05s |
wiki-cohere-v3 | 211.86s | 168.22s | 654.97s | 414.12s |
Below we show the recall vs latency graphs that compare the candidate (dashed lines) to the baseline at two retrieval depths: recall@10 and recall@100 for indices with multiple segments (the final result of our default merge strategy after indexing all vectors) and after force merging to a single segment. A curve that is higher and further to the left is better, which means higher recall at lower latency.
As you can see, for multiple segment indices the candidate is better for the Cohere v3 dataset and slightly worse, but almost comparable, for all other datasets. After merging to a single segment recall curves are almost identical for all cases.

Recall @10 and @100 vs latency after building the index

Recall @10 and @100 vs latency after merging to a single segment
Experiment 2: BBQ quantization
The average speedups from the baseline to the candidate are:
Index Time Speedup: 1.33
Force Merge Speedup: 1.34
This corresponds to the following breakdown in run times

Index and merge time for the baseline and candidate merge strategies
For completeness the exact times are
Index | Merge | |||
---|---|---|---|---|
Dataset | baseline | candidate | build | candidate |
quora-E5-small | 70.71s | 58.25s | 59.38s | 40.15s |
wiki-cohere-v2 | 203.08s | 142.27s | 107.27s | 85.68s |
gist | 110.35s | 105.52s | 323.66s | 202.2s |
wiki-cohere-v3 | 313.43s | 190.63s | 165.98s | 159.95s |
For multiple segment indices the candidate is better for almost all datasets, except cohere v2 where the baseline is slightly better. For the single segment indices recall curves are almost identical for all cases.

Recall @10 and @100 vs latency after building the index

Recall @10 and @100 vs latency having merged to a single segment
Conclusion
The algorithm discussed in this blog will be available in the upcoming Lucene 10.2, and in the Elasticsearch release that is based on it. Users will be able to take advantage of the improved merge performance and reduced index build time in these new versions. This change is a part of our continuous effort to make Lucene and Elasticsearch fast and efficient for vector and hybrid search.
Try out vector search for yourself using this self-paced hands-on learning for Search AI. You can start a free cloud trial or try Elastic on your local machine now.