This is a cache of https://www.elastic.co/search-labs/blog/vector-search-set-up-elasticsearch. It is a snapshot of the page at 2025-02-13T00:56:44.974+0000.
How to set up ve<strong>c</strong>tor sear<strong>c</strong>h in Elasti<strong>c</strong>sear<strong>c</strong>h - Elasti<strong>c</strong>sear<strong>c</strong>h Labs

How to set up vector search in Elasticsearch

Learn how to set up vector search and execute k-NN searches in Elasticsearch.

This article is the second in a series of three that dives into the intricacies of vector search also known as semantic search and how it is implemented in Elasticsearch.

The first part was focused on providing a general introduction to the basics of embeddings (aka vectors) and how vector search works under the hood.

Armed with all the vector search knowledge learned in the first article, this second part will guide you through the meanders of how to set up vector search and execute k-NN searches in Elasticsearch.

In the third part, we will leverage what we learned in the first two parts and build upon that knowledge by delving into how to craft powerful hybrid search queries in Elasticsearch.

Some background first

Even though Elasticsearch did not support vector search up until version 8.0 with the technical preview of the _knn_search API endpoint, it has been possible to store vectors using the dense_vector field type since the 7.0 release. At that point, vectors were simply stored as binary doc values but not indexed using any of the algorithms that we presented in our first article. Those dense vectors constituted the premises of the upcoming vector search features in Elasticsearch.

If you’re interested in diving more into the discussions that led to the current implementation of vector search in Elasticsearch, you can refer to this issue, which details all the hurdles that Elastic had to jump over in order to bring this feature to market. Very briefly, since Elasticsearch already made heavy use of Lucene as their underlying search engine, we also decided to utilize the same technology as our vector engine, and we explained the rationale behind that decision in a very transparent way.

With history matters out of the way, let’s now get to work.

How to set up k-NN

Vector search is available natively in Elasticsearch, and there’s nothing specific to install. We only need to create an index that defines at least one field of type dense_vector, which is where your vector data will be stored and/or indexed.

The mapping below shows a dense_vector field called title_vector of dimension 3. Dense vectors stored and indexed in this field will use the dot_product similarity function that we introduced in the first article in this series. It is worth noting that up to 8.11 hnsw (i.e., Hierarchical Navigable Small Worlds) was the only algorithm supported by Apache Lucene for indexing dense vectors. Since then, other algorithms have been added and in the future, Elasticsearch might provide additional methods for indexing and searching dense vectors, but since it fully relies on Apache Lucene that will depend on what unfolds on that front.

The table below summarizes all available configuration parameters for the dense_vector field type provided by Elasticsearch:

Table 1: The different configuration parameters for dense vectors

ParameterRequiredDescription
dimsYes (<8.11) No (8.11+)The number of vector dimensions, which can’t exceed 1024 until 8.9.2, 2048 since 8.10.0 and 4096 since 8.11.0. Also, as of 8.11, this parameter is not required anymore and will default to the dimension of the first indexed vector.
element_typeNoThe data type of the vector element values. If unspecified, the default type is `float` (4 bytes), `byte` (1 byte) and `bit` are also available.
indexNoIndicates whether to index vectors (if `true`) in a dedicated and optimized data structure or simply store them as binary doc values (if `false`). Until 8.10, the default value was `false` if not specified. As of 8.11, the default value is `true` if not specified.
similarityYes (<8.11) No (8.11+)Until 8.10, this parameter is required if `index` is `true` and defines the vector similarity metric to use for k-NN search. The available metrics are: a) `l2_norm`: L2 distance b) `dot_product`: dot product similarity c) `cosine`: cosine similarity d) `max_inner_product`: maximum inner product similarity. Also note that `dot_product` should be used only if your vectors are already normalized (i.e., they are unit vectors with magnitude 1), otherwise use `cosine` or `max_inner_product`. As of 8.11, if not specified, this parameter defaults to `l2_norm` if element_type is `bit` and to `cosine` otherwise.
index_optionsNoHere are the possible values for the `type` parameter depending on the version: a) Up until 8.11, only `hnsw` was supported. b) In 8.12, scalar quantization enabled `int8_hnsw` c) In 8.13, `flat` was added along with its scalar-quantized `int8_flat` sibling d) In 8.15, `int4_hnsw` and `int4_flat` were added e) In 8.18, binary quantization enabled `bbq_hnsw` and `bbq_flat`. You can check the official documentation to learn about their detailed description (https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html#dense-vector-params) and how each algorithm can be configured.

As we can see in the above table, since the version 8.11 the definition of vector fields has been drastically simplified:

Regarding the support for scalar quantization added in 8.12, remember we talked about this compression technique in the first part of this series. We won’t dig deeper in this article, but you can learn more about how this was implemented in Lucene in another Elastic Search Labs article. Similarly, we won’t dive into better binary quantization (BBQ) added in 8.18, and we invite you to learn more about that new groundbreaking algorithm in this article.

That’s all there is to it! By simply defining and configuring a dense_vector field, we can now index vector data in order to run vector search queries in Elasticsearch using either the knn search option or the knn DSL query (introduced in 8.12). Elasticsearch supports two different vector search modes: 1) exact search using the script_score query and 2) approximate nearest neighbor search using the knn search option or the knn query (8.12+). We’re going to describe both modes next.

If you recall from the first article of this series, where we reviewed the vector search landscape, an exact vector search simply boils down to performing a linear search, or brute-force search, across the full vector space. Basically, the query vector will be measured against each stored vector in order to find the closest neighbors. In this mode, the vectors do not need to be indexed in an HNSW graph but simply stored as binary doc values, and the similarity computation is run by a custom Painless script.

First, we need to define the vector field mapping in a way that the vectors are not indexed, and this can be done by specifying index: false and no similarity metric in the mapping:

The advantage of this approach is that vectors do not need to be indexed, which drastically lowers the ingestion time since there’s no need to build the underlying HNSW graph. However, depending on the size of the data set and your hardware, search queries can slow down pretty quickly as your data volume grows, since the more vectors you add, the more time is needed to visit each one of them (i.e., linear search has an O(n) complexity).

With the index being created and the data being loaded, we can now run an exact search using the following script_score query:

As you can see, the script_score query is composed of two main elements, namely the query and the script. In the above example, the query part specifies a filter (i.e., price >= 100), which restrains the document set against which the script will be executed. If no query was specified, it would be equivalent to using a match_all query, in which case the script would be executed against all vectors stored in the index. Depending on the number of vectors, the search latency can increase substantially.

Since vectors are not indexed, there’s no built-in algorithm that will measure the similarity of the query vector with the stored ones, this has to be done through a script, and luckily for us, Painless provides most of the similarity functions that we’ve learned so far, such as:

  • l1norm(vector, field): L1 distance (Manhattan distance)
  • l2norm(vector, field): L2 distance (Euclidean distance)
  • hamming(vector, field): Hamming distance
  • cosineSimilarity(vector, field): cosine similarity
  • dotProduct(vector, field): Dot product similarity

Since we’re writing a script, it is also possible to build our own similarity algorithm. Painless makes this possible by providing access to doc[<field>].vectorValue, which allows iterating over the vector array, and doc[<field>].magnitude, which returns the length of the vector.

To sum up, even though exact search doesn’t scale well, it might still be suitable for certain very small use cases, but if you know your data volume will grow over time, you need to consider resorting to k-NN search instead. That’s what we’re going to present next.

Most of the time, this is the mode you’re going to pick if you have a substantial amount of data and need to implement vector search using Elasticsearch. Indexing latency is a bit higher since Lucene needs to build the underlying HNSW graph to store and index all vectors. It is also a bit more demanding in terms of memory requirements at search time, and the reason it’s called “approximate” is because the accuracy can never be 100% like with exact search. Despite all this, approximate k-NN offers a much lower search latency and allows us to scale to millions, or even billions of vectors, provided your cluster is sized appropriately.

Let’s see how this works. First, let’s create a sample index with adequate vector field mapping to index the vector data (i.e., index: true + specific similarity) and load it with some data:

After running these two commands, our vector data is now properly indexed in a scalar-quantized HNSW graph and ready to be searched. Up until 8.11, the only way to run a simple k-NN search was by using the knn search option, located at the same level as the query section you’re used to, as shown in the query below:


In the above search payload, we can see that there is no query section like for lexical searches, but a knn section instead. We are searching for the two (k: 2) nearest neighboring vectors to the specified query vector.

From 8.12 onwards, a new knn search query has been introduced to allow for more advanced hybrid search use cases, which is a topic we are going to handle in our next article. Unless you have the required expertise to combine k-NN queries with other queries, Elastic recommends sticking to the knn search option, which is easier to use. The first thing to note is that the new knn search query doesn’t have a k parameter, instead it uses the size parameter like any other queries. The second notable thing is that the new knn query enables to post-filter k-NN search results by leveraging a bool query that combines one or more filters with the knn search query, as shown in the code below:

The above query first retrieves the top 3 documents having the nearest neighboring vectors and then filters out the ones whose price is smaller than 100. It is worth noting that with this kind of post-filtering, you might end up with no results at all if your filters are too aggressive. Also note that this behavior is different from the usual boolean full-text queries, where the filter part is executed first in order to reduce the document set that needs to be scored. If you are interested to learn more about the differences between the knn top-level search option and the new knn search query, you can head over to another great Search Labs article for more details.

Let’s now move on and learn more about the num_candidates parameter. The role of num_candidates is to increase or decrease the likelihood of finding the true nearest neighbor candidates. The higher that number, the slower the search but also the more likely the real nearest neighbors will be found. As many as num_candidates vectors will be considered on each shard, and the top k ones will be returned to the coordinator node, which will merge all shard-local results and return the top k vectors from the global results as illustrated in Figure 1, below:

Vectors id4 and id2 are the k local nearest neighbors on the first shard, respectively id5 and id7 on the second shard. After merging and reranking them, the coordinator node returns id4 and id5 as the two global nearest neighbors for the search query.

Several k-NN searches

If you have several vector fields in your index, it is possible to send several k-NN searches as the knn section also accepts an array of queries, as shown below:

As we can see, each query can take a different value of k as well as a different boost factor. The boost factor is equivalent to a weight, and the total score will be a weighted average of both scores.

Similarly to what we saw with the script_score query earlier, the knn section also accepts the specification of a filter in order to reduce the vector space on which the approximate search should run. For instance, in the k-NN search below, we’re restricting the search to only documents whose price is greater than or equal to 100.

Now, you might wonder whether the data set is first filtered by price and then the k-NN search is run on the filtered data set (pre-filtering) or the other way around, i.e., the nearest neighbors are first retrieved and then filtered by price (post-filtering). It’s a bit of both actually. If the filter is too aggressive, the problem with pre-filtering is that k-NN search would have to run on a very small and potentially sparse vector space and would not return very accurate results. Whereas post-filtering would probably weed out a lot of high-quality nearest neighbors.

So, even though the knn section filter is considered as a pre-filter, it works during the k-NN search in such a way as to make sure that at least k neighbors can be returned. If you’re interested in the details of how this works, you can check out the following Lucene issue dealing with this matter.

Filtered k-NN search with expected similarity

In the previous section, we learned that when specifying a filter, we can reduce the search latency, but we also run the risk of drastically reducing the vector space to vectors that are partly or mostly dissimilar to the query vector. In order to alleviate this problem, k-NN search makes it possible to also specify a minimum similarity value that all returned vectors are expected to have. Reusing the previous query, it would look like this:

Basically, the way it works is that the vector space will be explored by skipping any vector that either doesn’t match the provided filter or that has a lower similarity than the specified one up until the k nearest neighbors are found. If the algorithm can’t honor at least k results (either because of a too-restrictive filter or an expected similarity that is too low), a brute-force search is attempted instead so that at least k nearest neighbors can be returned.

A quick word concerning how to determine that minimum expected similarity. It depends on which similarity metric you’ve chosen in your vector field mapping. If you have picked l2_norm, which is a distance function (i.e., similarity decreases as distance grows), you will want to set the maximum expected distance in your k-NN query, that is, the maximum distance that you consider acceptable. In other words, a vector having a distance between 0 and that maximum expected distance with the query vector will be considered “close” enough to be similar.

If you have picked dot_product or cosine instead, which are similarity functions (i.e., similarity decreases as the vector angle gets wider), you will want to set a minimum expected similarity. A vector having a similarity between that minimum expected similarity and 1 with the query vector will be considered “close” enough to be similar.

Applied to the sample filtered query above and the sample data set that we have indexed earlier, Table 1, below, summarizes the cosine similarities between the query vector and each indexed vector. As we can see, vectors 3 and 4 are selected by the filter (price >= 100), but only vector 3 has the minimum expected similarity (i.e., 0.975) to be selected.

Table 2: Sample filtered search with expected similarity

Vectorcosine similarityPrice
10.847323
20.51939
30.9844124
40.96831457

Limitations of k-NN

Now that we have reviewed all the capabilities of k-NN searches in Elasticsearch, let’s see the few limitations that you need to be aware of:

  • Up until 8.11, k-NN searches cannot be run on vector fields located inside nested documents. From 8.12 onwards, this limitation has been lifted. However, such nested knn queries do not support the specification of a filter.
  • The search_type is always set to dfs_query_then_fetch, and it is not possible to change it dynamically.
  • The ccs_minimize_roundtrips option is not supported when searching across different clusters with cross-cluster search.
  • This has been mentioned a few times already, but due to the nature of the HNSW algorithm used by Lucene (as well as any other approximate nearest neighbors search algorithms for that matter), “approximate” really means that the k nearest neighbors being returned are not always the true ones.

Tuning k-NN

As you can imagine, there are quite a few options that you can use in order to optimize the indexing and search performance of k-NN searches. We are not going to review them in this article, but we really urge you to check them out in the official documentation if you are serious about implementing k-NN searches in your Elasticsearch cluster.

Beyond k-NN

Everything we have seen so far leverages dense vector models (hence the dense_vector field type), in which vectors usually contain essentially non-zero values. Elasticsearch also provides an alternative way of performing semantic search using sparse vector models.

Elastic has created a sparse NLP vector model called Elastic Learned Sparse EncodeR, or ELSER for short, which is an out-of-domain (i.e., not trained on a specific domain) sparse vector model that does not require any fine-tuning. It was pre-trained on a vocabulary of approximately 30000 terms, and being a sparse model, it means that vectors have the same number of values, most of which are zero.

The way it works is pretty simple. At indexing time, the sparse vectors (term / weight pairs) are generated using the inference ingest processor and stored in fields of type sparse_vector, which is the sparse counterpart to the dense_vector field type. At query time, a specific DSL query also called sparse_vector will replace the original query terms with terms available in the ELSER model vocabulary which are known to be the most similar to them given their weights.

We won’t dive deeper into ELSER in this article, but if you’re eager to discover how this works, you can check out this seminal article as well as the official documentation, which explains the topic in great detail.

Elasticsearch also supports combining lexical search and vector search, and that will be the subject of the next and final article of this series.

So far, we’ve had to generate the embeddings vectors outside of Elasticsearch and pass them explicitly in all our queries. Would it be possible to just provide the query text and a model would generate the embeddings on the fly? Well, the good news is that this is possible with Elasticsearch either by leveraging a construct called query_vector_builder (for dense vectors) or using the new semantic_text field type and semantic DSL query (for sparse vectors), and you can learn more about these techniques in this article.

Let’s conclude

In this article, we delved deeply into Elasticsearch vector search support. We first shared some background on Elastic’s quest to provide accurate vector search and why we decided to use Apache Lucene as our vector indexing and search engine.

We then introduced the two main ways to perform vector search in Elasticsearch, namely either by leveraging the script_score query in order to run an exact brute-force search or by resorting to using approximate nearest neighbor search via the knn search option or the knn search query introduced in 8.12.

We showed how to run a simple k-NN search and, following up on that, we reviewed all the possible ways of configuring the knn search option and query using filters and expected similarity and how to run multiple k-NN searches at the same time.

To wrap up, we listed some of the current limitations of k-NN searches and what to be aware of. We also invited you to check out all the possible options that can be used to optimize your k-NN searches.

If you like what you’re reading, make sure to check out the other parts of this series:

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself