This article is the second in a series of three that dives into the intricacies of vector search also known as semantic search and how it is implemented in Elasticsearch.
The first part was focused on providing a general introduction to the basics of embeddings (aka vectors) and how vector search works under the hood.
Armed with all the vector search knowledge learned in the first article, this second part will guide you through the meanders of how to set up vector search and execute k-NN searches in Elasticsearch.
In the third part, we will leverage what we learned in the first two parts and build upon that knowledge by delving into how to craft powerful hybrid search queries in Elasticsearch.
Some background first
Even though Elasticsearch did not support vector search up until version 8.0 with the technical preview of the _knn_search
API endpoint, it has been possible to store vectors using the dense_vector
field type since the 7.0 release. At that point, vectors were simply stored as binary doc values but not indexed using any of the algorithms that we presented in our first article. Those dense vectors constituted the premises of the upcoming vector search features in Elasticsearch.
If you’re interested in diving more into the discussions that led to the current implementation of vector search in Elasticsearch, you can refer to this issue, which details all the hurdles that Elastic had to jump over in order to bring this feature to market. Very briefly, since Elasticsearch already made heavy use of Lucene as their underlying search engine, we also decided to utilize the same technology as our vector engine, and we explained the rationale behind that decision in a very transparent way.
With history matters out of the way, let’s now get to work.
How to set up k-NN
Vector search is available natively in Elasticsearch, and there’s nothing specific to install. We only need to create an index that defines at least one field of type dense_vector
, which is where your vector data will be stored and/or indexed.
The mapping below shows a dense_vector
field called title_vector
of dimension 3. Dense vectors stored and indexed in this field will use the dot_product
similarity function that we introduced in the first article in this series. It is worth noting that up to 8.11 hnsw
(i.e., Hierarchical Navigable Small Worlds) was the only algorithm supported by Apache Lucene for indexing dense vectors. Since then, other algorithms have been added and in the future, Elasticsearch might provide additional methods for indexing and searching dense vectors, but since it fully relies on Apache Lucene that will depend on what unfolds on that front.
The table below summarizes all available configuration parameters for the dense_vector
field type provided by Elasticsearch:
Table 1: The different configuration parameters for dense vectors
Parameter | Required | Description |
---|---|---|
dims | Yes (<8.11) No (8.11+) | The number of vector dimensions, which can’t exceed 1024 until 8.9.2, 2048 since 8.10.0 and 4096 since 8.11.0. Also, as of 8.11, this parameter is not required anymore and will default to the dimension of the first indexed vector. |
element_type | No | The data type of the vector element values. If unspecified, the default type is `float` (4 bytes), `byte` (1 byte) and `bit` are also available. |
index | No | Indicates whether to index vectors (if `true`) in a dedicated and optimized data structure or simply store them as binary doc values (if `false`). Until 8.10, the default value was `false` if not specified. As of 8.11, the default value is `true` if not specified. |
similarity | Yes (<8.11) No (8.11+) | Until 8.10, this parameter is required if `index` is `true` and defines the vector similarity metric to use for k-NN search. The available metrics are: a) `l2_norm`: L2 distance b) `dot_product`: dot product similarity c) `cosine`: cosine similarity d) `max_inner_product`: maximum inner product similarity. Also note that `dot_product` should be used only if your vectors are already normalized (i.e., they are unit vectors with magnitude 1), otherwise use `cosine` or `max_inner_product`. As of 8.11, if not specified, this parameter defaults to `l2_norm` if element_type is `bit` and to `cosine` otherwise. |
index_options | No | Here are the possible values for the `type` parameter depending on the version: a) Up until 8.11, only `hnsw` was supported. b) In 8.12, scalar quantization enabled `int8_hnsw` c) In 8.13, `flat` was added along with its scalar-quantized `int8_flat` sibling d) In 8.15, `int4_hnsw` and `int4_flat` were added e) In 8.18, binary quantization enabled `bbq_hnsw` and `bbq_flat`. You can check the official documentation to learn about their detailed description (https://www.elastic.co/guide/en/elasticsearch/reference/current/dense-vector.html#dense-vector-params) and how each algorithm can be configured. |
As we can see in the above table, since the version 8.11 the definition of vector fields has been drastically simplified:
data:image/s3,"s3://crabby-images/6d6ca/6d6ca21158089afa07ab48dedf7c1cba9690cf1a" alt=""
Regarding the support for scalar quantization added in 8.12, remember we talked about this compression technique in the first part of this series. We won’t dig deeper in this article, but you can learn more about how this was implemented in Lucene in another Elastic Search Labs article. Similarly, we won’t dive into better binary quantization (BBQ) added in 8.18, and we invite you to learn more about that new groundbreaking algorithm in this article.
That’s all there is to it! By simply defining and configuring a dense_vector
field, we can now index vector data in order to run vector search queries in Elasticsearch using either the knn
search option or the knn
DSL query (introduced in 8.12). Elasticsearch supports two different vector search modes: 1) exact search using the script_score
query and 2) approximate nearest neighbor search using the knn
search option or the knn
query (8.12+). We’re going to describe both modes next.
Exact search
If you recall from the first article of this series, where we reviewed the vector search landscape, an exact vector search simply boils down to performing a linear search, or brute-force search, across the full vector space. Basically, the query vector will be measured against each stored vector in order to find the closest neighbors. In this mode, the vectors do not need to be indexed in an HNSW graph but simply stored as binary doc values, and the similarity computation is run by a custom Painless script.
First, we need to define the vector field mapping in a way that the vectors are not indexed, and this can be done by specifying index: false
and no similarity
metric in the mapping:
The advantage of this approach is that vectors do not need to be indexed, which drastically lowers the ingestion time since there’s no need to build the underlying HNSW graph. However, depending on the size of the data set and your hardware, search queries can slow down pretty quickly as your data volume grows, since the more vectors you add, the more time is needed to visit each one of them (i.e., linear search has an O(n) complexity).
With the index being created and the data being loaded, we can now run an exact search using the following script_score
query:
As you can see, the script_score
query is composed of two main elements, namely the query
and the script
. In the above example, the query
part specifies a filter (i.e., price >= 100
), which restrains the document set against which the script
will be executed. If no query was specified, it would be equivalent to using a match_all
query, in which case the script would be executed against all vectors stored in the index. Depending on the number of vectors, the search latency can increase substantially.
Since vectors are not indexed, there’s no built-in algorithm that will measure the similarity of the query vector with the stored ones, this has to be done through a script, and luckily for us, Painless provides most of the similarity functions that we’ve learned so far, such as:
l1norm(vector, field)
: L1 distance (Manhattan distance)l2norm(vector, field)
: L2 distance (Euclidean distance)hamming(vector, field)
: Hamming distancecosineSimilarity(vector, field)
: Cosine similaritydotProduct(vector, field):
Dot product similarity
Since we’re writing a script, it is also possible to build our own similarity algorithm. Painless makes this possible by providing access to doc[<field>].vectorValue
, which allows iterating over the vector array, and doc[<field>].magnitude
, which returns the length of the vector.
To sum up, even though exact search doesn’t scale well, it might still be suitable for certain very small use cases, but if you know your data volume will grow over time, you need to consider resorting to k-NN search instead. That’s what we’re going to present next.
Approximate k-NN search
Most of the time, this is the mode you’re going to pick if you have a substantial amount of data and need to implement vector search using Elasticsearch. Indexing latency is a bit higher since Lucene needs to build the underlying HNSW graph to store and index all vectors. It is also a bit more demanding in terms of memory requirements at search time, and the reason it’s called “approximate” is because the accuracy can never be 100% like with exact search. Despite all this, approximate k-NN offers a much lower search latency and allows us to scale to millions, or even billions of vectors, provided your cluster is sized appropriately.
Let’s see how this works. First, let’s create a sample index with adequate vector field mapping to index the vector data (i.e., index: true
+ specific similarity
) and load it with some data:
Simple k-NN search
After running these two commands, our vector data is now properly indexed in a scalar-quantized HNSW graph and ready to be searched. Up until 8.11, the only way to run a simple k-NN search was by using the knn
search option, located at the same level as the query
section you’re used to, as shown in the query below:
In the above search payload, we can see that there is no query
section like for lexical searches, but a knn
section instead. We are searching for the two (k: 2
) nearest neighboring vectors to the specified query vector.
From 8.12 onwards, a new knn
search query has been introduced to allow for more advanced hybrid search use cases, which is a topic we are going to handle in our next article. Unless you have the required expertise to combine k-NN queries with other queries, Elastic recommends sticking to the knn
search option, which is easier to use. The first thing to note is that the new knn
search query doesn’t have a k
parameter, instead it uses the size
parameter like any other queries. The second notable thing is that the new knn
query enables to post-filter k-NN search results by leveraging a bool
query that combines one or more filters with the knn
search query, as shown in the code below:
The above query first retrieves the top 3 documents having the nearest neighboring vectors and then filters out the ones whose price is smaller than 100. It is worth noting that with this kind of post-filtering, you might end up with no results at all if your filters are too aggressive. Also note that this behavior is different from the usual boolean full-text queries, where the filter part is executed first in order to reduce the document set that needs to be scored. If you are interested to learn more about the differences between the knn
top-level search option and the new knn
search query, you can head over to another great Search Labs article for more details.
Let’s now move on and learn more about the num_candidates
parameter. The role of num_candidates
is to increase or decrease the likelihood of finding the true nearest neighbor candidates. The higher that number, the slower the search but also the more likely the real nearest neighbors will be found. As many as num_candidates
vectors will be considered on each shard, and the top k ones will be returned to the coordinator node, which will merge all shard-local results and return the top k vectors from the global results as illustrated in Figure 1, below:
data:image/s3,"s3://crabby-images/cf69e/cf69ed02d61bdaa31aa215e89b723cbfe907ee7d" alt="Nearest neighbor search accuracy using num_candidates"
Figure 1: Nearest neighbor search accuracy using num_candidates
Vectors id4
and id2
are the k local nearest neighbors on the first shard, respectively id5
and id7
on the second shard. After merging and reranking them, the coordinator node returns id4
and id5
as the two global nearest neighbors for the search query.
Several k-NN searches
If you have several vector fields in your index, it is possible to send several k-NN searches as the knn
section also accepts an array of queries, as shown below:
As we can see, each query can take a different value of k as well as a different boost factor. The boost factor is equivalent to a weight, and the total score will be a weighted average of both scores.
Filtered k-NN search
Similarly to what we saw with the script_score
query earlier, the knn
section also accepts the specification of a filter in order to reduce the vector space on which the approximate search should run. For instance, in the k-NN search below, we’re restricting the search to only documents whose price is greater than or equal to 100.
Now, you might wonder whether the data set is first filtered by price and then the k-NN search is run on the filtered data set (pre-filtering) or the other way around, i.e., the nearest neighbors are first retrieved and then filtered by price (post-filtering). It’s a bit of both actually. If the filter is too aggressive, the problem with pre-filtering is that k-NN search would have to run on a very small and potentially sparse vector space and would not return very accurate results. Whereas post-filtering would probably weed out a lot of high-quality nearest neighbors.
So, even though the knn
section filter
is considered as a pre-filter, it works during the k-NN search in such a way as to make sure that at least k neighbors can be returned. If you’re interested in the details of how this works, you can check out the following Lucene issue dealing with this matter.
Filtered k-NN search with expected similarity
In the previous section, we learned that when specifying a filter, we can reduce the search latency, but we also run the risk of drastically reducing the vector space to vectors that are partly or mostly dissimilar to the query vector. In order to alleviate this problem, k-NN search makes it possible to also specify a minimum similarity value that all returned vectors are expected to have. Reusing the previous query, it would look like this:
Basically, the way it works is that the vector space will be explored by skipping any vector that either doesn’t match the provided filter or that has a lower similarity than the specified one up until the k nearest neighbors are found. If the algorithm can’t honor at least k results (either because of a too-restrictive filter or an expected similarity that is too low), a brute-force search is attempted instead so that at least k nearest neighbors can be returned.
A quick word concerning how to determine that minimum expected similarity. It depends on which similarity metric you’ve chosen in your vector field mapping. If you have picked l2_norm
, which is a distance function (i.e., similarity decreases as distance grows), you will want to set the maximum expected distance in your k-NN query, that is, the maximum distance that you consider acceptable. In other words, a vector having a distance between 0 and that maximum expected distance with the query vector will be considered “close” enough to be similar.
If you have picked dot_product
or cosine
instead, which are similarity
functions (i.e., similarity decreases as the vector angle gets wider), you will want to set a minimum expected similarity. A vector having a similarity between that minimum expected similarity and 1 with the query vector will be considered “close” enough to be similar.
Applied to the sample filtered query above and the sample data set that we have indexed earlier, Table 1, below, summarizes the cosine similarities between the query vector and each indexed vector. As we can see, vectors 3 and 4 are selected by the filter (price >= 100), but only vector 3 has the minimum expected similarity (i.e., 0.975) to be selected.
Table 2: Sample filtered search with expected similarity
Vector | Cosine similarity | Price |
---|---|---|
1 | 0.8473 | 23 |
2 | 0.5193 | 9 |
3 | 0.9844 | 124 |
4 | 0.9683 | 1457 |
Limitations of k-NN
Now that we have reviewed all the capabilities of k-NN searches in Elasticsearch, let’s see the few limitations that you need to be aware of:
- Up until 8.11, k-NN searches cannot be run on vector fields located inside
nested
documents. From 8.12 onwards, this limitation has been lifted. However, such nestedknn
queries do not support the specification of a filter. - The
search_type
is always set todfs_query_then_fetch
, and it is not possible to change it dynamically. - The
ccs_minimize_roundtrips
option is not supported when searching across different clusters with cross-cluster search. - This has been mentioned a few times already, but due to the nature of the HNSW algorithm used by Lucene (as well as any other approximate nearest neighbors search algorithms for that matter), “approximate” really means that the k nearest neighbors being returned are not always the true ones.
Tuning k-NN
As you can imagine, there are quite a few options that you can use in order to optimize the indexing and search performance of k-NN searches. We are not going to review them in this article, but we really urge you to check them out in the official documentation if you are serious about implementing k-NN searches in your Elasticsearch cluster.
Beyond k-NN
Everything we have seen so far leverages dense vector models (hence the dense_vector
field type), in which vectors usually contain essentially non-zero values. Elasticsearch also provides an alternative way of performing semantic search using sparse vector models.
Elastic has created a sparse NLP vector model called Elastic Learned Sparse EncodeR, or ELSER for short, which is an out-of-domain (i.e., not trained on a specific domain) sparse vector model that does not require any fine-tuning. It was pre-trained on a vocabulary of approximately 30000 terms, and being a sparse model, it means that vectors have the same number of values, most of which are zero.
The way it works is pretty simple. At indexing time, the sparse vectors (term / weight pairs) are generated using the inference
ingest processor and stored in fields of type sparse_vector
, which is the sparse counterpart to the dense_vector
field type. At query time, a specific DSL query also called sparse_vector
will replace the original query terms with terms available in the ELSER model vocabulary which are known to be the most similar to them given their weights.
We won’t dive deeper into ELSER in this article, but if you’re eager to discover how this works, you can check out this seminal article as well as the official documentation, which explains the topic in great detail.
A quick glimpse into some upcoming related topics
Elasticsearch also supports combining lexical search and vector search, and that will be the subject of the next and final article of this series.
So far, we’ve had to generate the embeddings vectors outside of Elasticsearch and pass them explicitly in all our queries. Would it be possible to just provide the query text and a model would generate the embeddings on the fly? Well, the good news is that this is possible with Elasticsearch either by leveraging a construct called query_vector_builder
(for dense vectors) or using the new semantic_text
field type and semantic
DSL query (for sparse vectors), and you can learn more about these techniques in this article.
Let’s conclude
In this article, we delved deeply into Elasticsearch vector search support. We first shared some background on Elastic’s quest to provide accurate vector search and why we decided to use Apache Lucene as our vector indexing and search engine.
We then introduced the two main ways to perform vector search in Elasticsearch, namely either by leveraging the script_score
query in order to run an exact brute-force search or by resorting to using approximate nearest neighbor search via the knn
search option or the knn
search query introduced in 8.12.
We showed how to run a simple k-NN search and, following up on that, we reviewed all the possible ways of configuring the knn
search option and query using filters and expected similarity and how to run multiple k-NN searches at the same time.
To wrap up, we listed some of the current limitations of k-NN searches and what to be aware of. We also invited you to check out all the possible options that can be used to optimize your k-NN searches.
If you like what you’re reading, make sure to check out the other parts of this series:
- Part 1: A Quick Introduction to Vector Search
- Part 3: Hybrid Search Using Elasticsearch (stay tuned!)
Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!
Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.