kNN search in Elasticsearch
Stack Serverless
A k-nearest neighbor (kNN) search finds the k nearest vectors to a query vector using a similarity metric such as cosine or L2 norm.
With Elasticsearch kNN search, you can retrieve results based on semantic meaning rather than exact keyword matches.
Common use cases for kNN vector similarity search include:
Search
- Semantic text search
- Image and video similarity
Recommendations
- Product recommendations
- Collaborative filtering
- Personalized content discovery
Analysis
- Anomaly detection
- Pattern matching
To run a kNN search in Elasticsearch:
Your data must be vectorized. You can use an NLP model in Elasticsearch or generate vectors outside Elasticsearch.
- Use the
dense_vector
field type for dense vectors. - Query vectors must have the same dimension and be created with the same model as the document vectors.
- Already have vectors? Refer to Bring your own dense vectors.
- Use the
Required index privileges:
create_index
ormanage
to create an index with adense_vector
fieldcreate
,index
, orwrite
to add dataread
to search the index
Elasticsearch supports two methods for kNN search:
- Approximate kNN: Fast, scalable similarity search using the
knn
option,knn
query, or aknn
retriever. Ideal for most production workloads. - Exact, brute-force kNN: Uses a
script_score
query with a vector function. Best for small datasets or precise scoring.
Approximate kNN offers low latency and good accuracy, while exact kNN guarantees accurate results but does not scale well for large datasets. With this approach, a script_score
query must scan each matching document to compute the vector function, which can result in slow search speeds. However, you can improve latency by using a query to limit the number of matching documents passed to the function. If you filter your data to a small subset of documents, you can get good search performance using this approach.
Approximate kNN search has specific resource requirements. All vector data must fit in the node’s page cache for efficient performance. Refer to the approximate kNN tuning guide for configuration tips.
To run an approximate kNN search:
Map one or more
dense_vector
fields. Approximate kNN search requires the following mapping options:- A
similarity
value. This value determines the similarity metric used to score documents based on similarity between the query and document vector. For a list of available metrics, see thesimilarity
parameter documentation. Thesimilarity
setting defaults tocosine
.
PUT image-index
{ "mappings": { "properties": { "image-vector": { "type": "dense_vector", "dims": 3, "similarity": "l2_norm" }, "title-vector": { "type": "dense_vector", "dims": 5, "similarity": "l2_norm" }, "title": { "type": "text" }, "file-type": { "type": "keyword" } } } }
- A
Index your data with embeddings.
POST image-index/_bulk?refresh=true
{ "index": { "_id": "1" } } { "image-vector": [1, 5, -20], "title-vector": [12, 50, -10, 0, 1], "title": "moose family", "file-type": "jpg" } { "index": { "_id": "2" } } { "image-vector": [42, 8, -15], "title-vector": [25, 1, 4, -12, 2], "title": "alpine lake", "file-type": "png" } { "index": { "_id": "3" } } { "image-vector": [15, 11, 23], "title-vector": [1, 5, 25, 50, 20], "title": "full moon", "file-type": "jpg" } ...
Query using the
knn
option or aknn
query.POST image-index/_search
{ "knn": { "field": "image-vector", "query_vector": [-5, 9, -12], "k": 10, "num_candidates": 100 }, "fields": [ "title", "file-type" ] }
The document _score
is a positive 32-bit floating-point number that ranks result relevance. In Elasticsearch kNN search, _score
is derived from the chosen vector similarity metric between the query and document vectors. Refer to similarity
for details on how kNN scores are computed.
Support for approximate kNN search was added in version 8.0. Before 8.0, dense_vector
fields did not support enabling index
in the mapping. If you created an index prior to 8.0 with dense_vector
fields, reindex using a new mapping with index: true
(which is the default value) to use approximate kNN.
For approximate kNN, Elasticsearch stores dense vector values per segment as an HNSW graph. Building HNSW graphs is compute-intensive, so indexing vectors can take time; you may need to increase client request timeouts for index and bulk operations. The approximate kNN tuning guide covers indexing performance, sizing, and configuration trade-offs that affect search performance.
In addition to search-time parameters, HNSW exposes index-time settings that balance graph build cost, search speed, and accuracy. When defining your dense_vector
mapping, use index_options
to set these parameters:
PUT image-index
{
"mappings": {
"properties": {
"image-vector": {
"type": "dense_vector",
"dims": 3,
"similarity": "l2_norm",
"index_options": {
"type": "hnsw",
"m": 32,
"ef_construction": 100
}
}
}
}
}
To gather results, the kNN API first finds a num_candidates
number of approximate neighbors per shard, computes similarity to the query vector, selects the top k
per shard, and merges them into the global top k
nearest neighbors.
- Increase
num_candidates
to improve recall and accuracy (at the cost of higher latency). - Decrease
num_candidates
for faster queries (with a potential accuracy trade-off).
Choosing num_candidates
is the primary knob for optimizing the latency/recall trade-off in Elasticsearch vector similarity search.
The approximate kNN search API also supports byte
(int8) value vectors alongside float
vectors. Use the knn
option to search a dense_vector
field with element_type
set to byte
and indexing enabled. Byte vectors reduce memory footprint and can improve cache efficiency for large-scale vector similarity search.
Explicitly map one or more
dense_vector
fields withelement_type
set tobyte
and indexing enabled.PUT byte-image-index
{ "mappings": { "properties": { "byte-image-vector": { "type": "dense_vector", "element_type": "byte", "dims": 2 }, "title": { "type": "text" } } } }
Index your data ensuring all vector values are integers within the range [-128, 127].
POST byte-image-index/_bulk?refresh=true
{ "index": { "_id": "1" } } { "byte-image-vector": [5, -20], "title": "moose family" } { "index": { "_id": "2" } } { "byte-image-vector": [8, -15], "title": "alpine lake" } { "index": { "_id": "3" } } { "byte-image-vector": [11, 23], "title": "full moon" }
Run the search using the
knn
option ensuring thequery_vector
values are integers within the range [-128, 127].POST byte-image-index/_search
{ "knn": { "field": "byte-image-vector", "query_vector": [-5, 9], "k": 10, "num_candidates": 100 }, "fields": [ "title" ] }
Note: In addition to the standard byte array, one can also provide a hex-encoded string value for the query_vector
param. As an example, the search request above can also be expressed as follows, which would yield the same results
POST byte-image-index/_search
{
"knn": {
"field": "byte-image-vector",
"query_vector": "fb09",
"k": 10,
"num_candidates": 100
},
"fields": [ "title" ]
}
If you want to provide float
vectors but still get the memory savings of byte
vectors, use the quantization feature. Quantization allows you to provide float
vectors, but internally they are indexed as byte
vectors. Additionally, the original float
vectors are still retained in the index.
The default index type for dense_vector
is int8_hnsw
.
To use quantization, set the dense_vector
index type to int8_hnsw
or int4_hnsw
.
PUT quantized-image-index
{
"mappings": {
"properties": {
"image-vector": {
"type": "dense_vector",
"element_type": "float",
"dims": 2,
"index": true,
"index_options": {
"type": "int8_hnsw"
}
},
"title": {
"type": "text"
}
}
}
}
Index your
float
vectors.POST quantized-image-index/_bulk?refresh=true
{ "index": { "_id": "1" } } { "image-vector": [0.1, -2], "title": "moose family" } { "index": { "_id": "2" } } { "image-vector": [0.75, -1], "title": "alpine lake" } { "index": { "_id": "3" } } { "image-vector": [1.2, 0.1], "title": "full moon" }
Run the search using the
knn
option. When searching, thefloat
vector is automatically quantized to abyte
vector.POST quantized-image-index/_search
{ "knn": { "field": "image-vector", "query_vector": [0.1, -2], "k": 10, "num_candidates": 100 }, "fields": [ "title" ] }
Because the original float
vectors are retained alongside the quantized index, you can use them for re-scoring: retrieve candidates quickly via the int8_hnsw
(or int4_hnsw
) index, then rescore the top k
hits using the original float
vectors. This provides the best of both worlds, fast search and accurate scoring.
POST quantized-image-index/_search
{
"knn": {
"field": "image-vector",
"query_vector": [0.1, -2],
"k": 15,
"num_candidates": 100
},
"fields": [ "title" ],
"rescore": {
"window_size": 10,
"query": {
"rescore_query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "cosineSimilarity(params.query_vector, 'image-vector') + 1.0",
"params": {
"query_vector": [0.1, -2]
}
}
}
}
}
}
}
The kNN search API supports restricting vector similarity search with a filter. The request returns the top k
nearest neighbors that also satisfy the filter query, enabling targeted, pre-filtered approximate kNN in Elasticsearch.
The following request performs an approximate kNN search filtered by the file-type
field:
POST image-index/_search
{
"knn": {
"field": "image-vector",
"query_vector": [54, 10, -2],
"k": 5,
"num_candidates": 50,
"filter": {
"term": {
"file-type": "png"
}
}
},
"fields": ["title"],
"_source": false
}
The filter is applied during approximate kNN search to ensure that k
matching documents are returned. In contrast, post-filtering applies the filter after the approximate kNN step and can return fewer than k
results; even when enough relevant documents exist.
In approximate kNN search with an HNSW index, applying filters can decrease performance as the engine must explore more of the graph to gather enough candidates that satisfy the filter and reach num_candidates
. This contrasts with conventional query filtering, where stricter filters often speed up queries.
To avoid significant performance drawbacks, Lucene implements the following strategies per segment:
- If the filtered document count is less than or equal to num_candidates, the search bypasses the HNSW graph and uses a brute force search on the filtered documents.
- While exploring the HNSW graph, if the number of nodes explored exceeds the number of documents that satisfy the filter, the search will stop exploring the graph and switch to a brute force search over the filtered documents.
You can perform hybrid retrieval by combining the knn
option with a standard query
. This blends vector similarity with lexical relevance, filters, and aggregations.
POST image-index/_search
{
"query": {
"match": {
"title": {
"query": "mountain lake",
"boost": 0.9
}
}
},
"knn": {
"field": "image-vector",
"query_vector": [54, 10, -2],
"k": 5,
"num_candidates": 50,
"boost": 0.1
},
"size": 10
}
This search finds the global top k = 5
vector matches, combines them with the matches from the match
query, and finally returns the 10 top-scoring results. The knn
and query
matches are combined through a disjunction, as if you took a boolean or between them. The top k
vector results represent the global nearest neighbors across all index shards.
The score of each hit is the sum of the knn
and query
scores. You can specify a boost
value to give a weight to each score in the sum. In the example above, the scores will be calculated as
score = 0.9 * match_score + 0.1 * knn_score
The knn
option can also be used with aggregations
. In general, Elasticsearch computes aggregations over all documents that match the search. So for approximate kNN search, aggregations are calculated on the top k
nearest documents. If the search also includes a query
, then aggregations are calculated on the combined set of knn
and query
matches.
Looking for a minimal configuration approach? The semantic_text
field type abstracts these vector search implementations with sensible defaults and automatic model management. It's the recommended approach for most users. Learn more about semantic_text.
kNN search enables you to perform semantic search by using a previously deployed text embedding model. Instead of literal matching on search terms, semantic search retrieves results based on the intent and the contextual meaning of a search query.
Under the hood, the text embedding NLP model converts your input query string (provided as model_text
) into a dense vector. The query vector is compared against an index containing dense vectors created with the same text embedding machine learning model. The search results are semantically similar as learned by the model.
To perform semantic search:
- You need an index that contains dense vector representations of the input data to search against.
- You must use the same text embedding model for search that you used to create the document vectors.
- The text embedding NLP model deployment must be started.
Reference the deployed text embedding model or the model deployment in the query_vector_builder
object, and provide the search string as model_text
:
(...)
{
"knn": {
"field": "dense-vector-field",
"k": 10,
"num_candidates": 100,
"query_vector_builder": {
"text_embedding": {
"model_id": "my-text-embedding-model",
"model_text": "The opposite of blue"
}
}
}
}
(...)
- The natural language processing task to perform. It must be
text_embedding
. - The ID of the text embedding model used to generate the query’s dense vector. Use the same model that produced the document embeddings in the target index. You can also provide the
deployment_id
as themodel_id
value. - The query string from which the model generates the dense vector representation.
For more information on how to deploy a trained model and use it to create text embeddings, refer to this end-to-end example.
In addition to hybrid retrieval, you can search more than one kNN vector field in a single request:
POST image-index/_search
{
"query": {
"match": {
"title": {
"query": "mountain lake",
"boost": 0.9
}
}
},
"knn": [ {
"field": "image-vector",
"query_vector": [54, 10, -2],
"k": 5,
"num_candidates": 50,
"boost": 0.1
},
{
"field": "title-vector",
"query_vector": [1, 20, -52, 23, 10],
"k": 10,
"num_candidates": 10,
"boost": 0.5
}],
"size": 10
}
This search retrieves the global top k = 5
neighbors for image-vector
and the global top k = 10
for title-vector
. These vector result sets are combined with the matches from the match
query, and the top 10 overall documents are returned. Multiple knn
clauses and the query
clause are combined via a disjunction (boolean OR). The top k
vector results represent the global nearest neighbors across all index shards.
The scoring for a document with the above configured boosts would be:
score = 0.9 * match_score + 0.1 * knn_score_image-vector + 0.5 * knn_score_title-vector
While kNN is a powerful tool, it always tries to return k
nearest neighbors. Consequently, when using knn
with a filter
, you could filter out all relevant documents and only have irrelevant ones left to search. In that situation, knn
will still do its best to return k
nearest neighbors, even though those neighbors could be far away in the vector space.
To control this, use the similarity
parameter in the knn
clause. This sets a minimum similarity threshold a vector must meet to be considered a match. The knn
search flow with this parameter is:
- Apply any user-provided
filter
queries. - Explore the vector space to gather
k
candidates. - Exclude any vectors with similarity below the configured
similarity
threshold.
similarity
is the true similarity value before it is transformed into _score
and before any boosts are applied.
For each configured similarity, the following shows how to invert _score
back to the underlying similarity. Use these when you want to filter based on _score
:
l2_norm
:sqrt((1 / _score) - 1)
cosine
:(2 * _score) - 1
dot_product
:(2 * _score) - 1
max_inner_product
:_score < 1
:1 - (1 / _score)
_score >= 1
:_score - 1
Example: the query searches for the given query_vector
, with a filter
applied, and requires that matches meet or exceed the specified similarity
threshold. Results below the threshold are not returned, even if fewer than k
neighbors remain.
POST image-index/_search
{
"knn": {
"field": "image-vector",
"query_vector": [1, 5, -20],
"k": 5,
"num_candidates": 50,
"similarity": 36,
"filter": {
"term": {
"file-type": "png"
}
}
},
"fields": ["title"],
"_source": false
}
In this data set, the only document with file-type = png
has the vector [42, 8, -15]
. The l2_norm
distance between [42, 8, -15]
and [1, 5, -20]
is 41.412
, which exceeds the configured similarity
threshold of 36
. As a result, this search returns no hits.
When text exceeds a model’s token limit, chunking must be performed before generating embeddings for each chunk. By combining nested
fields with dense_vector
, you can perform nearest passage retrieval without copying top-level document metadata.
Note that nested kNN queries only support score_mode=max
.
Here is a simple passage vectors index that stores vectors and some top-level metadata for filtering.
PUT passage_vectors
{
"mappings": {
"properties": {
"full_text": {
"type": "text"
},
"creation_time": {
"type": "date"
},
"paragraph": {
"type": "nested",
"properties": {
"vector": {
"type": "dense_vector",
"dims": 2,
"index_options": {
"type": "hnsw"
}
},
"text": {
"type": "text",
"index": false
},
"language": {
"type": "keyword"
}
}
},
"metadata": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"value": {
"type": "text"
}
}
}
}
}
}
With the above mapping, we can index multiple passage vectors along with storing the individual passage text.
POST passage_vectors/_bulk?refresh=true
{ "index": { "_id": "1" } }
{ "full_text": "first paragraph another paragraph", "creation_time": "2019-05-04", "paragraph": [ { "vector": [ 0.45, 45 ], "text": "first paragraph", "paragraph_id": "1", "language": "EN" }, { "vector": [ 0.8, 0.6 ], "text": "another paragraph", "paragraph_id": "2", "language": "FR" } ], "metadata": [ { "key": "author", "value": "Jane Doe" }, { "key": "source", "value": "Internal Memo" } ] }
{ "index": { "_id": "2" } }
{ "full_text": "number one paragraph number two paragraph", "creation_time": "2020-05-04", "paragraph": [ { "vector": [ 1.2, 4.5 ], "text": "number one paragraph", "paragraph_id": "1", "language": "EN" }, { "vector": [ -1, 42 ], "text": "number two paragraph", "paragraph_id": "2", "language": "EN" }] , "metadata": [ { "key": "author", "value": "Jane Austen" }, { "key": "source", "value": "Financial" } ] }
The query will seem very similar to a typical kNN search:
POST passage_vectors/_search
{
"fields": ["full_text", "creation_time"],
"_source": false,
"knn": {
"query_vector": [
0.45,
45
],
"field": "paragraph.vector",
"k": 2
}
}
Note that even with 4 total nested vectors, the response still returns two documents. kNN search over nested dense vectors will always diversify the top results over the top-level document; "k"
top-level documents will be returned, scored by their nearest passage vector (for example, "paragraph.vector"
).
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "passage_vectors",
"_id": "1",
"_score": 1.0,
"fields": {
"creation_time": [
"2019-05-04T00:00:00.000Z"
],
"full_text": [
"first paragraph another paragraph"
]
}
},
{
"_index": "passage_vectors",
"_id": "2",
"_score": 0.9997144,
"fields": {
"creation_time": [
"2020-05-04T00:00:00.000Z"
],
"full_text": [
"number one paragraph number two paragraph"
]
}
}
]
}
}
Want to filter by metadata in a nested kNN search? Add a filter
to your knn
clause.
To ensure correct results, each individual filter must target either:
- Top-level metadata
nested
metadata StackNoteA single
knn
search can include multiple filters: some over top-level metadata and others over nested metadata.
POST passage_vectors/_search
{
"fields": [
"creation_time",
"full_text"
],
"_source": false,
"knn": {
"query_vector": [0.45, 45],
"field": "paragraph.vector",
"k": 2,
"filter": {
"range": {
"creation_time": {
"gte": "2019-05-01",
"lte": "2019-05-05"
}
}
}
}
}
With the top-level creation_time
filter applied, only one document falls within the specified range.
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "passage_vectors",
"_id": "1",
"_score": 1.0,
"fields": {
"creation_time": [
"2019-05-04T00:00:00.000Z"
],
"full_text": [
"first paragraph another paragraph"
]
}
}
]
}
}
Stack
The following query applies a nested metadata filter. When scoring parent documents, it only considers nested vectors whose "paragraph.language" is "EN".
POST passage_vectors/_search
{
"fields": [
"full_text"
],
"_source": false,
"knn": {
"query_vector": [0.45, 45],
"field": "paragraph.vector",
"k": 2,
"filter": {
"match": {
"paragraph.language": "EN"
}
}
}
}
The next example combines two filters: one on nested metadata and one on top-level metadata. Parent documents are scored only by vectors with "paragraph.language": "EN" and whose parent documents fall within the specified time range.
POST passage_vectors/_search
{
"fields": [
"full_text"
],
"_source": false,
"knn": {
"query_vector": [0.45,45],
"field": "paragraph.vector",
"k": 2,
"filter": [
{"match": {"paragraph.language": "EN"}},
{"range": { "creation_time": { "gte": "2019-05-01", "lte": "2019-05-05"}}}
]
}
}
Stack
Nested knn search also allows pre-filtering on sibling nested fields. For example, given "paragraphs" and "metadata" as nested fields, we can search "paragraphs.vector" and filter by "metadata.key" and "metadata.value".
POST passage_vectors/_search
{
"fields": [
"full_text"
],
"_source": false,
"knn": {
"query_vector": [0.45, 45],
"field": "paragraph.vector",
"k": 2,
"filter": {
"nested": {
"path": "metadata",
"query": {
"bool": {
"must": [
{ "match": { "metadata.key": "author" } },
{ "match": { "metadata.value": "Doe" } }
]
}
}
}
}
}
}
Retrieving "inner_hits" when filtering on sibling nested fields is not supported.
To extract the nearest passage for each matched parent document, add inner_hits to the knn
clause.
When using inner_hits
with multiple knn
clauses, set a unique inner_hits.name
for each clause to avoid naming collisions that would fail the search request.
POST passage_vectors/_search
{
"fields": [
"creation_time",
"full_text"
],
"_source": false,
"knn": {
"query_vector": [
0.45,
45
],
"field": "paragraph.vector",
"k": 2,
"num_candidates": 2,
"inner_hits": {
"_source": false,
"fields": [
"paragraph.text"
],
"size": 1
}
}
}
Now the result will contain the nearest found paragraph when searching.
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "passage_vectors",
"_id": "1",
"_score": 1.0,
"fields": {
"creation_time": [
"2019-05-04T00:00:00.000Z"
],
"full_text": [
"first paragraph another paragraph"
]
},
"inner_hits": {
"paragraph": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "passage_vectors",
"_id": "1",
"_nested": {
"field": "paragraph",
"offset": 0
},
"_score": 1.0,
"fields": {
"paragraph": [
{
"text": [
"first paragraph"
]
}
]
}
}
]
}
}
}
},
{
"_index": "passage_vectors",
"_id": "2",
"_score": 0.9997144,
"fields": {
"creation_time": [
"2020-05-04T00:00:00.000Z"
],
"full_text": [
"number one paragraph number two paragraph"
]
},
"inner_hits": {
"paragraph": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 0.9997144,
"hits": [
{
"_index": "passage_vectors",
"_id": "2",
"_nested": {
"field": "paragraph",
"offset": 1
},
"_score": 0.9997144,
"fields": {
"paragraph": [
{
"text": [
"number two paragraph"
]
}
]
}
}
]
}
}
}
}
]
}
}
Use nested kNN search with dense_vector
fields and inner_hits
in Elasticsearch to retrieve the most relevant passages from structured, chunked documents.
This approach is ideal when you:
- Chunk your content into paragraphs, sections, or other nested structures.
- Want to retrieve only the most relevant nested section of each matching document.
- Generate your own vectors with a custom model instead of relying on the
semantic_text
field provided by Elastic's semantic search capability.
This example creates an index that stores a vector at the top level for the document title and multiple vectors inside a nested field for individual paragraphs.
PUT nested_vector_index
{
"mappings": {
"properties": {
"paragraphs": {
"type": "nested",
"properties": {
"text": {
"type": "text"
},
"vector": {
"type": "dense_vector",
"dims": 2,
"index_options": {
"type": "hnsw"
}
}
}
}
}
}
}
Add example documents with vectors for each paragraph.
POST _bulk
{ "index": { "_index": "nested_vector_index", "_id": "1" } }
{ "paragraphs": [ { "text": "First paragraph", "vector": [0.5, 0.4] }, { "text": "Second paragraph", "vector": [0.3, 0.8] } ] }
{ "index": { "_index": "nested_vector_index", "_id": "2" } }
{ "paragraphs": [ { "text": "Another one", "vector": [0.1, 0.9] } ] }
This example searches for documents with relevant paragraph vectors.
POST nested_vector_index/_search
{
"_source": false,
"knn": {
"field": "paragraphs.vector",
"query_vector": [0.5, 0.4],
"k": 2,
"num_candidates": 10,
"inner_hits": {
"size": 2,
"name": "top_passages",
"_source": false,
"fields": ["paragraphs.text"]
}
}
}
The inner_hits
block returns the most relevant paragraphs within each top-level document. Use the size
parameter to control how many matches are returned. If your query includes multiple kNN clauses, set a unique name
for each clause to avoid naming conflicts in the response.
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "nested_vector_index",
"_id": "1",
"_score": 1,
"inner_hits": {
"top_passages": {
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "nested_vector_index",
"_id": "1",
"_nested": {
"field": "paragraphs",
"offset": 0
},
"_score": 1,
"fields": {
"paragraphs": [
{
"text": [
"First paragraph"
]
}
]
}
},
{
"_index": "nested_vector_index",
"_id": "1",
"_nested": {
"field": "paragraphs",
"offset": 1
},
"_score": 0.92955077,
"fields": {
"paragraphs": [
{
"text": [
"Second paragraph"
]
}
]
}
}
]
}
}
}
},
{
"_index": "nested_vector_index",
"_id": "2",
"_score": 0.8535534,
"inner_hits": {
"top_passages": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.8535534,
"hits": [
{
"_index": "nested_vector_index",
"_id": "2",
"_nested": {
"field": "paragraphs",
"offset": 0
},
"_score": 0.8535534,
"fields": {
"paragraphs": [
{
"text": [
"Another one"
]
}
]
}
}
]
}
}
}
}
]
}
}
- Two documents matched the query.
- Document score, based on its most relevant paragraph.
- Matching paragraphs appear in the
inner_hits
section. - Actual paragraph text that matched the query.
- When using kNN search in cross-cluster search, the
ccs_minimize_roundtrips
option is not supported. - Elasticsearch uses the HNSW algorithm for efficient kNN. Like most approximate methods, HNSW trades perfect accuracy for speed, so results aren’t always the true k closest neighbors.
Approximate kNN always uses the dfs_query_then_fetch
search type to gather the global top k
matches across shards. You can’t set search_type
explicitly for kNN search.
When using quantized vectors for kNN search, can optionally rescore results to balance performance and accuracy, by doing:
- Oversampling — retrieving more candidates per shard.
- Rescoring — recalculating scores on those oversampled candidates using the original (non-quantized) vectors.
Because final scores are computed with the original float
vectors, rescoring combines:
- The performance and memory benefits of approximate retrieval with quantized vectors.
- The accuracy of using the original vectors for rescoring the top candidates.
All quantization introduces some accuracy loss, and higher compression generally increases that loss. In practice:
int8
typically needs little to no rescoring.int4
often benefits from rescoring for higher accuracy or recall; 1.5×–2× oversampling usually recovers most loss.bbq
commonly requires rescoring except on very large indices or models specifically designed for quantization; 3×–5× oversampling is generally sufficient, but higher may be needed for low-dimension vectors or embeddings that quantize poorly.
Stack Stack
Use rescore_vector
to automatically perform reranking. When you specify an oversample
value, approximate kNN will:
- Retrieve
num_candidates
candidates per shard. - Rescore the top
k * oversample
candidates per shard using the original vectors. - Return the top
k
rescored candidates.
Here is an example of using the rescore_vector
option with the oversample
parameter:
POST image-index/_search
{
"knn": {
"field": "image-vector",
"query_vector": [-5, 9, -12],
"k": 10,
"num_candidates": 100,
"rescore_vector": {
"oversample": 2.0
}
},
"fields": [ "title", "file-type" ]
}
This example will:
- Search using approximate kNN for the top 100 candidates.
- Rescore the top 20 candidates (
oversample * k
) per shard using the original, non quantized vectors. - Return the top 10 (
k
) rescored candidates. - Merge the rescored canddidates from all shards, and return the top 10 (
k
) results.
The following sections provide additional ways of rescoring:
You can use this option when you don’t want to rescore on each shard, but on the top results from all shards.
Use the rescore section in the _search
request to rescore the top results from a kNN search.
Here is an example using the top level knn
search with oversampling and using rescore
to rerank the results:
POST /my-index/_search
{
"size": 10,
"knn": {
"query_vector": [0.04283529, 0.85670587, -0.51402352, 0],
"field": "my_int4_vector",
"k": 20,
"num_candidates": 50
},
"rescore": {
"window_size": 20,
"query": {
"rescore_query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "(dotProduct(params.queryVector, 'my_int4_vector') + 1.0)",
"params": {
"queryVector": [0.04283529, 0.85670587, -0.51402352, 0]
}
}
}
},
"query_weight": 0,
"rescore_query_weight": 1
}
}
}
- The number of results to return, note its only 10 and we will oversample by 2x, gathering 20 nearest neighbors.
- The number of results to return from the KNN search. This will do an approximate KNN search with 50 candidates per HNSW graph and use the quantized vectors, returning the 20 most similar vectors according to the quantized score. Additionally, since this is the top-level
knn
object, the global top 20 results will from all shards will be gathered before rescoring. Combining withrescore
, this is oversampling by2x
, meaning gathering 20 nearest neighbors according to quantized scoring and rescoring with higher fidelity float vectors. - The number of results to rescore, if you want to rescore all results, set this to the same value as
k
- The script to rescore the results. Script score will interact directly with the originally provided float32 vector.
- The weight of the original query, here we simply throw away the original score
- The weight of the rescore query, here we only use the rescore query
You can use this option when you want to rescore on each shard and want more fine-grained control on the rescoring than the rescore_vector
option provides.
Use rescore per shard with the knn query and script_score query . Generally, this means that there will be more rescoring per shard, but this can increase overall recall at the cost of compute.
POST /my-index/_search
{
"size": 10,
"query": {
"script_score": {
"query": {
"knn": {
"query_vector": [0.04283529, 0.85670587, -0.51402352, 0],
"field": "my_int4_vector",
"num_candidates": 20
}
},
"script": {
"source": "(dotProduct(params.queryVector, 'my_int4_vector') + 1.0)",
"params": {
"queryVector": [0.04283529, 0.85670587, -0.51402352, 0]
}
}
}
}
}
- The number of results to return
- The
knn
query to perform the initial search, this is executed per-shard - The number of candidates to use for the initial approximate
knn
search. This will search using the quantized vectors and return the top 20 candidates per shard to then be scored - The script to score the results. Script score will interact directly with the originally provided float32 vector.
To run an exact kNN search, use a script_score
query with a vector function.
Explicitly map one or more
dense_vector
fields. If you don’t intend to use the field for approximate kNN, set theindex
mapping option tofalse
. This can significantly improve indexing speed.PUT product-index
{ "mappings": { "properties": { "product-vector": { "type": "dense_vector", "dims": 5, "index": false }, "price": { "type": "long" } } } }
Index your data.
POST product-index/_bulk?refresh=true
{ "index": { "_id": "1" } } { "product-vector": [230.0, 300.33, -34.8988, 15.555, -200.0], "price": 1599 } { "index": { "_id": "2" } } { "product-vector": [-0.5, 100.0, -13.0, 14.8, -156.0], "price": 799 } { "index": { "_id": "3" } } { "product-vector": [0.5, 111.3, -13.0, 14.8, -156.0], "price": 1099 } ...
Use the search API to run a
script_score
query containing a vector function.TipTo limit the number of matched documents passed to the vector function, we recommend you specify a filter query in the
script_score.query
parameter. If needed, you can use amatch_all
query in this parameter to match all documents. However, matching all documents can significantly increase search latency.POST product-index/_search
{ "query": { "script_score": { "query" : { "bool" : { "filter" : { "range" : { "price" : { "gte": 1000 } } } } }, "script": { "source": "cosineSimilarity(params.queryVector, 'product-vector') + 1.0", "params": { "queryVector": [-0.5, 90.0, -10, 14.8, -156.0] } } } } }
A k-nearest neighbor (kNN) search finds the k nearest vectors to a query vector, as measured by a similarity metric.
Common use cases for kNN include:
- Relevance ranking based on natural language processing (NLP) algorithms
- Product recommendations and recommendation engines
- Similarity search for images or videos
Check out our hands-on tutorial to learn how to ingest dense vector embeddings into Elasticsearch.