This is a cache of https://www.elastic.co/docs/solutions/search/vector/knn. It is a snapshot of the page at 2025-09-05T00:37:07.914+0000.
kNN search in Elasticsearch | Elastic Docs
Loading

kNN search in Elasticsearch

Stack Serverless

A k-nearest neighbor (kNN) search finds the k nearest vectors to a query vector using a similarity metric such as cosine or L2 norm.
With Elasticsearch kNN search, you can retrieve results based on semantic meaning rather than exact keyword matches.

Common use cases for kNN vector similarity search include:

  • Search

    • Semantic text search
    • Image and video similarity
  • Recommendations

    • Product recommendations
    • Collaborative filtering
    • Personalized content discovery
  • Analysis

    • Anomaly detection
    • Pattern matching

To run a kNN search in Elasticsearch:

  • Your data must be vectorized. You can use an NLP model in Elasticsearch or generate vectors outside Elasticsearch.

    • Use the dense_vector field type for dense vectors.
    • Query vectors must have the same dimension and be created with the same model as the document vectors.
    • Already have vectors? Refer to Bring your own dense vectors.
  • Required index privileges:

    • create_index or manage to create an index with a dense_vector field
    • create, index, or write to add data
    • read to search the index

Elasticsearch supports two methods for kNN search:

  • Approximate kNN: Fast, scalable similarity search using the knn option, knn query, or a knn retriever. Ideal for most production workloads.
  • Exact, brute-force kNN: Uses a script_score query with a vector function. Best for small datasets or precise scoring.

Approximate kNN offers low latency and good accuracy, while exact kNN guarantees accurate results but does not scale well for large datasets. With this approach, a script_score query must scan each matching document to compute the vector function, which can result in slow search speeds. However, you can improve latency by using a query to limit the number of matching documents passed to the function. If you filter your data to a small subset of documents, you can get good search performance using this approach.

Warning

Approximate kNN search has specific resource requirements. All vector data must fit in the node’s page cache for efficient performance. Refer to the approximate kNN tuning guide for configuration tips.

To run an approximate kNN search:

  1. Map one or more dense_vector fields. Approximate kNN search requires the following mapping options:

    • A similarity value. This value determines the similarity metric used to score documents based on similarity between the query and document vector. For a list of available metrics, see the similarity parameter documentation. The similarity setting defaults to cosine.
     PUT image-index {
      "mappings": {
        "properties": {
          "image-vector": {
            "type": "dense_vector",
            "dims": 3,
            "similarity": "l2_norm"
          },
          "title-vector": {
            "type": "dense_vector",
            "dims": 5,
            "similarity": "l2_norm"
          },
          "title": {
            "type": "text"
          },
          "file-type": {
            "type": "keyword"
          }
        }
      }
    }
    
  2. Index your data with embeddings.

     POST image-index/_bulk?refresh=true { "index": { "_id": "1" } }
    { "image-vector": [1, 5, -20], "title-vector": [12, 50, -10, 0, 1], "title": "moose family", "file-type": "jpg" }
    { "index": { "_id": "2" } }
    { "image-vector": [42, 8, -15], "title-vector": [25, 1, 4, -12, 2], "title": "alpine lake", "file-type": "png" }
    { "index": { "_id": "3" } }
    { "image-vector": [15, 11, 23], "title-vector": [1, 5, 25, 50, 20], "title": "full moon", "file-type": "jpg" }
    ...
    
  3. Query using the knn option or a knn query.

     POST image-index/_search {
      "knn": {
        "field": "image-vector",
        "query_vector": [-5, 9, -12],
        "k": 10,
        "num_candidates": 100
      },
      "fields": [ "title", "file-type" ]
    }
    

The document _score is a positive 32-bit floating-point number that ranks result relevance. In Elasticsearch kNN search, _score is derived from the chosen vector similarity metric between the query and document vectors. Refer to similarity for details on how kNN scores are computed.

Note

Support for approximate kNN search was added in version 8.0. Before 8.0, dense_vector fields did not support enabling index in the mapping. If you created an index prior to 8.0 with dense_vector fields, reindex using a new mapping with index: true (which is the default value) to use approximate kNN.

For approximate kNN, Elasticsearch stores dense vector values per segment as an HNSW graph. Building HNSW graphs is compute-intensive, so indexing vectors can take time; you may need to increase client request timeouts for index and bulk operations. The approximate kNN tuning guide covers indexing performance, sizing, and configuration trade-offs that affect search performance.

In addition to search-time parameters, HNSW exposes index-time settings that balance graph build cost, search speed, and accuracy. When defining your dense_vector mapping, use index_options to set these parameters:

 PUT image-index {
  "mappings": {
    "properties": {
      "image-vector": {
        "type": "dense_vector",
        "dims": 3,
        "similarity": "l2_norm",
        "index_options": {
          "type": "hnsw",
          "m": 32,
          "ef_construction": 100
        }
      }
    }
  }
}

To gather results, the kNN API first finds a num_candidates number of approximate neighbors per shard, computes similarity to the query vector, selects the top k per shard, and merges them into the global top k nearest neighbors.

  • Increase num_candidates to improve recall and accuracy (at the cost of higher latency).
  • Decrease num_candidates for faster queries (with a potential accuracy trade-off).

Choosing num_candidates is the primary knob for optimizing the latency/recall trade-off in Elasticsearch vector similarity search.

The approximate kNN search API also supports byte (int8) value vectors alongside float vectors. Use the knn option to search a dense_vector field with element_type set to byte and indexing enabled. Byte vectors reduce memory footprint and can improve cache efficiency for large-scale vector similarity search.

  1. Explicitly map one or more dense_vector fields with element_type set to byte and indexing enabled.

     PUT byte-image-index {
      "mappings": {
        "properties": {
          "byte-image-vector": {
            "type": "dense_vector",
            "element_type": "byte",
            "dims": 2
          },
          "title": {
            "type": "text"
          }
        }
      }
    }
    
  2. Index your data ensuring all vector values are integers within the range [-128, 127].

     POST byte-image-index/_bulk?refresh=true { "index": { "_id": "1" } }
    { "byte-image-vector": [5, -20], "title": "moose family" }
    { "index": { "_id": "2" } }
    { "byte-image-vector": [8, -15], "title": "alpine lake" }
    { "index": { "_id": "3" } }
    { "byte-image-vector": [11, 23], "title": "full moon" }
    
  3. Run the search using the knn option ensuring the query_vector values are integers within the range [-128, 127].

     POST byte-image-index/_search {
      "knn": {
        "field": "byte-image-vector",
        "query_vector": [-5, 9],
        "k": 10,
        "num_candidates": 100
      },
      "fields": [ "title" ]
    }
    

Note: In addition to the standard byte array, one can also provide a hex-encoded string value for the query_vector param. As an example, the search request above can also be expressed as follows, which would yield the same results

 POST byte-image-index/_search {
  "knn": {
    "field": "byte-image-vector",
    "query_vector": "fb09",
    "k": 10,
    "num_candidates": 100
  },
  "fields": [ "title" ]
}

If you want to provide float vectors but still get the memory savings of byte vectors, use the quantization feature. Quantization allows you to provide float vectors, but internally they are indexed as byte vectors. Additionally, the original float vectors are still retained in the index.

Note

The default index type for dense_vector is int8_hnsw.

To use quantization, set the dense_vector index type to int8_hnsw or int4_hnsw.

 PUT quantized-image-index {
  "mappings": {
    "properties": {
      "image-vector": {
        "type": "dense_vector",
        "element_type": "float",
        "dims": 2,
        "index": true,
        "index_options": {
          "type": "int8_hnsw"
        }
      },
      "title": {
        "type": "text"
      }
    }
  }
}
  1. Index your float vectors.

     POST quantized-image-index/_bulk?refresh=true { "index": { "_id": "1" } }
    { "image-vector": [0.1, -2], "title": "moose family" }
    { "index": { "_id": "2" } }
    { "image-vector": [0.75, -1], "title": "alpine lake" }
    { "index": { "_id": "3" } }
    { "image-vector": [1.2, 0.1], "title": "full moon" }
    
  2. Run the search using the knn option. When searching, the float vector is automatically quantized to a byte vector.

     POST quantized-image-index/_search {
      "knn": {
        "field": "image-vector",
        "query_vector": [0.1, -2],
        "k": 10,
        "num_candidates": 100
      },
      "fields": [ "title" ]
    }
    

Because the original float vectors are retained alongside the quantized index, you can use them for re-scoring: retrieve candidates quickly via the int8_hnsw (or int4_hnsw) index, then rescore the top k hits using the original float vectors. This provides the best of both worlds, fast search and accurate scoring.

 POST quantized-image-index/_search {
  "knn": {
    "field": "image-vector",
    "query_vector": [0.1, -2],
    "k": 15,
    "num_candidates": 100
  },
  "fields": [ "title" ],
  "rescore": {
    "window_size": 10,
    "query": {
      "rescore_query": {
        "script_score": {
          "query": {
            "match_all": {}
          },
          "script": {
            "source": "cosineSimilarity(params.query_vector, 'image-vector') + 1.0",
            "params": {
              "query_vector": [0.1, -2]
            }
          }
        }
      }
    }
  }
}

The kNN search API supports restricting vector similarity search with a filter. The request returns the top k nearest neighbors that also satisfy the filter query, enabling targeted, pre-filtered approximate kNN in Elasticsearch.

The following request performs an approximate kNN search filtered by the file-type field:

 POST image-index/_search {
  "knn": {
    "field": "image-vector",
    "query_vector": [54, 10, -2],
    "k": 5,
    "num_candidates": 50,
    "filter": {
      "term": {
        "file-type": "png"
      }
    }
  },
  "fields": ["title"],
  "_source": false
}
Note

The filter is applied during approximate kNN search to ensure that k matching documents are returned. In contrast, post-filtering applies the filter after the approximate kNN step and can return fewer than k results; even when enough relevant documents exist.

In approximate kNN search with an HNSW index, applying filters can decrease performance as the engine must explore more of the graph to gather enough candidates that satisfy the filter and reach num_candidates. This contrasts with conventional query filtering, where stricter filters often speed up queries.

To avoid significant performance drawbacks, Lucene implements the following strategies per segment:

  • If the filtered document count is less than or equal to num_candidates, the search bypasses the HNSW graph and uses a brute force search on the filtered documents.
  • While exploring the HNSW graph, if the number of nodes explored exceeds the number of documents that satisfy the filter, the search will stop exploring the graph and switch to a brute force search over the filtered documents.

You can perform hybrid retrieval by combining the knn option with a standard query. This blends vector similarity with lexical relevance, filters, and aggregations.

 POST image-index/_search {
  "query": {
    "match": {
      "title": {
        "query": "mountain lake",
        "boost": 0.9
      }
    }
  },
  "knn": {
    "field": "image-vector",
    "query_vector": [54, 10, -2],
    "k": 5,
    "num_candidates": 50,
    "boost": 0.1
  },
  "size": 10
}

This search finds the global top k = 5 vector matches, combines them with the matches from the match query, and finally returns the 10 top-scoring results. The knn and query matches are combined through a disjunction, as if you took a boolean or between them. The top k vector results represent the global nearest neighbors across all index shards.

The score of each hit is the sum of the knn and query scores. You can specify a boost value to give a weight to each score in the sum. In the example above, the scores will be calculated as

score = 0.9 * match_score + 0.1 * knn_score

The knn option can also be used with aggregations. In general, Elasticsearch computes aggregations over all documents that match the search. So for approximate kNN search, aggregations are calculated on the top k nearest documents. If the search also includes a query, then aggregations are calculated on the combined set of knn and query matches.

Tip

Looking for a minimal configuration approach? The semantic_text field type abstracts these vector search implementations with sensible defaults and automatic model management. It's the recommended approach for most users. Learn more about semantic_text.

kNN search enables you to perform semantic search by using a previously deployed text embedding model. Instead of literal matching on search terms, semantic search retrieves results based on the intent and the contextual meaning of a search query.

Under the hood, the text embedding NLP model converts your input query string (provided as model_text) into a dense vector. The query vector is compared against an index containing dense vectors created with the same text embedding machine learning model. The search results are semantically similar as learned by the model.

Important

To perform semantic search:

  • You need an index that contains dense vector representations of the input data to search against.
  • You must use the same text embedding model for search that you used to create the document vectors.
  • The text embedding NLP model deployment must be started.

Reference the deployed text embedding model or the model deployment in the query_vector_builder object, and provide the search string as model_text:

(...)
{
  "knn": {
    "field": "dense-vector-field",
    "k": 10,
    "num_candidates": 100,
    "query_vector_builder": {
      "text_embedding": {
        "model_id": "my-text-embedding-model",
        "model_text": "The opposite of blue"
      }
    }
  }
}
(...)
  1. The natural language processing task to perform. It must be text_embedding.
  2. The ID of the text embedding model used to generate the query’s dense vector. Use the same model that produced the document embeddings in the target index. You can also provide the deployment_id as the model_id value.
  3. The query string from which the model generates the dense vector representation.

For more information on how to deploy a trained model and use it to create text embeddings, refer to this end-to-end example.

In addition to hybrid retrieval, you can search more than one kNN vector field in a single request:

 POST image-index/_search {
  "query": {
    "match": {
      "title": {
        "query": "mountain lake",
        "boost": 0.9
      }
    }
  },
  "knn": [ {
    "field": "image-vector",
    "query_vector": [54, 10, -2],
    "k": 5,
    "num_candidates": 50,
    "boost": 0.1
  },
  {
    "field": "title-vector",
    "query_vector": [1, 20, -52, 23, 10],
    "k": 10,
    "num_candidates": 10,
    "boost": 0.5
  }],
  "size": 10
}

This search retrieves the global top k = 5 neighbors for image-vector and the global top k = 10 for title-vector. These vector result sets are combined with the matches from the match query, and the top 10 overall documents are returned. Multiple knn clauses and the query clause are combined via a disjunction (boolean OR). The top k vector results represent the global nearest neighbors across all index shards.

The scoring for a document with the above configured boosts would be:

score = 0.9 * match_score + 0.1 * knn_score_image-vector + 0.5 * knn_score_title-vector

While kNN is a powerful tool, it always tries to return k nearest neighbors. Consequently, when using knn with a filter, you could filter out all relevant documents and only have irrelevant ones left to search. In that situation, knn will still do its best to return k nearest neighbors, even though those neighbors could be far away in the vector space.

To control this, use the similarity parameter in the knn clause. This sets a minimum similarity threshold a vector must meet to be considered a match. The knn search flow with this parameter is:

  • Apply any user-provided filter queries.
  • Explore the vector space to gather k candidates.
  • Exclude any vectors with similarity below the configured similarity threshold.
Note

similarity is the true similarity value before it is transformed into _score and before any boosts are applied.

For each configured similarity, the following shows how to invert _score back to the underlying similarity. Use these when you want to filter based on _score:

  • l2_norm: sqrt((1 / _score) - 1)
  • cosine: (2 * _score) - 1
  • dot_product: (2 * _score) - 1
  • max_inner_product:
    • _score < 1: 1 - (1 / _score)
    • _score >= 1: _score - 1

Example: the query searches for the given query_vector, with a filter applied, and requires that matches meet or exceed the specified similarity threshold. Results below the threshold are not returned, even if fewer than k neighbors remain.

 POST image-index/_search {
  "knn": {
    "field": "image-vector",
    "query_vector": [1, 5, -20],
    "k": 5,
    "num_candidates": 50,
    "similarity": 36,
    "filter": {
      "term": {
        "file-type": "png"
      }
    }
  },
  "fields": ["title"],
  "_source": false
}

In this data set, the only document with file-type = png has the vector [42, 8, -15]. The l2_norm distance between [42, 8, -15] and [1, 5, -20] is 41.412, which exceeds the configured similarity threshold of 36. As a result, this search returns no hits.

When text exceeds a model’s token limit, chunking must be performed before generating embeddings for each chunk. By combining nested fields with dense_vector, you can perform nearest passage retrieval without copying top-level document metadata.
Note that nested kNN queries only support score_mode=max.

Here is a simple passage vectors index that stores vectors and some top-level metadata for filtering.

 PUT passage_vectors {
    "mappings": {
        "properties": {
            "full_text": {
                "type": "text"
            },
            "creation_time": {
                "type": "date"
            },
            "paragraph": {
                "type": "nested",
                "properties": {
                    "vector": {
                        "type": "dense_vector",
                        "dims": 2,
                        "index_options": {
                            "type": "hnsw"
                        }
                    },
                    "text": {
                        "type": "text",
                        "index": false
                    },
                    "language": {
                        "type": "keyword"
                    }
                }
            },
            "metadata": {
                "type": "nested",
                "properties": {
                    "key": {
                        "type": "keyword"
                    },
                    "value": {
                        "type": "text"
                    }
                }
            }
        }
    }
}

With the above mapping, we can index multiple passage vectors along with storing the individual passage text.

 POST passage_vectors/_bulk?refresh=true { "index": { "_id": "1" } }
{ "full_text": "first paragraph another paragraph", "creation_time": "2019-05-04", "paragraph": [ { "vector": [ 0.45, 45 ], "text": "first paragraph", "paragraph_id": "1", "language": "EN" }, { "vector": [ 0.8, 0.6 ], "text": "another paragraph", "paragraph_id": "2", "language": "FR" } ], "metadata": [ { "key": "author", "value": "Jane Doe" }, { "key": "source", "value": "Internal Memo" } ] }
{ "index": { "_id": "2" } }
{ "full_text": "number one paragraph number two paragraph", "creation_time": "2020-05-04", "paragraph": [ { "vector": [ 1.2, 4.5 ], "text": "number one paragraph", "paragraph_id": "1", "language": "EN" }, { "vector": [ -1, 42 ], "text": "number two paragraph", "paragraph_id": "2", "language": "EN" }] , "metadata": [ { "key": "author", "value": "Jane Austen" }, { "key": "source", "value": "Financial" } ] }

The query will seem very similar to a typical kNN search:

 POST passage_vectors/_search {
    "fields": ["full_text", "creation_time"],
    "_source": false,
    "knn": {
        "query_vector": [
            0.45,
            45
        ],
        "field": "paragraph.vector",
        "k": 2
    }
}

Note that even with 4 total nested vectors, the response still returns two documents. kNN search over nested dense vectors will always diversify the top results over the top-level document; "k" top-level documents will be returned, scored by their nearest passage vector (for example, "paragraph.vector").

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "passage_vectors",
                "_id": "1",
                "_score": 1.0,
                "fields": {
                    "creation_time": [
                        "2019-05-04T00:00:00.000Z"
                    ],
                    "full_text": [
                        "first paragraph another paragraph"
                    ]
                }
            },
            {
                "_index": "passage_vectors",
                "_id": "2",
                "_score": 0.9997144,
                "fields": {
                    "creation_time": [
                        "2020-05-04T00:00:00.000Z"
                    ],
                    "full_text": [
                        "number one paragraph number two paragraph"
                    ]
                }
            }
        ]
    }
}

Want to filter by metadata in a nested kNN search? Add a filter to your knn clause.

To ensure correct results, each individual filter must target either:

  • Top-level metadata
  • nested metadata Stack Planned
    Note

    A single knn search can include multiple filters: some over top-level metadata and others over nested metadata.

 POST passage_vectors/_search {
    "fields": [
        "creation_time",
        "full_text"
    ],
    "_source": false,
    "knn": {
        "query_vector": [0.45, 45],
        "field": "paragraph.vector",
        "k": 2,
        "filter": {
            "range": {
                "creation_time": {
                    "gte": "2019-05-01",
                    "lte": "2019-05-05"
                }
            }
        }
    }
}

With the top-level creation_time filter applied, only one document falls within the specified range.

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "passage_vectors",
                "_id": "1",
                "_score": 1.0,
                "fields": {
                    "creation_time": [
                        "2019-05-04T00:00:00.000Z"
                    ],
                    "full_text": [
                        "first paragraph another paragraph"
                    ]
                }
            }
        ]
    }
}

Stack Planned

The following query applies a nested metadata filter. When scoring parent documents, it only considers nested vectors whose "paragraph.language" is "EN".

 POST passage_vectors/_search {
    "fields": [
        "full_text"
    ],
    "_source": false,
    "knn": {
        "query_vector": [0.45, 45],
        "field": "paragraph.vector",
        "k": 2,
        "filter": {
            "match": {
                "paragraph.language": "EN"
            }
        }
    }
}

The next example combines two filters: one on nested metadata and one on top-level metadata. Parent documents are scored only by vectors with "paragraph.language": "EN" and whose parent documents fall within the specified time range.

 POST passage_vectors/_search {
    "fields": [
        "full_text"
    ],
    "_source": false,
    "knn": {
        "query_vector": [0.45,45],
        "field": "paragraph.vector",
        "k": 2,
        "filter": [
            {"match": {"paragraph.language": "EN"}},
            {"range": { "creation_time": { "gte": "2019-05-01", "lte": "2019-05-05"}}}
        ]
    }
}

Stack Planned

Nested knn search also allows pre-filtering on sibling nested fields. For example, given "paragraphs" and "metadata" as nested fields, we can search "paragraphs.vector" and filter by "metadata.key" and "metadata.value".

 POST passage_vectors/_search {
    "fields": [
        "full_text"
    ],
    "_source": false,
    "knn": {
        "query_vector": [0.45, 45],
        "field": "paragraph.vector",
        "k": 2,
        "filter": {
            "nested": {
                "path": "metadata",
                "query": {
                    "bool": {
                        "must": [
                            { "match": { "metadata.key": "author" } },
                            { "match": { "metadata.value": "Doe" } }
                        ]
                    }
                }
            }
        }
    }
}
Note

Retrieving "inner_hits" when filtering on sibling nested fields is not supported.

To extract the nearest passage for each matched parent document, add inner_hits to the knn clause.

Note

When using inner_hits with multiple knn clauses, set a unique inner_hits.name for each clause to avoid naming collisions that would fail the search request.

 POST passage_vectors/_search {
    "fields": [
        "creation_time",
        "full_text"
    ],
    "_source": false,
    "knn": {
        "query_vector": [
            0.45,
            45
        ],
        "field": "paragraph.vector",
        "k": 2,
        "num_candidates": 2,
        "inner_hits": {
            "_source": false,
            "fields": [
                "paragraph.text"
            ],
            "size": 1
        }
    }
}

Now the result will contain the nearest found paragraph when searching.

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "passage_vectors",
                "_id": "1",
                "_score": 1.0,
                "fields": {
                    "creation_time": [
                        "2019-05-04T00:00:00.000Z"
                    ],
                    "full_text": [
                        "first paragraph another paragraph"
                    ]
                },
                "inner_hits": {
                    "paragraph": {
                        "hits": {
                            "total": {
                                "value": 2,
                                "relation": "eq"
                            },
                            "max_score": 1.0,
                            "hits": [
                                {
                                    "_index": "passage_vectors",
                                    "_id": "1",
                                    "_nested": {
                                        "field": "paragraph",
                                        "offset": 0
                                    },
                                    "_score": 1.0,
                                    "fields": {
                                        "paragraph": [
                                            {
                                                "text": [
                                                    "first paragraph"
                                                ]
                                            }
                                        ]
                                    }
                                }
                            ]
                        }
                    }
                }
            },
            {
                "_index": "passage_vectors",
                "_id": "2",
                "_score": 0.9997144,
                "fields": {
                    "creation_time": [
                        "2020-05-04T00:00:00.000Z"
                    ],
                    "full_text": [
                        "number one paragraph number two paragraph"
                    ]
                },
                "inner_hits": {
                    "paragraph": {
                        "hits": {
                            "total": {
                                "value": 2,
                                "relation": "eq"
                            },
                            "max_score": 0.9997144,
                            "hits": [
                                {
                                    "_index": "passage_vectors",
                                    "_id": "2",
                                    "_nested": {
                                        "field": "paragraph",
                                        "offset": 1
                                    },
                                    "_score": 0.9997144,
                                    "fields": {
                                        "paragraph": [
                                            {
                                                "text": [
                                                    "number two paragraph"
                                                ]
                                            }
                                        ]
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        ]
    }
}

Use nested kNN search with dense_vector fields and inner_hits in Elasticsearch to retrieve the most relevant passages from structured, chunked documents.

This approach is ideal when you:

  • Chunk your content into paragraphs, sections, or other nested structures.
  • Want to retrieve only the most relevant nested section of each matching document.
  • Generate your own vectors with a custom model instead of relying on the semantic_text field provided by Elastic's semantic search capability.

This example creates an index that stores a vector at the top level for the document title and multiple vectors inside a nested field for individual paragraphs.

 PUT nested_vector_index {
  "mappings": {
    "properties": {
      "paragraphs": {
        "type": "nested",
        "properties": {
          "text": {
            "type": "text"
          },
          "vector": {
            "type": "dense_vector",
            "dims": 2,
            "index_options": {
              "type": "hnsw"
            }
          }
        }
      }
    }
  }
}

Add example documents with vectors for each paragraph.

 POST _bulk { "index": { "_index": "nested_vector_index", "_id": "1" } }
{ "paragraphs": [ { "text": "First paragraph", "vector": [0.5, 0.4] }, { "text": "Second paragraph", "vector": [0.3, 0.8] } ] }
{ "index": { "_index": "nested_vector_index", "_id": "2" } }
{ "paragraphs": [ { "text": "Another one", "vector": [0.1, 0.9] } ] }

This example searches for documents with relevant paragraph vectors.

 POST nested_vector_index/_search {
  "_source": false,
  "knn": {
    "field": "paragraphs.vector",
    "query_vector": [0.5, 0.4],
    "k": 2,
    "num_candidates": 10,
    "inner_hits": {
      "size": 2,
      "name": "top_passages",
      "_source": false,
      "fields": ["paragraphs.text"]
    }
  }
}

The inner_hits block returns the most relevant paragraphs within each top-level document. Use the size parameter to control how many matches are returned. If your query includes multiple kNN clauses, set a unique name for each clause to avoid naming conflicts in the response.

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "nested_vector_index",
        "_id": "1",
        "_score": 1,
        "inner_hits": {
          "top_passages": {
            "hits": {
              "total": {
                "value": 2,
                "relation": "eq"
              },
              "max_score": 1,
              "hits": [
                {
                  "_index": "nested_vector_index",
                  "_id": "1",
                  "_nested": {
                    "field": "paragraphs",
                    "offset": 0
                  },
                  "_score": 1,
                  "fields": {
                    "paragraphs": [
                      {
                        "text": [
                          "First paragraph"
                        ]
                      }
                    ]
                  }
                },
                {
                  "_index": "nested_vector_index",
                  "_id": "1",
                  "_nested": {
                    "field": "paragraphs",
                    "offset": 1
                  },
                  "_score": 0.92955077,
                  "fields": {
                    "paragraphs": [
                      {
                        "text": [
                          "Second paragraph"
                        ]
                      }
                    ]
                  }
                }
              ]
            }
          }
        }
      },
      {
        "_index": "nested_vector_index",
        "_id": "2",
        "_score": 0.8535534,
        "inner_hits": {
          "top_passages": {
            "hits": {
              "total": {
                "value": 1,
                "relation": "eq"
              },
              "max_score": 0.8535534,
              "hits": [
                {
                  "_index": "nested_vector_index",
                  "_id": "2",
                  "_nested": {
                    "field": "paragraphs",
                    "offset": 0
                  },
                  "_score": 0.8535534,
                  "fields": {
                    "paragraphs": [
                      {
                        "text": [
                          "Another one"
                        ]
                      }
                    ]
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}
  1. Two documents matched the query.
  2. Document score, based on its most relevant paragraph.
  3. Matching paragraphs appear in the inner_hits section.
  4. Actual paragraph text that matched the query.
  • When using kNN search in cross-cluster search, the ccs_minimize_roundtrips option is not supported.
  • Elasticsearch uses the HNSW algorithm for efficient kNN. Like most approximate methods, HNSW trades perfect accuracy for speed, so results aren’t always the true k closest neighbors.
Note

Approximate kNN always uses the dfs_query_then_fetch search type to gather the global top k matches across shards. You can’t set search_type explicitly for kNN search.

When using quantized vectors for kNN search, can optionally rescore results to balance performance and accuracy, by doing:

  • Oversampling — retrieving more candidates per shard.
  • Rescoring — recalculating scores on those oversampled candidates using the original (non-quantized) vectors.

Because final scores are computed with the original float vectors, rescoring combines:

  • The performance and memory benefits of approximate retrieval with quantized vectors.
  • The accuracy of using the original vectors for rescoring the top candidates.

All quantization introduces some accuracy loss, and higher compression generally increases that loss. In practice:

  • int8 typically needs little to no rescoring.
  • int4 often benefits from rescoring for higher accuracy or recall; 1.5×–2× oversampling usually recovers most loss.
  • bbq commonly requires rescoring except on very large indices or models specifically designed for quantization; 3×–5× oversampling is generally sufficient, but higher may be needed for low-dimension vectors or embeddings that quantize poorly.

Stack 9.1.0 Stack Preview 9.0.0

Use rescore_vector to automatically perform reranking. When you specify an oversample value, approximate kNN will:

  • Retrieve num_candidates candidates per shard.
  • Rescore the top k * oversample candidates per shard using the original vectors.
  • Return the top k rescored candidates.

Here is an example of using the rescore_vector option with the oversample parameter:

 POST image-index/_search {
  "knn": {
    "field": "image-vector",
    "query_vector": [-5, 9, -12],
    "k": 10,
    "num_candidates": 100,
    "rescore_vector": {
      "oversample": 2.0
    }
  },
  "fields": [ "title", "file-type" ]
}

This example will:

  • Search using approximate kNN for the top 100 candidates.
  • Rescore the top 20 candidates (oversample * k) per shard using the original, non quantized vectors.
  • Return the top 10 (k) rescored candidates.
  • Merge the rescored canddidates from all shards, and return the top 10 (k) results.

The following sections provide additional ways of rescoring:

You can use this option when you don’t want to rescore on each shard, but on the top results from all shards.

Use the rescore section in the _search request to rescore the top results from a kNN search.

Here is an example using the top level knn search with oversampling and using rescore to rerank the results:

 POST /my-index/_search {
  "size": 10,
  "knn": {
    "query_vector": [0.04283529, 0.85670587, -0.51402352, 0],
    "field": "my_int4_vector",
    "k": 20,
    "num_candidates": 50
  },
  "rescore": {
    "window_size": 20,
    "query": {
      "rescore_query": {
        "script_score": {
          "query": {
            "match_all": {}
          },
          "script": {
            "source": "(dotProduct(params.queryVector, 'my_int4_vector') + 1.0)",
            "params": {
              "queryVector": [0.04283529, 0.85670587, -0.51402352, 0]
            }
          }
        }
      },
      "query_weight": 0,
      "rescore_query_weight": 1
    }
  }
}
  1. The number of results to return, note its only 10 and we will oversample by 2x, gathering 20 nearest neighbors.
  2. The number of results to return from the KNN search. This will do an approximate KNN search with 50 candidates per HNSW graph and use the quantized vectors, returning the 20 most similar vectors according to the quantized score. Additionally, since this is the top-level knn object, the global top 20 results will from all shards will be gathered before rescoring. Combining with rescore, this is oversampling by 2x, meaning gathering 20 nearest neighbors according to quantized scoring and rescoring with higher fidelity float vectors.
  3. The number of results to rescore, if you want to rescore all results, set this to the same value as k
  4. The script to rescore the results. Script score will interact directly with the originally provided float32 vector.
  5. The weight of the original query, here we simply throw away the original score
  6. The weight of the rescore query, here we only use the rescore query

You can use this option when you want to rescore on each shard and want more fine-grained control on the rescoring than the rescore_vector option provides.

Use rescore per shard with the knn query and script_score query . Generally, this means that there will be more rescoring per shard, but this can increase overall recall at the cost of compute.

 POST /my-index/_search {
  "size": 10,
  "query": {
    "script_score": {
      "query": {
        "knn": {
          "query_vector": [0.04283529, 0.85670587, -0.51402352, 0],
          "field": "my_int4_vector",
          "num_candidates": 20
        }
      },
      "script": {
        "source": "(dotProduct(params.queryVector, 'my_int4_vector') + 1.0)",
        "params": {
          "queryVector": [0.04283529, 0.85670587, -0.51402352, 0]
        }
      }
    }
  }
}
  1. The number of results to return
  2. The knn query to perform the initial search, this is executed per-shard
  3. The number of candidates to use for the initial approximate knn search. This will search using the quantized vectors and return the top 20 candidates per shard to then be scored
  4. The script to score the results. Script score will interact directly with the originally provided float32 vector.

To run an exact kNN search, use a script_score query with a vector function.

  1. Explicitly map one or more dense_vector fields. If you don’t intend to use the field for approximate kNN, set the index mapping option to false. This can significantly improve indexing speed.

     PUT product-index {
      "mappings": {
        "properties": {
          "product-vector": {
            "type": "dense_vector",
            "dims": 5,
            "index": false
          },
          "price": {
            "type": "long"
          }
        }
      }
    }
    
  2. Index your data.

     POST product-index/_bulk?refresh=true { "index": { "_id": "1" } }
    { "product-vector": [230.0, 300.33, -34.8988, 15.555, -200.0], "price": 1599 }
    { "index": { "_id": "2" } }
    { "product-vector": [-0.5, 100.0, -13.0, 14.8, -156.0], "price": 799 }
    { "index": { "_id": "3" } }
    { "product-vector": [0.5, 111.3, -13.0, 14.8, -156.0], "price": 1099 }
    ...
    
  3. Use the search API to run a script_score query containing a vector function.

    Tip

    To limit the number of matched documents passed to the vector function, we recommend you specify a filter query in the script_score.query parameter. If needed, you can use a match_all query in this parameter to match all documents. However, matching all documents can significantly increase search latency.

     POST product-index/_search {
      "query": {
        "script_score": {
          "query" : {
            "bool" : {
              "filter" : {
                "range" : {
                  "price" : {
                    "gte": 1000
                  }
                }
              }
            }
          },
          "script": {
            "source": "cosineSimilarity(params.queryVector, 'product-vector') + 1.0",
            "params": {
              "queryVector": [-0.5, 90.0, -10, 14.8, -156.0]
            }
          }
        }
      }
    }
    

A k-nearest neighbor (kNN) search finds the k nearest vectors to a query vector, as measured by a similarity metric.

Common use cases for kNN include:

  • Relevance ranking based on natural language processing (NLP) algorithms
  • Product recommendations and recommendation engines
  • Similarity search for images or videos
Tip

Check out our hands-on tutorial to learn how to ingest dense vector embeddings into Elasticsearch.