Knn query
editKnn query
editFinds the k nearest vectors to a query vector, as measured by a similarity metric. knn query finds nearest vectors through approximate search on indexed dense_vectors. The preferred way to do approximate kNN search is through the top level knn section of a search request. knn query is reserved for expert cases, where there is a need to combine this query with other queries, or perform a kNN search against a semantic_text field.
Example request
editresp = client.indices.create(
index="my-image-index",
mappings={
"properties": {
"image-vector": {
"type": "dense_vector",
"dims": 3,
"index": True,
"similarity": "l2_norm"
},
"file-type": {
"type": "keyword"
},
"title": {
"type": "text"
}
}
},
)
print(resp)
response = client.indices.create(
index: 'my-image-index',
body: {
mappings: {
properties: {
"image-vector": {
type: 'dense_vector',
dims: 3,
index: true,
similarity: 'l2_norm'
},
"file-type": {
type: 'keyword'
},
title: {
type: 'text'
}
}
}
}
)
puts response
const response = await client.indices.create({
index: "my-image-index",
mappings: {
properties: {
"image-vector": {
type: "dense_vector",
dims: 3,
index: true,
similarity: "l2_norm",
},
"file-type": {
type: "keyword",
},
title: {
type: "text",
},
},
},
});
console.log(response);
PUT my-image-index
{
"mappings": {
"properties": {
"image-vector": {
"type": "dense_vector",
"dims": 3,
"index": true,
"similarity": "l2_norm"
},
"file-type": {
"type": "keyword"
},
"title": {
"type": "text"
}
}
}
}
-
Index your data.
resp = client.bulk( index="my-image-index", refresh=True, operations=[ { "index": { "_id": "1" } }, { "image-vector": [ 1, 5, -20 ], "file-type": "jpg", "title": "mountain lake" }, { "index": { "_id": "2" } }, { "image-vector": [ 42, 8, -15 ], "file-type": "png", "title": "frozen lake" }, { "index": { "_id": "3" } }, { "image-vector": [ 15, 11, 23 ], "file-type": "jpg", "title": "mountain lake lodge" } ], ) print(resp)response = client.bulk( index: 'my-image-index', refresh: true, body: [ { index: { _id: '1' } }, { "image-vector": [ 1, 5, -20 ], "file-type": 'jpg', title: 'mountain lake' }, { index: { _id: '2' } }, { "image-vector": [ 42, 8, -15 ], "file-type": 'png', title: 'frozen lake' }, { index: { _id: '3' } }, { "image-vector": [ 15, 11, 23 ], "file-type": 'jpg', title: 'mountain lake lodge' } ] ) puts responseconst response = await client.bulk({ index: "my-image-index", refresh: "true", operations: [ { index: { _id: "1", }, }, { "image-vector": [1, 5, -20], "file-type": "jpg", title: "mountain lake", }, { index: { _id: "2", }, }, { "image-vector": [42, 8, -15], "file-type": "png", title: "frozen lake", }, { index: { _id: "3", }, }, { "image-vector": [15, 11, 23], "file-type": "jpg", title: "mountain lake lodge", }, ], }); console.log(response);POST my-image-index/_bulk?refresh=true { "index": { "_id": "1" } } { "image-vector": [1, 5, -20], "file-type": "jpg", "title": "mountain lake" } { "index": { "_id": "2" } } { "image-vector": [42, 8, -15], "file-type": "png", "title": "frozen lake"} { "index": { "_id": "3" } } { "image-vector": [15, 11, 23], "file-type": "jpg", "title": "mountain lake lodge" } -
Run the search using the
knnquery, asking for the top 10 nearest vectors from each shard, and then combine shard results to get the top 3 global results.resp = client.search( index="my-image-index", size=3, query={ "knn": { "field": "image-vector", "query_vector": [ -5, 9, -12 ], "k": 10 } }, ) print(resp)const response = await client.search({ index: "my-image-index", size: 3, query: { knn: { field: "image-vector", query_vector: [-5, 9, -12], k: 10, }, }, }); console.log(response);POST my-image-index/_search { "size" : 3, "query" : { "knn": { "field": "image-vector", "query_vector": [-5, 9, -12], "k": 10 } } }
Top-level parameters for knn
edit-
field -
(Required, string) The name of the vector field to search against. Must be a
dense_vectorfield with indexing enabled, or asemantic_textfield with a compatible dense vector inference model. -
query_vector -
(Optional, array of floats or string) Query vector. Must have the same number of dimensions as the vector field you are searching against. Must be either an array of floats or a hex-encoded byte vector. Either this or
query_vector_buildermust be provided. -
query_vector_builder -
(Optional, object) Query vector builder. A configuration object indicating how to build a query_vector before executing the request. You must provide either a
query_vector_builderorquery_vector, but not both. Refer to Perform semantic search to learn more.If all queried fields are of type semantic_text, the inference ID associated with the
semantic_textfield may be inferred. -
k -
(Optional, integer) The number of nearest neighbors to return from each shard. Elasticsearch collects
kresults from each shard, then merges them to find the global top results. This value must be less than or equal tonum_candidates. Defaults to search request size. -
num_candidates -
(Optional, integer) The number of nearest neighbor candidates to consider per shard while doing knn search. Cannot exceed 10,000. Increasing
num_candidatestends to improve the accuracy of the final results. Defaults to1.5 * kifkis set, or1.5 * sizeifkis not set. -
filter -
(Optional, query object) Query to filter the documents that can match. The kNN search will return the top documents that also match this filter. The value can be a single query or a list of queries. If
filteris not provided, all documents are allowed to match.The filter is a pre-filter, meaning that it is applied during the approximate kNN search to ensure that
num_candidatesmatching documents are returned. -
similarity -
(Optional, float) The minimum similarity required for a document to be considered a match. The similarity value calculated relates to the raw
similarityused. Not the document score. The matched documents are then scored according tosimilarityand the providedboostis applied. -
rescore_vector -
(Optional, object) Functionality in [preview] This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. . Apply oversampling and rescoring to quantized vectors.
Rescoring only makes sense for quantized vectors; when quantization is not used, the original vectors are used for scoring. Rescore option will be ignored for non-quantized
dense_vectorfields.-
oversample -
(Required, float)
Applies the specified oversample factor to
kon the approximate kNN search. The approximate kNN search will:-
Retrieve
num_candidatescandidates per shard. -
From these candidates, the top
k * oversamplecandidates per shard will be rescored using the original vectors. -
The top
krescored candidates will be returned.
-
Retrieve
See oversampling and rescoring quantized vectors for details.
-
-
boost -
(Optional, float) Floating point number used to multiply the scores of matched documents. This value cannot be negative. Defaults to
1.0. -
_name -
(Optional, string) Name field to identify the query
Pre-filters and post-filters in knn query
editThere are two ways to filter documents that match a kNN query:
-
pre-filtering – filter is applied during the approximate kNN search
to ensure that
kmatching documents are returned. - post-filtering – filter is applied after the approximate kNN search completes, which results in fewer than k results, even when there are enough matching documents.
Pre-filtering is supported through the filter parameter of the knn query.
Also filters from aliases are applied as pre-filters.
All other filters found in the Query DSL tree are applied as post-filters.
For example, knn query finds the top 3 documents with the nearest vectors
(k=3), which are combined with term filter, that is
post-filtered. The final set of documents will contain only a single document
that passes the post-filter.
resp = client.search(
index="my-image-index",
size=10,
query={
"bool": {
"must": {
"knn": {
"field": "image-vector",
"query_vector": [
-5,
9,
-12
],
"k": 3
}
},
"filter": {
"term": {
"file-type": "png"
}
}
}
},
)
print(resp)
const response = await client.search({
index: "my-image-index",
size: 10,
query: {
bool: {
must: {
knn: {
field: "image-vector",
query_vector: [-5, 9, -12],
k: 3,
},
},
filter: {
term: {
"file-type": "png",
},
},
},
},
});
console.log(response);
POST my-image-index/_search
{
"size" : 10,
"query" : {
"bool" : {
"must" : {
"knn": {
"field": "image-vector",
"query_vector": [-5, 9, -12],
"k": 3
}
},
"filter" : {
"term" : { "file-type" : "png" }
}
}
}
}
Hybrid search with knn query
editKnn query can be used as a part of hybrid search, where knn query is combined
with other lexical queries. For example, the query below finds documents with
title matching mountain lake, and combines them with the top 10 documents
that have the closest image vectors to the query_vector. The combined documents
are then scored and the top 3 top scored documents are returned.
+
resp = client.search(
index="my-image-index",
size=3,
query={
"bool": {
"should": [
{
"match": {
"title": {
"query": "mountain lake",
"boost": 1
}
}
},
{
"knn": {
"field": "image-vector",
"query_vector": [
-5,
9,
-12
],
"k": 10,
"boost": 2
}
}
]
}
},
)
print(resp)
const response = await client.search({
index: "my-image-index",
size: 3,
query: {
bool: {
should: [
{
match: {
title: {
query: "mountain lake",
boost: 1,
},
},
},
{
knn: {
field: "image-vector",
query_vector: [-5, 9, -12],
k: 10,
boost: 2,
},
},
],
},
},
});
console.log(response);
POST my-image-index/_search
{
"size" : 3,
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "mountain lake",
"boost": 1
}
}
},
{
"knn": {
"field": "image-vector",
"query_vector": [-5, 9, -12],
"k": 10,
"boost": 2
}
}
]
}
}
}
Knn query inside a nested query
editknn query can be used inside a nested query. The behaviour here is similar
to top level nested kNN search:
- kNN search over nested dense_vectors diversifies the top results over the top-level document
-
filterover the top-level document metadata is supported and acts as a pre-filter -
filterovernestedfield metadata is not supported
A sample query can look like below:
{
"query" : {
"nested" : {
"path" : "paragraph",
"query" : {
"knn": {
"query_vector": [
0.45,
45
],
"field": "paragraph.vector",
"num_candidates": 2
}
}
}
}
}
Knn query on a semantic_text field
editElasticsearch supports knn query over a semantic_text field.
Here is an example using the query_vector_builder:
{
"query": {
"knn": {
"field": "inference_field",
"k": 10,
"num_candidates": 100,
"query_vector_builder": {
"text_embedding": {
"model_text": "test"
}
}
}
}
}
Note that for semantic_text fields, the model_id does not have to be provided as it can be inferred from the semantic_text field mapping.
Knn search using query vectors over semantic_text fields is also supported, with no change to the API.
Knn query with aggregations
editknn query calculates aggregations on top k documents from each shard.
Thus, the final results from aggregations contain
k * number_of_shards documents. This is different from
the top level knn section where aggregations are
calculated on the global top k nearest documents.