Vector search filtering: Keep it relevant

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

Vector search is not enough to find relevant results. It's very common to use filtering criteria that will help narrow down search results and filter out irrelevant results.

Understanding how filtering works in vector search will help you balance the performance and recall tradeoffs, as well as discovering some of the optimizations used to make vector search performant when filtering is used.

Why filtering?

Vector search has revolutionized how we find relevant information in large datasets, allowing us to discover items that are semantically similar to a query.

However, simply finding similar items isn't enough. We often need to narrow down the search results based on specific criteria or attributes.

Imagine you're searching for a product in an e-commerce store. A pure vector search might show you visually similar items, but you might also want to filter by price range, brand, availability, or customer ratings. Without filtering, you'd be presented with a vast array of similar products, making it difficult to find exactly what you're looking for.

Filtering enables precise control over the search results, ensuring that the retrieved items not only align semantically but also meet all the necessary requirements. This leads to a much more accurate, efficient, and user-friendly search experience.

This is where Elasticsearch and Apache Lucene excel—using effective filtering across various data types is one of the key differences with other vector databases.

Filtering for exact vector search

There are two main ways of performing exact vector searches:

Using a flat index type for your dense_vector field. This makes knn searches use exact search instead of approximate.
Using a script_score query that uses vector functions to calculate the score. This can be used with any index type.

When executing exact vector search, all vectors are compared to the query. In this scenario, filtering will help performance as only the vectors that pass the filter need to be compared.

This does not impact the result quality, as all vectors are considered anyway. We're just filtering in advance the results that are not interesting, so we can reduce the number of operations.

This is very important, as it can be more performant to execute an exact search instead of an approximate search when the applied filters result in a small number of documents.

The rule of thumb is to use exact search when fewer than 10k documents pass the filter. BBQ indices are much faster for comparisons, so it makes sense to use exact search when less than 100k for based indices. Check out this blog post for more details.

In case your filters are always very restrictive, you may consider indexing focusing on exact search instead of approximate search by using a flat index type instead of an HNSW-based one. For more details, see the properties of index_options.

Filtering for approximate vector search

When executing approximate vector search, we trade result accuracy for performance. Vector search data structures like HNSW efficiently search for approximate nearest neighbors on millions of vectors. They focus on retrieving the most similar vectors by doing the least amount of vector comparisons, which are expensive to calculate.

This means that other filtering attributes are not part of the vector data. Different data types have their own indexing structures that are efficient for finding and filtering them, like terms dictionaries, posting lists and doc values.

Given that these data structures are separate from the vector search mechanism, how do we apply filtering to vector search? There are two options: applying filters after vector search (postfiltering) or before vector search (prefiltering).

Each of those options has pros and cons. Let's dive deeper into them!

Postfiltering

Postfiltering applies filters after vector search has been done. This means that filters are applied after the top k most similar vector results have been found.

Obviously, we can potentially get less than k results after applying the filters to the results. We could of course retrieve more results from vector search (higher k value), but we won't be sure that we'll be getting k or more after applying the filters.

Postfiltering’s advantage is that it doesn't change the runtime behaviour of vector search—vector search is unaware of filtering. But, it does change the final number of results retrieved.

The following is an example of postfiltering using the knn query. Check that the filtering clause is separate from the knn query:

Postfiltering is also available for the knn search using post-filter:

Keep in mind that you need to use an explicit post-filter section with the knn search. If you don't use a post-filter, knn search will combine nearest neighbors results with other queries or filters instead of doing a post filter.

Prefiltering

Applying filters before the vector search will first retrieve the documents that satisfy the filters, and then pass down that information to the vector search.

Lucene uses BitSets to efficiently store the documents that satisfy the filter condition. Vector search then traverses the HNSW graph, taking into account the documents that satisfy the condition. Before adding a candidate to the results, it checks that it's contained in the BitSet of valid documents.

However, the candidate must be explored and compared to the query, even if it's not a valid document. The effectiveness of HNSW relies on the connection between the vectors in the graph—if we stopped exploring a candidate, it would mean that we could be skipping its neighbors as well.

Think of it as driving to get to a gas station. If you discard any roads that don't have a gas station on them, it's improbable that you'll get to your destination. Other roads may not be what you need, but they connect you to your destination. Same with vectors on an HNSW graph!

So it follows that applying prefiltering is less performant than not applying filters. We need to do the work on all the vectors we visit in our search, and we need to throw away the ones that don't match the filter. We're doing more work and taking more time to get our top k results.

The following is an example of pretfiltering in the Elasticsearch Query DSL. Check that the filtering clause is now part of the knn section:

Prefiltering is available for both knn search and knn query:

Prefiltering optimizations

There are a couple of optimizations that we can apply to ensure prefiltering is performant.

We can switch to exact search if the filter is very restrictive. When there are few vectors to compare, it's faster to perform an exact search on the few documents that satisfy the filter.

This is an optimization that is applied automatically in Lucene and Elasticsearch.

Another optimization method involves disregarding the vectors that do not satisfy the filter. Instead, this method checks neighbors of the filtered vectors that do pass the filter. This approach effectively reduces the number of comparisons as the filtered vectors are not considered, and continues to explore vectors connected to the current path.

This algorithm is ACORN-1, and the process is described in detail in this blog post.

Filtering using Document Level Security

Document Level Security (DLS) is an Elasticsearch feature that specifies the documents that user roles can retrieve.

DLS is performed using queries. A role can have a query associated with indices, which effectively limits the documents that a user who belongs to that role can retrieve from the indices.

The role query is used as a filter to retrieve the documents that match it, and are cached as a BitSet. This BitSet is then used to wrap the underlying Lucene reader, so only the documents that were returned from the query are considered live—that is, they exist on the index and have not been deleted.

As the live documents are retrieved from the reader to perform the knn query, only the documents available to the user will be considered. If there is a prefilter, the DLS documents will be added to it.

This means that DLS filtering works as a prefilter for approximate vector search, with the same performance implications and optimizations.

DLS with exact search will have the same benefits as applying any filter—the fewer documents that are retrieved from DLS, the more performant an exact search will be. Consider as well the number of documents returned by DLS—if DLS roles are very restrictive, you may consider using exact search instead of approximate search.

Benchmarking

At Elasticsearch, we want to make sure that vector search filtering is efficient. We have a specific benchmark for vector filtering that performs approximate vector searches with different filtering to ensure that vector search keeps retrieving relevant results as fast as possible.

Check the improvements when ACORN-1 was introduced. For tests where only 2% of the vectors pass the filter, query latency is reduced to 55% of the original duration:

Conclusion

Filtering is an integral part of search. Ensuring that filtering is performant in vector search, and understanding the tradeoffs and optimizations, is what makes or breaks an efficient and accurate search.

Filtering impacts performance for vector search:

Exact search is faster when using filtering. You should consider using exact search instead of approximate search if your filtering is restrictive enough. This is an automatic optimization in Elasticsearch.
Approximate search is slower when using prefiltering. Prefiltering allows us to get the top k results that match the filter, at the cost of slower search.
Postfiltering does not necessarily retrieve the top k results, as they may be filtered via the filter when it's applied.

Happy filtering!

Report an issue