In this short blog, I’ll show you how to use a model from Hugging Face to perform semantic reranking in your own Elasticsearch cluster at search time. We will download the model using Eland, load a dataset from Hugging Face, and perform sample queries using retrievers, all in a Jupyter notebook.
Overview
If you are unfamiliar with Semantic text, check out these resources:
- What it is
- Why you would want to use it
- How to create an inference API and connect it to an external service
- How to use a retriever query for re-ranking
Please review the following links:
- What is semantic reranking and how to use it?
- Learn about the trade-offs using semantic reranking in search and RAG pipelines
- Semantic reranking in Elasticsearch with retrievers
- This blog includes a video presentation and an overview of everything you need to get started.
- Elastic Docs - Semantic re-ranking
- This excellent doc guide talks about use cases, encoder model types, and re-ranking in Elasticsearch
The code in this blog and accompanying notebook will also get you started, but we aren’t going to go in-depth on the what and why.
Also, note that I’ll show code snippets below, but the best way to do this yourself is to follow the accompanying notebook.
Step zero
I will also assume you have an Elasticsearch cluster or serverless project you will use for this guide. If not, head on over to cloud.elastic.co and sign up for a free trial! You'll need a Cloud ID and Elasticsearch API Key.
I’ll wait...
Model selection
The first (real) step is choosing a model to use for re-ranking. A deep discussion of selecting a model and evaluating results is outside the scope of this blog. Know that, for now, Elasticsearch only supports cross-encoder models.
While not directly covering model selection, the following blogs give a good overview of evaluating search relevance.
For the guide, we are going to use the cross-encoder/ms-marco-MiniLM-L-6-v2. This model used the MS Marco dataset for retrieval and re-ranking.
Model loading
To load an NLP model from Hugging Face into Elasticsearch, you will use the Eland Python Library.
Eland is Elastic's Python library for data frame analytics and loading supervised and NLP models into Elasticsearch. It offers a familiar Pandas-compatible API.
The code below is from the notebook section "Hugging Face Reranking Model."
model_id = "cross-encoder/ms-marco-MiniLM-L-6-v2"
cloud_id = "my_super_cloud_id"
api_key = "my_super_secred_api_key!"
!eland_import_hub_model \
--cloud-id $cloud_id \
--es-api-key $api_key \
--hub-model-id $model_id \
--task-type text_similarity
Eland doesn’t have a specific `rerank` task type; we use the text_similarity type to load the model.
This step will download the model locally where your code is running, split it apart, and load it into your Elasticsearch cluster.
Cut to
In the notebook, you can follow along to set up your cluster to run the re-ranking query in the next section. The setup steps after downloading the model shown in the notebook are:
- Create an Inference Endpoint with the rerank task
- This will also deploy our re-ranking model on Elasticsearch machine learning nodes
- Create an index mapping
- Download a dataset from Hugging Face - CShorten/ML-ArXiv-Papers
- Index the data into Elasticsearch
Re-rank time!
With everything set up, we can query using the text_similarity_reranker retriever. The text similarity reranker is a two-stage reranker. This means that the specified retrievers are run first, and then those results are passed to the second re-ranking stage.
Example from the notebook:
query = "sparse vector embedding"
# Query with Semantic Reranker
response_reranked = es.search(
index="arxiv-papers-lexical",
body={
"size": 10,
"retriever": {
"text_similarity_reranker": {
"retriever": {
"standard": {
"query": {
"match": {
"title": query
}
}
}
},
"field": "abstract",
"inference_id": "semantic-reranking",
"inference_text": query,
"rank_window_size": 100
}
},
"fields": [
"title",
"abstract"
],
"_source": False
}
)
The parameters for the text_similarity_reranker above are:
- `
retriever
- Here, we do a simple match query with a standard retriever for lexical first-stage retrieval. You can also use a knn retriever or an rrf retriever here. field
- The field from the first-stage results the re-ranking model will use for similarity comparisons.inference_id
- The ID of the inference service to use for re-ranking. Here, we are using the model we loaded earlier.inference_text
- The string to use for the similarity rankingrank_window_size
- The number of top documents from the first stage the model will consider.
You may wonder why `rank_window_size` is set to 100, even though you might ultimately want only the top 10 results.
In a two-stage search setup, the initial lexical search provides a broad set of documents for the semantic re-ranker to evaluate. Returning a larger set of 100 results increases the chances that relevant documents are available for the semantic re-ranker to identify and reorder based on semantic content, not just lexical matches. This approach compensates for the lexical search's limitations in capturing nuanced meaning, allowing the semantic model to sift through a broader range of possibilities.
However, finding the right `rank_window_size` is a balance. While a larger candidate set improves accuracy, it may also increase resource demands, so some tuning is necessary to achieve an optimal trade-off between recall and resources.
Comparison
While I’m not going to provide an in-depth analysis of the results in this short guide, What may be of general interest is to look at the top 5 results from a standard lexical match query and the results from the re-ranked query above.
This dataset contains a subset of ArXiv papers about Machine Learning. The results listed are the titles of the papers.
The “Scored Results” are the top 10 results using a standard retriever
The “Reranked Results” are the top 10 results after re-ranking
Scored Results | Reranked Results | |
---|---|---|
0 | Compact Speaker Embedding: lrx-vector | Scaling Up Sparse Support Vector Machines by Simultaneous Feature and Sample Reduction |
1 | Quantum Sparse Support Vector Machines | Spaceland Embedding of Sparse Stochastic Graphs |
2 | Sparse Support Vector Infinite Push | Elliptical Ordinal Embedding |
3 | The Sparse Vector Technique, Revisited | Minimum-Distortion Embedding |
4 | L-Vector: Neural Label Embedding for Domain Adaptation | Free Gap Information from the Differentially Private Sparse Vector and Noisy Max Mechanisms |
5 | Spaceland Embedding of Sparse Stochastic Graphs | Interpolated Discretized Embedding of Single Vectors and Vector Pairs for Classification, Metric Learning and Distance Approximation |
6 | Sparse Signal Recovery in the Presence of Intra-Vector and Inter-Vector Correlation | Attention Word Embedding |
7 | Stable Sparse Subspace Embedding for Dimensionality Reduction | Binary Speaker Embedding |
8 | Auto-weighted Mutli-view Sparse Reconstructive Embedding | NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization |
9 | Embedding Words in Non-Vector Space with Unsupervised Graph Learning | Estimating Vector Fields on Manifolds and the Embedding of Directed Graphs |
Your turn
Hopefully, you see how easy it is to incorporate a re-ranking model from Hugging Face into Elasticsearch so you can start re-ranking. While this isn't the only re-ranking option, it can be helpful when you are running air-gapped, don't have access to an external re-ranking service, wants to control costs or have a model that works particularly well for your dataset.
Try it Now!
You can try the accompanying notebook in a live workshop environment for free!
Click here to head over to the lab now.
If you haven't clicked on one of the many links to the accompanying notebook, now's the time!
Ready to try this out on your own? Start a free trial.
Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!