LlamaIndex and Elasticsearch Rerankers: Unbeatable simplicity

Elasticsearch has native integrations to industry leading Gen AI tools and providers. Check out our webinars on going Beyond RAG Basics, or building prod-ready apps Elastic Vector Database.

To build the best search solutions for your use case, start a free cloud trial or try Elastic on your local machine now.

In this article, we are going to explore how to use the Llamaindex RankGPT Reranker and the built-in Elasticsearch semantic reranker. Elastic provides an out-of-the-box experience to deploy and use rerankers as part of the retrievers pipeline in a scalable way without additional effort.

Originally, reranking in Elasticsearch required multiple steps, but now it’s integrated directly into the retrievers pipeline: the first stage runs the search query, and the second stage reranks the results, as shown in the image below:

What is reranking?

Reranking is the process of using an expensive mechanism to push the most relevant documents to the top of the results after retrieving a set of documents that are relevant to the user query.

There are many strategies to rerank documents using specialized cross-encoder models, like the Elastic Rerank mode, or Cross encoder for MS Marco (cross-encoder/ms-marco-MiniLM-L6-v2). Other approaches involve using an LLM for reranking. One of the advantages of the Elastic Rerank model is that it can be used as part of a semantic search pipeline or as a standalone tool to improve existing BM25 scoring systems.

A reranker needs a list of candidates and a user query to reorganize the candidates from most to least relevant based on the user query.

In this article, we will explore the Llamaindex RankGPT Reranker, which is a RankGPT reranker implementation, and the Elastic Semantic Reranker, using the Elastic Rerank model.

The complete example is available in this notebook.

Steps

Products index

Let’s create a reranker for laptops based on a user’s question. If a user is a hardcore gamer, they should get the most powerful machines. If they are a student, they might be okay with the lighter ones.

Let’s start with creating some documents in our Notebook:

User question

Let's define the question we are going to use to rerank the results.

LlamaIndex reranking

Install dependencies and import packages

We install all the dependencies needed to execute the RankGPT reranker of Llamaindex and Elasticsearch for document retrieval. Then, we load the laptops into an ElasticsearchStore, which is the LlamaIndex abstraction for the Elasticsearch vector database, and retrieve them using the VectorStoreIndex class.

Setup keys

Elasticsearch client

We instantiate the Elasticsearch client to index documents and run queries against our cluster.

Mappings

We are going to use regular text fields for full-text search and also create a semantic_field with a copy of all the content so we can run semantic and hybrid queries. In Elasticsearch 8.18+, an inference endpoint will be deployed automatically.

Indexing data to LlamaIndex

Create an ElasticsearchStore from the array of products we defined. This will create an Elasticsearch vector store that we can consume later using VectorStoreIndex.

LLM setup

Define the LLM that will work as a reranker:

Rerank feature

We now create a function that executes a retriever to get the most similar documents to the user question from the vector index, then applies a RankGPTRerank reranking on top and finally returns the documents reordered.

We also create a function to format the resulting documents.

Without rerank

We first run the request without reranking.

Answer:

With rerank

Now we enable reranking, which will execute the same vector search and then rerank the results using an LLM by applying the Best laptops for gaming criteria to the top 5 results. We can see subtle differences, like the Intel Core i7 processor being pushed to the bottom and the Alienware m18 being promoted to position 2.

Answer:

Elasticsearch semantic reranking

Inference rerank endpoint

Create an inference endpoint that we can call in a standalone fashion to re-rank a list of candidates based on a query or when used as part of a retriever:

We define a function to execute search queries and then parse the hit back.

As with LlamaIndex, we create a function to format the resulting documents.

Semantic query

We will start with a semantic query to return the most similar results to the user’s question.

Query results:

Query result:

1. 1399.99 - Acer Predator Helios 300 (4.5) ['Intel Core i7', 'RTX 3060', '16GB RAM', '512GB SSD', '144Hz Display']

2. 2999.99 - Alienware m18 (4.8) ['Intel Core i9', 'RTX 4090', '32GB RAM', '2TB SSD', '480Hz Display']

3. 2799.99 - MSI Stealth 17 (4.8) ['Intel Core i9', 'RTX 4080', '32GB RAM', '1TB SSD', '4K Display']

4. 1999.99 - Gigabyte AORUS 17 (4.6) ['Intel Core i9', 'RTX 4070', '16GB RAM', '1TB SSD', '360Hz Display']

5. 1599.99 - HP Omen 16 (4.4) ['AMD Ryzen 7', 'RTX 3060', '16GB RAM', '512GB SSD', '165Hz Display']

In the following table, we can see a position comparison across the different tests:

Laptop model	Llama (no rerank)	Llama (with rerank)	Elastic (no rerank)	Elastic (with rerank)
Razer Blade 15	1	5	-	-
ASUS ROG Strix G16	2	1	-	-
Gigabyte AORUS 17	3	4	5	4
MSI Stealth 17	4	3	2	3
Alienware m18	5	2	1	2
HP Omen 16	-	-	3	5
Acer Predator Helios 300	-	-	4	1

Legend: A dash (-) indicates the item did not appear in the top 5 for that method.

It maintains consistency by keeping high-end laptops, like the Alienware m18 and MSI Stealth 17, in the top positions—just like LlamaIndex reranking—while achieving a better quality-price balance.

Conclusion

Rerankers are a powerful tool to increase the quality of our search systems and ensure we always retrieve the most important information for each user’s question.

LlamaIndex offers a variety of reranker strategies using specialized models, or LLMs. In their simplest implementation, you can create an in-memory vector store and store your documents locally, then retrieve and rerank, or use Elasticsearch as the vector store for persistence.

Elasticsearch, on the other hand, provides an out-of-the-box inference endpoints framework where you can use rerankers as part of the retrieval pipeline or as a standalone endpoint. You can also choose from many providers like Elastic itself, Cohere, Jina, or Alibaba, or deploy any third-party compatible model. With the simplest implementation of Elasticsearch, both your documents and your reranking model live on your Elasticsearch cluster, allowing you to scale.

Report an issue