elasticsearch has native integrations to industry leading Gen AI tools and providers. Check out our webinars on going Beyond RAG Basics, or building prod-ready apps Elastic Vector Database.
To build the best search solutions for your use case, start a free cloud trial or try Elastic on your local machine now.
Elastic announced Elastic Rerank in December 2024, which brings powerful semantic search capabilities with no required reindexing—delivering high relevance, top performance, and efficiency. The core set of capabilities powered by Elastic is now even more flexible, allowing developers to bring their own models from Cohere, Vertex AI, Hugging Face, Jina AI and now IBM watsonx.ai. With our open Inference API, you get the control and choice to integrate, test, and optimize reranking for your needs.
Along with support for IBM watsonx™ Slate embedding models, elasticsearch vector database powers watsonx Assistant for Conversational Search—now with semantic reranking for even better answer quality.
Reranking refines LLM responses by prioritizing the most relevant documents using advanced scoring methods, ensuring accurate responses in multi-stage retrieval, and making it broadly applicable even for datasets that you don’t want to reindex or remap.
IBM watsonx offers high-quality reranker models that accurately score and prioritize passages based on query relevance, helping refine search results for better precision. These models enhance tasks like semantic search and document comparison, making them essential for delivering highly relevant answers in AI-driven retrieval systems.
In this blog, we’ll explore how to use IBM watsonx™ reranking when building search experiences in the elasticsearch vector database to reorder search results by meaning, giving you sharper, more context-aware answers without altering your existing index.
How reranking can create powerful search experiences
Semantic reranking is crucial because users expect the best answers at the top, and GenAI models require accurate results to avoid generating incorrect information. Semantic reranking provides consistent scoring, ensuring the most relevant documents are used by AI models and enabling effective cutoff points to prevent hallucinations.
Prerequisites & Creation of an Inference Endpoint
Create an elasticsearch Serverless Project.
elasticsearch Cloud Serverless offers fast query execution and seamless integration with the open Inference API, making it ideal for deploying reranking without infrastructure overhead.
Generate an API key in IBM Cloud
- Go to IBM watsonx.ai Cloud and log in using your credentials. You will land on the welcome page.

- Go to the API keys page.
- Create an API key.
Steps in elasticsearch
Using DevTools in Kibana, create an inference endpoint using the watsonxai service for reranking. This example uses the MS Marco MiniLM L-12 v2 model which is supported by IBM, for ensuring high relevance in passage retrieval.
You will receive the following response on the successful creation of the inference endpoint:
Let us now create an index.
Next, insert data into the created index.
Next, let’s search using a text_similarity_reranker
retriever, which enhances search results by reranking documents based on semantic similarity to a specified inference text, using an ML model.
The retriever helps you configure both the retrieval and reranking of search results in a single API call.
Next, let’s verify the returned result.
The passages are now reordered to show the passages with the highest scores first. In this example, lexical retrieval initially selected The Avengers (_id: 3) and Star Wars (_id: 2) based on word matches—“lightning” in one and “feeling” in the other. This approach considers surface-level overlaps and keywords.
IBM watsonx.ai rerank then re-evaluated the results based on context, ranking The Avengers higher because “lightning” directly aligned with the query "feeling lightning." This demonstrates that by prioritizing meaning over simple keyword matches, reranking ensures more relevant search results.
Try semantic reranking with watsonx and elasticsearch today
With the integration of IBM watsonx™ rerank models, the elasticsearch Open Inference API continues to empower developers with enhanced capabilities for building powerful and flexible AI-powered search experiences. Explore more supported encoder foundation models available with watsonx.ai.
Additionally, use IBM watsonx Assistant’s new Conversational Search feature and IBM watsonx Discovery today. Visit IBM watsonx Discovery to learn more about this new capability using elasticsearch. You can follow these steps for setup and integration with IBM watsonx Assistants.