<strong>elasticsearch</strong> open inference API adds support for IBM watsonx.ai rerank models

elasticsearch has native integrations to industry leading Gen AI tools and providers. Check out our webinars on going Beyond RAG Basics, or building prod-ready apps Elastic Vector Database.

To build the best search solutions for your use case, start a free cloud trial or try Elastic on your local machine now.

Elastic announced Elastic Rerank in December 2024, which brings powerful semantic search capabilities with no required reindexing—delivering high relevance, top performance, and efficiency. The core set of capabilities powered by Elastic is now even more flexible, allowing developers to bring their own models from Cohere, Vertex AI, Hugging Face, Jina AI and now IBM watsonx.ai. With our open Inference API, you get the control and choice to integrate, test, and optimize reranking for your needs.

Along with support for IBM watsonx™ Slate embedding models, elasticsearch vector database powers watsonx Assistant for Conversational Search—now with semantic reranking for even better answer quality.

Reranking refines LLM responses by prioritizing the most relevant documents using advanced scoring methods, ensuring accurate responses in multi-stage retrieval, and making it broadly applicable even for datasets that you don’t want to reindex or remap.

IBM watsonx offers high-quality reranker models that accurately score and prioritize passages based on query relevance, helping refine search results for better precision. These models enhance tasks like semantic search and document comparison, making them essential for delivering highly relevant answers in AI-driven retrieval systems.

In this blog, we’ll explore how to use IBM watsonx™ reranking when building search experiences in the elasticsearch vector database to reorder search results by meaning, giving you sharper, more context-aware answers without altering your existing index.

How reranking can create powerful search experiences

Semantic reranking is crucial because users expect the best answers at the top, and GenAI models require accurate results to avoid generating incorrect information. Semantic reranking provides consistent scoring, ensuring the most relevant documents are used by AI models and enabling effective cutoff points to prevent hallucinations.

Prerequisites & Creation of an Inference Endpoint

Create an elasticsearch Serverless Project.

elasticsearch Cloud Serverless offers fast query execution and seamless integration with the open Inference API, making it ideal for deploying reranking without infrastructure overhead.

Generate an API key in IBM Cloud

Go to IBM watsonx.ai Cloud and log in using your credentials. You will land on the welcome page.

Go to the API keys page.
Create an API key.

Steps in elasticsearch

Using DevTools in Kibana, create an inference endpoint using the watsonxai service for reranking. This example uses the MS Marco MiniLM L-12 v2 model which is supported by IBM, for ensuring high relevance in passage retrieval.

You will receive the following response on the successful creation of the inference endpoint:

Let us now create an index.

Next, insert data into the created index.

Next, let’s search using a text_similarity_reranker retriever, which enhances search results by reranking documents based on semantic similarity to a specified inference text, using an ML model.

The retriever helps you configure both the retrieval and reranking of search results in a single API call.

Next, let’s verify the returned result.

The passages are now reordered to show the passages with the highest scores first. In this example, lexical retrieval initially selected The Avengers (_id: 3) and Star Wars (_id: 2) based on word matches—“lightning” in one and “feeling” in the other. This approach considers surface-level overlaps and keywords.

IBM watsonx.ai rerank then re-evaluated the results based on context, ranking The Avengers higher because “lightning” directly aligned with the query "feeling lightning." This demonstrates that by prioritizing meaning over simple keyword matches, reranking ensures more relevant search results.

Try semantic reranking with watsonx and elasticsearch today

With the integration of IBM watsonx™ rerank models, the elasticsearch Open Inference API continues to empower developers with enhanced capabilities for building powerful and flexible AI-powered search experiences. Explore more supported encoder foundation models available with watsonx.ai.

Additionally, use IBM watsonx Assistant’s new Conversational Search feature and IBM watsonx Discovery today. Visit IBM watsonx Discovery to learn more about this new capability using elasticsearch. You can follow these steps for setup and integration with IBM watsonx Assistants.

Report an issue

Related content

ECK made simple: Deploying Elasticsearch on GCP GKE Autopilot

How To

June 19, 2025

ECK made simple: Deploying elasticsearch on GCP GKE Autopilot

Learn how to deploy an elasticsearch cluster on GCP using GKE Autopilot and ECK.

By: Eduard Martin

The current state of MCP (Model Context Protocol)

Developer Experience Generative AI

June 12, 2025

The current state of MCP (Model Context Protocol)

Learn about MCP, project updates, features, security challenges, emerging use-cases, and how to tinker around with Elastic’s elasticsearch MCP server.

By: JD Armada

Using Azure LLM Functions with Elasticsearch for smarter query experiences

Integrations How To

June 13, 2025

Using Azure LLM Functions with elasticsearch for smarter query experiences

Try out the example real estate search app that uses Azure Gen AI LLM Functions with elasticsearch to provide flexible hybrid search results. See step-by-step how to configure and run the example app in GitHub Codespaces.

JS JW

By: Jonathan Simon and James Williams

ES|QL How To

June 11, 2025

Geospatial distance search with ES|QL

Exploring geospatial distance search in elasticsearch Query Language (ES|QL), one of the most desired and useful features in elasticsearch's geospatial search and in ES|QL.

By: Craig Taverner