This is a cache of https://www.elastic.co/search-labs/blog/jina-ai-embeddings-rerank-model-open-inference-api. It is a snapshot of the page at 2025-02-25T00:16:07.703+0000.
Elasticsearch Open Inference API adds support for Jina AI Embeddings and Rerank Model - Elasticsearch Labs

Elasticsearch Open Inference API adds support for Jina AI Embeddings and Rerank Model

Explore how to access Jina AI models using the Elasticsearch Open Inference API.

Our friends at Jina AI added native integration for Jina AI’s embedding models and reranking products to the Elasticsearch open Inference API. This includes support for industry-leading multilingual text embeddings and multilingual reranking—optimized for retrieval, clustering, and classification. This integration provides developers with a high-performance, cost-effective tool kit for AI information retrieval and semantic applications with Elasticsearch vector database and Jina AI.

With asymmetric embeddings for search and high-performance reranking models to enhance precision, Jina AI’s models put top-shelf AI in Elasticsearch applications without additional integration or development costs.

This post explores how to access Jina AI models using the Elasticsearch open Inference API.

About Jina AI Models

Founded in 2020, Jina AI is a leading search foundation company creating embeddings, rerankers, and small language models to help developers build reliable and high-quality multimodal search applications.

Jina Embeddings v3 is a multilingual embedding model from Jina AI that supports 8K tokens input length. Jina CLIP v2 is a multimodal and multilingual embedding model, supporting texts with 8K tokens and image inputs. Jina Reranker v2 is a neural reranker model, which is multilingual and post-trained, especially for agentic use cases. ReaderLM-v2 is a small language model that converts input data from various sources to Markdown or structured data formats suitable for interacting with LLMs.

Getting Started

We will be using the Kibana Dev Console to go through the setup. Alternatively, here is a Jupyter notebook to get you started.

First, you'll need a Jina AI API key. You can get a free key with a one million token usage limit here.

Jina AI makes several models available, but we recommend using the latest embedding model, jina-embeddings-v3, and their reranking model jina-reranker-v2-base-multilingual.

Step 1: Creating Jina AI inference API endpoint for generating embeddings

Create your text embedding inference endpoint in Elasticsearch by providing the service as jinaai. Use your Jina AI API key for api_key and model_id as jina-embeddings-v3 in service settings.

Let’s test our Jina AI endpoint to validate the configurations. To do this, let’s perform the inference on a sample text.

Step 2: Creating Jina AI inference API endpoint for reranking

Similarly, create a rerank task_type service named jina_rerank for use during the search. Use jinaai as the service name, your Jina AI API key for api_key, and model_id as jina-reranker-v2-base-multilingual in service settings.

The task_settings section of the API sets the maximum number of documents for jina_rerank to return with the top_n setting, set here to 10. The return_documents setting informs jina_rerank that it should return a full copy of the search candidate documents it identifies.

In the Kibana dev console, these commands should return a 200 response code indicating that the services are correctly configured.

Step 3: Generating Embeddings (automagically)

Let’s create an index configured to use the jina_embeddings to generate the embeddings. We will create an index named film_index and generate and store embeddings automatically with the semantic_text type using jina_embeddings as the value for inference_id.

Now, we can bulk-insert documents into the index. We are using the films dataset below for this tutorial, which contains information about six films. Each document is a JSON string with a field labeled blurb.

As the documents are indexed, drumroll please…. the Elasticsearch open inference API will call the jina_embeddings service to generate embeddings for the blurb text. Credits for this seamless developer experience go to the semantic_text type and Jina AI integration in Elasticsearch open inference API.

Step 4: Semantic Reranking

Now, you can search film_index using semantic embedding vectors. The API Call below will

  • Create an embedding for the query string “An inspiring love story” using the jina_embeddings service.
  • Compare the resulting embedding to the ones stored in film_index.
  • Return the stored documents whose blurb fields best match the query.

Now, let’s use jina_rerank. It will perform the same query-matching procedure as the one above, then take the 50 best matches (specified by the rank_window_size field) and use the jina_rerank service to do a more precise ranking of the results, returning the top 10 (as specified in the configuration of jina-rerank previously.)

RAG with Elasticsearch and Jina AI

As developers use Elasticsearch for their RAG use cases, the ability to use Jina AI’s search foundations natively in the inference API provides low-cost and seamless access to Jina AI’s search foundations. Developers can use this integration today in Elastic Cloud Serverless, and it will soon be available in the 8.18 version of Elasticsearch. Thank you, Jina AI team, for the contribution!

  • Try this notebook with an end-to-end example of using Inference API with the Jina AI models.
  • To learn more about Jina AI models, visit jina.ai and blog.

Elasticsearch has native integrations with industry-leading Gen AI tools and providers. Check out our webinars on going Beyond RAG Basics, or building prod-ready apps Elastic Vector Database.

To build the best search solutions for your use case, start a free cloud trial for a fully managed Elastic Cloud project or try Elastic on your local machine now in a few minutes with `curl -fsSL https://elastic.co/start-local | sh`

Elasticsearch has native integrations to industry leading Gen AI tools and providers. Check out our webinars on going Beyond RAG Basics, or building prod-ready apps Elastic Vector Database.

To build the best search solutions for your use case, start a free cloud trial or try Elastic on your local machine now.

Related content

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself