Our friends at Jina AI added native integration for Jina AI’s embedding models and reranking products to the Elasticsearch open Inference API. This includes support for industry-leading multilingual text embeddings and multilingual reranking—optimized for retrieval, clustering, and classification. This integration provides developers with a high-performance, cost-effective tool kit for AI information retrieval and semantic applications with Elasticsearch vector database and Jina AI.
With asymmetric embeddings for search and high-performance reranking models to enhance precision, Jina AI’s models put top-shelf AI in Elasticsearch applications without additional integration or development costs.
This post explores how to access Jina AI models using the Elasticsearch open Inference API.
About Jina AI Models
Founded in 2020, Jina AI is a leading search foundation company creating embeddings, rerankers, and small language models to help developers build reliable and high-quality multimodal search applications.
Jina Embeddings v3 is a multilingual embedding model from Jina AI that supports 8K tokens input length. Jina CLIP v2 is a multimodal and multilingual embedding model, supporting texts with 8K tokens and image inputs. Jina Reranker v2 is a neural reranker model, which is multilingual and post-trained, especially for agentic use cases. ReaderLM-v2 is a small language model that converts input data from various sources to Markdown or structured data formats suitable for interacting with LLMs.
Getting Started
We will be using the Kibana Dev Console to go through the setup. Alternatively, here is a Jupyter notebook to get you started.
First, you'll need a Jina AI API key. You can get a free key with a one million token usage limit here.
Jina AI makes several models available, but we recommend using the latest embedding model, jina-embeddings-v3
, and their reranking model jina-reranker-v2-base-multilingual
.
Step 1: Creating Jina AI inference API endpoint for generating embeddings
Create your text embedding inference endpoint in Elasticsearch by providing the service as jinaai
. Use your Jina AI API key for api_key
and model_id as jina-embeddings-v3
in service settings.
Let’s test our Jina AI endpoint to validate the configurations. To do this, let’s perform the inference on a sample text.
Step 2: Creating Jina AI inference API endpoint for reranking
Similarly, create a rerank task_type service named jina_rerank
for use during the search. Use jinaai
as the service name, your Jina AI API key for api_key
, and model_id as jina-reranker-v2-base-multilingual
in service settings.
The task_settings
section of the API sets the maximum number of documents for jina_rerank
to return with the top_n
setting, set here to 10. The return_documents
setting informs jina_rerank
that it should return a full copy of the search candidate documents it identifies.
In the Kibana dev console, these commands should return a 200 response code indicating that the services are correctly configured.
Step 3: Generating Embeddings (automagically)
Let’s create an index configured to use the jina_embeddings
to generate the embeddings. We will create an index named film_index
and generate and store embeddings automatically with the semantic_text
type using jina_embeddings
as the value for inference_id
.
Now, we can bulk-insert documents into the index. We are using the films
dataset below for this tutorial, which contains information about six films. Each document is a JSON string with a field labeled blurb
.
As the documents are indexed, drumroll please…. the Elasticsearch open inference API will call the jina_embeddings
service to generate embeddings for the blurb
text. Credits for this seamless developer experience go to the semantic_text type and Jina AI integration in Elasticsearch open inference API.
Step 4: Semantic Reranking
Now, you can search film_index
using semantic embedding vectors. The API Call below will
- Create an embedding for the query string “An inspiring love story” using the
jina_embeddings
service. - Compare the resulting embedding to the ones stored in
film_index
. - Return the stored documents whose
blurb
fields best match the query.
Now, let’s use jina_rerank
. It will perform the same query-matching procedure as the one above, then take the 50 best matches (specified by the rank_window_size
field) and use the jina_rerank
service to do a more precise ranking of the results, returning the top 10 (as specified in the configuration of jina-rerank
previously.)
RAG with Elasticsearch and Jina AI
As developers use Elasticsearch for their RAG use cases, the ability to use Jina AI’s search foundations natively in the inference API provides low-cost and seamless access to Jina AI’s search foundations. Developers can use this integration today in Elastic Cloud Serverless, and it will soon be available in the 8.18 version of Elasticsearch. Thank you, Jina AI team, for the contribution!
- Try this notebook with an end-to-end example of using Inference API with the Jina AI models.
- To learn more about Jina AI models, visit jina.ai and blog.
Elasticsearch has native integrations with industry-leading Gen AI tools and providers. Check out our webinars on going Beyond RAG Basics, or building prod-ready apps Elastic Vector Database.
To build the best search solutions for your use case, start a free cloud trial for a fully managed Elastic Cloud project or try Elastic on your local machine now in a few minutes with `curl -fsSL https://elastic.co/start-local | sh`
Elasticsearch has native integrations to industry leading Gen AI tools and providers. Check out our webinars on going Beyond RAG Basics, or building prod-ready apps Elastic Vector Database.
To build the best search solutions for your use case, start a free cloud trial or try Elastic on your local machine now.