In this article, we'll cover how to integrate Alibaba Cloud AI features with Elasticsearch to improve relevance in semantic searches.
Alibaba Cloud AI Search is a solution that integrates advanced AI features with Elasticsearch tools, by leveraging the Qwen LLM family to contribute with advanced models for inference and classification. In this article, we'll use descriptions of novels and plays written by the same author to test the Alibaba reranking and sparse embedding endpoints.
Steps
- Configure Alibaba Cloud AI
- Create Elasticsearch mappings
- Index data into Elasticsearch
- Query data
- Bonus: Answering questions with completion
Configure Alibaba Cloud AI
Alibaba Cloud AI reranking and embeddings
Open inference Alibaba Cloud offers different services. In this example, we'll use the descriptions of popular books and plays by Agatha Christie to test Alibaba Cloud embeddings and reranking endpoints in semantic search.
The Alibaba Cloud AI reranking endpoint is a semantic reranking functionality. This type of reranking uses a machine learning model to reorder search results based on their semantic similarity to a query. This allows you to use out-of-the-box semantic search capabilities on existing full-text search indices.
The sparse embedding endpoint is a type of embedding where most values are zero, making relevant information more prominent.
Get Alibaba Cloud API Key
We need a valid API Key to integrate Alibaba with Elasticsearch. To get it, follow these steps:
- Access the Alibaba Cloud portal from the Service Plaza section.
- Go to the left menu API Keys as shown below.
- Generate a new API Key.

Configure Alibaba Endpoints
We´ll first configure the sparse embedding endpoint to transform the text descriptions into semantic vectors:
Embeddings endpoint:
We´ll then configure the rerank endpoint to reorganize results.
Rerank Endpoint:
Now that the endpoints are configured, we can prepare the Elasticsearch index.
Create Elasticsearch mappings
Let's configure the mappings. For this, we need to organize both the texts with the descriptions as well as the model-generated vectors.
We'll use the following properties:
- semantic_description: to store the embeddings generated by the model and run semantic searches.
- description: we'll use a "text" type to store the novels and plays’ descriptions and use them for full-text search.
We'll include the copy_to parameter so that both the text and the semantic field are available for hybrid search:
With the mappings ready, we can now index the data.
Index data into Elasticsearch
Here's the dataset with the descriptions that we'll use for this example. We'll index it using the Elasticsearch Bulk API.
Note that the first two documents, “Black Coffee” and “The Mousetraps” are plays while the others are novels.
Query data
To see the different results we can get, we'll run different types of queries, starting with semantic query, then applying reranking, and finally using both. We'll use the same question "Which novel was written by Agatha Christie?" expecting to get the three documents that explicitly say novel, plus the one that says book. The two plays should be the last results.
Semantic search
We'll begin querying the semantic_text field to ask: "Which novel was written by Agatha Christie?" Let's see what happens:
Response:
In this case, the response prioritized most of the novels, but the document that says “book” appears last. We can still further refine the results with reranking.
Refining results with Reranking
In this case, we'll use a _inference/rerank
request to assess the documents we got in the first query and improve their rank in the results.
Response:
The response here shows that both plays are now at the bottom of the results.
Semantic search and reranking endpoint combined
Using a retriever, we'll combine the semantic query and reranking in just one step:
Response:
The results here differ from the semantic query. We can see that the document with no exact match for "novel" but that says “book” (The Murder of Roger Ackroyd) appears higher than in the first semantic search. Both plays are still the last results, just like with reranking.
Bonus: Answering questions with completion
With embeddings and reranking we can satisfy a search query, but still, the user will see all the search results and not the actual answer.
With the examples provided, we are one step away from a RAG implementation, where we can provide the top results + the question to an LLM to get the right answer.
Fortunately, Alibaba Cloud AI Service also provides an endpoint service we can use to achieve this purpose.
Let’s create the endpoint
Completion Endpoint:
And now, send the results and question from the previous query:
Query
Response
Conclusion
Integrating Alibaba Cloud AI Search with Elasticsearch allows us to easily access completion, embedding, and reranking models to incorporate them into our search pipeline.
We can use the reranking and embedding endpoints, either separately or together, with the help of a retriever.
We can also introduce the completion endpoint to finish up a RAG end-to-end implementation.
Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!
Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.