Elasticsearch vector database for native grounding in Google Cloud’s Vertex AI Platform

Elastic is thrilled to announce that the Elasticsearch vector database is now integrated into Google Cloud’s Vertex AI platform as a natively supported information retrieval engine, empowering users to leverage the multimodal strengths of Google’s Gemini models with the advanced AI-powered semantic and hybrid search capabilities of Elasticsearch.

Developers can now create their RAG applications within a unified journey, grounding their chat experiences on their private data in a low-code, flexible way. Whether you’re building AI agents for your customers and internal employees or leveraging LLMs generation within your software, the Vertex AI platform puts Elasticsearch relevance at your fingertip with minimal configuration. This integration allows easier and faster adoption for Gemini models in production use cases, driving GenAI from PoCs to real-life scenarios.

In this blog, we will walk you through integrating Elasticsearch with Google Cloud’s Vertex AI platform for seamless data grounding and building fully customizable GenAI applications. Let’s discover how.

Google Cloud’s Vertex AI and Gemini models grounded on your data with Elasticsearch

Users leveraging Vertex AI services and tools for creating GenAI applications can now access the new “Grounding” option to bring their private data into their conversational interaction automatically. Elasticsearch is now part of this feature and could be used via both:

Vertex AI LLM APIs, which directly enrich Google’s Gemini models at generation time (preferred);
Grounded Generation API, used instead in the Vertex AI Agent Builder ecosystem to build agentic experiences.

With this integration, Elasticsearch – the most downloaded and deployed vector database – will bring your relevant enterprise data wherever it’s needed in your internal end customer-facing chats, which is crucial for the real-life adoption of GenAI into business processes.

The aforementioned APIs will allow developers to adopt this new partnered feature in their code. However, prompt engineering and testing remain crucial steps in the application development and serve as an initial discovery playground. To support this, Elasticsearch is designed for easy evaluation by users within the Vertex AI Studio console tool.

All it takes is a few simple steps to configure the Elastic endpoints with the desired parameters (index to be searched, the number of documents to be retrieved, and the desired search template) within the “Customize Grounding” tab in the UI, as shown below. Now you’re ready to generate with your private knowledge!

Google Cloud’s Vertex AI and Gemini models grounded on your data with Elasticsearch

Production-ready GenAI applications with ease

Elastic and Google Cloud work to provide developer-first, comprehensive, and enjoyable experiences. Connecting to Elastic natively in both LLM and Grounding Generation API reduces complexity and overhead while building GAI applications on Vertex AI, avoiding unnecessary additional APIs and data orchestration while grounding in just one unified call.

Let’s see how it works in both scenarios.

The first example is executed with the LLM API:

In the above example, with the retrieval field of the API requesting content generation to Gemini 2.0 Flash, we can contextually set a retrieval engine for the request. Setting api_spec to “ELASTIC_SEARCH” enables the usage of additional configuration parameters such as the API Key and the cluster endpoint (needed to route a request to your Elastic cluster), the index to retrieve data from, and the Search template to be used for your search logic.

Similarly, the same outcome could be achieved with the Grounding Generation API, setting the groundingSpec parameter:

With both approaches, the response will provide an answer with the most relevant private documents found in Elasticsearch – and the related connected data sources – to support your query.

Simplicity, however, should not be confused with a lack of personalization to fulfill your specific needs and use cases. With this in mind, we designed it to allow you to perfectly adapt the search configuration to your scenario.

Fully customizable search at your fingertips: search templates

To provide maximum customization to your search scenario, we’ve built, in collaboration with Google Cloud,the experience on top of our well-known Search Templates. Elasticsearch search templates are an excellent tool for creating dynamic, reusable, and maintainable search queries. They allow you to predefine and reuse query structures. They are particularly useful when executing similar queries with different parameters, as they save development time and reduce the chance of errors. Templates can include placeholders for variables, making the queries dynamic and adaptable to different search requirements.

While using Vertex AI APIs and Elasticsearch for grounding, you must reference a desired search template – as shown in the code snippets above – where the search logic is implemented and pushed down to Elasticsearch. Elastic power users can asynchronously manage, configure, and update the search approaches and tailor them to the specific indices, models, and data in a fully transparent way for Vertex AI users, web-app developers, or AI engineers, who only need to specify the name of the template in the grounding API.

This design allows for complete customization, putting the extensive Elasticsearch retrieval features at your disposal in a Google Cloud AI environment while ensuring modularity, transparency, and ease of use for different developers, even those unfamiliar with Elastic.

Whenever you need BM25 search, semantic search, or a hybrid approach between the two (Have you explored retrievers already? Composable retrieval techniques in a single search API call), you can define your custom logic in a search template, which Vertex AI can automatically leverage.

This also applies to embeddings and reranking models you choose to manage vectors and results. Depending on your use case, you may want to host models on Elastic’s ML nodes, use a third-party service endpoint through the Inference API, or run your local model on-prem. This is doable via a search template, and we’ll see how it works in the next section.

Start with reference templates, then build your own

To help you get started quickly, we’ve provided a set of compatible search template samples to be used as an initial reference; you can then modify and build your custom ones upon:

Semantic Search with ELSER model (sparse vectors and chunking)
Semantic Search with e5 multilingual model (dense vectors and chunking)
Hybrid Search with Vertex AI text-embedding model

You can find them in this GitHub repo.

Let’s look at one example: creating embeddings with Google Cloud’s Vertex AI APIs on a product catalog. First, we need to create the search template in Elasticsearch as shown below:

In this example, we will execute KNN search on two fields within one single search: title_embedding – the vector field containing the name of the product – and description_embedding – the one containing the representation of its description.

You can leverage the excludes syntax to avoid returning unnecessary fields to the LLM, which may cause noise in its processing and impact the quality of the final answer. In our example, we excluded the fields containing vectors and image urls.

Vectors are created on the fly at query time on the submitted input via an inference endpoint to the Vertex AI embeddings API, googlevertexai_embeddings_004, previously defined as follows:

You can find additional information on how to use Elastic’s Inference API here.

We’re now ready to test our templated search:

The params fields will replace the variables we set in the template scripts in double curl brackets. Currently, Vertex AI LLM and Grounded Generation APIs can send to Elastic the following input variables:

“query” - the user query to be searched
“index_name” - the name of the index where to search
“num_hits” - how many documents we want to retrieve in the final output

Here’s a sample output:

The above query is precisely what Google Cloud’s Vertex AI will run on Elasticsearch behind the scenes when referring to the previously created search template. Gemini models will use the output documents to ground its answer: when you ask “What do I need to patch my drywall?” instead of getting a generic suggestion, the chat agent will provide you with specific products!

End-to-end GenAI journey with Elastic and Google Cloud

Elastic partners with Google Cloud to create production-ready, end-to-end GenAI experiences and solutions. As we’ve just seen, Elastic is the first ISV to be integrated directly into the UI and SDK for the Vertex AI platform, allowing seamless, grounded Gemini models prompts and agents using our vector search features. Moreover, Elastic integrates with Vertex AI and Google AI Studio’s embedding, reranking, and completion models to create and rank vectors without leaving the Google Cloud landscape, ensuring Responsible AI principles. By supporting multimodal approaches, we jointly facilitate applications across diverse data formats.

You can tune, test, and export your GenAI search code via our Playground.

But it’s not just about building search apps: Elastic leverages Gemini models to empower IT operations, such as in the Elastic AI Assistants, Attack Discovery, and Automatic Import features, reducing daily fatigue for security analysts and SREs on low-value tasks, and allowing them to focus on improving their business. Elastic also enables comprehensive monitoring of Vertex AI usage, tracking metrics and logs, like response times, tokens, and resources, to ensure optimal performance. Together, we manage the complete GenAI lifecycle, from data ingestion and embedding generation to grounding with hybrid search, while ensuring robust observability and security of GenAI tools with LLM-powered actions.

Explore more and try it out!

Are you interested in trying this out? The feature is currently available in Public Preview!

If you haven’t already, one of the easiest ways to get started with Elastic Search AI Platform and explore our capabilities is with your free Elastic Cloud trial or by subscribing through Google Cloud Marketplace.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all. Elastic, Elasticsearch and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.

Elasticsearch has native integrations to industry leading Gen AI tools and providers. Check out our webinars on going Beyond RAG Basics, or building prod-ready apps Elastic Vector Database.

To build the best search solutions for your use case, start a free cloud trial or try Elastic on your local machine now.

Report an issue

Elasticsearch vector database for native grounding in Google Cloud’s Vertex AI Platform

Google Cloud’s Vertex AI and Gemini models grounded on your data with Elasticsearch

Production-ready GenAI applications with ease

Fully customizable search at your fingertips: search templates

Start with reference templates, then build your own

End-to-end GenAI journey with Elastic and Google Cloud

Explore more and try it out!

Related content

Getting Started with the Elastic Chatbot RAG app using Vertex AI running on Google Kubernetes Engine

Using CrewAI with Elasticsearch

Exploring GPU-accelerated Vector Search in Elasticsearch with NVIDIA

Configurable chunking settings for inference API endpoints

Ingesting data with BigQuery

Ready to build state of the art search experiences?