This is a cache of https://www.elastic.co/search-labs/blog/mcp-intelligent-search. It is a snapshot of the page at 2025-09-09T01:04:43.521+0000.
MCP for intelligent search - Elasticsearch Labs

MCP for intelligent search

Building an intelligent search system by integrating Elastic's intelligent query layer with MCP to enhance the generative efficacy of LLMs.

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

When retrieving relevant documents from a database, the query structure makes all the difference. Even small adjustments to the search’s shape may significantly impact both the accuracy of your results and the speed at which they are returned. This is especially true when working with AI agents, as Large Language Models (LLMs) are inherently unpredictable in how they generate responses.

Unfortunately, if you're solely relying on an LLM to create the most critical part of your database search, you're essentially building on an unstable foundation. That's where the Intelligent Query Layer, which was introduced in an earlier blog, comes into play. By combining the reasoning power of LLMs with the precise search capabilities of Elasticsearch, we can create queries that leverage the best of both worlds while achieving near-perfect recall.

We've now adapted this proven approach to work within the MCP framework, and the improvements have been substantial. In this blog post, we'll walk through a practical example using a Python MCP server that demonstrates effective property search by combining MCP with Elasticsearch search templates, all orchestrated through an LLM.

Demo

This MCP server can be queried via Claude Desktop to search for properties using a generated search template. 

To replicate this environment, you must first launch the MCP server. You can find the server's source code and setup instructions in the elastic-property-mcp GitHub repository.

Setup and configuration

This example comprises a simple MCP server, available in this repo, that interacts with Claude Desktop.

To run it, please ensure the following prerequisites are met:

The steps to initialize the project and API keys are listed in the installation section of the project’s README.

Data ingestion

The data consists of property listings in the state of Florida in JSON format. Each property has a text description, values for bedrooms and bathrooms, square footage, address, and price. Here is a sample entry:

The ingest_properties.py script serves as a data pipeline to prepare and load property listings into Elasticsearch. It is executed with the following command:

A breakdown of the different tasks accomplished when this file is executed is discussed in subsequent sections.

Index configuration and creation

Before loading any data, the script creates an Elasticsearch index named properties with specific field mappings. These may be viewed in the properties_index_mappings.json file.

Below is a sample of the mappings for the index:

Search template

The search_temlpate.mustache file contains the main search structure used to query properties.

Elasticsearch search templates are pre-defined search queries that utilize the Mustache templating language to enable dynamic search parameters. They allow applications to run complex searches without needing to understand or directly interact with the full Elasticsearch query syntax. These templates are stored within an Elasticsearch instance and can be executed by providing the template ID along with any necessary variables.

Search templates promote the reuse of common search patterns across different parts of an application or various user interfaces. Changes to the underlying search logic can be made within the template itself, without requiring modifications to the application's code. Variables within the Mustache template (e.g., {{house_price}}) allow dynamic values to be passed at runtime, enabling flexible and customizable searches based on user input or other criteria.

For this demonstration, the LLM is given agency to pass the variables it deems necessary into a search template and execute the resulting query. This is what gives LLMs the power to perform extremely precise search operations with high recall.

Semantic text enhancement

A key feature of the ingestion process is the creation of semantic search capabilities with the simple addition of a field type. For each property record, the script copies the content of the body_content to a field with the semantic_text type. This automatically embeds the text into a vector for semantic search. Querying this field for semantic context is as simple as running any normal text query—semantic search is performed under the hood, while normal text queries are all that the developers need to create.

Here you can see the body_content being copied over to body_content_semantic_text, which automatically embeds the text into a vector.

This is an example of querying the body_content_semantic_text field with plain text. The query is converted to a query vector, and similarity search is performed with no further configuration or tooling.

Bulk indexing operations

To optimize performance, the script uses Elasticsearch's parallel bulk API to index multiple documents simultaneously:

This significantly speeds up the insertion of large quantities of documents into Elasticsearch.

System architecture

The layered architecture behind the intelligent search system is built using the FastMCP framework for interaction and orchestration, as shown in the architecture diagram below.

At the top, Claude manages the execution of LLM chains, which include tools, prompt functions, and external API calls.

The orchestration layer, powered by the Elastic Python MCP, coordinates tool invocation, API access, and response formatting. This layer acts as a bridge between user queries and downstream services.

MCP offers the LLM tools to execute at will. When the tools are called, Elasticsearch queries are triggered or geolocation searches are performed to enhance the search template with geospatial coordinates.

The results are returned to the LLM, which is empowered to create natural language responses.

This modular design allows the system to combine precise search capabilities with deep reasoning, enabling it to process natural language queries and return highly relevant and structured responses.

Component roles

Diving further into the architecture, each component provides the following features:

Claude Desktop

Claude Desktop serves as the primary user interface and interaction hub in this MCP server project. It handles real-time message streaming to ensure smooth, responsive communication. It also manages user session state throughout the conversation, maintaining context and continuity across multiple exchanges. The application displays property results and associated images in a user-friendly format, while enabling natural conversations that allow users to ask questions, request clarifications, and navigate through property data seamlessly through dialogue rather than traditional form-based interfaces.

Elasticsearch search templates

Elasticsearch search templating defines reusable query structures for different search scenarios, mapping detected entities from user input to specific search parameters. The system generates real-time Elasticsearch queries that handle boolean logic, combining filters for price, features, and location while supporting geo-spatial queries for radius-based searches and distance calculations.

Elastic Python MCP server

The MCP server orchestrates communication between all system components, managing tool calls and responses while handling API integrations with Elasticsearch and the Google Maps API. It processes search results from these services and formats the responses appropriately for display in the user interface, serving as the central coordination layer that ensures seamless data flow throughout the application.

Data flow architecture

In this example, intelligent parsing, semantic understanding, and modular orchestration come together to deliver precise and user-friendly search experiences. All natural language queries submitted by a user undergo several stages of processing to generate a meaningful response, as illustrated below:

Queries are processed by the MCP server to extract structured parameters using a search template and an LLM. The get_properties_template_params function determines which fields (e.g., number of bedrooms, bathrooms, location) are available as filters. The LLM semantically interprets the user query to detect matching candidates—for example, understanding "2 bed", "two bedrooms", or "needs a couple of rooms" as a filter for number_of_bedrooms = 2.

If a location is present in the query, the system calls a geocoding service (in this case, the Google Maps API) to convert it into coordinates. The fully constructed query, enriched with structured parameters and geo-points, is then executed against Elasticsearch. This is a combination of the original query—executed as a semantic search—with filtering against the parameters passed in to reduce the possible results to search through. The retrieved results are then formatted and returned to the user.

Real-world example

Consider this example query: "Find a home within 10 miles of Miami, Florida that has 2 bedrooms, 2 bathrooms, central air, and tile floors, with a budget up to $300,000." Let's examine the processing of this query within the system.

Entity detection and mapping

The MCP application provides several tools leveraged by an LLM to identify and validate key entities:

1. Location detection and geocoding:

  • Validates the existence of locations (e.g., Miami, Florida).
  • Transforms distance measurements (e.g., "10 miles") into searchable radii.
  • Translates locations into Elasticsearch geo-point coordinates.

2. Property requirements:

  • Verifies amenities (e.g., "pool") for searchability.
  • Normalizes terms like "bed" to "bedrooms" and extracts numerical values.
  • Normalizes terms like "bath" to "bathrooms" and extracts numerical values.
  • Converts abbreviated price formats (e.g., "300k") to numerical values (e.g., 300,000).

Intelligent dynamic query generation

A user begins an interactive session with the application with the specific goal of locating a new home. This interaction utilizes natural language, allowing the user to express their preferences, requirements, and questions in a conversational manner, just as they would with another person. This dialogue format facilitates a dynamic and responsive search process, where the chatbot can clarify ambiguous queries, offer suggestions based on the provided information, and refine results as the conversation progresses. The aim is to leverage the capabilities of MCP to streamline and simplify the home search experience, making it more intuitive and efficient for the user.

MCP is the core of intelligent search, processing queries by analyzing their characteristics and matching them to search templates. It integrates with the Google Maps API to geocode locations within queries, improving search relevancy. All MCP processes are logged and displayed in real-time, providing a transparent audit trail for users. This sophistication transforms search inputs into refined, contextually rich retrieval operations, making MCP vital for intelligent search.

Methods Used

  • geocode_location :
    • If a geographical query is detected (e.g., "10 miles from Miami, Florida"), it calls the Google API.
  • search_template :
    • Initiates a search query using Elasticsearch templates once query parameters match available search template inputs.

Property search template

Using identified query parameters, MCP initiates a call to the Elasticsearch search template. For instance, the user inquiry, "Home within 10 miles of Miami, Florida with 2 beds 2 baths with central air and tile floors, up to 300k," is transformed into a specific invocation.

The template produces the query as follows:

Notice that the “should” clause with “match” uses the “or” operator. With "operator": "or" and "minimum_should_match": 1, the query will:

  1. Match documents with ANY of the features ("central air" OR "tile floors" OR both).
  2. Score documents with MORE matching features higher due to Elasticsearch's natural TF-IDF scoring.

Documents with partial matches qualify, but those with all features get better relevance scores.

With the data sent back to the LLM, the LLM may further enhance the natural language interaction with optional features, such as projecting the properties on a custom map site, as seen below.

Conclusion

The "MCP for Intelligent Search" system improves LLM generative capabilities by integrating Elasticsearch. It uses an intelligent query layer and an Elastic Python MCP server. LLMs handle entity detection, while Elasticsearch search templates generate dynamic queries.

With a property search example, we demonstrated how natural language queries are processed, highlighting the collaboration between LLMs and Elasticsearch for efficient data retrieval and response. Practical setup details and the system architecture were provided, giving you a clear foundation for implementation.

Resources

Elastic Python MCP Server by Sunile Manjee

The current state of MCP (Model Context Protocol)

Fastmcp documentation

The Road to MCP video playlist

Building an MCP server with Elasticsearch for real health data

Search Template Documentation

How to Build a RAG Ready MCP Server with Elasticsearch Integration

Related content

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself