The ultimate guide to Elasticsearch

Elasticsearch has undergone a remarkable transformation from a simple keyword search engine to a sophisticated AI-powered search platform that combines traditional lexical search with modern vector-based techniques. This evolution has positioned Elasticsearch as a comprehensive retrieval platform capable of handling diverse data types including structured, unstructured, and vector data in real-time. Today's developers need to understand not just basic indexing and querying, but also how to leverage advanced features like language analyzers, dense vectors, and hybrid search to build cutting-edge applications.

The power of Elasticsearch lies in its ability to bridge the gap between traditional keyword search (Best Match 25, or BM25) and semantic vector search (AI-powered), creating a robust foundation for modern search experiences. In this comprehensive guide, we'll explore the practical aspects of implementing and optimizing Elasticsearch for contemporary use cases, with special emphasis on indexing strategies, analyzer configuration, and the powerful combination of lexical and semantic search known as hybrid search.

Text analysis in depth

At the heart of Elasticsearch's text processing capabilities lie analyzers, components that determine how text is split into tokens, normalized, and indexed. An analyzer consists of three main components:

A tokenizer that splits text into tokens
Zero or more token filters that modify these tokens
Character filters that preprocess text before tokenization.

Learn more about the text analysis components in the Elastic docs.

Built-in analyzers

Elasticsearch provides a rich set of built-in analyzers:

Standard Analyzer. The Standard Analyzer is Elasticsearch's default and most commonly used analyzer. It's a great all-purpose choice because it balances speed and accuracy. It performs two main operations:
- Tokenization: It breaks a block of text into individual tokens (words) based on word boundaries, which are defined by grammar-based rules. It's intelligent enough to handle things like punctuation, hyphens, and email addresses correctly.
- Lowercasing: It converts all tokens to lowercase. This step is crucial for search, as it ensures that a query for "Apple" will match "apple" in the document and vice versa.
Simple Analyzer. The Simple Analyzer is a very basic analyzer that's useful when you need fast, simple tokenization without any frills. It operates on a simple rule: it splits text into tokens wherever it encounters a character that is not a letter. It also lowercases the tokens. For example, the text “Quick Brown Fox!” would be tokenized into quick, brown, and fox. The exclamation mark is a non-letter, so it serves as a delimiter.
Whitespace Analyzer. The Whitespace Analyzer is the most straightforward of the bunch. It performs only one task: it splits a text into tokens only on whitespace characters (like spaces, tabs, and newlines). It doesn't perform any lowercasing or punctuation removal. This analyzer is ideal for fields where you need to preserve the exact case and punctuation of the original text. For example, it's useful for parsing log files or product codes where case sensitivity matters. The text “SKU-123-A” would be tokenized into SKU-123-A as a single token.
Stop Analyzer. The Stop Analyzer is an extension of the Standard Analyzer, but with an added feature: it removes stopwords. Stopwords are common words that are often filtered out of search queries because they don't add much meaning and can bloat the index size. Examples include words like "the," "a," "is," and "and." It first tokenizes and lowercases the text (just like the Standard Analyzer), and then it removes any tokens that are on a pre-defined list of stopwords. This analyzer is useful for improving search efficiency and reducing index size, particularly for large text fields where common words are plentiful. For example, the text “The quick brown fox is fast.” would be tokenized into quick, brown, fox, and fast. The words “the,” “and,” and “is” are removed.

Custom analyzers and plugins

Elasticsearch's built-in analyzers are great, but they are not a one-size-fits-all solution. For many use cases, you need more control over how your text is processed. This is where custom analyzers come in. A custom analyzer is a combination of up to three building blocks:

Character Filters. These are applied first. They modify the original string by adding, removing, or changing characters. A common use case is to clean up HTML tags or map specific characters.
Tokenizer. This is the core component. It breaks the text into a stream of tokens (words). You can choose from a variety of tokenizers that work differently, for example, splitting on whitespace, punctuation, or based on grammar rules.
Token Filters. These are applied last. They take the stream of tokens from the tokenizer and modify them. This is where you can perform a variety of operations like lowercasing, stemming (reducing words to their root form), or removing stopwords.

By combining these components, you can create a highly tailored analysis pipeline that precisely matches your data's requirements.

A great example of a custom analyzer is building an autocomplete feature. For "search-as-you-type" functionality, you can't rely on a standard analyzer. Instead, you need a custom one that breaks a word into smaller parts, or "prefixes," which can be matched as a user types.

Consider the following example of an edge_ngram analyzer for autocomplete:

PUT /autocomplete 
{ 
  "settings": { 
    "analysis": { 
      "analyzer": { 
        "autocomplete_analyzer": { 
          "tokenizer": "edge_ngram", 
          "filter": ["lowercase"] 
        } 
      }, 
      "tokenizer": { 
        "edge_ngram": { 
          "type": "edge_ngram", 
          "min_gram": 2, 
          "max_gram": 20, 
          "token_chars": ["letter", "digit"] 
        } 
      } 
    } 
  }, 
  "mappings": { 
    "properties": { 
      "suggest": { 
        "type": "text", 
        "analyzer": "autocomplete_analyzer" 
      } 
    } 
  } 
}

In this setup, we've created a custom analyzer named autocomplete_analyzer. The magic happens with the edge_ngram tokenizer, which breaks a word like "search" into “se,” “sea,” “sear,” “searc,” and “search.” Then, a lowercase filter ensures case-insensitivity. Finally, this custom analyzer is applied to our suggest field in the index mapping, ensuring that any text we add to this field is automatically processed and ready for instant autocomplete suggestions. This is a powerful demonstration of how combining core Elasticsearch components allows you to build sophisticated search features.

Learn more about custom analyzers in the Elastic docs.

Language-specific plugins

For languages that don't fit the standard tokenization model, such as those that don't use spaces between words, Elasticsearch provides specialized plugins. These plugins contain pre-configured analyzers designed to handle the specific linguistic complexities of those languages.

Japanese (analysis-kuromoji): This plugin uses a Japanese morphological analyzer to break down sentences into meaningful words, which is essential because written Japanese doesn't use spaces between words. It also handles katakana, hiragana, kanji, and romaji.
Chinese (analysis-smartcn): This plugin provides a smart tokenizer for Chinese, which also lacks word boundaries. It uses a dictionary-based approach to segment text into words and also supports both simplified and traditional Chinese characters.
Korean (analysis-nori): This plugin is a Korean morphological analyzer that breaks down words into their base forms and parts of speech. Like Japanese and Chinese, Korean often doesn't have spaces between words, making this kind of analysis critical for accurate search results.
Phonetic Matching (analysis-phonetic): This plugin is used for phonetic matching, meaning it helps find words that sound similar even if they are spelled differently. It does this by converting tokens into a phonetic representation (like a soundex code), which is then stored in the index. This is useful for searching for names or terms where misspellings are common. For example, a search for "Smith" could also find "Smyth."

These plugins are not bundled with Elasticsearch by default because they add significant size to the installation and are not needed by every user. To use them, you must install them on every node in your cluster using the bin/elasticsearch-plugin install command. Once installed, you can configure your index to use the new analyzers provided by the plugin.

Vector search essentials

Vector search, also known as semantic search, represents a fundamental hift from traditional lexical matching. Instead of matching exact words, it finds content with similar meaning by representing data as dense vectors in a high-dimensional space. These vectors, called embeddings, are generated by machine learning models that capture semantic relationships between pieces of content.

Modern embedding models can represent various types of content as vectors, including text, images, events and more. Each dimension in the vector represents a feature or characteristic of the content, allowing similarity to be calculated based on semantic meaning rather than surface-level characteristics.

Approximate Nearest Neighbor (ANN) Search

Searching through millions of vectors to find the closest one is computationally intensive. To solve this, vector search uses Approximate Nearest Neighbor (ANN) algorithms. These algorithms sacrifice perfect accuracy for a huge boost in performance.

Elasticsearch's chosen ANN algorithm is Hierarchical Navigable Small World (HNSW). HNSW works by building a graph of your vectors where each vector (or node) is connected to its nearest neighbors. When you run a query, the search starts at a random entry point in the graph and "hops" from node to node, getting closer to the target vector with each step. This process quickly navigates to the most relevant results without having to scan every single vector.

For example, this JSON creates a vector field using HNSW parameters:

PUT /semantic-search
{
  "mappings": {
    "properties": {
      "embedding": {
        "type": "dense_vector",
        "dims": 768,
        "index": true,
        "similarity": "cosine",
        "index_options": {
          "type": "hnsw",
          "m": 16,
          "ef_construction": 128
        }
      }
    }
  }
}

This example shows how to configure an index for vector search using the dense_vector data type. The dims: 768 parameter specifies the vector's size (its dimensionality), which is determined by the machine learning model used to create the embeddings. The similarity: "cosine" parameter tells Elasticsearch how to measure the "distance" between vectors, with a smaller angle between two vectors indicating higher similarity. It also demonstrates how to enable HNSW and configure its parameters:

m: This parameter controls how many connections each node has. Higher values create more connections, leading to a denser graph. This improves search accuracy but increases the time and memory needed to build the index.
ef_construction: This parameter relates to the accuracy of the graph-building process. Higher values mean the graph is built more accurately, improving search quality at the cost of slower indexing.
num_candidates: At query time, this parameter specifies how many candidates to consider before refining to the top k results

Implementing Hybrid Search

Hybrid search combines traditional full-text search with AI-powered semantic search, creating more powerful search experiences that serve a wider range of user needs. This approach is particularly effective because:

Lexical search excels when users know the exact words they're looking for.
Semantic search shines when users search for concepts or ideas not explicitly defined in documents.
Hybrid search gives you the best of both worlds by blending precision with contextual understanding.

Learn more about hybrid search in this Elastic article and this Elastic search labs blog.

Reciprocal Rank Fusion (RRF)

Elasticsearch makes hybrid search easier with Reciprocal Rank Fusion (RRF), which is a rank aggregation method that merges rankings from multiple retrievers (such as BM25 and semantic).

In the following example, both a match query (lexical) and a semantic query are run. RRF fuses the results so that documents ranking high in either search are surfaced to the user.

GET semantic-embeddings/_search
{
  "retriever": {
    "rrf": {
      "retrievers": [
        {
          "standard": {
            "query": {
              "match": {
                "content": "How to avoid muscle soreness while running?"
              }
            }
          }
        },
        {
          "standard": {
            "query": {
              "semantic": {
                "field": "semantic_text",
                "query": "How to avoid muscle soreness while running?"
              }
            }
          }
        }
      ]
    }
  }
}

Custom weighting strategies

For more control over how lexical and semantic scores contribute to the final ranking, you can implement custom weighting strategies using script_score queries.

In the following example, 40% weight is given to the keyword relevance (_score) and 60% weight goes to vector similarity. This custom weighting lets you fine-tune hybrid ranking for your domain.

GET /products/_search
{
  "query": {
    "script_score": {
      "query": {
        "match": { "name": "gaming mouse" }
      },
      "script": {
        "source": "0.4 * _score + 0.6 * cosineSimilarity(params.query_vector, 'description_vector')",
        "params": {
          "query_vector": [0.11, -0.02, 0.77, ...]
        }
      }
    }
  }
}

Script scoring with boost modes

Another approach combines scores using function_score queries with various boost modes.

The following example uses boost_mode SUM to combine BM25 and vector scores.

GET /products/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": { "name": "gaming mouse" }
      },
      "script_score": {
        "script": {
          "source": "cosineSimilarity(params.query_vector,'embedding') + 1.0",
          "params": { "query_vector": [0.22, -0.13, 0.55, ...] }
        }
      },
      "boost_mode": "sum"
    }
  }
}

Common boost_mode options include:

sum: Adds the scores together.
multiply: Multiplies the scores.
replace: Replaces the query score with the script score.
max/min: Takes the maximum or minimum of the scores.

Optimizing search relevance

Improving search relevance is a key goal for any search application and Elasticsearch provides several tools for fine-tuning how documents are scored. While vector and hybrid search focus on semantic meaning, these strategies give you more control over the traditional keyword-based scoring.

Field boosting: You can tell Elasticsearch that matches in a certain field are more important than others. For example, a keyword match in a product's title might be more relevant than a match in its description. You do this by adding a boost value to the field in your query. A boost of 2.0, as shown in the example, means that any document matching "mouse" in the title field will get twice the normal score for that term.
Field weighting in multi_match: This is a powerful feature for queries that search across multiple fields at once. Instead of a simple boost, you can apply a weight to each field directly. In the example, the name field has a weight of ^3, meaning a match in the product's name is three times more important than a match in the description. This gives you precise control over which fields contribute most to the final relevance score.
Minimum should match: This setting helps you manage the strictness of your queries. When a user enters multiple words, Elasticsearch by default returns documents that contain any of those words. By using minimum_should_match, you can enforce that a certain number or percentage of the user's terms must be present for a document to be considered a match. For example, setting it to 75% on a two-word query like "gaming mouse" would require at least one word to be present, while on a four-word query, it would require at least three words to be present.
Rescoring: This is an advanced technique that's perfect for hybrid search. A typical search might return thousands of results, but a user only cares about the top few. Rescoring works by first running a fast query (like a traditional BM25 search) to find the top N results. Then, a second, more computationally intensive scoring method,such as vector similarity, is applied only to those top N results to re-rank them. This gives you the speed of a fast query combined with the high accuracy of a more complex one, without the performance hit of applying the complex scoring to the entire dataset.

Performance and evaluation

When working with vector or hybrid search in Elasticsearch, performance tuning is critical to ensure queries remain responsive at scale.

Query optimization. Use filters instead of queries for simple yes/no conditions (for example, status: active). Filters don’t compute relevance scores, making them much faster.
Index partitioning. Split large datasets into multiple indices (for example , by time range or category) and use aliases to query across them. This reduces search space and improves speed.
Caching strategies. Elasticsearch automatically caches frequent filters and query results. Designing queries to take advantage of caching (for example , stable filters) can yield big performance wins.
Hardware allocation. Vector search, especially HNSW graphs, is memory-intensive. Ensure sufficient RAM and allocate enough file system cache for fast graph traversal. SSDs are strongly recommended.

Once your search system is live, measuring search quality is as important as measuring speed. Elasticsearch provides tools for both.

Ranking Evaluation API. Automates query testing by checking where relevant documents appear in results.
Mean Reciprocal Rank (MRR). Prioritizes returning relevant results at the very top. A document at position 1 scores 1.0, at position 2 scores 0.5 and so on.
Mean Average Precision (MAP). Evaluates how well relevant documents are distributed across the result set.
Normalized Discounted Cumulative Gain (NDCG). Weighs highly ranked relevant documents more heavily, rewarding systems that surface useful results earlier.

Bulk operations for efficiency

Elasticsearch's Bulk API allows you to perform multiple index, create, delete, and update operations in a single request, significantly reducing overhead and increasing indexing speed. Performing multiple operations at once is particularly valuable when dealing with large datasets or regular updates.

The Bulk API uses a Newline Delimited JSON (NDJSON) structure where each operation is specified with two lines: one for action and metadata and an optional second line for the source document for index and create operations.

You can use curl to implement a bulk update operation:

curl -X POST -H "Content-Type: application/x-ndjson" http://localhost:9200/_bulk --data-binary '
{ "update": { "_index": "myindex", "_id": "1" } }
{ "doc": { "field": "new value" } }
{ "update": { "_index": "myindex", "_id": "2" } }
{ "doc": { "field": "new value" } }
'

To update multiple documents with specific conditions, you can use update by query.. For example, you can add a new alias to all documents matching a specific tag:

POST /myindex/mytype/_update_by_query
{
  "script": {
    "inline": "ctx._source.aliases += new_alias",
    "params": {
      "new_alias": "new_value"
    }
  },
  "query": {
    "term": {
      "tag": "existing_tag"
    }
  }
}

Best practices for bulk operations

Consider these best practices when implementing bulk operations:

Batch size: Experiment with different batch sizes to find the optimal setting for your workload.
HTTP limits: Remember that Elasticsearch limits HTTP request size to 100MB by default.
Error handling: Check responses for individual operation failures within bulk requests.
Client helpers: Use official client helpers for bulk operations in various programming languages.

Practical implementation guide using Python

In this section, we’ll get hands-on with Elasticsearch using Python. To simplify development, Elasticsearch offers a dedicated Python SDK that streamlines tasks such as connecting to your cluster, creating indices, indexing documents, performing hybrid and regex searches, and ingesting data with embeddings. Each example builds on earlier concepts, showing how to translate theory into working code.

Set up a secure connection

The first step is establishing a secure connection to your Elasticsearch cluster. Here’s how you can connect using environment variables for credentials:

from elasticsearch import Elasticsearch import os 
ES_HOST = os.getenv('ES_HOST_Address')  
ES_USER = os.getenv('ES_USER_Name')  
ES_PASS = os.getenv('ES_PASS') 
es = Elasticsearch( [ES_HOST], basic_auth=(ES_USER, ES_PASS), verify_certs=False, ssl_show_warn=False ) 
print("Connected:", es.ping())
print("Cluster health:", es.cluster.health())
print("Indices:", es.indices.get_alias(index="*"))

This code securely connects to your cluster, checks if the connection works, prints the cluster’s health status and lists available indices.

Create an index with a custom analyzer

Next, we define an index with both custom analyzers (for text processing) and a vector field (for semantic search).

index_config = {
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 1,
        "analysis": {
            "analyzer": {
                "custom_english": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": ["lowercase", "stop", "porter_stem"]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "name": {"type": "text", "analyzer": "custom_english"},
            "category": {"type": "keyword"},
            "price": {"type": "float"},
            "created_at": {"type": "date"},
            "description_vector": {
                "type": "dense_vector",
                "dims": 768,
                "index": True,
                "similarity": "cosine",
                "index_options": {"type": "hnsw", "m": 16, "ef_construction": 128}
            }
        }
    }
}

es.indices.create(index="products", body=index_config, ignore=400)

In this code, we’re combining the best of both worlds: a keyword field (category) for exact matches, a custom analyzer for full-text search on name, and a dense vector field for semantic similarity searches.

Index some documents

With the index ready, let’s add some documents:

docs = [
    {"name": "Gaming Keyboard", "category": "electronics", "price": 59.99, "created_at": "2025-09-10"},
    {"name": "Office Chair", "category": "furniture", "price": 120.00, "created_at": "2025-09-12"},
    {"name": "Wireless Mouse", "category": "electronics", "price": 19.99, "created_at": "2025-09-15",
     "description_vector": [0.12, -0.04, 0.87, ...]}
]

for i, doc in enumerate(docs, 1):
    es.index(index="products", id=i, body=doc)

This snippet shows how to index structured documents, including a vector field for semantic search. Think of it as loading your product catalog into Elasticsearch.

Implement hybrid search

Now let’s run a hybrid search that combines keyword relevance with vector similarity.

query_vector = [0.11, -0.02, 0.77, ...]

query = {
    "query": {
        "script_score": {
            "query": {"match": {"name": "gaming mouse"}},
            "script": {
                "source": "0.4 * _score + 0.6 * cosineSimilarity(params.query_vector, 'description_vector')",
                "params": {"query_vector": query_vector}
            }
        }
    }
}

resp = es.search(index="products", body=query)
for hit in resp["hits"]["hits"]:
    print(hit["_score"], hit["_source"]["name"])

In this code, BM25 text relevance (match on name) gets 40% weight and semantic similarity on the description_vector gets 60%. The output shows a balanced ranking.

Implement regex search for Japanese content

Sometimes, you need more specialized queries—like finding documents containing Japanese characters using regex.

def search_japanese_with_regex(es_client, index_name, field_name):
    japanese_regex = "[\u4E00-\u9FFF]+"

    query_body = {
        "query": {
            "regexp": {
                field_name: { "value": japanese_regex }
            }
        },
        "size": 5
    }

    response = es_client.search(index=index_name, body=query_body)
    print(f"Found {response['hits']['total']['value']} documents with Japanese text.")
    for i, hit in enumerate(response['hits']['hits'], 1):
        print(f"\n--- Result {i} ---")
        print(f"ID: {hit['_id']}")
        print(f"Content: {hit['_source'][field_name]}")

This example demonstrates how regex queries can be applied for language-specific searches. Here, the Unicode range matches Japanese Kanji characters.

Implement an ingestion pipeline with embeddings

Finally, let’s walk through a full ingestion pipeline: loading data from CSV, generating embeddings with a transformer model, and then bulk indexing them.

from sentence_transformers import SentenceTransformer
from elasticsearch.helpers import bulk
import pandas as pd

INDEX_NAME = 'my_generic_vector_index'
CSV_FILE_PATH = 'your_data.csv'

embedding_model = SentenceTransformer('intfloat/multilingual-e5-small')
df = pd.read_csv(CSV_FILE_PATH)

actions = []
for idx, row in df.iterrows():
    text_content = str(row['your_text_column'])
    vector_embedding = embedding_model.encode(text_content).tolist()

    doc = {
        "_index": INDEX_NAME,
        "_id": f"doc_{idx}",
        "_source": {
            "text_content": text_content,
            "embedding": vector_embedding,
            "metadata_field_1": str(row['metadata_column_1']),
            "metadata_field_2": int(row['metadata_column_2'])
        }
    }
    actions.append(doc)

success, _ = bulk(es_client, actions, request_timeout=120)
print(f"Successfully ingested {success} records into {INDEX_NAME}.")

This example shows how to bring it all together: load your dataset, generate embeddings using a SentenceTransformer and index documents efficiently with Elasticsearch’s bulk API. This is the foundation for any production-grade semantic search system.

Conclusion

Through implementing Elasticsearch in various scenarios, several important lessons emerge:

Analyzer selection is critical: Multilingual search requires proper tokenization and language-specific processing.
Hybrid search requires tuning: Simply combining keyword and vector search isn't enough. The weighting and boost modes significantly impact relevance.
Vector parameters affect performance: HNSW parameters (m, ef_construction, num_candidates) dramatically impact both speed and accuracy.
Combination is powerful: Filters plus hybrid search deliver the best results for real-world applications.

Consider these recommendations as you implement Elasticsearch in your applications:

Start with clear requirements: Understand your use case before choosing between lexical, semantic, or hybrid search.
Implement iterative testing: Use A/B testing and relevance evaluation to refine your approach.
Monitor performance: Keep an eye on both latency and accuracy metrics as your data evolves.
Stay updated: Elasticsearch's vector capabilities are rapidly evolving, so you need to keep abreast of new features and best practices.

Elasticsearch has transformed from a simple search engine into a powerful AI-powered platform capable of handling diverse search requirements. By mastering indexing, analyzers and hybrid search techniques, developers can build sophisticated search experiences that combine the precision of keyword search with the contextual understanding of semantic search. As the platform continues to evolve, those who invest in understanding these capabilities will be well-positioned to create the next generation of search applications.

Next steps

Explore using Elasticsearch for RAG in these articles and tutorials: