This is a cache of https://www.elastic.co/docs/reference/elasticsearch-hadoop. It is a snapshot of the page as it appeared on 2025-11-19T03:06:17.916+0000.
<strong>elasticsearch</strong> for Apache Hadoop | <strong>elasticsearch</strong> for Apache Hadoop
Loading

elasticsearch for Apache Hadoop

Serverless Unavailable Stack

elasticsearch for Apache Hadoop is an umbrella project consisting of two similar, yet independent sub-projects: elasticsearch-hadoop and repository-hdfs. This documentation pertains to elasticsearch-hadoop. For information about repository-hdfs and using HDFS as a back-end repository for doing snapshot or restore from or to elasticsearch, go to Hadoop HDFS repository plugin.

elasticsearch for Apache Hadoop is an open-source, stand-alone, self-contained, small library that allows Hadoop jobs (whether using Map/Reduce or libraries built upon it such as Hive or new upcoming libraries like Apache Spark ) to interact with elasticsearch. One can think of it as a connector that allows data to flow bi-directionaly so that applications can leverage transparently the elasticsearch engine capabilities to significantly enrich their capabilities and increase the performance.

elasticsearch for Apache Hadoop offers first-class support for vanilla Map/Reduce and Hive so that using elasticsearch is literally like using resources within the Hadoop cluster. As such, elasticsearch for Apache Hadoop is a passive component, allowing Hadoop jobs to use it as a library and interact with elasticsearch through elasticsearch for Apache Hadoop APIs.

While the official name of the project is elasticsearch for Apache Hadoop throughout the documentation the term elasticsearch-hadoop will be used instead to increase readability.

Admonition

This document assumes the reader already has a basic familiarity with elasticsearch and Hadoop concepts. For more information, refer to elasticsearch for Apache Hadoop resources.