Elastic Open Web Crawler

An intelligent, intuitive indexing tool

The fastest way to index web content into Elasticsearch on serverless, in the cloud, or on-prem

Video thumbnail

Start crawling now!

Set up and and deploy a crawler for your web content with a terminal and Elasticsearch.

  • Run Docker image

    Deploy web crawler code on your own infrastructure by running from Source or Docker.

  • Set url for crawl

    Set one or more url you want to crawl.

  • Configure and connect

    Configure your crawler and connect it to Elasticsearch.

Elasticsearch — the most widely deployed vector database

Copy to try locally in two minutes

curl -fsSL https://elastic.co/start-local | sh
Read docs
OR

Take control with open code

Customize Elastic Open Web Crawler (Open Crawler) to fit your needs. Inspect, modify, and contribute to your project while handling large documents, running transformations, and retrieving data in your desired format.

Flexible and fast: The Open Crawler advantage

Benefit from index naming without limitations and the ability to use custom mappings before crawling. Boost performance by bulk indexing crawl results into Elasticsearch instead of one web page at a time.

Manage deployments with ease

Manage your open web crawler programmatically with simple CLI commands. Scale deployments easily with Terraform or Puppet — and spin up or down as needed. Eliminate unnecessary dependencies for simplified management. Deploy it anywhere, including serverless environments, and connect easily with small, simple tools.

  • Simple

    Open code

    Work with a fully transparent and modifiable codebase on GitHub.

  • CUSTOMIZABLE

    Crawl on your terms

    Get precise with xPath selectors and CSS selectors to refine exactly what you need off of your pages.

  • THOROUGH

    Extract all data — including PDFs

    Through binary content extraction, all required data types can be extracted and pulled in and turned into searchable content.

  • SEARCHABLE

    Easily integrate to power hybrid, conversational search experiences.