Want to get Elastic certified? Find out when the next elasticsearch Engineer training is running!
elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.
elasticsearch Query Language (ES|QL) is a new instruction language in pipes aimed at allowing users to link different operations in a step-by-step fashion. It’s a language optimized for data analysis, besides working in a new architecture designed to analyze large data volumes with high efficiency.
You can learn more about ES|QL in this article and the documentation.
ES|QL queries allow you to build the response in different formats, such as JSON, CSV, TSV, YAML, Arrow, and binary. Starting in elasticsearch 8.16, the Node.js client includes helpers to handle some of these formats.
This article will cover the newest helpers, toArrowReader and toArrowTable
, which support Apache Arrow specifically in the elasticsearch Node.js client. For more on helpers, check out this article.
What is Apache Arrow?
Apache Arrow is a columnar data analysis tool that uses an agnostic format across the programming language of modern environments.
One of the primary benefits of the Arrow format is that its binary, columnar format is optimized for very fast reads, enabling high-performance analytics calculations.
Read more about how to leverage Arrow with ES|QL in this article.
ES|QL Apache Arrow helpers
For the examples, we are going to use Elastic’s Web logs sample dataset. You can ingest it by following this documentation.
elasticsearch client
Set up the elasticsearch client by specifying your elasticsearch endpoint URL and API Key.
toArrowReader
The toArrowReader
helper is provided to optimize memory by not loading the entire result set into memory at once, but rather by streaming it in batches. This makes it possible to perform calculations on very large data sets without exhausting your system's memory.
This helper allows you to process each row:
toArrowTable
We can use toArrowTable
if we want to load all the results into an Arrow table object once the request is completed, instead of returning each row as a stream.
This helper is useful if your dataset will easily fit in memory and you still want to leverage Arrow’s zero-copy reads and compact transfer size while keeping the code simple.
toArrowTable
is also a good option if the application is already working with Arrow data, since you don’t need to serialize the data. In addition, given that Arrow is language-agnostic, you can use it regardless of the platform and language.
Conclusion
The Apache Arrow helpers provided by the elasticsearch Node.js client help facilitate day-to-day tasks like analyzing large data sets efficiently and receiving elasticsearch responses in a compact and language-agnostic format.
In this article, we learned how to use the ES|QL client helpers to parse the elasticsearch response as an Arrow Reader or an Arrow Table.