This is a cache of https://www.elastic.co/search-labs/blog/big-query-data-ingestion. It is a snapshot of the page at 2025-03-12T00:37:57.518+0000.
Ingesting data with BigQuery - Elasticsearch Labs

Ingesting data with BigQuery

Learn how to index and search google BigQuery data in Elasticsearch using Python.

BigQuery is a google platform that allows you to centralize data from their different sources and services into one repository. It also enables you to do data analysis and use GenAI and ML tools. Below are the ways to bring data into BigQuery:

Indexing data from all of these sources into Elasticsearch allows you to centralize your data sources for a better observability experience.

In this article, you'll learn how to index data from BigQuery into Elasticsearch using Python, enabling you to unify data from different systems for search and analysis.

You can use the example from this article in this google Colab notebook.

Steps

  1. Prepare BigQuery
  2. Configure the BigQuery Python client
  3. Index data to Elasticsearch
  4. Search data

Prepare BigQuery

To use BigQuery, you need to access google Cloud Console and create a project. Once done, you'll be redirected to this view:

BigQuery allows you to transfer data from google Drive and google Cloud Storage, and to upload local files. To upload data to BigQuery you must first create a dataset. Create one and name it "server-logs" so we can upload some files.

For this article, we'll upload a local dataset that includes different types of articles. Check BigQuery’s official documentation to learn how to upload local files.

Dataset

The file we will upload to BigQuery has data from a server log with HTTP responses and their descriptions in a ndjson format. The ndjson file includes these fields: ip_address, _timestamp, http_method, endpoint, status_code, response_time and status_code_description.

BigQuery will extract data from this file. Then, we'll consolidate it with Python and index it to Elasticsearch.

Create a file named logs.ndjson and populate it with the following:

We upload this file to the dataset we've just created (shown as "server_logs") and use "logs" as table name (shown as "table id").

Once you're done, your files should look like this:

Configure the BigQuery Python client

Below, we'll learn how to use the BigQuery Python client and google Colab to build an app.

1. Dependencies

First, we must install the following dependencies:

The google-cloud-bigquery dependency has the necessary tools to consume the BigQuery data, elasticsearch allows it to connect to Elastic and index the data, and getpass lets us enter sensitive variables without exposing them in the code. Let's import all the necessary dependencies:

We also need to declare other variables and initialize the Elasticsearch client for Python:

2. Authentication

To get the necessary credentials to use BigQuery, we'll use auth. Run the command line below and choose the same account you used to create the google Cloud project:

Now, let's see the data in BigQuery:

This should be the result you see:

With this simple code, we've extracted the data from BigQuery. We've stored it in the logs_data variable and can now use it with Elasticsearch.

Index data to Elasticsearch

We'll begin by defining the data structure from the Kibana Devtools console:

The match_only_text field is a variant of the text field type that saves disk space by not storing the metadata to calculate scores. We use it since logs are usually time-centric, i.e. the date is more important than the match quality in the text field. Queries that use a textfield are compatible with the ones that use a match_only_text field.

We'll index the files using the Elasticsearch_bulk api:

Search data

We can now run queries using the data from the bigquery-logs index.

For this example, we'll run a search using the error descriptions from the server in the (status_code_description field). In addition, we'll sort them by date and get the IP addresses of the errors:

This is the result:

Conclusion

Tools like BigQuery, which help to centralize information, are very useful for data management. In addition to search, using BigQuery with Elasticsearch allows you to leverage the power of ML and data analysis to detect or analyze issues in a simpler and faster way.

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

Related content

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself