Running cloud-native Elasticsearch with ECK

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

A fully managed service is a convenient way to run software, but depending on the use case, it might not always be the best approach. As an alternative, Kubernetes (K8s) brings cloud-native advantages to both on-premises and cloud-hosted container platforms.

Elastic, on the other hand, offers comprehensive tools for securing and monitoring cloud-native environments. In this blog, I would like to flip that perspective—considering Elastic as a distributed, cloud-native application, and abiding by DevOps best practices for its deployment.

This blog reflects my experience deploying a small Elastic cluster leveraging Elastic Cloud on Kubernetes (ECK) and the Google Kubernetes Engine (GKE). It is not intended to be an installation guide, but rather a way to share the insights learned along the process.

The technologies involved in this exercise include:

Terraform: Infrastructure-as-code to provision and manage cloud resources.
GKE: Container-as-a-service offering on Google Cloud Platform (GCP).
Kubernetes: De-facto container orchestrator for running containerized applications, in this case, our containerized Elastic Stack.
ECK: Elastic Cloud on Kubernetes, which extends the Kubernetes APIs with an operator and custom resources to run Elasticsearch as a cloud-native application.

The approach presented here is fully declarative. All manifests and files are available on GitHub.

1. Pre-requisites

To follow along, some tools and accounts need to be available beforehand, whose installation and setup are outside the scope of this blog. For the Terraform part:

Terraform, google-cloud-sdk (gcloud) must be installed on the local machine.
A GCP account and, ideally, a Service Account—acting as a technical user—with at least the necessary permissions to create a VPC, provision a K8s cluster, spin up VMs on Compute Engine, and create a DNS record.

Note: I recommend following the principle of least privilege when assigning the GCP permissions. Whenever possible, avoid assigning broader roles and instead grant only the capabilities that are strictly necessary.

It does not necessarily have to be a GKE cluster. The instructions in the Kubernetes section work for any other Kubernetes infrastructure, whether on-prem or cloud-hosted. For this part, the only additional precondition is to have kubectl installed on the local machine.

2. Architecture overview

The diagram below illustrates the elements being deployed: Elasticsearch and Kibana, an Elastic Agent monitoring the same Kubernetes infrastructure everything is running on, and a Fleet Server centrally managing the Elastic Agent configuration.

All resources can be grouped into three categories:

The Cloud infrastructure (depicted in BLUE), deployed via Terraform: a K8s Cluster with its Node Pool and all necessary Networking components to make our Kibana reachable from the outside world.
The Elastic Resources (depicted in GREEN), deployed as Custom Resources via K8s manifests: Elasticsearch, Kibana, Elastic Agent and Fleet—with the corresponding Services and Persistent Volumes.
Additional K8s resources (depicted in RED): the Ingress Controller and the Certificate Manager, deployed via Terraform and Helm Charts, and the Cluster Issuer, deployed via K8s manifest.

3. Cloud infrastructure

The cloud infrastructure featured in this demo will be hosted on GCP, leveraging Google Kubernetes Engine (GKE) as the container platform.

Feel free to skip this section if you already have your K8s cluster up and running, or if you prefer to set up your own cluster locally or with any other cloud provider.

Terraform is an open-source infrastructure-as-code (IaC) tool used to define and provision infrastructure resources, including a Kubernetes cluster and networking resources.

The Terraform sample files are available in the GitHub repository. This repository contains a modular Terraform structure, where the main.tf in the main terraform directory defines which modules to apply. In this demo, just two of them are used:

GKE: Provisions the GKE cluster and related resources.
Helm: Deploys essential K8s utilities using Helm charts.

In the main.tf for the GKE module, the following cloud resources are defined:

A VPC (Virtual Private Cloud) network and a subnet.
A public static IP for exposing the Kibana service outside the K8s cluster.
A K8s cluster with a custom node pool, allowing us to specify which hardware to use, among other settings. For this reason, it’s important to remove the default node pool immediately upon creation.
A separately managed node pool with three worker nodes, so we can distribute our Elastic Cluster nicely across three different availability zones. Defining a region with node_count = 1 will deploy one node on each available zone in the region.
A DNS record of type A, which will point to the public IP address.

Through the main.tf for the Helm module, the following applications are installed:

Ingress Nginx Controller : For managing SSL termination and routing external requests to the Kibana service.
Certificate Manager: To issue trusted certificates, signed by Mozilla's Let's Encrypt certificate authority, for the Kibana endpoint.
Kube-state-metrics: A utility to expose telemetry data from the GKE cluster.

An interesting aspect of this setup is that some of the outputs from the GKE module are passed directly as input parameters to the Helm module. For example, the new public IP address is used as an argument for the Ingress Nginx Controller created just afterwards. This is why the installation of these resources is managed directly by Terraform, rather than including them in the K8s manifests deployed later.

Note: this configuration pins known stable versions (ingress-nginx v1.10.1 and cert-manager v1.13.3) to avoid breaking changes introduced in later releases. For productive setups, it’s recommended to always check for newer versions.

Before running Terraform to create the resources, the required parameters defined in the variables.tf file, located in the terraform directory, must be provided—either by creating a var file (e.g., myvars.tfvars) to include them, or by passing each parameter directly as arguments when invoking the terraform apply command.

Once the variables are declared, still in the terraform folder, initialize Terraform:

Then apply the configuration:

Confirm with “yes” when prompted and go grab a coffee—this will take a few minutes, especially the creation of the node pool.

Once Terraform completes successfully, initialize the Google Cloud SDK:

Then configure gcloud to use the credentials of the new Kubernetes cluster, enabling kubectl access:

This command works because gke_cluster_name and gke_cluster_region are defined as Terraform outputs.

After this, the kubectl command-line tool is configured to interact with the new, shining GKE cluster:

The GCP console should look similar to this:

4. Kubernetes

Kubernetes: the open-source container orchestration system used to automate the deployment, scaling and management of containerized applications. The project is hosted by the Cloud Native Computing Foundation.

Interacting with the Kubernetes cluster requires the kubectl command-line interface tool, which can be used in two ways:

The imperative way, where you send a command to the K8s control plane telling it what you want to do.
The declarative way: you define a desired cluster state in a YAML file, which is called the manifest, and then the control plane takes care of performing all steps to reach this state.

Embracing a declarative approach is a best practice for a cloud-native mindset: it promotes idempotency, easier version control, and automation. Let’s put it into practice by exploring the cluster’s worker nodes:

As intended, the Kubernetes cluster consists of three worker nodes evenly distributed across the region’s three availability zones. This setup is perfect for running a small Elasticsearch cluster with three nodes.

5. Elastic Cloud on Kubernetes

At this stage, a running Kubernetes cluster is available and ready to host our cloud-native Elastic deployment. The recommended way to run Elastic as a containerized application is to use Elastic Cloud on Kubernetes (ECK).

5.1. Custom Resource Definitions

Custom Resources are extensions of the Kubernetes API. The Custom Resource Definitions (CRDs) tell the K8s controller how to manage the Custom Resources as K8s objects.

Custom Resource Definitions (CRDs) come into play to manage the Elasticsearch components as standard K8s resources.

To install the CRDs, navigate to the k8s directory and apply the manifest using kubectl:

5.2. ECK Operator

Operators are software extensions to Kubernetes that use custom resources to manage applications and their components.

Built on the Kubernetes Operator pattern, ECK extends the basic Kubernetes orchestration capabilities to support the setup and management of Elasticsearch, Kibana, APM Server, Enterprise Search, Beats, Elastic Agent, Elastic Maps Server, and Logstash on Kubernetes.

To install the ECK operator, apply the manifest using kubectl, as done previously for the CRDs:

The elastic operator runs in the elastic-system namespace. We can take a look at the running pods in this namespace to see if our operator is up and running:

To see if this has worked out correctly, we can also take a look at the operator logs:

5.3. Deploying Elasticsearch

At this point, the ECK operator is running, and the CRDs are installed on the Kubernetes cluster. Now it’s time to deploy Elasticsearch and Kibana.

To keep things neat and organized, I personally like using Kustomize, a tool that lets you customize Kubernetes objects through a kustomization.yaml file.

The kustomization.yaml inside the k8s/elasticsearch/base directory includes a base Elastic installation and references the Elasticsearch and Kibana manifests.

Note: Manifests can be written from scratch, adapted from this example, or used simply as a reference. I recommend checking the official ECK documentation.

Let’s walk through the most relevant aspects of the elasticsearch.yaml manifest.

Object config (K8s):

A resource of type Elasticsearch in version 9.1.2 (the latest at the time of this writing) is specified. The type of the service is ClusterIP since the Elasticsearch API is not exposed to the outside.

NOTE: In a production environment, a top-level Kustomize overlay can be used to centrally override the spec.version argument.

Resource config (Elasticsearch):

A node set with three nodes is declared. Each node will have all roles (master, data, and ml), so there’s no need to specifically declare the node type.

Pod template config:

For each container, the resource requests and limits must be defined, as best practices for Kubernetes indicate. For this small use case, each Elasticsearch node is assigned 2 GB RAM and 1 virtual core (with a CPU limit of 2 to ease garbage collection).

For managing storage, we use Persistent Volumes. The cloud provider offers Storage Classes, and through a Persistent Volume Claim, a pod can request one. That’s exactly what we are doing here: each Elasticsearch node is claiming a volume with 2 GB capacity.

According to best practices, distributed apps should remain as stateless as possible. Kubernetes inherently embraces most of these principles, providing mechanisms to support them. But what about databases and data stores like Elasticsearch?

A datastore is stateful by definition. Kubernetes handles this through StatefulSets. ECK defines an Elasticsearch deployment as a StatefulSet to ensure an Elasticsearch node is always deployed on the same worker node, where local storage is attached.

Going beyond, ECK applies anti-affinity rules to avoid scheduling a primary shard and its replica on the same worker node, ensuring data redundancy in case of a machine failure.

Note: Elastic also allows you to extend the declarative approach beyond cluster deployment through Elastic Stack Configuration Policies, enabling configuration-as-code for cluster settings, lifecycle management, ingest pipelines, secure settings, and more.

5.4. Deploying Kibana

The kibana.yaml manifest is structured in an analog manner. Let’s take a look at the relevant parts of the configuration.

Object config (K8s):

We specify a single instance of a resource of type kibana in version 9.1.2 (the latest at the time of this writing). Declaring the elasticsearchRef lets Kibana know which Elastic instance to connect to without having to specify the endpoint

The type of service is ClusterIP: the Kibana service is not exposed directly, but through the Ingress Controller, as explained below. Ingress is a Kubernetes mechanism that allows exposing HTTP(s) services. TLS is deactivated, since I prefer to do the SSL termination directly on the Ingress rather than at the application layer.

Pod template config:

Same as for Elasticsearch, the resource requests and limits are defined. For the single Kibana instance in this demo, 1 GB RAM and 1 CPU (virtual core) are enough.

Regarding persistence, Kibana is stateless: we don’t need to care about storage or about pinning the container to a determined worker node. Kibana is a Deployment.

Considering all the correct settings for the manifests takes time and effort; however, deploying Elasticsearch and Kibana from the existing manifests is as fast as this:

When listing the running pods, there should be three elasticsearch nodes—each of them running on a different working node—and a Kibana node:

Since Elasticsearch is a custom resource, it’s possible to list the “Elasticsearches”:

And also the “Kibanas”:

In this demo, external access is managed exclusively by the Ingress Controller, and Kibana is configured as the only internet-facing component. Cert-manager and Let’s Encrypt are used to automatically issue trusted TLS certificates for the Ingress Controller.

Note: Alternatively, it would be possible to rely on the Load Balancer resources provided by the cloud provider by setting the service type of Kibana (or Elasticsearch) to LoadBalancer.

The featured Ingress Controller is ingress-nginx, which was installed via Terraform in the first part of this blog. The Ingress resource configuration is defined in the ingress.yaml file inside the k8s/elasticsearch/ingress directory.

Note: the ingress.yaml.sample file in github contains a placeholder—it must be edited to point to the actual domain hosting kibana and must be renamed to ingress.yaml.

Apply the Ingress and Issuer manifests in the k8s/elasticsearch/ingress directory:

To get the default password for the default “elastic” user, run the following command:

After a minute, you should be able to access your Kibana instance via the browser:

As always, Kibana is the user-facing component tying everything together.

5.5. Fleet and the Elastic Agent

Ingesting data into Elastic normally requires an Elastic Agent (or Beats and/or Logstash, or all of them combined). In this demo, Kubernetes logs and metrics are ingested into Elasticsearch via the Kubernetes Integration. Fleet provides a central hub in Kibana for the agent configuration.

Note: As an alternative, the agent configuration could be defined directly in a ConfigMap instead of using Fleet.

To allow the Kubernetes integration to gather cluster data, kube-state-metrics must be installed. In this blog, it was installed via a Helm chart included in the Terraform Helm module when provisioning the cloud infrastructure.

Note: The Kubernetes integration is featured to illustrate an example of the configuration and deployment of the Elastic Agent. Kubernetes Observability is out of the scope of this particular blog.

The kustomization.yaml for Fleet and the Elastic Agent is a little bit different, introducing the concept of an overlay: on top of the running base installation, new resources are deployed and existing resources are patched.

Through the kibana-patch.yaml, the manifest of the running Kibana instance is modified to include the central configuration for the Elastic Agents:

The packages and agent policy configurations are included in the config section, along with the endpoints for the agents to reach Elasticsearch and the Fleet Server.

Regarding the elastic-agent.yaml manifest, let’s review some important aspects:

The agent is enrolled with the agent policy for the K8s integration, and the Fleet Server is referenced via fleetServerRef.

The Elastic Agent is deployed as a DaemonSet: for the Kubernetes integration, there must be an agent running on each worker node. A small persistent volume, mapped to a local directory on the worker node, provides persistence for the agent state.

Finally, regarding the fleet-server.yaml manifest, here are some points to highlight:

Fleet is behind the scenes an Elastic Agent, and therefore, a resource of kind “Agent”. Kibana and Elasticsearch are referenced via kibanaRef and elasticsearchRefs, respectively. Fleet is enrolled in the agent policy for the Fleet integration.

Apply the Elastic Agent and Fleet manifests in the k8s/elasticsearch/fleet directory :

Monitoring the running pods will let us know when Fleet is available:

Similarly to Elasticsearch and Kibana, it is possible to list the “agent” custom resources:

In Kibana, the Fleet section will now be available, and the K8s integration should be delivering data.

Wrapping up

Setting up cloud infrastructure and spinning up an Elasticsearch cluster with ECK is straightforward—once everything is declared as code and all manifests and configuration files are in place.

To me, this approach truly reflects the heart of cloud-native and the very essence of distributed applications. Have fun :)

This blog is human-written, since the author firmly believes in writing—even technical writing—as a fundamental way of personal expression.

Report an issue