This is a cache of https://www.elastic.co/search-labs/blog/ingest-aws-s3-data-elastic-cloud-elastic-agent. It is a snapshot of the page at 2024-10-14T00:28:41.726+0000.
Ingest data from AWS S3 into Elastic Cloud using Elastic Agent — Search Labs

How to ingest data from AWS S3 into Elastic Cloud - Part 2 : Elastic Agent

Learn about different options to ingest data from AWS S3 into Elastic Cloud.

This is the second installment in a multi-part blog series exploring different options for ingesting data from AWS S3 into Elastic Cloud.

In this blog we will learn about how to ingest data from AWS S3 using Elastic Agent.

Note 1: Check different options comparison in Part 1 : Elastic Serverless Forwarder
Note 2: Elastic Cloud deployment is a prerequisite to follow along the steps described below.

Elastic Cloud

Check the Part 1 : Elastic Serverless Forwarder of the blog series on how to get started with Elastic Cloud. Skip this if you already have a active deployment.

Elastic Agent

Another option to ingest data from AWS S3 is using Elastic Agent. Elastic Agent is a single, unified way to ingest data such as logs, metrics. Elastic agent is installed on an instance such as EC2 and using integrations can connect to the AWS services such as S3 and can forward the data to Elasticsearch.

High level Elastic Agent working:

  • A policy is created which is like a manifest file and consist of instructions for agent.
  • In the policy integrations are added which are essentialy modules consists of assets such as configs, mappings, dashboards etc.
  • Agents are installed with the required policy.
  • Agent will perform ingestion action based on the integrations.

Features

  • Ships both Logs & Metrics
  • Support data transfer over AWS PrivateLink
  • Support all integrations and agent can be managed using Fleet (comes default with Elastic Cloud)
  • Agents needs to be installed and maintaned and there is no autoscaling. Using Fleet can simplify the agent maintenance.
  • Good performance out of the box and performance parameters can be configured to use performance presets. Preset can be used depending on the data type and ingestion requirement. More about Fleet server scalability here
  • Cost is of EC2 instance for agent installation and for SQS notification

Data Flow

High level data flow for Elastic agent based data ingestion:

  • VPC flow log is configured to write to S3 bucket
  • Once log is written to S3 bucket, S3 event notifications is sent to SQS
  • Elastic agent polls SQS queue for new message. Based on the metadata in the message it reads the log data from S3 bucket and send it to Elasticsearch
  • SQS is recommeded for performance so that agent can read only the new updated objects in S3 bucket instead of polling entire bucket each time

Set up

For Steps (1)-(2), follow the details from Part 1 : Elastic Serverless Forwarder:

1. Create S3 Bucket to store VPC flow logs

2. Enable VPC Flow logs and send to S3 bucket created above

3. Create SQS queue with default settings

Note: Create SQS queue in same region as S3 bucket

Provide queue name sqs-vpc-flow-logs-elastic-agent and keep the other setting as default:

Update the SQS Access Policy (Advance) to allow s3 bucket to send notification to SQS queue. Replace account-id with your AWS account id. Keep other options as default.

Here, we are specifying S3 to send message to SQS queue (ARN) from the S3 bucket:

  {
    "Version": "2012-10-17",
    "Id": "example-ID",
    "Statement": [
      {
        "Sid": "example-statement-ID",
        "Effect": "Allow",
        "Principal": {
          "Service": "s3.amazonaws.com"
        },
        "Action": "SQS:SendMessage",
        "Resource": "arn:aws:sqs:ap-southeast-2:<account-id>:sqs-vpc-flow-logs-elastic-agent",
        "Condition": {
          "StringEquals": {
            "aws:SourceAccount": "<account-id>"
          },
          "ArnLike": {
            "aws:SourceArn": "arn:aws:s3:::s3-vpc-flow-logs-elastic"
          }
        }
      }
    ]
  }

Note the SQS URL, in queue setting under Details:

4. Enable VPC flow log event notification in S3 bucket

Go to S3 bucket s3-vpc-flow-logs-elastic -> Properties and Create event notification

Provide name and on what event type you want to trigger SQS. We have selected object create when any object is added to the bucket:

Select destination as SQS queue and choose sqs-vpc-flow-logs-elastic-agent:

Once saved, configuration will look like below:

Confirm VPC flow logs are published in S3 bucket:

Confirm S3 event notification is sent to SQS queue:

5. Install Elastic Agent on EC2 instance

Launch an EC2 instance

To get the installation commands, Go to:

Kibana -> Fleet -> Add Agent

Create new agent policy aws-vpc-flow-logs-s3-policy and click Create Policy.

Once policy is created, copy the instruction to install Elastic Agent. Leave other settings as default:

Login to EC2 instance and run the commands:

  [root@ip-xxx-xx-xx-xxx ~]# curl -L -O https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.14.3-linux-x86_64.tar.gz
  tar xzvf elastic-agent-8.14.3-linux-x86_64.tar.gz
  cd elastic-agent-8.14.3-linux-x86_64
  sudo ./elastic-agent install --url=https://xxxxxxxxxxx.fleet.ap-southeast-2.aws.found.io:443 --enrollment-token= xxxxxxxxxxx
    % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                  Dload  Upload   Total   Spent    Left  Speed
  100  327M  100  327M    0     0  5068k      0  0:01:06  0:01:06 --:--:-- 5065k
  elastic-agent-8.14.3-linux-x86_64/manifest.yaml
  elastic-agent-8.14.3-linux-x86_64/data/elastic-agent-2df2c1/elastic-agent
  ..........................
  Elastic Agent will be installed at /opt/Elastic/Agent and will run as a service. Do you want to continue? [Y/n]:Y
  [=   ] Service Started  [0s] Elastic Agent successfully installed, starting enrollment.
  [==  ] Waiting For Enroll...  [1s] {"log.level":"info","@timestamp":"2024-09-03T03:43:40.209Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":517},"message":"Starting enrollment to URL: https://xxxxxxxxxxx.fleet.ap-southeast-2.aws.found.io:443/","ecs.version":"1.6.0"}
  [  ==] Waiting For Enroll...  [2s] {"log.level":"info","@timestamp":"2024-09-03T03:43:41.396Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":480},"message":"Restarting agent daemon, attempt 0","ecs.version":"1.6.0"}
  [ ===] Waiting For Enroll...  [2s] {"log.level":"info","@timestamp":"2024-09-03T03:43:41.448Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":298},"message":"Successfully triggered restart on running Elastic Agent.","ecs.version":"1.6.0"}
  Successfully enrolled the Elastic Agent.
  [ ===] Done  [2s]                               
  Elastic Agent has been successfully installed.

Upon successful completion, status will be updated on fleet page:

Update policy aws-vpc-flow-logs-s3-policy with aws integration. This will push aws integration configuration to the agent which is subscribed to this policy. More on how fleet and agent work together is here.

Kibana -> Fleet -> Agent policies. Select the policy aws-vpc-flow-logs-s3-policy and click Add integration. This will take you to the integration page search for AWS integration. Choosing AWS integration is better if you want monitor more than 1 AWS service:

Provide AWS Access Key ID and Secret Access Key for authentication and allow Elastic Agent to read from AWS services. There are other authentication options available. Details here. Namespace option is used to segregate the data based on environment or any other identifier:

Toggle off other services and use Collect VPC flow logs from S3 . Update S3 bucket and SQS queue URL copied earlier. Leave advance settings as default:

Scroll down and click Existing hosts option as we have already intalled the agent and select the policy aws-vpc-flow-logs-s3-policy. Save and continue. This will push the configured integration to Elastic Agent:

Go to Kibana -> Fleet -> Agent policies and policy aws-vpc-flow-logs-s3-policy is updated with AWS integration.

After couple of minutes, you can validate flow logs are ingested from S3 into Elastic. Go to Kibana -> Discover:

6. Monitor VPC flow logs in Kibana dashboards

Integrations comes with assets such as dashboard which are pre-built for common use cases. Go to Kibana -> Dashboard and search for VPC Flow logs:

More Dashboards!

As promised, here are few dashboards that can help monitor AWS services used in our setup using the Elastic agent ingestion method. This will help in tracking usage and help in optimisation.

We will use the same setup used in the Elastic Agent data ingestion option to configure settings and populate dashboards.

Go to Kibana -> Fleet -> aws-vpc-flow-logs-s3-policy . Select AWS integration and toggle on the required service and fill in the details.

Some of the interesting Dashboards:

Note: All dashboards are available under Kibana->Analytics->Dashboards

[Metrics AWS] Lambda Overview

If you have implemented ingestion using Elastic Serverless Forwarder, then you can use this dashboard to track AWS Lambda metrics. It mainly shows Lambda function duration, errors, and any function throttling:

[Metrics AWS] S3 Overview

This dashboard outlines S3 usage and helps in monitoring bucket size, number of objects, etc. This can help in optimisation of S3 usage by tracking stale buckets and objects:

[Logs AWS] S3 Server Access Log Overview

This dashboard shows S3 server access logging and provides detailed records for the requests that are made to a bucket. This can be useful in security and access audits and can also help in learning how users access your S3 buckets and objects:

[Metrics AWS] Usage Overview

This dashboard shows the general usage of AWS services and highlights API usage against AWS services. This can help in understanding the service usage and potential optimisation:

[Metrics AWS] Billing Overview

This dashboard shows the billing usage by service and helps monitor how many $$ are spent for the services:

[Metrics AWS] SQS Overview

This dashboard shows SQS queues utilisation showing messages sent, received and any delay in sending messages. This is important in monitoring the SQS queues for any issues as it is an important component in the architecture. Any issues with SQS can potentially cause delay in data ingestion:

[Metrics AWS] EC2 Overview

If you are using the Elastic agent ingestion method, then you can monitor the utilisation of the EC2 instance for CPU, memory, disk, etc. hosting the Elastic agent, which can be helpful in sizing the instance if there is a high traffic load. This can also be used for your other EC2 instances:

[Elastic Agent] S3 Input Metrics

This dashboard shows the detailed utilisation of Elastic agent showing how Elastic agent is processing S3 inputs and monitoring interaction with SQS and S3. The dashboard shows aggregated metrics of the Elastic agent on reading SQS messages and S3 objects and forwarding them to Elasticsearch. Together with the [Metrics AWS] EC2 Overview dashboard, this can help in understanding the utilisation of EC2 and Elastic agent and can potentially helps in scaling these components:

Conclusion

Elasticsearch provides multiple options to sync data from AWS S3 into Elasticsearch deployments. In this walkthrough, we have demonstrated that it is relatively easy to implement Elastic Agent ingestion options and leverage Elastic's industry-leading search capabilities.

In Part 3 of this series, we'll dive into using Elastic S3 Native Connector as another option for ingesting AWS S3 data.

Don't forget to checkout Part 1 : Elastic Serverless Forwarder of the series.

You can build search with data from any source. Check out this webinar to learn about different connectors and sources that Elasticsearch supports.

Ready to try this out on your own? Start a free trial.

Related content

How to ingest data from AWS S3 into Elastic Cloud - Part 1 : Elastic Serverless Forwarder

October 2, 2024

How to ingest data from AWS S3 into Elastic Cloud - Part 1 : Elastic Serverless Forwarder

Learn about different ways you can ingest data from AWS S3 into Elastic Cloud

Architecting the next-generation of Managed Intake Service

Architecting the next-generation of Managed Intake Service

APM Server has been the de facto service for ingesting data from Elastic APM agents and OTel agents. In this blog post, we will walk through our journey of redesigning the APM Server product to scale and evolve into a more generic ingest component for Elastic Observability while also improving the reliability and maintainability compared to the traditional APM Server.

Open Crawler now in beta

September 17, 2024

Open Crawler now in beta

The Open Crawler is now in beta. This latest version 0.2 update also comes with several new features.

Elasticsearch data ingestion - What's the best data ingestion tool for the job?

June 21, 2024

Elasticsearch data ingestion - What's the best data ingestion tool for the job?

Discover the different ways you can ingest data in Elasticsearch. Explore data ingestion tools like Logstash, Client APIs and Elastic Native Connectors + the Elastic Connector Framework.

Open Crawler released for tech-preview

June 7, 2024

Open Crawler released for tech-preview

The Open Crawler lets users crawl web content and index it into Elasticsearch from wherever they like. Learn about it & how to use it here.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself