Observability for Amazon MQ with Elastic: Demystifying Messaging Flows with Real-Time Insights
Managing the Hidden Complexity of Message-Driven Architectures
Amazon MQ is a managed message broker service for Apache ActiveMQ Classic and RabbitMQ that manages the setup, operation, and maintenance of message brokers. Messaging systems like RabbitMQ, managed by Amazon MQ, are pivotal in modern decoupled, event-driven applications. By serving as an intermediary between services, RabbitMQ facilitates asynchronous communication through message queuing, routing, and reliable delivery, making it an ideal fit for microservices, real-time pipelines, and event-driven architectures. However, this flexibility introduces operational challenges, such as retries, processing delays, consumer failures, and queue backlogs, which can gradually impact downstream performance and system reliability.
With Elastic’s Amazon MQ integration, users gain deep visibility into message flow patterns, queue performance, and consumer health. This integration allows for the proactive detection of bottlenecks, helps optimize system behaviour, and ensures reliable message delivery at scale.
In this blog, we'll dive into the operational challenges of RabbitMQ in modern architectures, while also examining the common gaps and strategies for overcoming them.
Why Observability for RabbitMQ on Amazon MQ Matters?
RabbitMQ brokers are integral to distributed systems, handling tasks ranging from order processing to payment workflows and notification delivery. Any disruption can cascade into significant downstream issues. Observability into RabbitMQ helps answer critical operational questions like:
- Is CPU and memory utilization increasing over time?
- What are the trends in the message publish rate, message confirmation rate?
- Are consumers failing to acknowledge messages?
- Which queues are experiencing abnormal growth?
- Are there an increasing number of messages being dead-lettered over time?
Enhanced Observability with Amazon MQ Integration
Elastic provides a dedicated Amazon MQ integration for RabbitMQ that utilizes Amazon CloudWatch metrics and logs to deliver comprehensive observability data. This integration enables the ingestion of metrics related to connections, nodes, queues, exchanges, and system logs.
By deploying Elastic Agent with this integration, the users can monitor:
- Queue performance and Dead-letter queue (DLQ) metrics include total message count (MessageCount.max), messages ready for delivery (MessageReadyCount.max), and unacknowledged messages (MessageUnacknowledgedCount.max).MessageCount.maxmetric tracks the total number of messages in a queue, including those that have been dead-lettered, and monitoring this over time can help identify trends in message accumulation, which may suggest issues leading to dead-lettering.
- Consumer behaviour through metrics like consumer count (ConsumerCount.max) and acknowledgement rate (AckRate.max), which help identify underperforming consumers or potential backlogs.
- Messaging throughput by tracking publish (PublishRate.max), confirm (ConfirmRate.max), and acknowledgement rates in real time. These are crucial for understanding application messaging patterns and flow.
- Broker and node-level health, including memory usage (RabbitMQMemUsed.max), CPU utilization (SystemCpuUtilization.max), disk availability (RabbitMQDiskFree.min), and file descriptor usage (RabbitMQFdUsed.max). These indicators are essential for diagnosing resource saturation and avoiding service disruption.
Integrating Amazon MQ Metrics into Elastic Observability
Elastic's Amazon MQ integration facilitates the ingestion of CloudWatch metrics and logs into Elastic Observability, delivering near real-time insights into RabbitMQ. The prebuilt Amazon MQ dashboard visualizes this data, providing a centralized view of broker health, messaging activity, and resource usage, helping users quickly detect and resolve issues. Elastic's alerting for Observability enables proactive notifications based on custom conditions, while its SLO capabilities allow users to define and track key performance targets, strengthening system reliability and service commitments.
Elastic brings together logs and metrics from Amazon MQ alongside data from a wide range of other services and applications, whether running in AWS, on-premises, or across multi-cloud environments, offering unified observability from a single platform.
Prerequisites
To follow along, ensure you have:
- An account on Elastic Cloud and a deployed stack in AWS (see instructions here). Ensure you are using version 8.16.5 or higher. Alternatively, you can use Elastic Cloud Serverless, a fully managed solution that eliminates infrastructure management, automatically scales based on usage, and lets you focus entirely on extracting value from your data.
- An AWS account with permissions to pull the necessary data from AWS. See details in our documentation.
Architecture
Tracing Audit Flows from RabbitMQ to AWS Lambda
Consider a financial audit trail use case, where every user action, such as a funds transfer, is published to RabbitMQ. A Python-based AWS Lambda function consumes these messages, deduplicates them using the id field, and logs structured audit events for downstream analysis.
Sample payload sent through RabbitMQ:
{
"id": "txn-849302",
"type": "audit",
"payload": {
"user_id": "u-10245",
"event": "funds.transfer",
"amount": 1200.75,
"currency": "USD",
"timestamp": "T14:20:15Z",
"ip": "192.168.0.8",
"location": "New York, USA"
}
}
You can now correlate message publishing activity from RabbitMQ with AWS Lambda invocation logs, track processing latency, and configure alerts for conditions like drops in consumer throughput or an unexpected surge in RabbitMQ queue depth.
AWS Lambda Function: Processing RabbitMQ Messages
This Python-based AWS Lambda function processes audit events received from RabbitMQ. It deduplicates messages based on the id field and logs structured event data for downstream analysis or compliance. Save the code below in a file named app.py.
import json
import logging
import base64
# Configure logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# In-memory set to track processed message IDs for deduplication
processed_ids = set()
def lambda_handler(event, context):
logger.info("Lambda triggered by RabbitMQ event")
if 'rmqMessagesByQueue' not in event:
logger.warning("Invalid event: missing 'rmqMessagesByQueue'")
return {'statusCode': 400, 'body': 'Invalid RabbitMQ event'}
for queue_name, messages in event['rmqMessagesByQueue'].items():
logger.info(f"Processing queue: {queue_name}, Messages count: {len(messages)}")
for msg in messages:
try:
raw_data = msg['data']
decoded_json = base64.b64decode(raw_data).decode('utf-8')
message = json.loads(decoded_json)
logger.info(f"Decoded message: {json.dumps(message)}")
message_id = message.get('id')
if not message_id:
logger.warning("Message missing 'id', skipping.")
continue
if message_id in processed_ids:
logger.warning(f"Duplicate message detected: {message_id}")
continue
payload = message.get('payload', {})
logger.info(f"Processing message ID: {message_id}")
logger.info(f"Event Type: {message.get('type')}")
logger.info(f"User ID: {payload.get('user_id')}")
logger.info(f"Event: {payload.get('event')}")
logger.info(f"Amount: {payload.get('amount')} {payload.get('currency')}")
logger.info(f"Timestamp: {payload.get('timestamp')}")
logger.info(f"IP Address: {payload.get('ip')}")
logger.info(f"Location: {payload.get('location')}")
processed_ids.add(message_id)
except Exception as e:
logger.error(f"Error processing message: {str(e)}")
return {'statusCode': 200, 'body': 'Messages processed successfully'}
Setting up AWS Secrets Manager
To securely store and manage your RabbitMQ credentials, use AWS Secrets Manager.
-
Create a New Secret:
- Navigate to the AWS Secrets Manager console.
- Choose Store a new secret.
- Select Other type of secret.
- Enter the following key-value pairs:
- username: Your RabbitMQ username
- password: Your RabbitMQ password
-
Configure the Secret:
- Provide a meaningful name, such as RabbitMQAccess.
- Optionally, add tags and set rotation if needed.
- Provide a meaningful name, such as
-
Store the Secret:
- Review the settings and store the secret. Note the ARN of the secret you have created.
Setting up Amazon MQ for RabbitMQ
To get started with RabbitMQ on Amazon MQ, follow these steps to set up your broker.
- Open the Amazon MQ console.
- Create a new broker with the RabbitMQ engine.
- Choose your preferred deployment option—single-instance or clustered
- Use the same username and password that you previously stored in AWS Secrets Manager.
- Under Additional settings, enable CloudWatch Logs for observability.
- Configure access and security settings, ensuring that the broker is accessible to your AWS Lambda function.
-
After the broker is created, note the following important details:
- ARN of the RabbitMQ broker.
- RabbitMQ web console URL.
-
You’ll need the RabbitMQ log group ARN to set up Elastic’s Amazon MQ integration for RabbitMQ. Follow these steps to locate it:
- Go to the General – Enabled Logs section of the broker.
- Copy the CloudWatch log group ARN.
Create a RabbitMQ Queue
Now that the RabbitMQ broker is configured, use the management console to create a queue where messages will be published.
- Access the RabbitMQ management console using the web console URL.
- Create a new queue (example: myQueue) to receive messages.
Build and deploy the AWS Lambda function
In this section, we'll set up the Lambda function using AWS SAM, add the message processing logic, and deploy it to AWS. This Lambda function will be responsible for consuming messages from RabbitMQ and logging audit events.
Before continuing, make sure you have completed the following prerequisites.
Next, follow the steps outlined below to continue with the setup.
- In your command line, run the command sam initfrom a directory of your choice.
- The AWS SAM CLI will walk you through the setup.
- Select AWS Quick Start Templates.
- Choose the hello World Example
- Use the Python runtime and zip package type.
- Proceed with the default options.
- Name your application as sample-rabbitmq-app.
- The AWS SAM CLI downloads your starting template and creates the application project directory structure.
- From your command line, move to the newly created sample-rabbitmq-app directory.
- Replace the content of the hello_world/app.py file with the lambda function code for rabbitmq message processing.
- In the template.yaml file, use the values mentioned below to update the file content.
Resources: SampleRabbitMQApp: Type: AWS::Serverless::Function Properties: CodeUri: hello_world/ Description: A starter AWS Lambda function. MemorySize: 128 Timeout: 3 Handler: app.lambda_handler Runtime: python3.10 PackageType: Zip Policies: - Statement: - Effect: Allow Resource: '*' Action: - mq:DescribeBroker - secretsmanager:GetSecretValue - ec2:CreateNetworkInterface - ec2:DescribeNetworkInterfaces - ec2:DescribeVpcs - ec2:DeleteNetworkInterface - ec2:DescribeSubnets - ec2:DescribeSecurityGroups Events: MQEvent: Type: MQ Properties: Broker: <ARN of the Broker> Queues: - myQueue SourceAccessConfigurations: - Type: BASIC_AUTH URI: <ARN of the secret>
- Run the command sam deploy --guidedand wait for the confirmation message. This deploys all of the resources.
Sending Audit Events to RabbitMQ and Triggering Lambda
To test the end-to-end setup, simulate the flow by publishing audit event data into RabbitMQ using its web UI. Once the message is sent, it triggers the Lambda function.
-
Navigate to the Amazon MQ console and select your newly created broker.
-
Locate and open the Rabbit web console URL
-
Under the Queues and Streams tab, select the target queue (example: myQueue).
-
Enter the message payload, and click Publish message to send it to the queue.
Here’s a sample payload published via RabbitMQ:{ "id": "txn-849302", "type": "audit", "payload": { "user_id": "u-10245", "event": "funds.transfer", "amount": 1200.75, "currency": "USD", "timestamp": "T14:20:15Z", "ip": "192.168.0.8", "location": "New York, USA" } }
-
Navigate to the AWS Lambda function created earlier.
-
Under the Monitor tab, click View CloudWatch logs.
-
Check the latest log stream to confirm that the Lambda was triggered by Amazon MQ and that the message was processed successfully.
Configuring Amazon MQ integration for Metrics and Logs collection
Elastic’s Amazon MQ integration simplifies the collection of logs and metrics from RabbitMQ brokers managed by Amazon MQ. Logs are ingested via Amazon CloudWatch Logs, while metrics are fetched from the specified AWS region at a defined interval.
Elastic provides a default configuration for metrics collection. You can accept these defaults or adjust settings such as the Collection Period to better fit your needs.
To enable the collection of logs:
- Navigate to the Amazon MQ console and select the newly created broker.
- Click the Logs hyperlink under the General – Enabled Logs section to open the detailed log settings page.
- From this page, copy the CloudWatch log group ARN.
- In Elastic, set up the Amazon MQ integration and paste the CloudWatch log group ARN.
- Accept Defaults or Customize Settings – Elastic provides a default configuration for logs collection. You can accept these defaults or adjust settings such as collection intervals to better fit your needs.
Visualizing RabbitMQ Workloads with the Pre-Built Amazon MQ Dashboard
You can access the RabbitMQ dashboard by:
-
Navigate to the Dashboard Menu – Select the Dashboard menu option in Elastic and search for [Amazon MQ] RabbitMQ Overview to open the dashboard.
-
Navigate to the Integrations Menu – Open the Integrations menu in Elastic, select Amazon MQ, go to the Assets tab, and choose [Amazon MQ] RabbitMQ Overview from the dashboard assets
The Amazon MQ RabbitMQ dashboard in the Elastic integration delivers a comprehensive overview of broker health and messaging activity. It provides real-time insights into broker resource utilization, queue and topic performance, connection trends, and messaging throughput. The dashboard helps users track system behaviour, detect performance bottlenecks, and ensure reliable message delivery across distributed applications.
Broker Metrics
This section provides a centralised view of the overall health and performance of the RabbitMQ broker on Amazon MQ. The visualizations highlights the number of configured exchanges and queues, active broker connections, producers, consumers, and total messages in flight. System-level metrics such as CPU utilization, memory consumption, and free disk space help assess whether the broker has sufficient resources to handle current workloads.
Message flow metrics such as publish rate, confirmation rate, and acknowledgement rate are displayed to provide visibility into how messages are processed through the broker. Monitoring trends in these values helps detect message delivery issues, throughput degradation, or potential saturation of the broker under load.
Node Metrics
Node-level visibility helps identify resource imbalances across nodes in clustered RabbitMQ setups. This section includes per-node CPU usage, memory consumption, and available disk space, offering insight into the underlying infrastructure's ability to support broker operations.
Queue Metrics
Queue-specific insights are critical for understanding message delivery patterns and backlog conditions. This section details total messages, ready messages, and unacknowledged messages, segmented by broker, virtual host, and queue.
By observing how these counts change over time, users can identify slow consumers, message build-ups, or delivery issues that may affect application performance or lead to dropped messages under pressure.
Logs
This section displays log level, process ID, and raw message content. These logs provide immediate visibility into events such as connection failures, resource thresholds being hit, or unexpected queue behaviors.
Detecting Queue Backlogs with Alerting Rules
Elastic’s alert framework allows you to define rules that monitor critical RabbitMQ metrics and automatically trigger actions when specific thresholds are breached.
Alert: Queue Backlog (Message Ready or Unacknowledged Messages)
This alert helps detect queue backlog in Amazon MQ by evaluating two metrics
- MessageUnacknowledgedCount.maxand
- MessageReadyCount.max.
The alert is triggered if either condition persists for more than 10 minutes:
- MessageUnacknowledgedCount.maxexceeds 5,000
- MessageReadyCount.maxexceeds 7,000
These thresholds should be adjusted based on typical message volume and consumer throughput. Sustained high values can indicate that consumers are not keeping up or message delivery pipelines are congested, potentially causing delays or dropped messages. Sustained high values may result in processing delays or dropped messages if not addressed.
Tracking Resource Utilization to Maintain RabbitMQ Performance
Elastic’s Service-level objectives (SLOs) capabilities allow you to define and monitor performance targets using key indicators like latency, availability, and error rates. Once configured, Elastic continuously evaluates these SLOs in real time, offering intuitive dashboards, alerts for threshold violations, and insights into error budget consumption. This enables teams to stay ahead of issues, ensuring service reliability and consistent performance.
SLO: Node Resource Health (CPU, Memory, Disk)
This SLO focuses on ensuring RabbitMQ brokers and nodes have sufficient resources to process messages without performance degradation. It tracks CPU, memory, and disk usage across RabbitMQ brokers and nodes to prevent resource exhaustion that could lead to service interruptions.
Target thresholds:
- SystemCpuUtilization.maxremains below 85% for 99% of the time.
- RabbitMQMemUsed.maxremains below 80% ofRabbitMQMemLimit.maxfor 99% of the time.
- RabbitMQDiskFree.minremains above 25% ofRabbitMQDiskFreeLimit.maxfor 99% of the time.
Sustained high values in CPU or memory usage can signal resource contention, which may result in slower message processing or downtime. Low disk availability may cause the broker to stop accepting messages, risking message loss. These thresholds are designed to catch early signs of resource saturation and ensure smooth, uninterrupted message flow across RabbitMQ deployments.
Conclusion
As RabbitMQ-based messaging architectures scale and become more complex, the need for in-depth visibility into system performance and potential issues deepens. Elastic’s Amazon MQ integration brings that visibility front and center—helping you go beyond basic health checks to understand real-time messaging throughput, queue backlog trends, and resource saturation across your brokers and consumers.
By leveraging the prebuilt dashboards, configuring alerts and SLOs, you can proactively detect anomalies, fine-tune consumer performance, and ensure reliable delivery across your event-driven applications.