Overview
Elasticsearch supports various tools, such as Beats collectors and Elastic Agent (OTel), to collect data like server logs, container logs, system metrics, and application performance data. This data is sent to Elasticsearch for search, analysis, and visualization. You can choose the most suitable collection method based on your data source and deployment environment.
Collection methods
|
Scenario |
Recommended method |
Description |
|
Server log file collection |
Filebeat / Elastic Agent |
Lightweight collection with low resource overhead. Elastic Agent also supports centralized management through Fleet, ideal for managing multiple nodes. |
|
System and service metrics monitoring |
Metricbeat / Elastic Agent |
Provides out-of-the-box modules for collecting system and service metrics, including CPU, memory, Nginx, and MySQL metrics. |
|
Kubernetes container logs |
Filebeat sidecar / Elastic Agent sidecar |
Deploy as a sidecar in a Pod to collect application log files from within the container. |
|
Large-scale log collection |
Log Collection and Processing Service (recommended), see Log Collection and Processing Service |
This service has a built-in message queue for buffering, eliminates operational overhead, and features a simplified architecture. Alternatively, you can build your own pipeline by using Filebeat, Kafka, and Logstash. |
|
Middleware and application monitoring |
Metricbeat / rsbeat / Dedicated Beat |
Use dedicated collectors for middleware such as RabbitMQ and Redis. |
|
Distributed tracing and APM |
SkyWalking / APM Server |
Ideal for distributed tracing and Application Performance Management (APM) in microservices architectures. |
|
Service availability monitoring |
Heartbeat / Uptime |
Proactively probes the availability and response time of ICMP, TCP, and HTTP services. |
|
Security auditing |
Auditbeat |
Collects system audit data and monitors file integrity changes. |
Collection tools
Beats collectors
Beats is a family of lightweight, single-purpose data collectors from Elastic that use few resources, are easy to deploy, and require no application code modifications.
|
Collector |
Purpose |
|
Filebeat |
Collects and forwards log file data. It supports various log formats, including Nginx, Apache, and MySQL. |
|
Metricbeat |
Collects metrics from operating systems and services, such as CPU, memory, disk usage, and Nginx operational metrics. |
|
Auditbeat |
Collects data from the Linux Audit Framework and monitors file integrity changes. |
|
Heartbeat |
Probes service availability and performs health checks using ICMP, TCP, and HTTP protocols. |
Elastic Agent (OTel)
Elastic Agent is a next-generation unified collector from Elastic. It includes a built-in OpenTelemetry (OTel) compatibility mode and can replace multiple Beats collectors. Elastic Agent collects various data types, such as logs, metrics, and security data, and supports centralized management through Fleet.
Data processing tools
Logstash
Logstash is a server-side data processing pipeline that filters, cleans, and transforms data. It is often used with collectors like Beats or Elastic Agent to process data before sending it to Elasticsearch, making it suitable for scenarios that require complex data processing. Logstash can also use its extensive library of input plugins to connect directly to external data sources, allowing you to migrate or continuously sync data to Elasticsearch.
Log Collection and Processing Service
Traditional log collection solutions often require a self-built, multi-stage pipeline using Kafka and Logstash between the collector and Elasticsearch to handle high-concurrency writes. The Log Collection and Processing Service consolidates these intermediate components into a fully managed, server-side ingestion channel. This simplifies the four-stage pipeline into a two-stage pipeline and offers the following key advantages:
|
Feature |
Traditional architecture |
Log Collection and Processing Service |
|
Pipeline architecture |
Collector → Kafka → Logstash → ES. A four-stage serial pipeline with multiple components. |
Collector → Managed service → ES. A two-stage pipeline accessible via a single write endpoint. |
|
Operational burden |
Requires you to deploy and maintain Kafka and Logstash. You also need to manage broker scaling, topics, partitions, and consumer groups. |
Fully managed. No need to purchase or maintain additional middleware. |
|
Troubleshooting |
Multiple potential points of failure require checking the status and configuration of each component. |
A shorter pipeline significantly narrows the scope of troubleshooting. |
|
Ingestion capacity |
Depends on the buffering capacity of a self-managed Kafka cluster, which requires manual scaling. |
Managed high-concurrency ingestion automatically handles traffic spikes and prevents data loss. |
|
Resource cost |
Requires provisioning Kafka and Logstash cluster resources for peak traffic, which leads to idle resources and waste during off-peak hours. |
The pay-as-you-go model eliminates the need to provision dedicated cluster resources. |
|
Compatibility |
Components have version compatibility requirements. Upgrades require synchronized configuration adjustments. |
Supports both Filebeat and OTel ingestion protocols and is compatible with existing collectors. The service is fully managed and maintenance-free. |
The Log Collection and Processing Service currently supports Elasticsearch versions 7.10 and 8.17.0. Ingestion via the OTel protocol is supported only for version 8.17.0, while the Filebeat protocol is supported for both versions. For usage instructions, see Log Collection and Processing Service.
Host log collection
When your application is deployed on Alibaba Cloud ECS instances, in an on-premises data center, or on servers from other cloud providers, you can use the following methods to collect logs and metrics from your hosts.
|
Collection tool |
Data source/scenario |
Description |
Documentation |
|
Filebeat |
ECS service logs |
Deploy Filebeat on an ECS instance to collect server log files and write them to Elasticsearch. |
|
|
Filebeat |
MySQL logs |
Use a self-managed Filebeat to collect MySQL slow query logs and error logs. |
|
|
Filebeat |
Apache logs |
Collect access logs and error logs from Apache HTTP Server. |
|
|
Metricbeat |
System metrics and Nginx service data |
Collect system-level metrics (such as CPU, memory, and disk) and Nginx operational metrics. |
Collect system data and Nginx operational metrics using Metricbeat |
|
Auditbeat |
System audit data |
Collect data from the system audit framework and monitor critical file integrity changes. |
Collect system audit data and monitor file changes using Auditbeat |
|
Heartbeat |
ICMP and HTTP services |
Proactively probe ICMP and HTTP services to monitor their availability and response time. |
|
|
Elastic Agent (OTel) |
Nginx logs |
Use Fleet for centralized management to collect Nginx access logs and error logs. |
|
|
Elastic Agent (OTel) |
NetFlow logs |
Collect NetFlow network traffic log data to enable visualization and analysis of network traffic. |
|
|
Elastic Agent (OTel) |
Custom logs |
Collect custom-formatted log data. This method is suitable for non-standard log collection scenarios. |
Kubernetes container log collection
When your application is deployed in a Kubernetes environment, such as an Alibaba Cloud ACK cluster or a native Kubernetes cluster, you can use the following methods to collect log data from your containerized applications.
|
Collection tool |
Data source/scenario |
Description |
Documentation |
|
Filebeat sidecar |
ACK clusters |
Deploy Filebeat as a sidecar in an ACK cluster to collect application logs from Pods. You can write data directly to Elasticsearch or forward it through Kafka. |
Use a Filebeat sidecar to collect ACK cluster logs to Alibaba Cloud Elasticsearch |
|
Filebeat + Kafka + Logstash |
ACK clusters |
Build a complete log analysis pipeline: Filebeat → Kafka → Logstash → Elasticsearch. This is suitable for large-scale logging scenarios. |
Build a log analysis system with Filebeat, Kafka, Logstash, and Elasticsearch |
|
Elastic Agent sidecar |
ACK clusters |
Deploy Elastic Agent as a sidecar in an ACK cluster to collect logs and send them to Elasticsearch using the OTel protocol. |
|
|
Elastic Agent sidecar |
Native Kubernetes clusters |
Deploy Elastic Agent as a sidecar in a native Kubernetes cluster to collect logs using the OTel protocol. |
|
|
Elastic Agent sidecar + Kafka |
ACK clusters |
Use a two-tier OTel collector architecture (sidecar → Kafka → centralized collector → ES). This architecture is suitable for scenarios that require high reliability and must handle traffic spikes. |
Application log collection
For log data generated by middleware and application services, you can use the following methods for collection and analysis.
|
Collection tool |
Data source/scenario |
Description |
Documentation |
|
Filebeat + Metricbeat |
RabbitMQ monitoring |
Use Filebeat to collect RabbitMQ logs and Metricbeat to collect operational metrics. You can then visualize and monitor the data in Kibana. |
|
|
Filebeat + Logstash |
RocketMQ client logs |
Collect RocketMQ client logs, parse them with a Logstash Grok filter, and write them to Elasticsearch. This helps troubleshoot message sending and receiving errors. |
|
|
rsbeat |
Redis slow query log |
Use rsbeat to collect Redis slow query logs in real time and analyze their distribution in Kibana. |
Analyze Redis slow query logs in real time with Elasticsearch and rsbeat |
Server data collection
For scenarios like server performance monitoring, distributed tracing, and APM, you can use the following methods to collect monitoring data and send it to Elasticsearch.
|
Tool |
Data source/scenario |
Description |
Documentation |
|
Metricbeat |
System metrics |
Use a self-managed Metricbeat to collect system-level metrics, such as CPU, memory, file system, and network I/O. |
|
|
SkyWalking |
Distributed tracing |
Store trace and metric data from SkyWalking in Elasticsearch to implement distributed tracing in a microservices architecture. |
Implement distributed tracing with SkyWalking and Elasticsearch |
|
Heartbeat / Uptime |
Elasticsearch service monitoring |
Use Heartbeat with the Kibana Uptime feature to monitor the availability and response time of your Elasticsearch service in real time. |
Monitor Alibaba Cloud Elasticsearch services in real time using Uptime |
|
APM Server |
Application performance data |
Use a self-managed APM Server to collect application performance data, such as transactions, spans, and error information, and write it to Elasticsearch for analysis. |
Collect data to Alibaba Cloud Elasticsearch using a self-managed APM Server |
|
Logstash |
Data migration and processing |
Use a self-managed Logstash to filter and transform data from external sources before migrating or continuously syncing it to Elasticsearch. Logstash supports a wide range of input plugins and filters. |
Migrate data to Alibaba Cloud Elasticsearch using a self-managed Logstash |
Related documents
-
To sync data from sources like MySQL, MongoDB, MaxCompute, and Hadoop to Elasticsearch, see the related documents in Best Practices Overview.
-
To migrate data from a self-managed or third-party Elasticsearch cluster to Alibaba Cloud Elasticsearch, see the related documents in Best Practices Overview.