Overview

更新时间: 2026-04-21 04:25:08

Elasticsearch supports various tools, such as Beats collectors and Elastic Agent (OTel), to collect data like server logs, container logs, system metrics, and application performance data. This data is sent to Elasticsearch for search, analysis, and visualization. You can choose the most suitable collection method based on your data source and deployment environment.

Collection methods

Scenario

Recommended method

Description

Server log file collection

Filebeat / Elastic Agent

Lightweight collection with low resource overhead. Elastic Agent also supports centralized management through Fleet, ideal for managing multiple nodes.

System and service metrics monitoring

Metricbeat / Elastic Agent

Provides out-of-the-box modules for collecting system and service metrics, including CPU, memory, Nginx, and MySQL metrics.

Kubernetes container logs

Filebeat sidecar / Elastic Agent sidecar

Deploy as a sidecar in a Pod to collect application log files from within the container.

Large-scale log collection

Log Collection and Processing Service (recommended), see Log Collection and Processing Service

This service has a built-in message queue for buffering, eliminates operational overhead, and features a simplified architecture. Alternatively, you can build your own pipeline by using Filebeat, Kafka, and Logstash.

Middleware and application monitoring

Metricbeat / rsbeat / Dedicated Beat

Use dedicated collectors for middleware such as RabbitMQ and Redis.

Distributed tracing and APM

SkyWalking / APM Server

Ideal for distributed tracing and Application Performance Management (APM) in microservices architectures.

Service availability monitoring

Heartbeat / Uptime

Proactively probes the availability and response time of ICMP, TCP, and HTTP services.

Security auditing

Auditbeat

Collects system audit data and monitors file integrity changes.

Collection tools

Beats collectors

Beats is a family of lightweight, single-purpose data collectors from Elastic that use few resources, are easy to deploy, and require no application code modifications.

Collector

Purpose

Filebeat

Collects and forwards log file data. It supports various log formats, including Nginx, Apache, and MySQL.

Metricbeat

Collects metrics from operating systems and services, such as CPU, memory, disk usage, and Nginx operational metrics.

Auditbeat

Collects data from the Linux Audit Framework and monitors file integrity changes.

Heartbeat

Probes service availability and performs health checks using ICMP, TCP, and HTTP protocols.

Elastic Agent (OTel)

Elastic Agent is a next-generation unified collector from Elastic. It includes a built-in OpenTelemetry (OTel) compatibility mode and can replace multiple Beats collectors. Elastic Agent collects various data types, such as logs, metrics, and security data, and supports centralized management through Fleet.

Data processing tools

Logstash

Logstash is a server-side data processing pipeline that filters, cleans, and transforms data. It is often used with collectors like Beats or Elastic Agent to process data before sending it to Elasticsearch, making it suitable for scenarios that require complex data processing. Logstash can also use its extensive library of input plugins to connect directly to external data sources, allowing you to migrate or continuously sync data to Elasticsearch.

Log Collection and Processing Service

Traditional log collection solutions often require a self-built, multi-stage pipeline using Kafka and Logstash between the collector and Elasticsearch to handle high-concurrency writes. The Log Collection and Processing Service consolidates these intermediate components into a fully managed, server-side ingestion channel. This simplifies the four-stage pipeline into a two-stage pipeline and offers the following key advantages:

Feature

Traditional architecture

Log Collection and Processing Service

Pipeline architecture

Collector → Kafka → Logstash → ES. A four-stage serial pipeline with multiple components.

Collector → Managed service → ES. A two-stage pipeline accessible via a single write endpoint.

Operational burden

Requires you to deploy and maintain Kafka and Logstash. You also need to manage broker scaling, topics, partitions, and consumer groups.

Fully managed. No need to purchase or maintain additional middleware.

Troubleshooting

Multiple potential points of failure require checking the status and configuration of each component.

A shorter pipeline significantly narrows the scope of troubleshooting.

Ingestion capacity

Depends on the buffering capacity of a self-managed Kafka cluster, which requires manual scaling.

Managed high-concurrency ingestion automatically handles traffic spikes and prevents data loss.

Resource cost

Requires provisioning Kafka and Logstash cluster resources for peak traffic, which leads to idle resources and waste during off-peak hours.

The pay-as-you-go model eliminates the need to provision dedicated cluster resources.

Compatibility

Components have version compatibility requirements. Upgrades require synchronized configuration adjustments.

Supports both Filebeat and OTel ingestion protocols and is compatible with existing collectors. The service is fully managed and maintenance-free.

The Log Collection and Processing Service currently supports Elasticsearch versions 7.10 and 8.17.0. Ingestion via the OTel protocol is supported only for version 8.17.0, while the Filebeat protocol is supported for both versions. For usage instructions, see Log Collection and Processing Service.

Host log collection

When your application is deployed on Alibaba Cloud ECS instances, in an on-premises data center, or on servers from other cloud providers, you can use the following methods to collect logs and metrics from your hosts.

Collection tool

Data source/scenario

Description

Documentation

Filebeat

ECS service logs

Deploy Filebeat on an ECS instance to collect server log files and write them to Elasticsearch.

Collect ECS service logs using Filebeat

Filebeat

MySQL logs

Use a self-managed Filebeat to collect MySQL slow query logs and error logs.

Collect MySQL logs using a self-managed Filebeat

Filebeat

Apache logs

Collect access logs and error logs from Apache HTTP Server.

Collect Apache log data using Filebeat

Metricbeat

System metrics and Nginx service data

Collect system-level metrics (such as CPU, memory, and disk) and Nginx operational metrics.

Collect system data and Nginx operational metrics using Metricbeat

Auditbeat

System audit data

Collect data from the system audit framework and monitor critical file integrity changes.

Collect system audit data and monitor file changes using Auditbeat

Heartbeat

ICMP and HTTP services

Proactively probe ICMP and HTTP services to monitor their availability and response time.

Monitor ICMP and HTTP services using Heartbeat

Elastic Agent (OTel)

Nginx logs

Use Fleet for centralized management to collect Nginx access logs and error logs.

Collect Nginx log data using Elastic Agent

Elastic Agent (OTel)

NetFlow logs

Collect NetFlow network traffic log data to enable visualization and analysis of network traffic.

Collect NetFlow log data using Elastic Agent

Elastic Agent (OTel)

Custom logs

Collect custom-formatted log data. This method is suitable for non-standard log collection scenarios.

Collect custom log data using Elastic Agent

Kubernetes container log collection

When your application is deployed in a Kubernetes environment, such as an Alibaba Cloud ACK cluster or a native Kubernetes cluster, you can use the following methods to collect log data from your containerized applications.

Collection tool

Data source/scenario

Description

Documentation

Filebeat sidecar

ACK clusters

Deploy Filebeat as a sidecar in an ACK cluster to collect application logs from Pods. You can write data directly to Elasticsearch or forward it through Kafka.

Use a Filebeat sidecar to collect ACK cluster logs to Alibaba Cloud Elasticsearch

Filebeat + Kafka + Logstash

ACK clusters

Build a complete log analysis pipeline: Filebeat → Kafka → Logstash → Elasticsearch. This is suitable for large-scale logging scenarios.

Build a log analysis system with Filebeat, Kafka, Logstash, and Elasticsearch

Elastic Agent sidecar

ACK clusters

Deploy Elastic Agent as a sidecar in an ACK cluster to collect logs and send them to Elasticsearch using the OTel protocol.

Collect data from Alibaba Cloud ACK (Kubernetes)

Elastic Agent sidecar

Native Kubernetes clusters

Deploy Elastic Agent as a sidecar in a native Kubernetes cluster to collect logs using the OTel protocol.

Collect data from Kubernetes Pods

Elastic Agent sidecar + Kafka

ACK clusters

Use a two-tier OTel collector architecture (sidecar → Kafka → centralized collector → ES). This architecture is suitable for scenarios that require high reliability and must handle traffic spikes.

Collect data from Alibaba Cloud ACK using Kafka

Application log collection

For log data generated by middleware and application services, you can use the following methods for collection and analysis.

Collection tool

Data source/scenario

Description

Documentation

Filebeat + Metricbeat

RabbitMQ monitoring

Use Filebeat to collect RabbitMQ logs and Metricbeat to collect operational metrics. You can then visualize and monitor the data in Kibana.

Monitor RabbitMQ using Alibaba Cloud Elasticsearch

Filebeat + Logstash

RocketMQ client logs

Collect RocketMQ client logs, parse them with a Logstash Grok filter, and write them to Elasticsearch. This helps troubleshoot message sending and receiving errors.

Query and analyze RocketMQ client logs

rsbeat

Redis slow query log

Use rsbeat to collect Redis slow query logs in real time and analyze their distribution in Kibana.

Analyze Redis slow query logs in real time with Elasticsearch and rsbeat

Server data collection

For scenarios like server performance monitoring, distributed tracing, and APM, you can use the following methods to collect monitoring data and send it to Elasticsearch.

Tool

Data source/scenario

Description

Documentation

Metricbeat

System metrics

Use a self-managed Metricbeat to collect system-level metrics, such as CPU, memory, file system, and network I/O.

Collect system metrics using a self-managed Metricbeat

SkyWalking

Distributed tracing

Store trace and metric data from SkyWalking in Elasticsearch to implement distributed tracing in a microservices architecture.

Implement distributed tracing with SkyWalking and Elasticsearch

Heartbeat / Uptime

Elasticsearch service monitoring

Use Heartbeat with the Kibana Uptime feature to monitor the availability and response time of your Elasticsearch service in real time.

Monitor Alibaba Cloud Elasticsearch services in real time using Uptime

APM Server

Application performance data

Use a self-managed APM Server to collect application performance data, such as transactions, spans, and error information, and write it to Elasticsearch for analysis.

Collect data to Alibaba Cloud Elasticsearch using a self-managed APM Server

Logstash

Data migration and processing

Use a self-managed Logstash to filter and transform data from external sources before migrating or continuously syncing it to Elasticsearch. Logstash supports a wide range of input plugins and filters.

Migrate data to Alibaba Cloud Elasticsearch using a self-managed Logstash

Related documents

  • To sync data from sources like MySQL, MongoDB, MaxCompute, and Hadoop to Elasticsearch, see the related documents in Best Practices Overview.

  • To migrate data from a self-managed or third-party Elasticsearch cluster to Alibaba Cloud Elasticsearch, see the related documents in Best Practices Overview.

上一篇: Data collection 下一篇: Host log collection
阿里云首页 检索分析服务 Elasticsearch版 相关技术圈