Troubleshoot Logstash data write issues

更新时间:
复制 MD 格式

When you use Alibaba Cloud Logstash to write data to an Alibaba Cloud Elasticsearch instance, you may encounter issues such as network problems, incorrect pipeline configurations, and high server load. You might also find that a pipeline starts correctly but fails to write data, or that the service is running but data is missing. This topic provides solutions to help you troubleshoot and resolve these issues.

Network connectivity issues

Troubleshooting plan

Common error cases

Recommended solution

Check whether Logstash is in the same network as the source and destination services.

Note

Alibaba Cloud Logstash and Alibaba Cloud Elasticsearch services are deployed in a virtual private cloud (VPC). You should deploy your business in the same VPC.

The source service is in a public network environment, while Logstash is in a VPC environment.

Choose one of the following solutions:

  • Use network products to connect the network environments.

  • Configure a NAT Gateway for public network data transmission. For more information, see Configure NAT for public network data transmission.

  • Purchase new Logstash and Elasticsearch instances in the same VPC and reconfigure the pipeline.

Check for errors in the NAT Gateway configuration.

  • The address or port in the NAT entry is incorrect.

  • The NAT Gateway type does not match the scenario.

Resolve the issue as follows:

  • Check the NAT entry address and port to ensure network connectivity.

  • SNAT and DNAT are used in different scenarios. Select the correct gateway translation method for your business scenario:

    • SNAT: Logstash actively accesses the public network.

    • DNAT: A public network service pushes data to Logstash nodes.

Check whether the correct Java Database Connectivity (JDBC) driver plugin is uploaded.

In a PolarDB data synchronization scenario, a high-version JDBC driver is used. The log shows no errors, but data is not written to the destination. This issue can be resolved by switching to a lower-version driver.

Select the correct version of the JDBC driver. For more information, see Configure extension files.

Check for restrictions in the whitelist or security group.

Filebeat is used to collect data for processing in Logstash. Filebeat is deployed on a user's ECS instance, but the listener port for the ECS instance is not open in the security group.

Resolve the issue as follows:

Check whether a Resource Access Management (RAM) user lacks the required permissions to access the source or destination service in the Logstash pipeline configuration.

  • The Logstash output configuration specifies a RAM user to access Elasticsearch, but the RAM user has not been granted index permissions on the Elasticsearch instance.

  • The main Logstash log reports a 401 error.

Resolve the issue as follows:

  • Grant the required permissions to the RAM user. For more information, see Grant permissions to a RAM user.

  • Use the correct username and password for the source and destination. The password must not contain any special characters. If the current password contains special characters, you must change the password for the source or destination. For more information, see Reset the access password of an instance.

Pipeline configuration errors

Troubleshooting plan

Common error cases

Recommended solution

Check the main Logstash log for errors. For more information, see Query logs.

A plugin is not installed. For example, the error message Couldn't find any output plugin named 'file_extend' indicates that the logstash-output-file_extend plugin is not installed on the cluster.

Choose one of the following solutions:

  • Install the plugin.

  • Delete the plugin's configuration information from the pipeline configuration.

The configuration contains hidden special characters.

Enter the configuration manually.

The filter code is incorrect. For example, the Ruby code contains errors.

Choose one of the following solutions:

  • Simplify the filter module to its original configuration. Then, gradually add filter configurations to identify the root cause and resolve the issue.

  • Use a third-party debugging tool to debug the code before you deploy it online.

A pipeline parameter name or value is incorrect. For example, hosts is written as host in the logstash-output-elasticsearch plugin, or the RDS instance name is incorrect.

To write the pipeline configuration, see the Logstash official documentation or the Alibaba Cloud Elasticsearch official best practices.

The connection between Logstash and the source or destination times out. For example, if Elasticsearch is inaccessible, the error message Elasticsearch Unreachable: [http://xxxx:9200/][Manticore::ConnectTimeout] connect timed out appears.

Ensure network connectivity between Logstash and Elasticsearch, and enter the correct source and destination addresses.

The Elasticsearch instance has the HTTPS protocol enabled, but the Logstash pipeline is configured to use http.

Modify the pipeline configuration to use the same access protocol as the source and destination.

Load issues

Troubleshooting plan

Common error cases

Recommended solution

Check whether the node disk usage is too high. For more information, see the Cluster monitoring topic.

  • In the pipeline configuration, the queue type is set to persistent (PERSISTED). Data is stored permanently on the disk, which can become full as data accumulates.

  • The pipeline output configuration specifies stdout{}.

Resolve the issue as follows:

  • Set the Logstash pipeline queue type to the default memory-optimized type (MEMORY). For more information, see Manage pipelines using configuration files.

    Important

    Alibaba Cloud Logstash does not currently provide an option to clear disks. If you encounter a full disk issue, backend technical personnel must resolve it for you.

  • Delete stdout{} from the pipeline output configuration.

    Important

    The pipeline output configuration does not support defining stdout{}. If you define it, high disk usage will occur.

Check whether the node has an out-of-memory (OOM) error. For more information, see the Cluster monitoring topic.

The node has an OOM error and fails to start.

Restart the corresponding node in the console.

Check for load issues on the source or destination.

The Elasticsearch cluster is unhealthy, which affects write operations.

Pause write operations and prioritize restoring the health of the cluster. You can also scale out the cluster.

Pipeline starts normally, but no data is written to the destination

Troubleshooting plan

Common error cases

Recommended solution

Enable the Logstash pipeline configuration debugging feature and check the debug logs to determine whether data is flowing into the Logstash service. You must install the logstash-output-file_extend plugin. For more information, see Use the Logstash pipeline configuration debugging feature.

  • If no data flows into Logstash, check whether the source configuration is correct.

  • If data flows into Logstash, check whether the destination configuration is correct.

No data flows into Logstash:

  • The source configuration contains Alibaba Cloud AccessKey information, but the AccessKey has expired.

  • The source does not generate real-time data. For example, Filebeat collects file data in real-time, but no new data is generated in the file.

Resolve the issue as follows:

  • Check the configuration and correct any inaccurate information.

  • If Logstash uses a real-time stream plugin, ensure that the source has data being written in real-time.

Data flows into Logstash:

  • The automatic index creation feature is not enabled for the Alibaba Cloud Elasticsearch instance.

  • Write operations are forbidden on the destination. For example, write operations are forbidden for the Elasticsearch index.

Resolve the issue as follows:

  • Enable the automatic index creation feature for the Alibaba Cloud Elasticsearch instance.

  • Ensure that the destination is writable.

Service is normal but data is missing

Troubleshooting plan

Common error cases

Recommended solution

Troubleshoot based on the pipeline configuration scenario and plugin properties:

  • Check whether the JDBC search statement is correct.

  • Check whether the source for the logstash-input-elasticsearch plugin has data being written in real-time.

JDBC scenario:

  • Data is missing from the results of the search statement.

  • The tracking field is non-incremental data, such as a time field or an ID.

  • The JDBC source and the Elasticsearch cluster are in different time zones.

Resolve the issue as follows:

  • Debug the search statement at the source.

  • Check whether the tracking field type is an officially recommended type. Set the field type to numeric or timestamp.

  • Check for time zone differences and handle them accordingly.

Scenario using the logstash-input-elasticsearch plugin:

  • The source is being updated with data in real-time.

  • The scheduled time interval in the pipeline configuration is too short. A large amount of data is written, which causes data to accumulate at the destination.

Logstash is not suitable for real-time data synchronization. If data is written to the source in real-time, you should increase the scheduled query interval. This reduces the frequency of queries and write operations on the source and destination.

Check the Logstash slow query log for slow write issues. For more information, see Query logs.

The load on the source and destination has not reached a bottleneck, but the number of Logstash pipeline worker threads is set to the official default value.

Increase the Logstash pipeline batch size and the number of worker threads. For more information, see Manage pipelines using configuration files.