Troubleshooting Logtail machine group issues

更新时间:
复制 MD 格式

This topic describes how to troubleshoot machine group heartbeat issues on a server.

Machine group installation examples

Installation method

Scenarios

Same account and region

The server is an ECS instance, and the instance and project are in the same region under the same Alibaba Cloud account.

Same account, different regions

The server is an ECS instance under the same Alibaba Cloud account as the project, but they are in different regions.

Different accounts, same region

The server is an ECS instance, and the instance and project are in the same region but belong to different Alibaba Cloud accounts.

Other cloud or self-managed servers

  • The server is not an ECS instance, such as a self-managed server or a server from another cloud provider.

  • If the server is an ECS instance, but the instance and project belong to different Alibaba Cloud accounts and are in different regions, treat it as a self-managed server.

Troubleshooting checklist

  1. Step 1: Verify that Logtail is running: Check if the Logtail process is active on the server.

  2. Step 2: Verify that the machine group IP address matches the IP address that Logtail obtains: A mismatched IP address between the machine group and the Logtail app_info.json file can cause heartbeat failures.

  3. Step 3: Check whether the Logtail startup parameters are correct: Check whether the Project region configured in the ilogtail_config.json file is correct.

  4. Step 4: Check network connectivity: Ensure the server can connect to the project endpoints.

  5. Step 5: Verify the system time on the Logtail server: If the system time is significantly different from the actual time, correct it.

  6. Step 6: Verify the user identifier for cross-account collection: This step is required if the server is not an ECS instance or if the instance and project belong to different Alibaba Cloud accounts.

  7. Step 7: Verify the custom identifier for a custom identifier-based machine group: If you use a custom identifier-based machine group, ensure that the custom identifier is configured on the server.

  8. Step 8: Restart Logtail: After making changes, restart Logtail to apply them.

Next steps

View Logtail collection errors: If the heartbeat status is OK but logs are still not collected, check Logtail error messages to continue troubleshooting.

Step 1: Verify Logtail status

Linux

  1. Log on to the server where Logtail is installed.

  2. Run the following command.

    ps -ef | grep ilogtail
    • If the output includes two entries similar to the following, Logtail is running correctly. These entries represent the Logtail daemon and worker processes.

      UID          PID    PPID  C STIME TTY          TIME CMD
      ...
      root          12       1  0 Nov10 ?        00:00:00 /usr/local/ilogtail/ilogtail
      root          14      12  0 Nov10 ?        03:07:43 /usr/local/ilogtail/ilogtail
      ...
      Important

      If the output shows three or more Logtail processes, multiple Logtail instances are running on the server. This can cause duplicate log collection. Check if this is the intended behavior.

    • If the output does not show any running Logtail processes:

Windows

  1. Log on to the server where Logtail is installed.

  2. Open the Run window and enter services.msc.

  3. Check the status of the LogtailDaemon service (for Logtail 1.0.0.0 and later) or the LogtailWorker service (for Logtail 0.x.x.x versions).

    If the service is not running:

    Important

    When installing Logtail, select a supported operating system and choose the correct installation parameters for your Simple Log Service project's region and network type. For more information about network types, see Logtail network types, startup parameters, and configuration files.

Step 2: Verify the IP address match

Note

Logtail obtains the IP address of a Linux server in the following ways:

  • If you have not configured hostname binding, Logtail uses the IP address of the first network interface controller (NIC) on the server.

  • If you want to customize the IP address, you can set working_ip in the ilogtail_config.json file in Step 3. After you set this parameter, the ip field in the app_info.json file will automatically be updated to the value of the working_ip field. For more information about working_ip, see Set startup parameters.

  • If you set a hostname binding in the /etc/hosts file, Logtail obtains the IP address corresponding to the bound hostname.

  1. Obtain the value of the ip field from the app_info.json file.

    The following table lists the default paths to this file on different operating systems.

    Operating system

    Logtail

    app_info.json path

    Linux

    Logtail (64-bit)

    /usr/local/ilogtail/app_info.json

    Windows (64-bit)

    Logtail (64-bit)

    C:\Program Files\Alibaba\Logtail\app_info.json

    Logtail (32-bit)

    C:\Program Files (x86)\Alibaba\Logtail\app_info.json

    Windows (32-bit)

    Logtail (32-bit)

    C:\Program Files\Alibaba\Logtail\app_info.json

    Logtail records the collected IP address in the ip field of the app_info.json file.

    {
      "UUID" : "",
      "hostname" : "iZ8vbdlzf******azuhZ",
      "instance_id" : "E9633380-***********-00163E1AA597_172.16.2.200_166****11",
      "ip" : "172.**.**.200",
      "logtail_version" : "1.3.1",
      "os" : "Linux; 4.19.91-26.1.al7.x86_64; #1 SMP Tue Jul 26 17:52:28 CST 2022; x86_64",
      "update_time" : "2022-12-27 05:38:33"
    }
  2. Verify that the machine group uses the IP address that Logtail obtains.

    Simple Log Service provides two types of machine groups: IP address-based machine groups and custom identifier-based machine groups. For more information, see Machine groups.

    • IP address-based machine group: Check if the IP Address list contains the IP address that you obtained in the previous step.

      If the IP address is not included, first confirm the server's correct IP address. If the address in the IP Address list is incorrect, update it in the machine group settings. If the IP address obtained by Logtail in the previous step is wrong, modify the working_ip parameter as described in Set Logtail startup parameters and restart Logtail. After making the correction, monitor the machine's heartbeat. If the status becomes normal, the issue is resolved.

    • Custom identifier-based machine group: In the Machine Group Status list, check for the IP address you obtained in the previous step. If the Heartbeat status is OK, the issue is resolved.image..png

Step 3: Verify startup parameters

The ilogtail_config.json file contains the startup parameters for Logtail.

  1. Log on to the server where Logtail is installed.

  2. Find the ilogtail_config.json file.

    The following table lists the default paths to this file on different operating systems.

    Operating system

    Logtail

    ilogtail_config.json path

    Linux

    Logtail (64-bit)

    /usr/local/ilogtail/ilogtail_config.json

    Windows (64-bit)

    Logtail (64-bit)

    C:\Program Files\Alibaba\Logtail\ilogtail_config.json

    Logtail (32-bit)

    C:\Program Files (x86)\Alibaba\Logtail\ilogtail_config.json

    Windows (32-bit)

    Logtail (32-bit)

    C:\Program Files\Alibaba\Logtail\ilogtail_config.json

    1. Open the ilogtail_config.json file and check if the configuration parameters are correct.

      {
        "config_server_address" : "http://logtail.<config_region>.log.aliyuncs.com",
        "data_server_list" :
        [
          {
            "cluster" : "<project_region>",
            "endpoint" : "<endpoint>"
          }
        ],
        ...
      }
      • If the startup parameters in the ilogtail_config.json file match the following descriptions, they are correct.

      • If the startup parameters are incorrect, modify the ilogtail_config.json file based on the following table and then restart Logtail. For more information, see Restart Logtail.

        For information about project regions, see Supported regions.

        Scenario

        Network type

        <config_region>

        <endpoint>

        The server is an ECS instance, and it is in the same region as the project.

        Alibaba Cloud internal network

        <project_region>-intranet

        <project_region>-intranet.log.aliyuncs.com

        Other scenarios

        Internet

        <project_region>

        <project_region>.log.aliyuncs.com

        transfer acceleration

        log-global.aliyuncs.com

Step 4: Check network connectivity

For successful data uploads, the Logtail server must be able to connect to the following addresses.

Important

If you use an internal network, you need to add -intranet after <endpoint>.

  1. The address specified by the config_server_address field in the ilogtail_config.json file and its HTTPS version.

  2. http://<project-name>.<endpoint>.

    • You can find the project name and region as shown in the following figure.

      image

    • <endpoint> is the address specified by the data_server_list.endpoint parameter in the ilogtail_config.json file.

  3. http://ali-<project-region>-sls-admin.<endpoint>, where <endpoint> is the address specified by the data_server_list.endpoint field in the ilogtail_config.json file.

Follow these steps to check and resolve network issues:

Linux

  1. Log on to the server where Logtail is installed.

  2. Run the curl command to connect to the preceding addresses in sequence.

    curl http://<project_name>.cn-hangzhou-intranet.log.aliyuncs.com

    If all responses are similar to the following example, the network connection is working correctly.

    {"Error":{"Code":"OLSInvalidMethod","Message":"The script name is invalid : /","RequestId":"5D****09"}}

    If the network connection fails, check for issues such as blocked ports (80 and 443), incorrect DNS settings, or misconfigured security groups.

Windows

  1. Log on to the server where Logtail is installed.

  2. Run the telnet command to try to connect to the preceding addresses one by one.

    telnet <project_name>.cn-hangzhou-intranet.log.aliyuncs.com 80 # If you use HTTPS, the port is 443.

    If all responses are similar to the following example, the network connection is working correctly.

    Trying 100*0*7*5...
    Connected to xxx.
    Escape character is '^]'.

    If the network connection fails, check for issues such as blocked ports (80 and 443), incorrect DNS settings, or misconfigured security groups.

Step 5: Verify the system time

Linux

  1. Log on to the server where Logtail is installed.

  2. Run the date command to view the system time.

    Wed Dec 28 06:59:26 UTC 2022

    If the system time deviates significantly from the actual time, take one of the following actions.

    • Adjust the system time to the correct time.

    • If you cannot modify the system time, modify the Logtail startup parameters by adding the configuration item "enable_log_time_auto_adjust": true to the ilogtail_config.json file. After you modify the file, restart Logtail. For more information, see Restart Logtail. For the path of the ilogtail_config.json file, see Step 3: Check whether the Logtail startup parameters are correct.

Windows

  1. Log on to the server where Logtail is installed.

  2. Check the time in the taskbar at the bottom-right of the desktop.

    • Adjust the system time to the correct time.

    • If you cannot change the system time, modify the Logtail startup parameters. Add the "enable_log_time_auto_adjust": true configuration to the ilogtail_config.json file. After the modification, you must restart Logtail. For more information, see Restart Logtail. For the path to the ilogtail_config.json file, see Step 3: Verify startup parameters.

Step 6: Verify the user identifier

Important
  • This step is required if the server is not an ECS instance or if the ECS instance and project belong to different Alibaba Cloud accounts.

  • The user identifier must be the Alibaba Cloud account ID (root account ID). For more information, see Configure a user identifier.

Check for the user identifier file in its specified directory to verify its configuration. This file grants the project's account permission to access the server.

Note

The paths to the user identifier file are:

  • Linux: /etc/ilogtail/users/

  • Windows: C:\LogtailData\users\

  • If a user identifier file does not exist in the specified path or is misconfigured, follow these steps to resolve the issue:

    • For Linux systems: Run the cd /etc/ilogtail/users/ && touch <uid> command to create a user identifier file. In this command, <uid> is the Alibaba Cloud account ID to which the project belongs.

    • On Windows, go to the C:\LogtailData\users\ directory and create an empty file named <uid>, where <uid> is the Alibaba Cloud account ID of the project.

  • If a file named after the project's Alibaba Cloud account ID exists in the specified path, the user identifier is correctly configured.

Step 7: Verify the custom identifier

If you use a custom identifier-based machine group, you can check the user_defined_id file in the specified directory to determine if a custom identifier is configured on the server.

  • If the result is empty, you need to check whether the user_defined_id file exists or whether a user-defined identifier is configured in the file.

    Note

    The paths to the user_defined_id file are:

    • Linux: /etc/ilogtail/user_defined_id

    • Windows: C:\LogtailData\user_defined_id

    • If the user_defined_id file does not exist, create it and enter the custom identifier for the machine group. For more information, see Configure a custom identifier.

    • If the user_defined_id file exists but does not contain the correct custom identifier, add the custom identifier for the machine group to the file. For more information, see Configure a custom identifier.

    • If the user_defined_id file already contains the custom identifier that you set for the machine group, the configuration is correct.

Step 8: Restart Logtail

After completing these changes, you must restart Logtail.

Linux

  1. Log on to the server where Logtail is installed.

  2. Run the following command.

    sudo /etc/init.d/ilogtaild restart

Windows

  1. Log on to the server where Logtail is installed.

  2. Open the Run window and enter services.msc.

  3. Restart the LogtailDaemon service (for Logtail 1.0.0.0 and later) or the LogtailWorker service (for Logtail 0.x.x.x versions).