Troubleshoot client issues

更新时间:
复制 MD 格式

If the Security Center client is offline, fails to install or uninstall, or experiences high CPU usage, you can use the automated troubleshooting tool for a quick diagnosis or investigate the issue manually.

Background

When the Security Center console indicates the client is offline or was not installed successfully, the server is no longer protected by Security Center and is vulnerable to compromise. The following table describes common reasons why a client goes offline.

Cause

Description

Client process issue

The core client processes, AliYunDun or AliYunDunUpdate, are not running correctly. This can result from a process crash, manual termination, or a system error.

Network connection issue

The server cannot connect to the Security Center service, which prevents the client from sending heartbeat data.

DNS resolution failure

The server's DNS service is malfunctioning and cannot resolve the domain names of the Security Center service.

Firewall or security group restrictions

Firewall ACL rules on the server or Alibaba Cloud security group rules are blocking communication between the client and the Security Center service.

Insufficient server resources

The server's CPU or memory usage is consistently high (for example, above 95%), which can prevent the client from running.

Third-party software conflict

Third-party antivirus software on the server is blocking the Security Center client's network access.

Troubleshooting methods

Method

Use case

Instructions

Console troubleshooting

The server is connected to Security Center.

Use the client troubleshooting feature on the console to automatically collect and analyze client data.

Command-line troubleshooting

The server is not connected to Security Center.

Run the aegis_checker troubleshooting tool on the server to automatically diagnose client issues.

Manual troubleshooting

The server does not support the Agent Troubleshooting feature or the aegis_checker tool.

Manually check the client process, network connection, and system resources to identify the cause.

Troubleshoot from the console

Use the Agent Troubleshooting feature on the Security Center console to automatically detect and analyze client issues.

Usage notes

  • Supported operating systems:

    • Windows Server 2008 and later

    • 64-bit Linux distributions (CentOS 5 and earlier are not supported)

  • The server must be connected to Security Center.

Procedure

  1. Log on to the Security Center console.

  2. In the left-side navigation pane, choose Assets > Host. In the upper-left corner of the console, select the region where your assets are located: Chinese Mainland or Outside Chinese Mainland.

  3. On the Host page, on the Server tab, select the server that you want to troubleshoot and click Agent Troubleshooting below the list.

  4. In the Agent Troubleshooting dialog box, select an Issue Type and a Mode, and then click Start Check.

    Parameter

    Description

    Issue Type

    Select the type of client issue. If you are unsure, select Overall Check (Unknown Issues).

    Mode

    Select a troubleshooting mode:

    • Standard Mode: Collects client-related log data and sends it to Security Center for analysis. The diagnosis takes about 1 minute.

    • Enhancement Mode: Collects data about the client's network, processes, and logs, and sends it to Security Center for analysis. The diagnosis takes about 5 minutes.

    Note

    The diagnostic program collects network, process, and log data from the server and uploads it to Security Center for analysis.

  5. In the Note dialog box, click OK to open the Task Management panel, which displays all client troubleshooting tasks.

    Note

    You can also open the Task Management panel by clicking Agent Task Management in the upper-right corner of the Host page.

  6. Find the troubleshooting task that you want to view and click Details in the Actions column to open the Run Logs panel. The Run Logs panel displays the troubleshooting details for each server.

    Column

    Description

    Start Time/End Time

    The start and end times of the client troubleshooting task.

    Server Information

    Information about the diagnosed server.

    Status

    The status of the client troubleshooting task. Valid values:

    • Starting: The client troubleshooting command has been sent.

    • Timeout: The task timed out because it did not receive a result within the specified time.

    • Success: The troubleshooting result has been generated.

    Issue

    The issue found by the troubleshooting task.

    Result

    The recommended solution for the detected issue.

    Actions

    Allows you to download the diagnostic log for further analysis.

  7. Analyze the results:

    • If a solution is provided in the Result column, follow the recommendation.

    • If no solution is provided in the Result column, click Download Diagnostic Logs in the Actions column. Provide the exported diagnostic log and your AliUid to technical support for further analysis.

Troubleshoot from the command line

Run the troubleshooting tool from the command line on your server to automatically diagnose client issues.

Usage notes

  • Supported operating systems:

    • Windows Server 2008 and later

    • 64-bit Linux distributions (CentOS 5 and earlier are not supported)

Procedure

  1. Log on to the target server.

    Note
    • On Windows, you must log on with administrator privileges.

    • On Linux, you must log on as the root user.

  2. Run the appropriate command on the server.

    Alibaba Cloud ECS - Linux

    • standard mode (diagnosis takes about 1 minute):

      • If the ECS instance can connect to Security Center, run the following command as the root user:

        wget "http://update2.aegis.aliyun.com/download/aegis_client_self_check/linux64/aegis_checker.bin" && chmod +x aegis_checker.bin && ./aegis_checker.bin
      • If the ECS instance cannot connect to Security Center, download aegis_checker, copy it to the target server, and then run the following commands as the root user:

        chmod +x aegis_checker.bin
         ./aegis_checker.bin
    • enhanced mode (diagnosis takes about 5 minutes): Run the following command as the root user:

      wget "http://update2.aegis.aliyun.com/download/aegis_client_self_check/linux64/aegis_checker.bin" && chmod +x aegis_checker.bin && ./aegis_checker.bin -b "ew0KICAgICJ1dWlkIjogIiIsDQogICAgImNtZF9pZHgiOiAiIiwNCiAgICAiaXNzdWUiOiAib3RoZXJfaXNzdWUiLA0KICAgICJtb2RlIjogMywNCiAgICAianNydl9kb21haW4iOiBbXSwNCiAgICAidXBkYXRlX2RvbWFpbiI6IFtdDQp9"

    Alibaba Cloud ECS - Windows

    standard mode (diagnosis takes about 1 minute): Choose one of the following methods:

    • Download the aegis_checker program and run it with administrator privileges.

    • Run the following command in a Command Prompt window with administrator privileges:

      powershell -executionpolicy bypass -c "(New-Object Net.WebClient).DownloadFile('http://update2.aegis.aliyun.com/download/aegis_client_self_check/win32/aegis_checker.exe', $ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath('.\aegis_checker.exe'))"; "./aegis_checker.exe"
    Note

    Enhanced mode is not supported on Windows.

    Non-Alibaba Cloud - Linux

    • standard mode (diagnosis takes about 1 minute): Run the following command as the root user:

      wget "http://aegis.alicdn.com/download/aegis_client_self_check/linux64/aegis_checker.bin" && chmod +x aegis_checker.bin && ./aegis_checker.bin
    • enhanced mode (diagnosis takes about 5 minutes): Run the following command as the root user:

      wget "http://aegis.alicdn.com/download/aegis_client_self_check/linux64/aegis_checker.bin" && chmod +x aegis_checker.bin && ./aegis_checker.bin -b "ew0KICAgICJ1dWlkIjogIiIsDQogICAgImNtZF9pZHgiOiAiIiwNCiAgICAiaXNzdWUiOiAib3RoZXJfaXNzdWUiLA0KICAgICJtb2RlIjogMywNCiAgICAianNydl9kb21haW4iOiBbXSwNCiAgICAidXBkYXRlX2RvbWFpbiI6IFtdDQp9"

    Non-Alibaba Cloud - Windows

    standard mode (diagnosis takes about 1 minute): Choose one of the following methods:

    • Download the aegis_checker program and run it with administrator privileges.

    • Run the following command in a Command Prompt window with administrator privileges:

      powershell -executionpolicy bypass -c "(New-Object Net.WebClient).DownloadFile('http://aegis.alicdn.com/download/aegis_client_self_check/win32/aegis_checker.exe', $ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath('.\aegis_checker.exe'))"; "./aegis_checker.exe"
    Note

    Enhanced mode is not supported on Windows.

  3. After the check is complete, export the generated log package. The location of the log package depends on the operating system.

    • Linux: The log package is in the /root/miniconda2/aegis_checker/output directory.

    • Windows: The log package is in the ./miniconda2/aegis_checker/output directory relative to the current path.

  4. Analyze the results: In the log file, lines prefixed with [root cause] indicate issues detected by aegis_checker.

    1. For some issues, a "processed" message or a recommended solution is provided. Follow the provided instructions.

    2. If aegis_checker does not provide a solution, provide a screenshot of the output, the log package, and your AliUid to Alibaba Cloud technical support for further analysis.

Troubleshoot manually

If the client is offline, you can log on to the server and follow these steps to investigate the cause.

Agent processes

Diagnostic steps: Verify that the two core processes, AliYunDun and AliYunDunUpdate, are running.

  • Linux: Run the ps -ef | grep AliYunDun command to check.

  • Windows: Open **Task Manager** and go to the **Details** or **Services** tab to find the related processes and services.

Resolution: Manually restart the agent processes.

Linux

Run the following commands to restart the processes.

  1. Stop the related processes:

    killall AliYunDun
    killall AliYunDunUpdate
  2. Start the latest version of the agent.

    In the /usr/local/aegis/aegis_client directory, find the aegis_10_xx folders and select the one with the highest version number.

    For example, among aegis_10_70, aegis_10_73, and aegis_10_75, select aegis_10_75.

    /usr/local/aegis/aegis_client/aegis_10_xx/AliYunDun

Windows

In the Services panel, restart the two Security Center services: Alibaba Security Aegis Detect Service and Alibaba Security Aegis Update Service. To do so, right-click each service and select Restart.

重启

Network connection

Diagnostic steps: Verify that your firewall or security group allows outbound traffic to the Security Center service IP addresses or domain names, such as jsrv.aegis.aliyun.com or update.aegis.aliyun.com. The agent can also go offline if the server cannot connect to the Security Center service.

Note

For more information about the Security Center service IP addresses and domain names, see Appendix: Agent communication endpoints (domain names and IP addresses).

Resolution:

  1. Verify that the DNS service on the server is running correctly.

    If the DNS service is not running, restart the server or troubleshoot the DNS service.

  2. Check if network access policies are configured on the server.

    1. Firewall ACL rules

      Add the Security Center service IP addresses or domain names to your firewall's allowlist to permit network access. You only need to configure rules for outbound traffic.

      Note

      If you use Alibaba Cloud Firewall, see Create an outbound access control policy for traffic from an internal network to the Internet for instructions.

      Example firewall configuration (iptables):

      # Allow access to the control service
      iptables -A OUTPUT -p tcp -d jsrv.aegis.aliyun.com --dport 443 -j ACCEPT
      iptables -A OUTPUT -p tcp -d jsrv.aegis.aliyun.com --dport 80 -j ACCEPT
      
      # Allow access to the update service
      iptables -A OUTPUT -p tcp -d update.aegis.aliyun.com --dport 443 -j ACCEPT
      iptables -A OUTPUT -p tcp -d update.aegis.aliyun.com --dport 80 -j ACCEPT
      
    2. Alibaba Cloud security group rules

      If you use an ECS instance, see Manage security groups for specific steps.

      Note

      Allow outbound traffic to the Security Center CIDR blocks. You can either leave the port unrestricted or allow traffic on ports 80 and 443.

      The following is an example configuration for the 100.100.0.0/16 CIDR block:

      • Direction: Outbound

      • Authorization policy: Allow

      • Protocol type: TCP

      • Port range: 80/443

      • Authorization object: 100.100.0.0/16

System resources

Diagnostic steps:

Verify that the server has sufficient resources. The agent may stop running if server resources are exhausted.

  • CPU/Memory: Use top (Linux) or **Task Manager** (Windows) to check the usage.

  • Disk space: Use df -h (Linux) or **This PC** (Windows) to check the remaining disk space.

Resolution:

  • High resource usage

    • If the AliYunDun process is the cause, contact technical support and provide the relevant logs.

    • If other business processes are the cause, optimize your applications or consider upgrading the server configuration.

  • Insufficient disk space: Delete unnecessary files to free up disk space.

Duplicate agent IDs

Diagnostic steps: This issue often occurs when you create multiple servers from the same system image. Check if the uuid field in the following configuration files is duplicated across multiple servers.

  • Linux: /usr/local/aegis/aegis_client.conf

  • Windows:

    • 32-bit: C:\Program Files\Alibaba\aegis\aegis_client.conf

    • 64-bit: C:\Program Files (x86)\Alibaba\aegis\aegis_client.conf

Resolution:

Before creating multiple images from a single template server, uninstall and clean up the old agent, and then obtain a new installation command.

Software conflicts

Diagnostic steps: Check if other Host-based Intrusion Detection System (HIDS), Endpoint Detection and Response (EDR), or antivirus software is installed on the server. Such software can conflict with the Security Center agent.

Resolution:

Disable or uninstall the third-party security software. After the Security Center agent is installed, you can restart or reinstall the original software as needed.

Agent logs

Diagnostic steps: Review the agent logs for specific error messages. The log files are located in the following directories:

  • Linux: /usr/local/aegis/aegis_client/aegis_12_xx/data/.

    Note

    The aegis_xx_xx placeholder represents the version directory for the running agent. To find the exact path, check the output of the ps -ef|grep AliYunDun command.

  • Windows: C:\Program Files (x86)\Alibaba\Aegis\aegis_client\aegis_12_xx\data\.

Resolution:

Troubleshoot the issue based on the error messages in the logs. If you cannot resolve the issue, contact technical support and provide the complete log files.