If the Security Center client is offline, fails to install or uninstall, or experiences high CPU usage, you can use the automated troubleshooting tool for a quick diagnosis or investigate the issue manually.
Background
When the Security Center console indicates the client is offline or was not installed successfully, the server is no longer protected by Security Center and is vulnerable to compromise. The following table describes common reasons why a client goes offline.
Cause | Description |
Client process issue | The core client processes, |
Network connection issue | The server cannot connect to the Security Center service, which prevents the client from sending heartbeat data. |
DNS resolution failure | The server's DNS service is malfunctioning and cannot resolve the domain names of the Security Center service. |
Firewall or security group restrictions | Firewall ACL rules on the server or Alibaba Cloud security group rules are blocking communication between the client and the Security Center service. |
Insufficient server resources | The server's CPU or memory usage is consistently high (for example, above 95%), which can prevent the client from running. |
Third-party software conflict | Third-party antivirus software on the server is blocking the Security Center client's network access. |
Troubleshooting methods
Method | Use case | Instructions |
Console troubleshooting | The server is connected to Security Center. | Use the client troubleshooting feature on the console to automatically collect and analyze client data. |
Command-line troubleshooting | The server is not connected to Security Center. | Run the |
Manual troubleshooting | The server does not support the Agent Troubleshooting feature or the | Manually check the client process, network connection, and system resources to identify the cause. |
Troubleshoot from the console
Use the Agent Troubleshooting feature on the Security Center console to automatically detect and analyze client issues.
Usage notes
Supported operating systems:
Windows Server 2008 and later
64-bit Linux distributions (CentOS 5 and earlier are not supported)
The server must be connected to Security Center.
Procedure
Log on to the Security Center console.
In the left-side navigation pane, choose . In the upper-left corner of the console, select the region where your assets are located: Chinese Mainland or Outside Chinese Mainland.
On the Host page, on the Server tab, select the server that you want to troubleshoot and click Agent Troubleshooting below the list.
In the Agent Troubleshooting dialog box, select an Issue Type and a Mode, and then click Start Check.
Parameter
Description
Issue Type
Select the type of client issue. If you are unsure, select Overall Check (Unknown Issues).
Mode
Select a troubleshooting mode:
Standard Mode: Collects client-related log data and sends it to Security Center for analysis. The diagnosis takes about 1 minute.
Enhancement Mode: Collects data about the client's network, processes, and logs, and sends it to Security Center for analysis. The diagnosis takes about 5 minutes.
NoteThe diagnostic program collects network, process, and log data from the server and uploads it to Security Center for analysis.
In the Note dialog box, click OK to open the Task Management panel, which displays all client troubleshooting tasks.
NoteYou can also open the Task Management panel by clicking Agent Task Management in the upper-right corner of the Host page.
Find the troubleshooting task that you want to view and click Details in the Actions column to open the Run Logs panel. The Run Logs panel displays the troubleshooting details for each server.
Column
Description
Start Time/End Time
The start and end times of the client troubleshooting task.
Server Information
Information about the diagnosed server.
Status
The status of the client troubleshooting task. Valid values:
Starting: The client troubleshooting command has been sent.
Timeout: The task timed out because it did not receive a result within the specified time.
Success: The troubleshooting result has been generated.
Issue
The issue found by the troubleshooting task.
Result
The recommended solution for the detected issue.
Actions
Allows you to download the diagnostic log for further analysis.
Analyze the results:
If a solution is provided in the Result column, follow the recommendation.
If no solution is provided in the Result column, click Download Diagnostic Logs in the Actions column. Provide the exported diagnostic log and your AliUid to technical support for further analysis.
Troubleshoot from the command line
Run the troubleshooting tool from the command line on your server to automatically diagnose client issues.
Usage notes
Supported operating systems:
Windows Server 2008 and later
64-bit Linux distributions (CentOS 5 and earlier are not supported)
Procedure
Log on to the target server.
NoteOn Windows, you must log on with administrator privileges.
On Linux, you must log on as the root user.
Run the appropriate command on the server.
Alibaba Cloud ECS - Linux
standard mode (diagnosis takes about 1 minute):
If the ECS instance can connect to Security Center, run the following command as the root user:
wget "http://update2.aegis.aliyun.com/download/aegis_client_self_check/linux64/aegis_checker.bin" && chmod +x aegis_checker.bin && ./aegis_checker.binIf the ECS instance cannot connect to Security Center, download aegis_checker, copy it to the target server, and then run the following commands as the root user:
chmod +x aegis_checker.bin ./aegis_checker.bin
enhanced mode (diagnosis takes about 5 minutes): Run the following command as the root user:
wget "http://update2.aegis.aliyun.com/download/aegis_client_self_check/linux64/aegis_checker.bin" && chmod +x aegis_checker.bin && ./aegis_checker.bin -b "ew0KICAgICJ1dWlkIjogIiIsDQogICAgImNtZF9pZHgiOiAiIiwNCiAgICAiaXNzdWUiOiAib3RoZXJfaXNzdWUiLA0KICAgICJtb2RlIjogMywNCiAgICAianNydl9kb21haW4iOiBbXSwNCiAgICAidXBkYXRlX2RvbWFpbiI6IFtdDQp9"
Alibaba Cloud ECS - Windows
standard mode (diagnosis takes about 1 minute): Choose one of the following methods:
Download the aegis_checker program and run it with administrator privileges.
Run the following command in a Command Prompt window with administrator privileges:
powershell -executionpolicy bypass -c "(New-Object Net.WebClient).DownloadFile('http://update2.aegis.aliyun.com/download/aegis_client_self_check/win32/aegis_checker.exe', $ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath('.\aegis_checker.exe'))"; "./aegis_checker.exe"
NoteEnhanced mode is not supported on Windows.
Non-Alibaba Cloud - Linux
standard mode (diagnosis takes about 1 minute): Run the following command as the root user:
wget "http://aegis.alicdn.com/download/aegis_client_self_check/linux64/aegis_checker.bin" && chmod +x aegis_checker.bin && ./aegis_checker.binenhanced mode (diagnosis takes about 5 minutes): Run the following command as the root user:
wget "http://aegis.alicdn.com/download/aegis_client_self_check/linux64/aegis_checker.bin" && chmod +x aegis_checker.bin && ./aegis_checker.bin -b "ew0KICAgICJ1dWlkIjogIiIsDQogICAgImNtZF9pZHgiOiAiIiwNCiAgICAiaXNzdWUiOiAib3RoZXJfaXNzdWUiLA0KICAgICJtb2RlIjogMywNCiAgICAianNydl9kb21haW4iOiBbXSwNCiAgICAidXBkYXRlX2RvbWFpbiI6IFtdDQp9"
Non-Alibaba Cloud - Windows
standard mode (diagnosis takes about 1 minute): Choose one of the following methods:
Download the aegis_checker program and run it with administrator privileges.
Run the following command in a Command Prompt window with administrator privileges:
powershell -executionpolicy bypass -c "(New-Object Net.WebClient).DownloadFile('http://aegis.alicdn.com/download/aegis_client_self_check/win32/aegis_checker.exe', $ExecutionContext.SessionState.Path.GetUnresolvedProviderPathFromPSPath('.\aegis_checker.exe'))"; "./aegis_checker.exe"
NoteEnhanced mode is not supported on Windows.
After the check is complete, export the generated log package. The location of the log package depends on the operating system.
Linux: The log package is in the /root/miniconda2/aegis_checker/output directory.
Windows: The log package is in the ./miniconda2/aegis_checker/output directory relative to the current path.
Analyze the results: In the log file, lines prefixed with [root cause] indicate issues detected by
aegis_checker.For some issues, a "processed" message or a recommended solution is provided. Follow the provided instructions.
If
aegis_checkerdoes not provide a solution, provide a screenshot of the output, the log package, and your AliUid to Alibaba Cloud technical support for further analysis.
Troubleshoot manually
If the client is offline, you can log on to the server and follow these steps to investigate the cause.
Agent processes
Diagnostic steps: Verify that the two core processes, AliYunDun and AliYunDunUpdate, are running.
-
Linux: Run the
ps -ef | grep AliYunDuncommand to check. -
Windows: Open **Task Manager** and go to the **Details** or **Services** tab to find the related processes and services.
Resolution: Manually restart the agent processes.
Linux
Run the following commands to restart the processes.
-
Stop the related processes:
killall AliYunDun killall AliYunDunUpdate -
Start the latest version of the agent.
In the
/usr/local/aegis/aegis_clientdirectory, find theaegis_10_xxfolders and select the one with the highest version number.For example, among
aegis_10_70,aegis_10_73, andaegis_10_75, selectaegis_10_75./usr/local/aegis/aegis_client/aegis_10_xx/AliYunDun
Windows
In the Services panel, restart the two Security Center services: Alibaba Security Aegis Detect Service and Alibaba Security Aegis Update Service. To do so, right-click each service and select Restart.

Network connection
Diagnostic steps: Verify that your firewall or security group allows outbound traffic to the Security Center service IP addresses or domain names, such as jsrv.aegis.aliyun.com or update.aegis.aliyun.com. The agent can also go offline if the server cannot connect to the Security Center service.
For more information about the Security Center service IP addresses and domain names, see Appendix: Agent communication endpoints (domain names and IP addresses).
Resolution:
-
Verify that the DNS service on the server is running correctly.
If the DNS service is not running, restart the server or troubleshoot the DNS service.
-
Check if network access policies are configured on the server.
-
Firewall ACL rules
Add the Security Center service IP addresses or domain names to your firewall's allowlist to permit network access. You only need to configure rules for outbound traffic.
NoteIf you use Alibaba Cloud Firewall, see Create an outbound access control policy for traffic from an internal network to the Internet for instructions.
Example firewall configuration (iptables):
# Allow access to the control service iptables -A OUTPUT -p tcp -d jsrv.aegis.aliyun.com --dport 443 -j ACCEPT iptables -A OUTPUT -p tcp -d jsrv.aegis.aliyun.com --dport 80 -j ACCEPT # Allow access to the update service iptables -A OUTPUT -p tcp -d update.aegis.aliyun.com --dport 443 -j ACCEPT iptables -A OUTPUT -p tcp -d update.aegis.aliyun.com --dport 80 -j ACCEPT -
Alibaba Cloud security group rules
If you use an ECS instance, see Manage security groups for specific steps.
NoteAllow outbound traffic to the Security Center CIDR blocks. You can either leave the port unrestricted or allow traffic on ports 80 and 443.
The following is an example configuration for the 100.100.0.0/16 CIDR block:
-
Direction: Outbound
-
Authorization policy: Allow
-
Protocol type: TCP
-
Port range: 80/443
-
Authorization object: 100.100.0.0/16
-
-
System resources
Diagnostic steps:
Verify that the server has sufficient resources. The agent may stop running if server resources are exhausted.
-
CPU/Memory: Use
top(Linux) or **Task Manager** (Windows) to check the usage. -
Disk space: Use
df -h(Linux) or **This PC** (Windows) to check the remaining disk space.
Resolution:
-
High resource usage
-
If the
AliYunDunprocess is the cause, contact technical support and provide the relevant logs. -
If other business processes are the cause, optimize your applications or consider upgrading the server configuration.
-
-
Insufficient disk space: Delete unnecessary files to free up disk space.
Duplicate agent IDs
Diagnostic steps: This issue often occurs when you create multiple servers from the same system image. Check if the uuid field in the following configuration files is duplicated across multiple servers.
-
Linux:
/usr/local/aegis/aegis_client.conf -
Windows:
-
32-bit: C:\Program Files\Alibaba\aegis\aegis_client.conf
-
64-bit: C:\Program Files (x86)\Alibaba\aegis\aegis_client.conf
-
Resolution:
Before creating multiple images from a single template server, uninstall and clean up the old agent, and then obtain a new installation command.
Software conflicts
Diagnostic steps: Check if other Host-based Intrusion Detection System (HIDS), Endpoint Detection and Response (EDR), or antivirus software is installed on the server. Such software can conflict with the Security Center agent.
Resolution:
Disable or uninstall the third-party security software. After the Security Center agent is installed, you can restart or reinstall the original software as needed.
Agent logs
Diagnostic steps: Review the agent logs for specific error messages. The log files are located in the following directories:
-
Linux: /usr/local/aegis/aegis_client/aegis_12_xx/data/.
NoteThe
aegis_xx_xxplaceholder represents the version directory for the running agent. To find the exact path, check the output of theps -ef|grep AliYunDuncommand. -
Windows: C:\Program Files (x86)\Alibaba\Aegis\aegis_client\aegis_12_xx\data\.
Resolution:
Troubleshoot the issue based on the error messages in the logs. If you cannot resolve the issue, contact technical support and provide the complete log files.