The instance health diagnosis feature helps you diagnose instance startup failures. You can use health diagnosis to identify the specific cause of the startup failure, and then start and log on to the instance with an attached repair disk to fix configurations in the original operating system that prevent the instance from starting. This topic describes how to use a repair disk and provides solutions for common ECS instance startup failures.
Scenarios
After you start or restart an ECS instance, its lifecycle state remains Starting or Running, and its health status remains Initializing. In this case, the operating system has failed to start, and you cannot connect to the ECS instance using SSH or RDP. You can only log on to the instance using a VNC connection to view the startup progress and check for error logs.
This issue can be caused by incorrect operating system configurations that prevent the instance from starting as expected. You can use the instance health diagnosis feature to diagnose the issue and then fix it based on the diagnostic results.
Prerequisites
The ECS instance must be in the Stopped state. For more information, see Stop an instance.
Procedure
Step 1: Diagnose connection or startup failures
When you fix an operating system startup issue, you may need to modify system configurations on the original system disk of the instance. To prevent risks, we recommend that you create a snapshot of the system disk before you run the diagnosis.
In the upper-left corner of the page, select a region and resource group.
-
On the Instance Troubleshooting tab, click Instance Connection Errors or Startup Exceptions.
-
Select the Operating System Startup Failure on the Instance in the Stopped State problem, select the Instance ID of the instance to diagnose and a time range, and then click Start Troubleshooting.
NoteThe ECS instance to be diagnosed must be in the Stopped state. For more information, see Stop an instance.
After the diagnosis is complete, the instance's state changes to Running and Fixing.
Step 2: View the diagnostic results
Unlike in other health diagnosis scenarios, after an instance startup failure is diagnosed, a repair disk is attached to the diagnosed instance regardless of whether the result is Critical, Warning, or Passed. You can view information about the repair disk in the diagnostic report and fix the issues identified in the report. For information about how to view a diagnostic report, see Diagnostic items and results.
Repair disk information
The following applies when an instance starts from the repair disk operating system:
|
Parameter |
Description |
|
Operating system |
|
|
Logon account |
|
|
Logon password |
View the password in the diagnostic report. Important
Even if you change the password in the repair disk operating system, the password reverts to the system-provided password after you restart the repair disk operating system. You cannot use any credentials from the original operating system to log on to the instance. |
|
Cloud disk read/write |
|
|
Create snapshot |
|
Diagnostic results
The diagnostic report consists of two main sections: repair disk information, and diagnostic results and suggestions.
Repair disk information
-
Logon information: Displays the username (root for Linux and Administrator for Windows) and password that ECS provides for logging on to the instance's repair system. Use these credentials to log on to the instance.
-
VNC connection entry: Provides a shortcut to establish a VNC connection to the ECS instance. When a repair disk is attached to an ECS instance, you can log on to the instance to fix issues only by using this entry. You cannot directly log on to the instance by using other methods, such as Workbench or third-party remote connection tools.
-
Detach repair disk entry: Provides an entry to detach the repair disk. Based on the health diagnosis results and repair progress, you can use this entry to detach the repair disk and restore the instance to its original operating system. Details are described as follows:
-
If the health diagnosis result is Passed, no issues are found. You can follow the steps in Step 4: Restore the instance and then use a VNC connection to further troubleshoot the ECS instance. If the issue persists, submit a ticket for technical support.
-
If the health diagnosis result is Critical or Warning, your ECS instance has an issue. You must use the VNC connection entry in the diagnostic report to connect to the ECS instance and then use the logon information for the repair disk to log on to the instance. After you fix the issue in the instance, detach the repair disk.
-
Exception details and repair suggestions
This section describes the specific misconfigurations of the instance operating system and provides repair suggestions. You can click the recommended repair documents to view specific issue descriptions and suggestions for how to fix the issues. For more information, see Step 3: Fix OS configurations.
Step 3: Fix OS configurations
-
View the mount information of the original system disk of the problematic instance.
-
Linux systems
In the temporary repair disk environment, the file system of the instance's original system disk is mounted to a temporary directory. You can use one of the following methods to find the temporary directory:
-
On the details page of the system disk, view the directory on the Mounted on Instance tab. An example of the temporary directory format is
/tmp/ecs-offline-diagnose_disk-uf67g4wwius3metl****, whereuf67g4wwius3metl****is the serial number of the original system disk. -
In the temporary repair disk environment, run the
mountcommand to view the temporary directory information. For example, if the device path of the original system disk is/dev/vda, run the following command:mount | grep /dev/vdaThe following output is returned:
/dev/vda1 on /tmp/ecs-offline-diagnose_disk-uf67g4wwius3metl**** type ext4 (rw,relatime) -
Windows systems: The attached repair disk is drive X. The drive letters of the original system disk and data disks remain unchanged.
-
-
-
Fix the incorrect OS configurations.
The following topics provide common solutions to instance startup failures. You can check the details of the diagnostic items in the diagnostic report to identify the cause of the startup failure and select the appropriate solution.
-
Linux
-
Windows
-
Step 4: Restore the instance
After you fix the startup issue, detach the repair disk to restore the instance to its original operating system.
After a repair disk is detached, it cannot be automatically attached again. To reattach a repair disk automatically, you must start another diagnosis for an operating system startup failure.
Method 1: Restore from the diagnostic report
In the upper-left corner of the page, select a region and resource group.
-
On the Instance Troubleshooting tab, click View History.
-
On the Instance Health Diagnosis tab, find the desired diagnostic report and click View Report in the Actions column.
-
Click Detach Repair Disk.
-
In the Are you sure you want to detach the repair disk? dialog box, read the notes and click Detach Immediately.
ImportantTo detach the repair disk, you must first stop the instance. If the instance is not stopped, follow the on-screen instructions to stop it and then detach the repair disk.
-
Start the instance and then remotely connect to it to make sure the connection is successful.
The ECS instance is in the Stopped state after the repair disk is detached. You must start the instance before you can connect to it remotely.
Method 2: Restore from the Instances page
Go to ECS console - Instances.
In the upper-left corner of the page, select a region and resource group.
-
Find the target instance and stop it.
For more information, see Stop an instance.
-
Hover over the Fixing state and click Detach Repair Disk.
Alternatively, you can click the ID of the target instance, and then click Detach Repair Disk on the Instance Details page.
A message explains that the instance will be stopped before the repair disk is detached. After detachment, the instance's original system disk becomes active, and the repair disk credentials become invalid. After you confirm this information, click Detach Repair Disk.
-
In the Are you sure you want to detach the repair disk? dialog box, read the notes and click Detach Immediately.
Note: Detaching a repair disk first stops the instance. Then, the original system disk is reactivated, and the repair disk and its credentials become invalid.
-
Start the instance and then remotely connect to it to make sure the connection is successful.
The ECS instance is in the Stopped state after the repair disk is detached. You must start the instance before you can connect to it remotely.