Troubleshoot common instance startup errors

更新时间:
复制 MD 格式

This topic explains how to resolve startup failures caused by operating system issues, such as an invalid configuration or an unexpected shutdown. To identify the startup error, log in to the instance via VNC or check the error fields in the instance health diagnosis tool.

Windows

1662001135: Windows enters Windows Recovery Environment

Symptoms

A Windows ECS instance fails to start after a restart. When you connect to the instance using VNC, the startup screen displays the System Recovery Options.

Causes

The System Recovery Options screen appears when the Windows operating system enters the Windows Recovery Environment (WinRE), which occurs when an error prevents the operating system from starting. Possible causes include the following:

  • Registry corruption, cloud disk issues, driver issues, corrupted or missing system files, or a corrupted Boot Configuration Data (BCD) file.

  • User error, viruses, third-party antivirus software, or an unexpected forced restart.

Solution

To resolve this issue, follow the solution provided in the official Microsoft documentation. For more information, see How Windows RE Works.

To prevent this issue from recurring, follow these best practices:

  • Store important data on data disks.

  • Regularly create snapshots of the system disk and data disks to restore your data if an issue occurs.

  • Back up the registry before you modify it to prevent issues caused by incorrect changes.

  • Regularly run Windows Update to install the latest Microsoft security updates.

  • Enable Security Center or other commercial antivirus tools on your ECS instance to regularly scan for viruses and keep the tools updated.

1662001136: File system error on a Windows instance

Symptoms

When you log on to an instance by using VNC, the Windows startup screen displays one of the following error messages: Checking file system on, CHKDSK is verifying files, or CHKDSK is verifying indexes.

Causes

Possible causes include:

  • The ECS instance shut down unexpectedly.

  • The Windows operating system is corrupt.

Solutions

Solution 1: Restore the system disk from a snapshot

If a snapshot is available, you can use it to restore the system disk. To do this, follow these steps:

Warning

A cloud disk rollback is irreversible. You will lose any data added to the cloud disk after the snapshot was created. To prevent accidental data loss, we recommend creating a snapshot of the cloud disk before you roll back. For more information, see Create a snapshot for a disk.

  1. Go to ECS console - Snapshots.

  2. In the upper-left corner of the page, select a region and resource group.地域

  3. On the Disk Snapshot tab, find the target snapshot. In the Actions column, click Roll Back Disk.

  4. In the Roll Back Disk dialog, select the confirmation checkbox, and click OK.

Solution 2: Re-initialize the system disk

You can re-initialize the system disk to fix the system error. To do this, follow these steps:

Warning

Re-initializing a system disk clears all its data. We recommend creating a snapshot first. For more information, see Create a snapshot for a disk.

  1. Stop the ECS instance.

    1. Go to ECS console - Instances.

    2. In the upper-left corner of the page, select a region and resource group.地域

    3. Click the target instance to view its details. Click All Actions, and then click Stop.

  2. Re-initialize the system disk.

    1. On the instance details page, click All Actions, and then click Re-initialize Disk.

    2. In the Re-initialize Disk dialog box, configure the re-initialization parameters.

    3. Click Confirm.

      When the re-initialization completes, the instance starts automatically.

1662001137: Windows internal error

Symptoms

When you connect to the instance using VNC, the Windows startup screen displays the error message An internal error has occurred or The system cannot find the file specified.

Cause

This issue may occur if a corrupted Windows operating system prevents the instance from starting.

Solutions

Try restarting the instance to resolve this error. If the issue persists, you can roll back or reset the system disk.

Solution 1: Restart the instance

Try restarting the instance to resolve the system error. Follow these steps:

  1. Go to the Instances page in the ECS console. In the upper-left corner of the page, select the target region and resource group.

  2. Click the target instance ID to open the instance details page. In the upper-right corner of the page, click Restart.

  3. In the dialog box that appears, select a restart mode.

    • Clear Force Restart (default): The operating system attempts to gracefully shut down all processes before restarting.

    • Select Force Restart: This is equivalent to a power-off operation and risks data loss from memory and file system corruption. Use this option only if the instance does not respond to a normal restart.

    Click OK.

Solution 2: Roll back the system disk

If a snapshot is available, you can use it to roll back the system disk. Follow these steps:

Warning

Rolling back a cloud disk is an irreversible operation. Any data added to the cloud disk after the snapshot was created will be lost. To prevent accidental data loss, we recommend creating a snapshot of the cloud disk to back up its data before the rollback. For more information, see Create a snapshot for a disk.

  1. Go to ECS console - Snapshots.

  2. In the upper-left corner of the page, select a region and resource group.地域

  3. On the Disk Snapshot tab, find the target snapshot. In the Actions column, click Roll Back Disk.

  4. In the Roll Back Disk dialog, select the confirmation checkbox, and click OK.

Solution 3: Reset the system disk

You can reset the system disk to resolve the system error. Follow these steps:

Warning

Resetting a system disk clears all data on the disk. We recommend creating a snapshot to back up your data before proceeding. For more information, see Create a snapshot for a disk.

  1. Stop the ECS instance.

    1. Go to ECS console - Instances.

    2. In the upper-left corner of the page, select a region and resource group.地域

    3. Click the target instance to view its details. Click All Actions, and then click Stop.

  2. Re-initialize the system disk.

    1. On the instance details page, click All Actions, and then click Re-initialize Disk.

    2. In the Re-initialize Disk dialog box, configure the re-initialization parameters.

    3. Click Confirm.

      When the re-initialization completes, the instance starts automatically.

1662001138: Windows BCD missing or corrupted

Symptoms

When you log on to an instance by using VNC, the Windows startup screen displays the error An error occurred while attempting to read the boot configuration data.

Windows Boot Manager

Windows failed to start. A recent hardware or software change might be the cause. To fix this problem:

File: \windows\system32\config\system
Status: 0xc000000f
Info: Windows failed to load because the system registry file is missing or corrupt.

Cause

This issue occurs when the Boot Configuration Data (BCD) file is missing or corrupted, preventing the system from starting.

Solution

If a snapshot is available, you can use it to restore the system disk. To do this, follow these steps:

Warning

Rolling back a cloud disk is an irreversible operation. Any data added to the cloud disk after the snapshot was created will be lost. To prevent accidental data loss, create a snapshot of the cloud disk before rolling it back. For more information, see Create a snapshot.

  1. Go to ECS console - Snapshots.

  2. In the upper-left corner of the page, select a region and resource group.地域

  3. On the Disk Snapshot tab, find the target snapshot. In the Actions column, click Roll Back Disk.

  4. In the Roll Back Disk dialog, select the confirmation checkbox, and click OK.

1662001139: Missing or corrupted Windows boot files

Symptoms

When connecting to the instance using VNC, the Windows startup screen displays a blue screen with the INACCESSIBLE BOOT DEVICE error message.

Your PC ran into a problem and needs to restart. We're just collecting some error info, and then we'll restart for you. (0% complete)

If you'd like to know more, you can search online later for this error: INACCESSIBLE BOOT DEVICE

Causes

This problem may be caused by one of the following:

  • The Windows boot sector is missing or corrupted.

  • A Windows driver file is missing or corrupted.

Solution

To restore the system disk using an available snapshot, follow these steps:

Warning

Rolling back a cloud disk is an irreversible operation. Data that is added to the cloud disk after the snapshot is created will be lost. To prevent data loss, create a new snapshot of the cloud disk before you roll it back. For more information, see Create a snapshot.

  1. Go to ECS console - Snapshots.

  2. In the upper-left corner of the page, select a region and resource group.地域

  3. On the Disk Snapshot tab, find the target snapshot. In the Actions column, click Roll Back Disk.

  4. In the Roll Back Disk dialog, select the confirmation checkbox, and click OK.

1662001140: Windows Bootmgr is missing or corrupted

Symptoms

When you connect to the instance using VNC, the Windows startup screen displays the BOOTMGR is missing error.

Causes

This issue occurs because the Windows Boot Manager (Bootmgr) is missing or corrupted, which prevents the system from starting.

Solution

If a snapshot is available, you can use it to restore the system disk. To do this, follow these steps:

Warning

Rolling back a cloud disk is an irreversible operation. You will lose any data added to the cloud disk after the snapshot was created. To prevent accidental data loss, create a snapshot of the cloud disk to back up its data before you roll it back. For more information, see Create a snapshot.

  1. Go to ECS console - Snapshots.

  2. In the upper-left corner of the page, select a region and resource group.地域

  3. On the Disk Snapshot tab, find the target snapshot. In the Actions column, click Roll Back Disk.

  4. In the Roll Back Disk dialog, select the confirmation checkbox, and click OK.

1662001141: Windows system file missing or corrupted

Symptoms

When you connect to the instance using VNC, Windows fails to start and the startup screen displays the Missing operating system error.

Cause

This issue can occur when a Windows system file is missing or corrupted, preventing the system from starting.

Solution

If a snapshot is available, you can use it to restore the system disk. To do this, follow these steps:

Warning

A disk rollback is an irreversible operation. Data that is added to the cloud disk after the snapshot is created will be lost. To prevent accidental data loss, we recommend creating a snapshot of the cloud disk to back up its data before you roll it back. For more information, see Create a snapshot for a disk.

  1. Go to ECS console - Snapshots.

  2. In the upper-left corner of the page, select a region and resource group.地域

  3. On the Disk Snapshot tab, find the target snapshot. In the Actions column, click Roll Back Disk.

  4. In the Roll Back Disk dialog, select the confirmation checkbox, and click OK.

1662001142: Windows registry is missing or corrupted

Symptoms

When you connect to the instance using VNC, the Windows system fails to start and displays the Windows Failed To Start. A Recent Hardware Or Software Change Might Be The Cause. error message.

Windows Boot Manager

Windows failed to start. A recent hardware or software change might be the cause. To fix this problem:

File: \windows\system32\config\system
Status: 0xc000000f
Info: Windows failed to load because the system registry file is missing or corrupt.

Cause

This issue occurs because the Windows registry is missing or corrupted, which prevents the system from loading.

Solution

If a snapshot is available, you can use it to restore the system disk. Follow these steps:

Warning

Rolling back a cloud disk is an irreversible operation. Data that is added to the cloud disk after the snapshot is created will be lost. To prevent accidental data loss, create a snapshot of the cloud disk before you proceed. For more information, see Create a snapshot.

  1. Go to ECS console - Snapshots.

  2. In the upper-left corner of the page, select a region and resource group.地域

  3. On the Disk Snapshot tab, find the target snapshot. In the Actions column, click Roll Back Disk.

  4. In the Roll Back Disk dialog, select the confirmation checkbox, and click OK.

1662001151: Windows OS initialization failure due to interrupted Sysprep

Symptoms

An ECS instance restarts unexpectedly or encounters an error. When you connect to the instance using VNC, the Windows startup screen shows the Windows Could Not Complete The Installation error. The dialog box reads: Windows could not complete the installation. To install Windows on this computer, restart the installation.

Causes

This issue occurs because the Sysprep process was incomplete when the instance restarted.

Solution

Warning

Re-initializing a system disk erases all data on the disk. We recommend creating a snapshot to back up the data before you proceed. For more information, see Create a snapshot.

  1. Stop the ECS instance.

    1. Go to ECS console - Instances.

    2. In the upper-left corner of the page, select a region and resource group.地域

    3. Click the target instance to view its details. Click All Actions, and then click Stop.

  2. Re-initialize the system disk.

    1. On the instance details page, click All Actions, and then click Re-initialize Disk.

    2. In the Re-initialize Disk dialog box, configure the re-initialization parameters.

    3. Click Confirm.

      When the re-initialization completes, the instance starts automatically.

1671696280: Windows startup failure due to BCD or file system error

Symptoms

When you connect to the instance by using VNC, the Windows system fails to start. The startup screen displays the Windows failed to start. A recent hardware or software change might be the cause. error message, and the Status is 0xc0000001.

Windows Boot Manager

Windows failed to start. A recent hardware or software change might be the cause.

Status: 0xc0000001
Info: After multiple tries, the operating system on your PC failed to start.

Press ENTER to try again
Press F8 for Startup Settings

Causes

Incorrect boot configuration data (BCD) or a corrupted cloud disk file system prevents the system from loading.

Solution

If a snapshot is available, you can use it to restore the system disk.

Warning

Rolling back a cloud disk is an irreversible operation. Data that is added to the cloud disk after the snapshot is created will be lost. To prevent accidental data loss, we recommend that you create a snapshot of the cloud disk to back up its data before proceeding. For more information, see Create a snapshot.

  1. Go to ECS console - Snapshots.

  2. In the upper-left corner of the page, select a region and resource group.地域

  3. On the Disk Snapshot tab, find the target snapshot. In the Actions column, click Roll Back Disk.

  4. In the Roll Back Disk dialog, select the confirmation checkbox, and click OK.

1671696281: Windows instance fails to start

Symptoms

When you log in to an instance by using VNC, Windows fails to start, and the startup screen gets stuck on the Choose Your Keyboard Layout screen. The screen displays the Choose Your Keyboard Layout page and lists available input method options, such as Microsoft Pinyin and US Keyboard.

Causes

This issue occurs when a corrupted Windows system file or an incompatible driver prevents the system from starting.

Solution

If a snapshot is available, you can use it to restore the system disk. To do this, follow these steps:

Warning

Rolling back a cloud disk is an irreversible operation. Any data added to the cloud disk after the snapshot was created will be lost. To prevent data loss, we recommend creating a snapshot of the cloud disk before you roll it back. For more information, see Create a snapshot for a disk.

  1. Go to ECS console - Snapshots.

  2. In the upper-left corner of the page, select a region and resource group.地域

  3. On the Disk Snapshot tab, find the target snapshot. In the Actions column, click Roll Back Disk.

  4. In the Roll Back Disk dialog, select the confirmation checkbox, and click OK.

1671696282: Windows instance fails to start

Symptoms

When you connect to the instance using VNC, Windows fails to start and gets stuck on the Windows Error Recovery screen. This screen provides startup options such as Safe Mode, Safe Mode with Networking, Safe Mode with Command Prompt, Last Known Good Configuration, and Start Windows Normally.

Causes

This issue can be caused by a system disk failure or hardware changes.

Solution

You can restore the system disk from an available snapshot. Follow these steps:

Warning

Rolling back a cloud disk is an irreversible operation. Data added to the cloud disk after the snapshot was created will be lost. To prevent data loss, we recommend creating a snapshot of the cloud disk before you roll it back. For more information, see Manually create a single snapshot.

  1. Go to ECS console - Snapshots.

  2. In the upper-left corner of the page, select a region and resource group.地域

  3. On the Disk Snapshot tab, find the target snapshot. In the Actions column, click Roll Back Disk.

  4. In the Roll Back Disk dialog, select the confirmation checkbox, and click OK.

1671696284: Bootable device not found

Symptoms

When you connect to the instance by using VNC, the Windows startup screen displays the An operating system wasn't found. Try disconnecting any drives that don't contain an operating system. error.

No operating system was found. Try disconnecting any drives that don't contain an operating system.
Press Ctrl+Alt+Del to restart

Causes

This issue occurs because the Windows system cannot find a bootable device.

Solution

If a snapshot is available, you can use it to roll back the system disk. To do this, follow these steps:

Warning

Rolling back a cloud disk is an irreversible operation. Data that is added to the cloud disk after the snapshot is created will be lost. To prevent accidental data loss, we recommend that you create a snapshot for the cloud disk to back up its data before you roll it back. For more information, see Create a snapshot for a disk.

  1. Go to ECS console - Snapshots.

  2. In the upper-left corner of the page, select a region and resource group.地域

  3. On the Disk Snapshot tab, find the target snapshot. In the Actions column, click Roll Back Disk.

  4. In the Roll Back Disk dialog, select the confirmation checkbox, and click OK.

1706506808: Boot disk not found

Symptoms

A Windows ECS instance fails to start and displays the "no bootable device" error.

Note

When the operating system fails to start, you can only access the instance by using VNC.

Booting from Hard Disk...
Boot failed: could not read the boot disk

No bootable device.

Causes

This issue can have multiple causes. You can use the instance health diagnostics feature to identify the specific cause based on the information it returns. For more information about using this feature, see Use the self-service troubleshooting feature to resolve instance issues.

Solution

You can select a solution based on the information returned by the instance health diagnostics feature. For more information, see What do I do if a "no bootable device" error occurs when I start a Windows instance?.

1706506809: Operating system crashes unexpectedly

Symptoms

A Windows ECS instance experiences issues such as a kernel panic, an out-of-memory (OOM) error, or a blue screen freeze during runtime.

Causes

An OS crash causes downtime, which can be caused by factors such as the unexpected termination of a core system service process, illegal memory access by the kernel or a driver, or a corrupted kernel data structure.

Solution

You can use a self-service diagnostics tool or check system events to identify the cause and resolve the issue. For more information, see Troubleshoot downtime issues for Windows instances.

1706506811: Stuck in boot mode selection

Symptoms

When you start a Windows instance, the operating system fails to load and enters Repair mode (Preparing Automatic Repair).

Preparing Automatic Repair
Loading files...

Causes

This issue can have multiple causes. You can use the instance health diagnostics feature to identify the specific cause based on the information it returns. For more information about using this feature, see Use the self-service troubleshooting feature to resolve instance issues.

Solution

You can select a solution based on the information returned by the instance health diagnostics feature. For more information, see What do I do if the operating system enters the "Preparing Automatic Repair" mode when I start a Windows instance?

1706506813: Corrupted registry

Symptoms

When you start a Windows instance, it displays the "Windows failed to start. A recent hardware or software change might be the cause." error.

Windows Boot Manager

Windows failed to start. A recent hardware or software change might be the cause.

File: \windows\system32\config\system
Status: 0xc000000f
Info: Windows failed to load because the system registry file is missing or corrupt.

Causes

A critical registry file on the instance is missing or corrupted.

Solution

You can enter Repair mode or use a repair instance to fix the corrupted registry file. For more information, see Repair a corrupted registry file.

1706506814: Critical file corruption

Symptoms

Windows Boot Manager

Windows failed to start. A recent hardware or software change might be the cause.

File: \Windows\system32\winload.exe
Status: 0xc0000017
Info: The boot selection failed because a required file is missing or contains errors.

ENTER=OS Selection  ESC=Recovery

Causes

This issue is caused by an exception in a Windows core component. Possible causes include:

  • An image built from a Public Preview version has expired.

  • System file corruption: A system file is deleted or its content is corrupted.

Solutions

1706506815: Cannot verify digital signature

Symptoms

A Windows instance fails to start and displays the "Status: 0xc0000428" error when you connect to it by using VNC.

Windows Boot Manager

Windows cannot verify the digital signature for this file.

File: \windows\system32\drivers\viostor.sys
Status: 0xc0000428

Causes

The Windows operating system cannot correctly verify the SHA256 digital signature used by the viostor.sys driver.

Solution

You need to temporarily disable the Driver Signature Enforcement mode to boot into Windows, and then install the KB3033929 patch in the Windows instance to support the SHA256 algorithm and ensure that the system can recognize the SHA256 signature used by the viostor.sys driver. For more information, see What do I do if a Windows instance fails to start and reports the 'Status: 0xc0000428' error when I log on by using VNC?.

1706506816: Image boot mode mismatch

Symptoms

When you start a Windows ECS instance, it fails to boot. When you connect by using VNC, the instance displays the EFI Shell screen.

EFI Shell version 2.31 [2.70]
Current running mode 1.1.2

Device mapping table
  fs0  :HardDisk - Alias hd14b0b1 blk0
        PciRoot(0x0)/Pci(0x5,0x0)/VirtIO(0x0)/HD(...)
  blk0 :HardDisk - Alias hd14b0b1 fs0
        PciRoot(0x0)/Pci(0x5,0x0)/VirtIO(0x0)/HD(...)
  blk1 :HardDisk
        PciRoot(0x0)/Pci(0x5,0x0)/VirtIO(0x0)
  blk2 :HardDisk
        PciRoot(0x0)/Pci(0x8,0x0)/VirtIO(0x0)
  blk3 :HardDisk
        PciRoot(0x0)/Pci(0x8,0x0)/VirtIO(0x0)/HD(...)
  blk4 :BlockDevice
        PciRoot(0x0)/Pci(0x1,0x0)/Floppy(0x0)

Press ESC in 5 seconds to skip startup.nsh, any other key to continue.
Shell>

Causes

This screen indicates that the ECS instance failed to start in UEFI mode. Possible causes include:

  • The instance image does not support UEFI, but the instance's boot mode is set to UEFI. This typically occurs with ECS instances created from a custom image.

  • The instance image supports UEFI, but its UEFI firmware is corrupted.

Solution

You can resolve this issue by changing the instance's boot mode or by repairing the UEFI firmware. For more information, see What do I do if a Linux ECS instance fails to start and displays a "UEFI Interactive Shell" error?.

Linux

1662001143: GRUB boot failure on a Linux instance

Symptoms

When you connect to the ECS instance by using VNC, the operating system fails to start and an error message that contains grub> or grub rescue> appears.

SeaBIOS (version 8f19b21)
Machine UUID 00xxx...ee0c

iPXE (http://xxx.org) 00:03.0 C980 PCI2.10 PnP PMM+BF...

Booting from Hard Disk...
.
error: file '/boot/grub2/i386-pc/normal.mod' not found.
Entering rescue mode...
grub rescue> telnet 47.xxx.xxx.xxx 22
Unknown command 'telnet'.
grub rescue> ls
(hd0) (hd0,msdos1)
grub rescue> dir
Unknown command 'dir'.
grub rescue>

Causes

A GRUB boot failure prevents the operating system from starting.

This issue can occur for the following reasons:

  • A GRUB component file, such as /boot/grub2/grub.cfg, is missing.

  • The GRUB2 configuration is incorrect, or the GRUB2 bootloader on the disk is corrupted.

  • If the VNC console displays a message like /boot/grub2/i386-pc/normal.mod not found., a critical GRUB2 dependency module is missing.

  • If the VNC console displays a message like error: no such partition, GRUB2 cannot recognize the corresponding partition.

  • If the VNC console displays a message like error: unknown filesystem, GRUB2 cannot recognize the file system type of the partition.

Solutions

Solution 1: Repair the system disk

  1. Detach the system disk from the faulty instance and attach it as a data disk to a working instance.

    For more information, see Steps 1 and 2 in Best practices for data restoration on Linux instances.

    Note

    If you are still in the chroot environment on the working instance, run the exit command.

  2. Use one of the following methods to repair the system disk.

    • Method 1: If a critical GRUB2 dependency module is missing or the GRUB2 bootloader on the disk is corrupt, reinstall GRUB2. In the chroot environment, run the grub2-install /dev/vdb command:

      1. Use the disk serial number to find the device name of the attached system disk.

        For more information, see Query the serial number of a disk. In this example, /dev/vdb is used.

      2. Connect to the working instance, switch to the root user, and run the following commands in sequence to enter the mount directory of the instance. In this example, the mount directory is /mnt.

        mount -o bind /proc/ /mnt/proc/
        mount -o bind /sys/ /mnt/sys/
        mount -o bind /dev/ /mnt/dev/
        chroot /mnt
      3. Log on to the normal ECS instance, switch to the root user, and run the following commands to enter the mount directory of the instance (for example, /mnt).

      4. Run the grub2-install /dev/vdb command to reinstall GRUB on the system disk.

      5. On the working instance, run the exit command to exit the chroot environment.

    • Method 2: In the ECS console, re-initialize the system disk. For more information, see Re-initialize the system disk.

      Warning

      Re-initializing a disk erases all its data. Before you proceed, create a snapshot to back up your data. For more information, see Create a snapshot.

  3. Re-attach the repaired system disk to the original instance.

    For more information, see Step 3 in Best practices for data restoration on Linux instances.

  4. Connect to the repaired instance by using SSH or VNC to verify that the instance is running correctly.

Solution 2: Restore the system disk from a snapshot

If Solution 1 fails, restore the system disk from a snapshot if you have one.

To restore the system disk, follow these steps:

Warning

A disk rollback is irreversible and erases all data added after the snapshot was created. To prevent accidental data loss, create a backup snapshot before you roll back the disk. For more information, see Create a snapshot.

  1. Go to ECS console - Snapshots.

  2. In the upper-left corner of the page, select a region and resource group.地域

  3. On the Disk Snapshot tab, find the target snapshot. In the Actions column, click Roll Back Disk.

  4. In the Roll Back Disk dialog, select the confirmation checkbox, and click OK.

Solution 3: Reset the system disk

If the system disk does not contain important data, you can reset it. To do so, follow these steps:

Warning

Resetting a system disk erases all its data. Before you proceed, create a snapshot to back up your data. For more information, see Create a snapshot.

  1. Stop the ECS instance.

    1. Go to ECS console - Instances.

    2. In the upper-left corner of the page, select a region and resource group.地域

    3. Click the target instance to view its details. Click All Actions, and then click Stop.

  2. Re-initialize the system disk.

    1. On the instance details page, click All Actions, and then click Re-initialize Disk.

    2. In the Re-initialize Disk dialog box, configure the re-initialization parameters.

    3. Click Confirm.

      When the re-initialization completes, the instance starts automatically.

1662001144: Incorrect root filesystem UUID in GRUB

Symptoms

When you connect to the instance using VNC, the console displays logs similar to the following:

Warning: /dev/disk/by-uuid/10xxx...527 does not exist:

dracut-initqueue[217]: Warning: dracut-initqueue timeout - starting timeout scripts
...
dracut-initqueue[217]: Warning: Could not boot.
dracut-initqueue[217]: Warning: /dev/disk/by-uuid/10c0e7e5-557a-40c1-893c-1e2dcbac1527 does not exist
Starting Dracut Emergency Shell...
Generating "/run/initramfs/rdsOSreport.txt"
Entering emergency mode. Exit the Shell to continue
Type "journalctl" to view system logs.
dracut:/#

Cause

This issue occurs because the grub2.cfg file specifies a device by its UUID, but no device with that UUID exists.

For example, the configuration specifies the root device as root=UUID=10c0e7e5-557a-40c1-893c-1e2dcba*****, but no device with this UUID is attached to the ECS instance.

### BEGIN /etc/grub.d/10_linux ###
menuentry 'CentOS Linux (3.10.0-1160.66.1.el7.x86_64) 7 (Core)' ... {
        load_video
        set gfxpayload=keep
        insmod gzip
        insmod part_msdos
        insmod ext2
        set root='hd0,msdos1'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root --hint='hd0,msdos1' 10cxxx...527
        else
          search --no-floppy --fs-uuid --set=root 10cxxx...527
        fi
        linux16 /boot/vmlinuz-3.10.0-1160.66.1.el7.x86_64 root=UUID=10cxxx...527 ro crashkernel=auto spectre_v2=retpoline rhgb quiet net.ifnames=0 console=ttyS0,115200n8 noibrs nvme_core.io_timeout=4294967295
        initrd16 /boot/initramfs-3.10.0-1160.66.1.el7.x86_64.img
}

Solution

  1. Detach the system disk from the faulty ECS instance and attach it as a data disk to a healthy ECS instance.

    For more information, see Steps 1 and 2 in Best practices for data restoration on Linux instances.

  2. On the healthy instance, use the cloud disk serial number to find the device name that the operating system assigned to the faulty system disk.

    For more information, see View block storage serial numbers. This example assumes the device name is /dev/vdb.

  3. Run the following command to query the file system UUID of /dev/vdb1.

    blkid /dev/vdb1

    The command returns the file system UUID.

    /dev/vdb1: UUID="10c0e7e5-557a-40c1-893c-1e2dcba*****" TYPE="ext4"
  4. Replace the incorrect UUID in the grub2.cfg file on the faulty system disk with the correct UUID.

  5. Reattach the system disk to the faulty ECS instance.

    For more information, see Step 3 in Best practices for data restoration on Linux instances.

  6. Connect to the repaired ECS instance using SSH or VNC. If the boot error no longer appears, the issue is resolved.

1662001145: Linux instance hangs on shutdown

Symptoms

An ECS instance remains in the Stopping state. When you log on to the instance by using VNC, you see the following error message.

CentOS release 5.8(Final)
Kernel 2.6.18-308.el5 on anx 86_64
iZuf6isbofkgfnm5qp***** login: md: stopping all md devices.
System halted.

Cause

This issue occurs when the system becomes unresponsive during the shutdown process. A detailed analysis of the system logs is required to determine the specific root cause.

Solution

To resolve this issue, force restart the instance in the ECS console. Follow these steps:

Important

A forced restart can cause the loss of data cached in memory.

  1. Go to the Instances page in the ECS console. In the top-left corner of the page, select the region and resource group where the target instance is located.

  2. Click the ID of the target instance to open the instance details page. In the upper-right corner, click Restart.

  3. In the dialog box that appears, select a restart mode.

    Select Force Restart. This action is equivalent to a power cycle and risks data loss from memory and file system corruption. Use this option only if the instance does not respond to a normal restart.

  4. Click OK.

1662001146: Nonexistent device in /etc/fstab

Symptoms

When you connect to the ECS instance using VNC, the console repeatedly displays the A start job is running error.

Booting from 0000:7c00
/: clean, 53966/2621440 files, 648440/10485499 blocks
A start job is running for dev-xvda1.device (5s / 1min 30s)
A start job is running for dev-xvda1.device (6s / 1min 30s)
A start job is running for dev-xvda1.device (7s / 1min 30s)
......

Cause

This issue may occur if the /etc/fstab file on the ECS instance contains a mount point for a nonexistent device.

For example, if the /dev/xvda1 device configured in /etc/fstab does not exist, the system reports the A start job is running for dev-xvda1.device error.

#
# /etc/fstab
# Created by anaconda on Fri Jul  1 06:25:22 2022
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=2b890xxx...cd5f /                       ext4    defaults        1 1
/dev/xvda1       /    ext4    defaults        1 1

Solution

  1. Wait for the system check to complete.

    After the check completes, the following screen appears:

    [  OK  ] Started Network Name Resolution.
    [  OK  ] Reached target Host and Network Name Lookups.
    [  OK  ] Reached target Network.
    [  OK  ] Reached target Network is Online.
    [  OK  ] Started Tell Plymouth To Write Out Runtime Data.
    emergency.service
    console-setup.service
    setvtrgb.service
    systemd-tmpfiles-setup.service
    systemd-update-utmp.service
    systemd-resolved.service
    [  OK  ] Started AppArmor initialization.
    apparmor.service
    Give root password for maintenance
    (or press Control-D to continue):
  2. Enter the logon password for the ECS instance (not the VNC password) to access the operating system.

  3. Run the mount -a command to identify the erroneous line.

    The following output indicates an error on line 11 of the /etc/fstab file.

    [    1.686180] systemd[1]: [/run/systemd/generator/\x27-swapfile.mount:10] Not an absolute path, ignoring: '/swapfile'
    Give root password for maintenance
    (or press Control-D to continue):
    Login incorrect
    
    Give root password for maintenance
    (or press Control-D to continue):
    root@ixxx:~# mount -a
    mount: /etc/fstab: parse error: ignore entry at line 11.
    root@ixxx:~#
  4. Run vim /etc/fstab to edit the /etc/fstab file.

  5. Comment out (by adding a # symbol at the beginning of the line) or delete the incorrect line in the /etc/fstab file. Then, enter :wq and press Enter to save your changes and exit.

  6. Restart the ECS instance.

    For more information, see Restart an instance.

  7. If the error message no longer appears, the issue is resolved.

1662001147: Linux kernel panic

Symptoms

The operating system of the ECS instance fails to start. Connecting to the instance through VNC reveals the Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block error message in logs such as /var/log/dmesg and /var/log/messages.

Cause

Various factors can cause a kernel panic, including kernel issues, application issues, or problems with other system components. Example error messages include:

  • Kernel panic - not syncing fatal exception in interrupt

  • Kernel panic - not syncing: Attempted to kill the idle task!

  • Kernel panic - not syncing: killing interrupt handler!

  • Kernel panic - not syncing: Attempted to kill init!

Solutions

Restart the ECS instance

To quickly restore your service, force restart the instance from the console:

Important

A force restart may cause the loss of cached data.

  1. Go to the Instances page in the ECS console. In the top navigation bar, select the region and resource group to which the instance belongs.

  2. Click the target instance's ID to open its details page. In the upper-right corner of the page, click Restart.

  3. In the dialog box that appears, select a restart mode.

    Select Force Restart. This action is like a hard power-off and risks data loss and file system corruption. Use this option only if the instance is unresponsive to a normal restart.

  4. Click OK.

Enable the Kdump service

A kernel panic is a system crash caused by a critical failure in the kernel or an application. To diagnose the root cause, you can enable the Kdump service to capture kernel dump files for analysis. For more information, see How to enable the Kdump service for Linux instances.

1662001148: A critical Linux system file is missing

Symptoms

An ECS instance fails to start. When you connect to the instance using VNC, the startup screen shows the Failed to execute/bin/sh, giving up: No such file or directory error message.

[  OK  ] Stopped dracut cmdline hook.
Stopping dracut cmdline hook...
[  OK  ] Stopped Create Static Device Nodes in /dev.
Stopping Create Static Device Nodes in /dev...
[  OK  ] Stopped Create list of required sta...ce nodes for the current kernel.
Stopping Create list of required st...nodes for the current kernel...
[  OK  ] Closed udev Control Socket.
[  OK  ] Closed udev Kernel Socket.
Starting Cleanup udev dDB.
[  OK  ] Started Cleanup udev dDB.
[  6.573859] systemd-journald[106]: Received SIGTERM from PID 1 (systemd).
[  OK  ] Reached target Switch Root.
[  OK  ] Started Plymouth switch root service.
Starting Switch Root...
[  6.583367] systemd[1]: No /sbin/init, trying fallback
[  6.584388] systemd[1]: Failed to execute/bin/sh, giving up: No such file or directory
[892.889761] random: crng init done

Cause

This issue may occur if the /bin/sh or /bin/bash file or its symbolic link is deleted from the ECS instance, which prevents the instance from starting.

Solutions

Solution 1: Repair the system disk by using a rescue instance

  1. Detach the system disk from the faulty ECS instance and attach it as a data disk to a running ECS instance.

    For more information, see Steps 1 and 2 in Best practices for data restoration on Linux instances.

  2. From a normal ECS instance, copy the corresponding /bin/sh or /bin/bash file to the abnormal ECS system disk.

    For example, if the disk is mounted at /mnt/disk, run the following command to copy the /bin/sh file:

    cp /bin/sh /mnt/disk/bin/
  3. Reattach the system disk to the faulty ECS instance.

    For more information, see Step 3 in Best practices for data restoration on Linux instances.

  4. Start the instance and connect to it using SSH or VNC. If the instance starts successfully, the issue is resolved.

Solution 2: Restore the system disk from a snapshot

If Solution 1 does not resolve the issue, you can try this solution.

Warning

Rolling back a disk is an irreversible operation. All data added to the disk after the snapshot was created will be lost. To prevent data loss, create a new snapshot to back up the disk before you proceed. For more information, see Create a manual snapshot.

  1. Go to ECS console - Snapshots.

  2. In the upper-left corner of the page, select a region and resource group.地域

  3. On the Disk Snapshot tab, find the target snapshot. In the Actions column, click Roll Back Disk.

  4. In the Roll Back Disk dialog, select the confirmation checkbox, and click OK.

Solution 3: Reinitialize the system disk

If the system disk does not contain important data, you can reinitialize it. To do this, follow these steps:

Warning

Reinitializing a system disk erases all data on it. Before you proceed, create a snapshot to back up the data. For more information, see Create a manual snapshot.

  1. Stop the ECS instance.

    1. Go to ECS console - Instances.

    2. In the upper-left corner of the page, select a region and resource group.地域

    3. Click the target instance to view its details. Click All Actions, and then click Stop.

  2. Re-initialize the system disk.

    1. On the instance details page, click All Actions, and then click Re-initialize Disk.

    2. In the Re-initialize Disk dialog box, configure the re-initialization parameters.

    3. Click Confirm.

      When the re-initialization completes, the instance starts automatically.

1662001149: Fsck file system error on startup

Symptoms

The ECS instance hangs during startup. When you connect to the instance by using VNC, you see an error message similar to the following:

fsck from util-linux 2.20.1
fsck from util-linux 2.20.1
: clean, 193163/1310720 files, 2415199/5242368 blocks
/dev/vdb: Superblock last write time (Tue Nov 23 00:31:58 2021,
now=Wed Nov 10 18:28:55 2021) is in the future.
/dev/vdb: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)
mountall: fsck /data[294] terminated with status 4
mountall: File system has errors: /data
Errors were found while checking the disk drive for /data.
Press F to attempt to fix the errors, I to ignore, S to skip mounting, or M for manual recovery

Cause

The /dev/vdb file system for the /data mount point has an inconsistency. The fsck utility reports an error requiring you to manually confirm the repair.

Solution

Note

This solution provides a general approach. The specific repair steps depend on the error that the fsck utility reports.

  1. As prompted, press F to automatically repair the errors.

  2. If the repair fails, press S to skip mounting /data and continue starting the operating system.

  3. After the operating system starts, run the following command to repair the /dev/vdb file system for the /data mount point.

    fsck -y /dev/vdb
  4. After the repair is complete, run the following command to restart the instance.

    reboot
  5. Connect to the instance remotely. If the error no longer appears, the issue is resolved.

1662001150: Missing device in /etc/fstab or fsck error

Symptoms

An ECS instance fails to start and enters emergency mode. You are prompted for the root password with the message Give root password for maintenance, as shown in the following output:

Booting from Hard Disk...
Booting from 0000:7c00
[ 2.845229] EXT4-fs (vdb1): Unrecognized mount option "default" or missing value
Welcome to emergency mode! After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, "systemctl default" or ^D to
try again to boot into default mode.
Give root password for maintenance
(or press Control-D to continue):

Causes

This issue may occur for the following reasons:

  • A device specified for a mount point in the /etc/fstab file on the ECS instance does not exist.

  • A file system check using fsck is configured to run at startup, and fsck detects an error in the file system that requires manual repair.

Solutions

Select a solution based on the root cause of the issue.

Solution 1: Fix configuration errors

  1. Enter the root password to access the system.

  2. Run the following command to remount the root partition with read-write permissions.

    mount / -o remount,rw
  3. Run the following command to view the detailed error message.

    journalctl -xb

    The output [ 2.845229] EXT4-fs (vdb1): Unrecognized mount option "default" or missing value indicates that the mount option for /dev/vdb1 in the /etc/fstab file is incorrectly set to default instead of defaults.

  4. In the /etc/fstab file, change the mount option for /dev/vdb1 to defaults.

    1. Run the following command to open the /etc/fstab file.

      vim /etc/fstab
    2. Press the i key to enter insert mode.

    3. Change the mount option for /dev/vdb1 to defaults. The following code shows the incorrect entry:

      /dev/vdb1       /      ext4   default  l  l
    4. Press the Esc key, enter :wq, and then press Enter to save the changes and exit.

  5. Restart the ECS instance.

    For more information, see Restart an instance.

  6. Remotely connect to the instance. If the error no longer occurs, the issue is resolved.

Solution 2: Repair file system errors

After you enter the root password and access the system, address the errors reported by the fsck check. The appropriate solution depends on the specific error messages.

Troubleshoot an initrd switch root failure

Symptoms

A Linux system fails to start. When you log on to the ECS instance using VNC, an error message similar to Failed to start Switch Root appears.

[  OK  ] Stopped target Sockets.
[  OK  ] Stopped target Paths.
[  OK  ] Stopped target Slices.
[  OK  ] Stopped target Swap.
...
         Starting Switch Root...
[FAILED] Failed to start Switch Root.
See 'systemctl status initrd-switch-root.service' for details.

Cause

The switch root process can fail during system startup for several reasons:

Solutions

Solution 1: Create the missing file

  1. Detach the system disk from the problematic ECS instance and attach the cloud disk as a data disk to a helper ECS instance.

    For more information, see Steps 1 and 2 in Best practices for data recovery of a Linux instance.

    Note

    We recommend using a helper instance that runs the same Linux distribution as the problematic instance. This simplifies restoring the /etc/os-release file.

  2. On the helper ECS instance, the required operations depend on whether the /etc/os-release file exists on the problematic system disk.

    Note

    In this example, the file system of the problematic system disk is mounted to the /mnt directory.

    Scenario

    Procedure

    The /etc/os-release file does not exist.

    The required steps depend on whether the helper and problematic ECS instances run the same Linux distribution.

    • If the instances run the same Linux distribution, run the cp /etc/os-release /mnt/etc/os-release command to copy the /etc/os-release file from the helper instance to the problematic ECS instance.

    • For different Linux distributions, run the vi /mnt/etc/os-release command to manually create and edit the /etc/os-release file. Populate the /etc/os-release file by using the content from the /etc/os-release file of a normal instance of the same Linux distribution.

    Note

    The /etc/os-release file on a helper instance might be a symbolic link. Choose one of the following methods to resolve this issue:

    • Perform the preceding operations to directly restore the /etc/os-release file that contains the actual content.

      systemd can correctly identify the OS as long as the /etc/os-release file or the /usr/lib/os-release file exists.

    • Refer to the following steps to first restore the content file that /etc/os-release points to, and then restore the /etc/os-release symbolic link to ensure the integrity and consistency of the system files.

    The /etc/os-release file exists.

    1. Run the ls -hal /etc/os-release command. If the output is similar to the following, the /etc/os-release file is a symbolic link to the /usr/lib/os-release file.

      lrwxrwxrwx 1 root root 19 Dec 20 15:13 os-release -> /usr/lib/os-release
    2. Perform the required operations based on whether the /usr/lib/os-release file is missing.

      • The /usr/lib/os-release file does not exist.

        • If the instances run the same Linux distribution, run the cp /usr/lib/os-release /mnt/usr/lib/os-release command to copy the /usr/lib/os-release file from the helper instance to the problematic ECS instance.

        • For different Linux distributions, run the vi /mnt/usr/lib/os-release command to manually create and edit the /usr/lib/os-release file. The content of the /usr/lib/os-release file must be the same as the content of the /usr/lib/os-release file in a normal instance of the same Linux distribution.

      • If the /usr/lib/os-release file exists, roll back the cloud disk from a snapshot. For more information, see Roll back a cloud disk.

  3. Restore the system disk of the problematic ECS instance.

    For more information, see Step 3 in Best practices for data recovery of a Linux instance.

  4. Log on to the ECS instance using SSH or VNC to verify that it is restored.

Solution 2: Restore the system disk from a snapshot

  1. Go to ECS console - Snapshots.

  2. In the upper-left corner of the page, select a region and resource group.地域

  3. On the Disk Snapshot tab, find the target snapshot. In the Actions column, click Roll Back Disk.

  4. In the Roll Back Disk dialog, select the confirmation checkbox, and click OK.

Out-of-memory (OOM) error on a Linux instance

Symptoms

When you use a Linux instance, you may encounter issues such as program crashes, abnormal process I/O, or a slow or hanging ECS instance. In the system log /var/log/message, a large number of Out of Memory (OOM) messages appear, as shown below:

Out of memory: Kill process 19476 (httpd) score 7 or sacrifice child
Killed process 19497 (httpd) total-vm:558024kB, anon-rss:201248kB, file-rss:732kB, shmem-rss:12kB
Out of memory: Kill process 19477 (httpd) score 7 or sacrifice child
...
INFO: task http:28062 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
INFO: task http:28073 blocked for more than 120 seconds.
...

Causes

Out-of-memory (OOM) is a mechanism in the Linux kernel that forcefully terminates processes to free up memory when the system is critically low on memory.

Numerous OOM messages indicate that the system has insufficient memory. As a result, the system cannot allocate enough memory for applications or processes, causing them to fail or experience data read/write errors.

Solution

OOM errors can be triggered by various factors, such as insufficient system resources, a memory leak, improper system configurations, or inefficient memory allocation. Follow these steps to troubleshoot the issue.

  1. Remotely log on to the ECS instance.

    For details, see Log on to a Linux instance by using Workbench.

  2. Run the following command to check the system log for the processes, timestamps, and frequency of OOM events.

    cat /var/log/message
  3. Review the system load of the Linux instance and the application logs at the time the OOM errors occurred.

    • Check the system load of the Linux instance.

    • Review the application logs to identify the cause of the OOM error.

      Use memory profiling tools such as Valgrind, JProfiler, or jmap to analyze the memory usage of your applications.

  4. Analyze the cause of the OOM error based on the system load and application logs, and then take one of the following actions:

  5. Continue to monitor the ECS instance. If OOM errors no longer occur, the issue is resolved.