Troubleshoot common pod status issues and solutions-Container Compute Service(ACS)-阿里云帮助中心

In this topic

Category	Content
Diagnostic process	Diagnostic process
Common troubleshooting methods	Check the pod status Check the pod details Check the pod configuration Check pod events Check pod logs Check pod monitoring data Connect to a container by using the terminal Diagnose pod failures
Common issues and solutions	Common abnormal pod states and solutions Troubleshoot pod OOM issues

Diagnostic process

诊断流程2

Check for abnormal pod states. For more information, see Check the pod status.
1. If a pod is in an abnormal state, check its events, logs, and configuration to identify the cause. For more information, see Common troubleshooting methods. For information about abnormal pod states and how to handle them, see Common abnormal pod states and solutions.
2. If the pod is in the Running state but does not work as expected, see A pod is in the Running state but does not work as expected.
If you confirm a pod OOM issue, see Troubleshoot pod OOM issues.
If the issue persists, submit a ticket.

Common abnormal pod states and solutions

Pod status	Description	Solution
Pending	The pod has not been scheduled.	A pod is in the Pending state
Init:N/M	The pod has M init containers, and N have started successfully.	A pod is in the Init:N/M, Init:Error, or Init:CrashLoopBackOff state
Init:Error	An init container failed to start.	A pod is in the Init:N/M, Init:Error, or Init:CrashLoopBackOff state
Init:CrashLoopBackOff	An init container failed to start and is restarting repeatedly.	A pod is in the Init:N/M, Init:Error, or Init:CrashLoopBackOff state
Completed	The pod has finished executing its startup command.	A pod is in the Completed state
CrashLoopBackOff	The pod failed to start and is restarting repeatedly.	A pod is in the CrashLoopBackOff state
ImagePullBackOff	The pod failed to pull the image.	A pod is in the ImagePullBackOff state
Running	The pod is running normally. The pod is in the `Running` state but does not work as expected.	No action is required. A pod is in the `Running` state but does not work as expected
Terminating	The pod is being terminated.	A pod is in the Terminating state

Common troubleshooting methods

Check the pod status

Log on to the ACS console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Pods.
In the upper-left corner of the Pods page, select the pod's Namespace and check its status.
- If the status is Running, the pod is working as expected.
- If the status is not Running, the pod is in an abnormal state. For solutions, see Common abnormal pod states and solutions.

Check the pod details

Log on to the ACS console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Pods.
In the upper-left corner of the Pods page, select the pod's Namespace. Then, click the target pod's name or click Details in the Actions column to view its details, such as name, image, and IP address.

Check the pod configuration

Log on to the ACS console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Pods.
In the upper-left corner of the Pods page, select the pod's Namespace. Then, click the target pod's name or click Details in the Actions column.
On the pod details page, click Edit in the upper-right corner to view the pod's YAML file and detailed configuration.

Check pod events

Log on to the ACS console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Pods.
In the upper-left corner of the Pods page, select the pod's Namespace. Then, click the target pod's name or click Details in the Actions column.
In the upper-right corner of the pod details page, click Edit to view the pod's YAML file and detailed configuration.
On the pod details page, click the Events tab.

Note
By default, Kubernetes retains events from the last hour. To store events for a longer period, see Create and use a Kubernetes event center.

Check pod logs

Log on to the ACS console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Pods.
In the upper-left corner of the Pods page, select the pod's Namespace. Then, click the target pod's name or click Details in the Actions column.
On the pod details page, click the Logs tab.

Note
Alibaba Cloud Container Service for Kubernetes (ACK) clusters are integrated with Simple Log Service. Enable Simple Log Service when creating a cluster to collect container logs, including standard output and text files within containers. For more information, see Configure application log collection by using pod environment variables.

Check pod monitoring data

Log on to the ACS console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Prometheus Monitoring.
On the Prometheus Monitoring page, click the Cluster Overview tab to view monitoring dashboards for pod CPU, memory, and network I/O.

Connect to a container via terminal

Log on to the ACS console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Pods.
On the Pods page, find the target pod and click Terminal in the Actions column.

Note
Use the terminal to inspect local files and other information inside the container.

Diagnose pod failures

Log on to the ACS console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Pods.
In the upper-left corner of the Pods page, select the pod's Namespace. Then, click the target pod's name or click Details in the Actions column.
On the Pods page, find the target pod and click Diagnose in the Actions column.

Note
Run diagnostics on the pod and resolve the issue based on the results. For more information, see Use cluster diagnostics.

A pod is in the Pending state

Cause

A pod in the Pending state cannot be scheduled, typically due to resource dependencies or misconfigured quotas.

Symptom

The pod status is Pending.

Solution

Examine the pod events to identify why the pod cannot be scheduled. The main causes include:

Resource dependencies

A pod may depend on other cluster resources such as ConfigMaps or persistent volume claims (PVCs). For example, a persistent volume claim must be bound to a persistent volume before it can be used by a pod.
Misconfigured quotas

Check the events and audit logs.

Pod states: Init, Error, or CrashLoopBackOff

Cause

If a pod is stuck in the Init:N/M state, it has M init containers, but only N have started successfully, leaving M-N containers that failed to start.
If a pod is in the Init:Error state, an init container in the pod failed to start.
If a pod is in the Init:CrashLoopBackOff state, an init container failed to start and is restarting repeatedly.

Symptom

The pod status is Init:N/M.
The pod status is Init:Error.
The pod status is Init:CrashLoopBackOff.

Solution

Check pod events to see if there are any issues with the init containers that have not started. For more information, see Check pod events.
Check the logs of the init containers that have not started to troubleshoot the issue. For more information, see Check pod logs.
Check the pod configuration to verify that the configuration of the init containers that have not started is correct. For more information, see Check the pod configuration. For more information about init containers, see Debug Init Containers.

A pod is in the ImagePullBackOff state

Cause

A pod in the ImagePullBackOff state was scheduled but failed to pull its container image.

Symptom

The pod status is ImagePullBackOff.

Solution

Check the pod event description to find the name of the image that failed to pull.

Verify that the container image name is correct.
If you are using a private image repository, see Use an image from an image repository to create an ACK workload for the solution.

A pod is in the CrashLoopBackOff state

Cause

The CrashLoopBackOff state indicates an application problem within the container.

Symptom

The pod status is CrashLoopBackOff.

Solution

Check the pod events to confirm if the pod has an issue. For more information, see Check pod events.
Check the pod logs to troubleshoot the issue. For more information, see Check pod logs.
Check the pod configuration to verify that the container's health check configuration is correct. For more information, see Check the pod configuration. For more information about pod health checks, see Configure Liveness, Readiness, and Startup Probes.

A pod is in the Completed state

Cause

A pod enters the Completed state after all its container processes exit.

Symptom

The pod status is Completed.

Solution

Check the pod configuration to identify the startup command for the container in the pod. For more information, see Check the pod configuration.
Check the pod logs to troubleshoot the issue. For more information, see Check pod logs.

Pod in `Running` state is not working

Cause

The YAML file used for deployment contains an error.

Symptom

The pod is in the Running state but does not work as expected.

Solution

Check the pod configuration to determine if the container configuration meets your expectations. For more information, see Check the pod configuration.
Use the following methods to check for spelling errors in the keys of environment variables.

The following example shows how to identify a spelling error if command is misspelled as commnd.

Note
When you create a pod, the cluster ignores spelling errors in the keys of environment variables. For example, if you misspell Command as Commnd, you can still use the YAML file to create the resource. However, at runtime, the container ignores the misspelled command and runs the default command from the image instead.
1. Before you run the kubectl apply -f command, include --validate, and then run the kubectl apply --validate -f XXX.yaml command.
  
  If you misspell command as commnd, you will see an error message XXX] unknown field: commnd XXX] this may be a false alarm, see https://gXXXb.XXX/6842pods/test.
2. Run the following command and compare the output pod.yaml file with the original file that you used to create the pod.
```
  kubectl get pods [$Pod] -o yaml > pod.yaml
```
  Note
  [$Pod] is the name of the abnormal pod. You can run the kubectl get pods command to view the pod name.
  - If the pod.yaml file has a few more lines than the file you used to create the pod, it means the created pod matches your expectations.
  - If a line of code from your original file is missing in the pod.yaml file, it indicates a spelling error in your original file.
Check the pod logs to troubleshoot the issue. For more information, see Check pod logs.
Access the container through the terminal to check if the local files inside are as expected. For more information, see Connect to a container by using the terminal.

A pod is in the Terminating state

Cause

The pod is being shut down.

Symptom

The pod status is Terminating.

Solution

A pod in the Terminating state is automatically deleted after a grace period. If it remains stuck, run the following command to forcefully delete it:

kubectl delete pod [$Pod] -n [$namespace] --grace-period=0 --force

Troubleshoot pod OOM issues

Cause

When a container exceeds its memory limit, the system terminates it with an OOM (Out of Memory) event, causing an unexpected exit. For more information about OOM events, see Assign Memory Resources to Containers and Pods.

Symptom

If the terminated process is a blocking process, the container may restart unexpectedly.
If an OOM issue occurs, on the pod details page in the console, the Events tab displays the OOM event pod was OOM killed.

Solution

Review the memory growth curve in the pod monitoring data to determine when the issue occurred. For more information, see Check pod monitoring data.
Based on the monitoring data, memory growth timeline, logs, and process names, check if the corresponding process has a memory leak.
- If the OOM is caused by a process memory leak, troubleshoot the root cause based on your application.
- If the process is running normally, increase the pod's memory limit based on your workload needs. We recommend that a pod's actual memory usage not exceed 80% of its memory limit. For more information, see Manage pods.

In this topic

Diagnostic process

Common abnormal pod states and solutions

Common troubleshooting methods

Check the pod status

Check the pod details

Check the pod configuration

Check pod events

Check pod logs

Check pod monitoring data

Connect to a container via terminal

Diagnose pod failures

A pod is in the Pending state

Cause

Symptom

Solution

Pod states: Init, Error, or CrashLoopBackOff

Cause

Symptom

Solution

A pod is in the ImagePullBackOff state

Cause

Symptom

Solution

A pod is in the CrashLoopBackOff state

Cause

Symptom

Solution

A pod is in the Completed state

Cause

Symptom

Solution

Pod in Running state is not working

Cause

Symptom

Solution

A pod is in the Terminating state

Cause

Symptom

Solution

Troubleshoot pod OOM issues

Cause

Symptom

Solution

Pod in `Running` state is not working