Diagnose and resolve common pod issues, including abnormal states, image pull failures, and OOM errors.
In this topic
|
Category |
Content |
|
Diagnostic process |
|
|
Common troubleshooting methods |
|
|
Common issues and solutions |
Diagnostic process

-
Check for abnormal pod states. For more information, see Check the pod status.
-
If a pod is in an abnormal state, check its events, logs, and configuration to identify the cause. For more information, see Common troubleshooting methods. For information about abnormal pod states and how to handle them, see Common abnormal pod states and solutions.
-
If the pod is in the
Runningstate but does not work as expected, see A pod is in theRunningstate but does not work as expected.
-
-
If you confirm a pod OOM issue, see Troubleshoot pod OOM issues.
-
If the issue persists, submit a ticket.
Common abnormal pod states and solutions
|
Pod status |
Description |
Solution |
|
Pending |
The pod has not been scheduled. |
|
|
Init:N/M |
The pod has M init containers, and N have started successfully. |
A pod is in the Init:N/M, Init:Error, or Init:CrashLoopBackOff state |
|
Init:Error |
An init container failed to start. |
A pod is in the Init:N/M, Init:Error, or Init:CrashLoopBackOff state |
|
Init:CrashLoopBackOff |
An init container failed to start and is restarting repeatedly. |
A pod is in the Init:N/M, Init:Error, or Init:CrashLoopBackOff state |
|
Completed |
The pod has finished executing its startup command. |
|
|
CrashLoopBackOff |
The pod failed to start and is restarting repeatedly. |
|
|
ImagePullBackOff |
The pod failed to pull the image. |
|
|
Running |
|
|
|
Terminating |
The pod is being terminated. |
Common troubleshooting methods
Check the pod status
-
Log on to the ACS console. In the left navigation pane, click Clusters.
-
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Pods.
-
In the upper-left corner of the Pods page, select the pod's Namespace and check its status.
-
If the status is
Running, the pod is working as expected. -
If the status is not
Running, the pod is in an abnormal state. For solutions, see Common abnormal pod states and solutions.
-
Check the pod details
-
Log on to the ACS console. In the left navigation pane, click Clusters.
-
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Pods.
-
In the upper-left corner of the Pods page, select the pod's Namespace. Then, click the target pod's name or click Details in the Actions column to view its details, such as name, image, and IP address.
Check the pod configuration
-
Log on to the ACS console. In the left navigation pane, click Clusters.
-
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Pods.
-
In the upper-left corner of the Pods page, select the pod's Namespace. Then, click the target pod's name or click Details in the Actions column.
-
On the pod details page, click Edit in the upper-right corner to view the pod's YAML file and detailed configuration.
Check pod events
-
Log on to the ACS console. In the left navigation pane, click Clusters.
-
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Pods.
-
In the upper-left corner of the Pods page, select the pod's Namespace. Then, click the target pod's name or click Details in the Actions column.
-
In the upper-right corner of the pod details page, click Edit to view the pod's YAML file and detailed configuration.
-
On the pod details page, click the Events tab.
NoteBy default, Kubernetes retains events from the last hour. To store events for a longer period, see Create and use a Kubernetes event center.
Check pod logs
-
Log on to the ACS console. In the left navigation pane, click Clusters.
-
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Pods.
-
In the upper-left corner of the Pods page, select the pod's Namespace. Then, click the target pod's name or click Details in the Actions column.
-
On the pod details page, click the Logs tab.
NoteAlibaba Cloud Container Service for Kubernetes (ACK) clusters are integrated with Simple Log Service. Enable Simple Log Service when creating a cluster to collect container logs, including standard output and text files within containers. For more information, see Configure application log collection by using pod environment variables.
Check pod monitoring data
-
Log on to the ACS console. In the left navigation pane, click Clusters.
-
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > .
-
On the Prometheus Monitoring page, click the Cluster Overview tab to view monitoring dashboards for pod CPU, memory, and network I/O.
Connect to a container via terminal
-
Log on to the ACS console. In the left navigation pane, click Clusters.
-
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Pods.
-
On the Pods page, find the target pod and click Terminal in the Actions column.
NoteUse the terminal to inspect local files and other information inside the container.
Diagnose pod failures
-
Log on to the ACS console. In the left navigation pane, click Clusters.
-
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Workloads > Pods.
-
In the upper-left corner of the Pods page, select the pod's Namespace. Then, click the target pod's name or click Details in the Actions column.
-
On the Pods page, find the target pod and click Diagnose in the Actions column.
NoteRun diagnostics on the pod and resolve the issue based on the results. For more information, see Use cluster diagnostics.
A pod is in the Pending state
Cause
A pod in the Pending state cannot be scheduled, typically due to resource dependencies or misconfigured quotas.
Symptom
The pod status is Pending.
Solution
Examine the pod events to identify why the pod cannot be scheduled. The main causes include:
-
Resource dependencies
A pod may depend on other cluster resources such as ConfigMaps or persistent volume claims (PVCs). For example, a persistent volume claim must be bound to a persistent volume before it can be used by a pod.
-
Misconfigured quotas
Check the events and audit logs.
Pod states: Init, Error, or CrashLoopBackOff
Cause
-
If a pod is stuck in the
Init:N/Mstate, it has M init containers, but only N have started successfully, leaving M-N containers that failed to start. -
If a pod is in the
Init:Errorstate, an init container in the pod failed to start. -
If a pod is in the
Init:CrashLoopBackOffstate, an init container failed to start and is restarting repeatedly.
Symptom
-
The pod status is
Init:N/M. -
The pod status is
Init:Error. -
The pod status is
Init:CrashLoopBackOff.
Solution
-
Check pod events to see if there are any issues with the init containers that have not started. For more information, see Check pod events.
-
Check the logs of the init containers that have not started to troubleshoot the issue. For more information, see Check pod logs.
-
Check the pod configuration to verify that the configuration of the init containers that have not started is correct. For more information, see Check the pod configuration. For more information about init containers, see Debug Init Containers.
A pod is in the ImagePullBackOff state
Cause
A pod in the ImagePullBackOff state was scheduled but failed to pull its container image.
Symptom
The pod status is ImagePullBackOff.
Solution
Check the pod event description to find the name of the image that failed to pull.
-
Verify that the container image name is correct.
-
If you are using a private image repository, see Use an image from an image repository to create an ACK workload for the solution.
A pod is in the CrashLoopBackOff state
Cause
The CrashLoopBackOff state indicates an application problem within the container.
Symptom
The pod status is CrashLoopBackOff.
Solution
-
Check the pod events to confirm if the pod has an issue. For more information, see Check pod events.
-
Check the pod logs to troubleshoot the issue. For more information, see Check pod logs.
-
Check the pod configuration to verify that the container's health check configuration is correct. For more information, see Check the pod configuration. For more information about pod health checks, see Configure Liveness, Readiness, and Startup Probes.
A pod is in the Completed state
Cause
A pod enters the Completed state after all its container processes exit.
Symptom
The pod status is Completed.
Solution
-
Check the pod configuration to identify the startup command for the container in the pod. For more information, see Check the pod configuration.
-
Check the pod logs to troubleshoot the issue. For more information, see Check pod logs.
Pod in Running state is not working
Cause
The YAML file used for deployment contains an error.
Symptom
The pod is in the Running state but does not work as expected.
Solution
-
Check the pod configuration to determine if the container configuration meets your expectations. For more information, see Check the pod configuration.
-
Use the following methods to check for spelling errors in the keys of environment variables.
The following example shows how to identify a spelling error if
commandis misspelled ascommnd.NoteWhen you create a pod, the cluster ignores spelling errors in the keys of environment variables. For example, if you misspell
CommandasCommnd, you can still use the YAML file to create the resource. However, at runtime, the container ignores the misspelled command and runs the default command from the image instead.-
Before you run the
kubectl apply -fcommand, include--validate, and then run thekubectl apply --validate -f XXX.yamlcommand.If you misspell command as commnd, you will see an error message
XXX] unknown field: commnd XXX] this may be a false alarm, see https://gXXXb.XXX/6842pods/test. -
Run the following command and compare the output pod.yaml file with the original file that you used to create the pod.
kubectl get pods [$Pod] -o yaml > pod.yamlNote[$Pod]is the name of the abnormal pod. You can run thekubectl get podscommand to view the pod name.-
If the pod.yaml file has a few more lines than the file you used to create the pod, it means the created pod matches your expectations.
-
If a line of code from your original file is missing in the pod.yaml file, it indicates a spelling error in your original file.
-
-
-
Check the pod logs to troubleshoot the issue. For more information, see Check pod logs.
-
Access the container through the terminal to check if the local files inside are as expected. For more information, see Connect to a container by using the terminal.
A pod is in the Terminating state
Cause
The pod is being shut down.
Symptom
The pod status is Terminating.
Solution
A pod in the Terminating state is automatically deleted after a grace period. If it remains stuck, run the following command to forcefully delete it:
kubectl delete pod [$Pod] -n [$namespace] --grace-period=0 --force
Troubleshoot pod OOM issues
Cause
When a container exceeds its memory limit, the system terminates it with an OOM (Out of Memory) event, causing an unexpected exit. For more information about OOM events, see Assign Memory Resources to Containers and Pods.
Symptom
-
If the terminated process is a blocking process, the container may restart unexpectedly.
-
If an OOM issue occurs, on the pod details page in the console, the Events tab displays the OOM event pod was OOM killed.
Solution
-
Review the memory growth curve in the pod monitoring data to determine when the issue occurred. For more information, see Check pod monitoring data.
-
Based on the monitoring data, memory growth timeline, logs, and process names, check if the corresponding process has a memory leak.
-
If the OOM is caused by a process memory leak, troubleshoot the root cause based on your application.
-
If the process is running normally, increase the pod's memory limit based on your workload needs. We recommend that a pod's actual memory usage not exceed 80% of its memory limit. For more information, see Manage pods.
-