This topic answers frequently asked questions about nodes and node pools. Learn how to change the maximum number of pods on a node, update the OS image for a node pool, and troubleshoot node timeout issues.
Index
To diagnose and troubleshoot node issues, see Troubleshoot abnormal nodes.
Using spot instances in a node pool
To use spot instances, create a new node pool or use the spot-instance-advisor command. For more information, see Best practices for spot instance node pools.
To maintain consistency within a node pool, you cannot convert a spot instance node pool to a pay-as-you-go or subscription node pool, or vice versa.
Multiple ECS instance types per node pool
Yes, you can. We recommend configuring your node pool with multiple ECS instance types to prevent node scale-out failures due to instance type unavailability or inventory shortages. To do this, configure multiple vSwitches across multiple availability zones, select multiple ECS instance types, or specify instance types based on vCPU and memory. After the node pool is created, you can add instance types based on the scalability level recommendations in the console or view the scalability level of a node pool.
For a list of supported instance types and node configuration recommendations, see ECS instance type configuration recommendations.
Maximum number of pods per node
The calculation for the maximum number of Pods per node depends on the cluster's network plugin. For more information, see Maximum number of Pods per node.
-
Terway: The maximum number of Pods per node is the sum of Pods on the container network and on the host network.
-
Flannel: The limit is the Number of Pods per Node value specified during cluster creation.
You can view the maximum number of Pods per node, also known as the Pod Quota, on the NodesNodes page of the console.
You cannot change the maximum number of Pods per node after creating a cluster. If a node reaches this limit, scale out your nodes to increase Pod capacity. For more information, see Adjust available Pods on a node.
Adjust node pod capacity
When your Container Service for Kubernetes (ACK) cluster runs out of pod capacity, you need to add more pods. The maximum number of pods per node depends on the network plug-in and the Elastic Compute Service (ECS) instance type. This limit is fixed by the instance type in Terway mode and by cluster creation settings in Flannel mode, and cannot be adjusted in most cases. This topic explains how pod limits are calculated for each network plug-in and how to increase pod capacity.
For more information, see Increase the maximum number of pods in a cluster.
Modify node configuration
-
To ensure service stability, certain parameters—specifically those related to availability and networking—are immutable after a node pool is created. For example, you cannot change the container runtime or the VPC to which a node belongs.
-
For mutable parameters, changes to the node pool configuration apply only to new nodes. Existing nodes are not affected unless you use specific options such as Update ECS Tags of Existing Nodes or Update Labels and Taints of Existing Nodes.
See Create and manage node pools for details on modifiable parameters and when changes take effect.
Alternatively, to apply a new configuration, you can create a new node pool with the desired configuration. Then, cordon and drain the nodes in the old node pool to migrate your workloads. After the migration is complete, you can release the instances in the old node pool. For instructions, see Cordon and drain nodes.
Can I disable Expected Nodes?
If the Scaling Mode of a node pool is set to Manual, you must configure the Expected Nodes. This feature cannot be disabled.
To remove a specific node, see Remove a node. To add a specific node, see add an existing node. After you remove a node or add an existing node, Expected Nodes automatically updates to the new node count. You do not need to change it manually.
Node pools with and without Expected Nodes
The Expected Nodes parameter defines the intended capacity of a node pool. You can scale a node pool in or out by adjusting this value. However, some legacy node pools may not have this feature enabled.
The following table describes how the system responds to operations for node pools with and without the Expected Nodes feature enabled.
|
Actions |
Expected nodes enabled |
Expected nodes disabled (legacy) |
Recommendation |
|
Scale in by reducing Expected Nodes in the ACK console or by using OpenAPI. |
The system terminates nodes until the actual node count matches the new Expected Nodes value. |
If the current node count is greater than the specified value, the system terminates the excess nodes. This action also enables the Expected Nodes feature for the node pool. |
None. |
|
Remove a specific node from the ACK console or by using OpenAPI. |
The Expected Nodes value decreases by the number of nodes removed. For example, if the Expected Nodes value is 10 and you remove 3 nodes, the value becomes 7. |
The specified node is removed from the cluster. |
None. |
|
Remove a node by running |
The Expected Nodes value remains unchanged. |
No change. |
Not recommended. |
|
Manually release an ECS instance from the ECS console or by using OpenAPI. |
The system automatically creates a new ECS instance to maintain the Expected Nodes count. |
The node pool is unaware of the change. No new ECS instance is created. The deleted node temporarily displays an Unknown status. |
Not recommended. This causes data inconsistency between ACK and Auto Scaling (ESS). Use the recommended method to remove nodes. For more information, see Remove a node. |
|
A subscription ECS instance expires. |
The system automatically creates a new ECS instance to maintain the Expected Nodes count. |
The node pool is unaware of the change. No new ECS instance is created. The deleted node temporarily displays an Unknown status. |
Not recommended. This causes data inconsistency between ACK and ESS. Use the recommended method to remove nodes. For more information, see Remove a node. |
|
An ECS instance in an ESS scaling group with health checks enabled fails a health check (for example, because the instance is stopped). |
The system automatically creates a new ECS instance to maintain the Expected Nodes count. |
The system creates a new ECS instance to replace the failed one. |
Not recommended. Do not directly manage scaling groups that are associated with a node pool. |
|
You remove an ECS instance from an ESS scaling group without modifying the expected instance count. |
The system automatically creates a new ECS instance to maintain the Expected Nodes count. |
No new ECS instance is created. |
Not recommended. Do not directly manage scaling groups that are associated with a node pool. |
Migrate unmanaged nodes to a node pool
In older ACK clusters created before the node pool feature was released, some worker nodes may not be managed by any node pool. If you no longer need these nodes, release their ECS instances directly. If you want to keep them, add them to a node pool for grouped management and automated operations and maintenance.
To do this, create a new node pool or scale out an existing one, remove the unmanaged nodes from the cluster, and then add them to the target node pool. For more information, see Migrate unmanaged nodes to a node pool.
Replace the OS image of a node pool
You can replace the operating system of a node pool, for example, to migrate from a version that has reached its end-of-life (EOL) to a supported one. Before you begin, consult the OS image release notes for supported operating systems, the latest image versions, and usage limitations.
See Replace the OS of a node pool for detailed instructions and considerations.
Release a specific ECS instance
To release a specific ECS instance, remove the node. This action automatically updates the expected node count. Do not attempt to release a specific instance by changing the expected node count, as this triggers a random scale-in and is not guaranteed to remove the intended instance.
What do I do if adding an existing node fails with a timeout error?
Check connectivity: Ensure the node has network access to the API server Classic Load Balancer (CLB) instance.
Security groups: Verify that the Security Group rules allow the required traffic. Refer to the Security group limits for adding existing nodes.
General networking: For more complex issues, see Network management FAQ.
Change worker node hostnames
You cannot customize a worker node's hostname after you create the cluster. As a workaround, you can use the node pool's naming rule to change the hostname.
When you create a cluster, you can define the hostname of a worker node in the Custom Node Name parameter. For more information, see Create an ACK managed cluster.
-
Remove the node. For more information, see Remove a node.
-
Add the node that you removed back to the node pool. For more information, see Manually add nodes.
The node is then automatically renamed based on the naming rule of the node pool.
Manually upgrade a GPU node kernel
This topic describes how to manually upgrade the kernel and the corresponding NVIDIA driver on a GPU node in an existing cluster.
The current kernel version is lower than 3.10.0-957.21.3.
Upgrading the kernel is a sensitive operation. Confirm your target kernel version and proceed with caution.
This guide focuses on the NVIDIA driver upgrade required after a kernel upgrade. The kernel upgrade process itself is not covered.
-
Cordon the GPU node (for example, node cn-beijing.i-2ze19qyi8votgjz*****).
kubectl cordon cn-beijing.i-2ze19qyi8votgjz***** node/cn-beijing.i-2ze19qyi8votgjz***** cordoned -
Drain the GPU node where you want to upgrade the driver.
kubectl drain cn-beijing.i-2ze19qyi8votgjz***** --grace-period=120 --ignore-daemonsets=true node/cn-beijing.i-2ze19qyi8votgjz***** cordoned WARNING: Ignoring DaemonSet-managed pods: flexvolume-9scb4, kube-flannel-ds-r2qmh, kube-proxy-worker-l62sf, logtail-ds-f9vbg pod/nginx-ingress-controller-78d847fb96-***** evicted -
Uninstall the current NVIDIA driver.
NoteThe driver package uninstalled in this step is version 384.111. If your driver version is not 384.111, you need to download the corresponding driver installer from the official NVIDIA website and replace
384.111in this step with your actual version.-
Log in to the GPU node and run
nvidia-smito check the driver version.sudo nvidia-smi -a | grep 'Driver Version' Driver Version : 384.111 -
Download the NVIDIA driver installer.
cd /tmp/ sudo curl -O https://cn.download.nvidia.cn/tesla/384.111/NVIDIA-Linux-x86_64-384.111.runNoteYou must use the installer to uninstall the NVIDIA driver.
-
Uninstall the current NVIDIA driver.
sudo chmod u+x NVIDIA-Linux-x86_64-384.111.run sudo sh ./NVIDIA-Linux-x86_64-384.111.run --uninstall -a -s -q
-
-
Upgrade the kernel.
Follow your operating system's procedures to upgrade the kernel.
-
Restart the GPU instance.
sudo reboot -
Log in to the GPU node again and install the corresponding kernel devel.
sudo yum install -y kernel-devel-$(uname -r) -
Go to the official NVIDIA website to download and install the required NVIDIA driver. This topic uses version 410.79 as an example.
# Change to the /tmp directory. cd /tmp/ # Download the NVIDIA driver installer. sudo curl -O https://cn.download.nvidia.cn/tesla/410.79/NVIDIA-Linux-x86_64-410.79.run # Add executable permissions to the installer. sudo chmod u+x NVIDIA-Linux-x86_64-410.79.run # Run the installer in silent mode. sudo sh ./NVIDIA-Linux-x86_64-410.79.run -a -s -q # Warm up the GPU. sudo nvidia-smi -pm 1 || true sudo nvidia-smi -acp 0 || true sudo nvidia-smi --auto-boost-default=0 || true sudo nvidia-smi --auto-boost-permission=0 || true sudo nvidia-modprobe -u -c=0 -m || true -
Check /etc/rc.d/rc.local to confirm whether it contains the following configuration. If not, add it manually.
sudo nvidia-smi -pm 1 || true sudo nvidia-smi -acp 0 || true sudo nvidia-smi --auto-boost-default=0 || true sudo nvidia-smi --auto-boost-permission=0 || true sudo nvidia-modprobe -u -c=0 -m || true -
Restart kubelet and Docker.
sudo service kubelet stop sudo service docker restart sudo service kubelet start -
Uncordon the GPU node to allow pods to be scheduled on it again.
kubectl uncordon cn-beijing.i-2ze19qyi8votgjz***** node/cn-beijing.i-2ze19qyi8votgjz***** uncordoned -
Verify the version of the device plugin pod on the GPU node.
kubectl exec -n kube-system -t nvidia-device-plugin-cn-beijing.i-2ze19qyi8votgjz***** nvidia-smi Thu Jan 17 00:33:27 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 410.79 Driver Version: 410.79 CUDA Version: N/A | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P100-PCIE... On | 00000000:00:09.0 Off | 0 | | N/A 27C P0 28W / 250W | 0MiB / 16280MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+NoteIf you run the
docker pscommand and find that no containers are started on the GPU node, see Fix GPU node container startup issues.
Fix container startup on GPU nodes
On a GPU node running certain versions of Kubernetes, containers may fail to start after you restart the kubelet and Docker services. The sudo docker ps command returns an empty list.
sudo service kubelet stop
# Redirecting to /bin/systemctl stop kubelet.service
sudo service docker stop
# Redirecting to /bin/systemctl stop docker.service
sudo service docker start
# Redirecting to /bin/systemctl start docker.service
sudo service kubelet start
# Redirecting to /bin/systemctl start kubelet.service
sudo docker ps
# CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
This issue occurs when the Cgroup Driver used by Docker does not match the one expected by the kubelet. To diagnose the issue, check Docker's Cgroup Driver.
sudo docker info | grep -i cgroup
Cgroup Driver: cgroupfs
If the output is cgroupfs, it confirms a mismatch, as the kubelet is configured to use the systemd driver.
To fix this issue, change the Docker Cgroup Driver to systemd.
-
Back up /etc/docker/daemon.json, and then run the following command to update /etc/docker/daemon.json.
sudo cat >/etc/docker/daemon.json <<-EOF { "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } }, "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m", "max-file": "10" }, "oom-score-adjust": -1000, "storage-driver": "overlay2", "storage-opts":["overlay2.override_kernel_check=true"], "live-restore": true } EOF -
Restart Docker and kubelet to apply the changes.
sudo service kubelet stop # Redirecting to /bin/systemctl stop kubelet.service sudo service docker restart # Redirecting to /bin/systemctl restart docker.service sudo service kubelet start # Redirecting to /bin/systemctl start kubelet.service -
Verify that the Docker Cgroup Driver is set to
systemd.sudo docker info | grep -i cgroup Cgroup Driver: systemd
Migrate Pods from a failed node
To migrate application Pods from a failed node, mark the node as unschedulable and then drain it. This process safely evicts the Pods and reschedules them onto healthy nodes.
-
Log on to the ACK console. On the Nodes page, find the failed node. In the Actions column, choose More > Drain.
-
Troubleshoot the failed node. For more information, see Troubleshoot node issues.
Node eviction policy during availability zone failures
When a node becomes unhealthy, the node controller initiates an eviction. The default eviction rate is 0.1 nodes per second, controlled by the --node-eviction-rate parameter. This means Pods are evicted from at most one node every 10 seconds.
However, for an ACK cluster with nodes in multiple availability zones, the node controller adjusts this policy based on the health status of each availability zone and the cluster size.
An availability zone can be in one of three health states.
-
FullDisruption: The availability zone has no healthy nodes and at least one unhealthy node.
-
PartialDisruption: The availability zone contains at least two unhealthy nodes, and the ratio of unhealthy nodes to total nodes (calculated as
(unhealthy nodes / (unhealthy nodes + healthy nodes))) exceeds 0.55. -
Normal: The availability zone does not meet the criteria for FullDisruption or PartialDisruption.
Clusters are also classified by size:
-
Large cluster: A cluster with more than 50 nodes.
-
Small cluster: A cluster with 50 or fewer nodes.
The node controller determines the eviction rate based on these states:
-
If all availability zones are in a FullDisruption state, eviction is disabled for the entire cluster.
-
If at least one availability zone is not in a FullDisruption state, the eviction rate is determined as follows:
-
For an availability zone in a FullDisruption state, the eviction rate is set to the default value of 0.1 nodes per second, regardless of cluster size.
-
For an availability zone in a PartialDisruption state, the eviction rate depends on the cluster size. In a large cluster, the rate is reduced to 0.01 nodes per second. In a small cluster, the rate is set to 0, which disables eviction.
-
For an availability zone in a Normal state, the eviction rate is set to the default value of 0.1 nodes per second, regardless of cluster size.
-
For more information, see Rate limits on eviction.
Kubelet path customization
No. The kubelet path in an ACK cluster is /var/lib/kubelet and cannot be changed. Do not change this path.
Mount a data disk to a custom directory
This feature is currently in canary release. To enable this feature, submit a ticket. Once enabled, the system automatically formats and mounts any data disk that you add to the node pool to a specified directory. The mount directory has the following restrictions.
-
Do not mount a data disk to the following critical operating system directories:
-
/
-
/etc
-
/var/run
-
/run
-
/boot
-
-
Do not mount a data disk to the following directories used by the system and container runtimes, or their subdirectories:
-
/usr
-
/bin
-
/sbin
-
/lib
-
/lib64
-
/ostree
-
/sysroot
-
/proc
-
/sys
-
/dev
-
/var/lib/kubelet
-
/var/lib/docker
-
/var/lib/containerd
-
/var/lib/container
-
-
Each data disk must have a unique mount directory.
-
The mount directory must be an absolute path that starts with
/. -
The mount directory must not contain carriage return or line feed characters (
\rand\n) or end with a backslash (\).
Modify file descriptor limits
The maximum number of file descriptors limits the number of files that can be open simultaneously. Alibaba Cloud Linux and CentOS systems have two levels of file descriptor limits:
-
System-level: The maximum number of files that all processes on the system can open simultaneously.
-
User-level: The maximum number of files that a single user's processes can open.
Container environments have an additional file descriptor limit: the maximum number of file descriptors per process within a container.
A node pool upgrade may overwrite changes made manually from the command line. To ensure your settings persist, edit the node pool.
Modify system-level file descriptor limit
For instructions, see Customize OS parameters for a node pool.
Modify per-process file descriptor limit
-
Log on to the node and check the
/etc/security/limits.conffile.cat /etc/security/limits.confUse the following parameters to configure the maximum number of file descriptors for a single process on the node:
... root soft nofile 65535 root hard nofile 65535 * soft nofile 65535 * hard nofile 65535 -
Run the
sedcommand to modify the maximum number of file descriptors. The recommended value is 65535.sed -i "s/nofile.[0-9]*$/nofile 65535/g" /etc/security/limits.conf -
Log on to the node again and run the following command to verify your change.
If the output matches your configured value, the change was successful.
# ulimit -n 65535
Modify container file descriptor limit
Modifying the file descriptor limit for a container requires restarting the Docker or containerd service, which will interrupt running containers. To avoid service interruptions, perform this operation during off-peak hours.
-
Log on to the node and run the following command to view the configuration file.
-
containerd node:
cat /etc/systemd/system/containerd.service -
Docker node:
cat /etc/systemd/system/docker.service
The following parameters set the file descriptor limit for a single process inside a container:
... LimitNOFILE=1048576 LimitNPROC=1048576 ... -
-
Run the following commands to modify the parameter values. The recommended value for the file descriptor limit is
1048576.-
containerd node:
sed -i "s/LimitNOFILE=[0-9a-zA-Z]*$/LimitNOFILE=1048576/g" /etc/systemd/system/containerd.service;sed -i "s/LimitNPROC=[0-9a-zA-Z]*$/LimitNPROC=1048576/g" /etc/systemd/system/containerd.service && systemctl daemon-reload && systemctl restart containerd -
Docker node:
sed -i "s/LimitNOFILE=[0-9a-zA-Z]*$/LimitNOFILE=1048576/g" /etc/systemd/system/docker.service && sed -i "s/LimitNPROC=[0-9a-zA-Z]*$/LimitNPROC=1048576/g" /etc/systemd/system/docker.service && systemctl daemon-reload && systemctl restart docker
-
-
Run the following command to check the file descriptor limit for a single process inside the container.
If the output matches your configured value, the change was successful.
-
containerd node:
# cat /proc/`pidof containerd`/limits | grep files Max open files 1048576 1048576 files -
Docker node:
# cat /proc/`pidof dockerd`/limits | grep files Max open files 1048576 1048576 files
-
Upgrade container runtime for unmanaged worker nodes
Legacy clusters created before node pools were introduced may contain unmanaged worker nodes. To upgrade the container runtime for these nodes, you must migrate them to a node pool.
Follow these steps:
-
Create a node pool: If no suitable node pool exists in the cluster, create one with a configuration that matches the unmanaged nodes.
-
Remove the node: When you remove a node, the system cordons it (marks it as unschedulable) and then drains its pods to evict them. If the drain fails, the removal process halts. The node is removed from the cluster only if the drain succeeds.
-
Add an existing node: Add the node to an existing node pool. Alternatively, you can create a node pool with zero nodes and then add the node to it. After the node is added, its container runtime automatically updates to match the one specified in the node pool's configuration.
NoteWhile the node pool feature itself is free of charge, you are billed for the underlying cloud resources, such as ECS instances, used by the node pool. For more information, see Cloud resource fees.
Node pool displayed as "Other Nodes"
ACK provides standard methods to add compute resources to a cluster through the console, OpenAPI, or CLI. For more information, see Add an existing node. If you add nodes using methods outside of standard ACK workflows, ACK cannot identify their source and assigns them to the Other Nodes group on the Nodes page. ACK cannot manage these nodes through a node pool, so features like lifecycle management, automated O&M, and guaranteed technical support are unavailable.
If you continue to use these nodes, you must ensure their compatibility with cluster add-ons and assume all potential risks. These risks include, but are not limited to, the following:
-
Version compatibility: During control plane or system component upgrades, the operating system and components on these unmanaged nodes may become incompatible with the new versions, which can cause service disruptions.
-
Workload scheduling compatibility: The cluster may fail to accurately report the status of these nodes, such as their availability zone and remaining resource capacity. This can lead to incorrect workload scheduling decisions, causing availability issues or performance degradation.
-
Data plane compatibility: The compatibility of node-side components and the operating system with the cluster's control plane and system components is not validated, posing potential stability risks.
-
O&M compatibility: Maintenance operations on these nodes through the console or OpenAPI may fail or produce unexpected results because the management channel and execution environment for these nodes are not verified.
Configure network ACLs for node vSwitches
If a node pool's vSwitch has a network ACL that denies traffic from required CIDR blocks, new nodes will fail to join the cluster and remain in a Failed or Offline state.
Follow these steps to allow the required CIDR blocks and re-add nodes:
-
Configure network ACL rules. In the inbound and outbound rules, allow traffic from the following CIDR blocks:
-
100.104.0.0/16: The management CIDR block for the ACK control plane. -
100.64.0.0/10: The Alibaba Cloud internal service CIDR block. -
100.100.100.200/32: The ECS instance metadata service endpoint. -
The primary and any secondary CIDR blocks of the cluster's VPC, or the CIDR block of the vSwitch containing the nodes.
-
-
Remove faulty nodes. Remove any nodes that were in a Failed or Offline state before the new network ACL rules took effect.
-
Create and manage node pools or expand an existing node pool to add new nodes. A Ready status on the new nodes confirms that the network ACL rules are configured correctly.