A node pool is a logical group of nodes that share the same properties. Node pools allow for unified node management and Operations and Maintenance (O&M), such as node upgrades and elastic scaling. ACK also provides various automated O&M capabilities for node pools, such as automatic remediation for OS CVE vulnerabilities and automatic recovery of failed nodes, to help reduce O&M costs.
Introduction to node pools
A node pool is a configuration template. Nodes that are scaled out in a node pool adopt its configuration. You can create multiple node pools with different configurations and types in a cluster. The configuration of a node pool includes node properties, such as instance types, billing methods, zones (vSwitches), operating system images, CPU architectures, labels, and taints. You can specify these properties when you create a node pool or edit them after the node pool is created.
Older clusters that were created before the node pool feature was introduced may contain unmanaged worker nodes. We recommend that you add these nodes to a node pool for easier management. For more information, see Migrate unmanaged nodes to a node pool.
You can use a single node pool to reduce management and configuration complexity. You can also use multiple node pools to implement fine-grained resource isolation and manage a hybrid deployment of different node types.
Single node pool | Multiple node pools |
Manage compute resources for multiple teams or workloads through a single node pool to simplify operations and maintenance. A single node pool supports the following features.
Instances with different operating systems and CPU architectures (Arm and x86) cannot be mixed. | Create multiple node pools to provide independent compute resources for different workloads or teams. This helps avoid resource contention and potential security risks. This is suitable for the following scenarios.
|
When you use multiple node pools, you can use scheduling policies to define the priority of different node pools to optimize resource and cost management. For example:
Control the provisioning priority of compute resources with different costs, such as spot instances and subscription instances, to reduce overall costs.
Allocate different instance types based on workload requirements, such as the required ratio of x86 to Arm architectures.
Node pool features
ACK provides various node management capabilities at the node pool level. If you want to reduce the O&M workload for worker nodes and focus more on application development, you can enable the managed node pool feature to use various automated O&M capabilities.
Basic Features
Feature | Description | References |
Create, edit, delete, and view |
| |
Manual or automatic scaling |
| |
Add existing nodes | Use the Add Existing Nodes feature to add a purchased ECS instance to an ACK cluster as a worker node or to add a worker node back to a node pool after it was removed. This feature has some limitations and important considerations. For more information, see the referenced document. | |
Remove nodes | If you no longer need certain nodes, you can remove them from the cluster or node pool. Follow the standard procedures to avoid unexpected behavior. | |
Upgrade the kubelet version | You can automatically upgrade the kubelet and runtime using the automatic cluster upgrade feature. Upgrade the kubelet and containerd versions of nodes in a node pool. | |
Change the operating system | Upgrade the operating system version or change the operating system type. For example, you can switch from an end-of-life (EOL) operating system to ContainerOS or Alibaba Cloud Linux. | |
CVE vulnerability remediation | You can enable automated O&M capabilities Manually scan for CVE vulnerabilities and fix security vulnerabilities in the node's operating system. Some CVE vulnerability fixes require a node restart. For more information about this feature and its considerations, see the referenced document. | |
Customize kubelet parameters for a node pool | Customize kubelet parameters for nodes at the node pool level to adjust node behavior. For example, you can adjust cluster resource reservations to manage resource allocation. | |
Customize OS parameters for a node pool | Customize OS parameters for nodes at the node pool level to tune system performance. | |
Cost insights | Analyze resource usage and cost distribution at the node pool level to optimize costs and improve cluster resource utilization. |
Automated O&M capabilities
Enabling automated O&M capabilities for a node pool can reduce the O&M workload for worker nodes. This allows ACK to automatically perform certain O&M operations, such as automatic remediation for operating system (OS) CVE vulnerabilities, automatic kubelet upgrades, and automatic fault recovery for nodes. However, this approach is not recommended if your services are sensitive to changes in underlying nodes and cannot tolerate node restarts or application pod migrations.
Preparations
Ensure that the operating system is Alibaba Cloud Linux 3 Container-Optimized Edition, ContainerOS, Alibaba Cloud Linux, Red Hat, or Ubuntu.
For more information about the operating system images supported by ACK clusters and their limitations, see Operating systems.
Before you use the automated O&M capabilities of a node pool, you must complete the following operations on the Node Pool page in the Container Service Management Console. You can modify these configurations at any time.
For more information about how to create and edit a node pool, see Create and manage a node pool.
Feature introduction
Feature | Description |
Node auto-healing | ACK automatically monitors node status and performs auto-healing tasks when a node becomes abnormal. This fixes issues with the system, K8s components, and node instances. For more information, see Enable node auto-healing. |
Automatic OS CVE vulnerability remediation | ACK scans for security vulnerabilities on nodes, schedules, and executes a CVE vulnerability remediation plan based on the cluster O&M window. This improves cluster stability, security, and compliance. For more information about the notes, see Fix OS CVE vulnerabilities for a node pool. |
Automatic response to ECS system events | Supports automatic response to ECS system events. The following system event types are currently supported.
|
Starting from January 31, 2026, the configuration entries for automatic upgrades of kubelet and container runtimes in managed node pools will be removed. You can configure automatic cluster upgrades to automatically upgrade node pools. For more information, see Product Change | Announcement on changes to security vulnerability remediation and automatic upgrades for managed node pools.
Node pool lifecycle
The lifecycle of an ACK cluster node pool involves multiple stages and states, from creation and deployment, to running and maintenance (including scaling, updating, and node removal), and finally to deletion. The following section describes the different states and their transitions.
Node pool state | Description |
Initializing (initial) | The node pool is being created. |
Active (active) | The node pool is successfully created and is running. |
Failed (failed) | The node pool failed to be created. |
Scaling (scaling) | The node pool is scaling out or adding nodes. |
Updating (updating) | The node pool configuration is being updated. |
Removing nodes (removing_nodes) | Nodes are being removed from the node pool. |
Upgrading (upgrading) | The node pool is being upgraded. |
Repairing (repairing) | The node pool is being repaired, for example, repairing nodes or fixing CVE vulnerabilities in the node pool. |
Deleting (deleting) | The node pool is being deleted. |
Deleted (deleted, this state is not visible to you) | The node pool is successfully deleted. |
Deletion failed (deleted_failed) | The node pool failed to be deleted. Try to delete it again. If the deletion still fails, submit a ticket. |
Node pool billing
The use of node pools and their automated O&M capabilities is free of charge. However, you are charged for the cloud resources within the node pool, such as ECS instances, by the corresponding cloud services.
For more information about the billing of ECS instances, see Billing overview.
To change the billing method of existing nodes in a node pool, see Change the billing method of an instance from pay-as-you-go to subscription. Changing the billing method of a node pool affects only new nodes that are scaled out. It does not change the billing method of existing nodes in the node pool.
For more information about the billing of scaling groups, see Auto Scaling billing.
Glossary
Before you use node pools, we recommend that you familiarize yourself with the following concepts and terms.
Scaling group: When you scale a node pool in or out, ACK performs scale-out and node removal operations using the Auto Scaling (ESS) service. Each node pool has a one-to-one relationship with an Auto Scaling scaling group. A scaling group is a collection of one or more ECS instances (worker nodes).
Scaling configuration: A node pool uses a scaling configuration to manage node configurations. A scaling configuration is a template that is used to create ECS instances during elastic scaling. When a scaling activity is triggered, Auto Scaling uses the specified scaling configuration to automatically create ECS instances.
Scaling activity: Every scale-in, scale-out, node addition, or node removal in a node pool triggers a scaling activity. After a scaling activity is triggered, all scaling actions are automatically performed by the system and the relevant records are saved. You can view the scaling activity history of a node pool.
Replace the system disk: Some operations in a node pool, such as automatically adding existing nodes and changing the container runtime, initialize the node by replacing its system disk. The instance properties of the node, such as node name, instance ID, and IP address, do not change, but the data on the node's system disk is deleted. Data disks that are attached to the node are not affected.
When ACK replaces a system disk, it drains the node. This process evicts the pods on the node to other available nodes and adheres to the Pod Disruption Budget (PDB). To ensure high service availability, we recommend that you use a multi-replica deployment to distribute workloads across multiple nodes and configure PDBs for critical services. This helps control the number of pods that can be disrupted at the same time.
In-place upgrade: An upgrade method that serves as an alternative to replacing the system disk. This method directly updates and replaces the required components on the node. An in-place upgrade does not replace the system disk or reinitialize the node, and the data on the node is not affected.
References
For more information about the capacity limits of Kubernetes resources supported by a cluster, such as the maximum number of nodes and pods, see Quotas.
ACK supports automatic cluster upgrades.
Clusters that run Kubernetes 1.24 or later no longer support Docker as the container runtime. You must migrate to containerd. For more information, see Migrate from Docker to containerd.
As the cluster load increases, you can enable an elastic scaling solution to dynamically adjust node resources. For more information, see Auto Scaling.
To specify the node pool for application pods, see Schedule pods to a specific node pool.
ACK managed cluster Basic Edition supports a maximum of 10 nodes. We recommend that you perform a hot migration to ACK managed cluster Pro Edition to obtain a higher resource quota. For more information, see Hot migrate an ACK managed cluster of Basic Edition to Pro Edition.
If you encounter problems when you use nodes or node pools, see Node and node pool FAQ for troubleshooting.
ACK provides a cluster cost management solution. For more information, see Cost management suite.
For best practices related to node pools, see Best practices for nodes and node pools.