By default, the ACK scheduler filters nodes based only on their resource requests. We recommend that you enable the load-aware scheduling feature in ACK Pro clusters. This feature analyzes the actual load of each node and preferentially schedules Pods to nodes with lower loads. This improves load balancing across the cluster and reduces the risk of node failures.
Prerequisites
-
The ack-koordinator component of version 1.1.1-ack.1 or later is installed. For more information, see ack-koordinator (ack-slo-manager).
-
The ACK kube-scheduler in your cluster must use a version that supports load-aware scheduling. See the table below for details.
ACK version
Supported ACK scheduler version
1.26 or later
All versions are supported.
1.24
v1.24.6-ack-4.0 or later
1.22
v1.22.15-ack-4.0 or later
Billing
The ack-koordinator component is free to install and use. However, additional fees may be incurred in the following scenarios:
ack-koordinator is a self-managed component and consumes worker node resources after installation. You can configure the resource requests for each module when you install the component.
By default, ack-koordinator exposes monitoring metrics for features such as resource profiling and fine-grained scheduling in Prometheus format. If you select the Enable Prometheus Monitoring for ACK-Koordinator option when you configure the component and use the Alibaba Cloud Prometheus service, these metrics are considered custom metrics and incur fees. The fees depend on factors such as your cluster size and the number of applications. Before you enable this feature, carefully read the Billing of Prometheus instances documentation for Alibaba Cloud Prometheus to understand the free quota and billing policies for custom metrics. You can monitor and manage your resource usage by querying usage data.
Limitations
This feature is available only in ACK Pro clusters. For more information, see Create an ACK Pro cluster.
How load-aware scheduling works
Load-aware scheduling is a plugin for the ACK kube-scheduler and is implemented based on the Kubernetes Scheduling Framework. Unlike the native Kubernetes scheduler, which primarily makes scheduling decisions based on resource allocation, the ACK scheduler understands the actual resource load on each node. By analyzing historical load statistics and predicting the needs of new Pods, the scheduler places Pods on nodes with lower loads. This achieves better load balancing and prevents application or node failures caused by overloaded nodes.
As shown in the following figure, Requested represents the amount of resources that are requested, and Usage represents the amount of resources that are actually used. Only the resources that are actually used are counted as the actual load. Given two identical nodes, the ACK scheduler assigns a newly created Pod to Node B, which has a lower load.

To account for dynamic changes in node utilization over time due to the cluster environment and workload traffic, the ack-koordinator component provides a descheduling feature to prevent extreme load imbalance from re-emerging in the cluster after Pods are scheduled. You can achieve optimal cluster load balancing by combining load-aware scheduling with hotspot descheduling. For more information about hotspot descheduling, see Use hotspot descheduling.
How it works
The load-aware scheduling feature is implemented by the kube-scheduler and ack-koordinator components working together. The ack-koordinator component is responsible for collecting and reporting node resource utilization. The ACK scheduler uses the utilization data to score and rank nodes, and prioritizes nodes with lower loads for scheduling. For more information about the component architecture, see ack-koordinator component architecture.
Scheduling policies
|
Policy name |
Description |
|
Node filtering |
When node filtering is enabled, the scheduler filters nodes based on their actual load. If a node's actual load exceeds the configured load threshold, the scheduler will not schedule Pods to that node. This feature is disabled by default. You can enable it by modifying the Important
If node auto scaling is enabled for a cluster, configuring load-aware threshold filtering may cause unexpected node scaling. This is because auto scaling scales out nodes based on pending Pods, whereas it scales in nodes based on the cluster's allocation rate. If you need to enable both node auto scaling and load-aware node filtering, you must adjust the configuration based on your cluster's capacity and utilization. For more information, see Enable node auto scaling. |
|
Node sorting |
Load-aware scheduling considers both CPU and memory dimensions. The scheduler uses a weighted formula to score nodes and prioritizes those with higher scores. When you enable the feature by selecting Specifies whether to enable load-aware node scoring during pod scheduling, you can further customize the weights for CPU and memory. For more information, see the Formula: |
|
Resource utilization calculation algorithm |
The calculation of resource utilization supports multiple configurations, such as average and percentile values. The default is the average value over the last 5 minutes. For more information, see Kube Scheduler Parameter Configuration. Additionally, memory usage data excludes the Page Cache because it can be reclaimed by the operating system. Note that the utilization value returned by the |
Step 1: Enable load-aware scheduling
Ensure the ack-koordinator component is version 1.1.1-ack.1 or later. Otherwise, load-aware scheduling will not take effect.
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click Add-ons.
-
On the Add-ons page, locate Kube Scheduler, and then on the Kube Scheduler card, click Configuration.
-
In the dialog box, configure the parameters based on the following table and then click OK.
The following table describes the main parameters for load-aware scheduling. For a detailed description of all parameters and their component version dependencies, see kube-scheduler and Custom scheduler parameters.
Parameter
Type
Description
Value
Example
loadAwareThreshold
A list consisting of a resource name (resourceName) and a threshold (threshold).
Specifies the threshold for a resource type based on the node filtering policy.
-
resourceName: cpu or memory.
-
threshold: An integer from 0 to 100.
The default value is empty, which means the filtering feature is disabled.
-
resourceName: cpu
-
threshold: 80
loadAwareResourceWeight
A list consisting of a resource name (resourceName) and a weight (resourceWeight).
This is the scoring weight of the resource type for the node sorting strategy. This setting takes effect only if you select Enable load-aware scoring for Pod scheduling.
-
resourceName: Validated by schema. Only
cpuormemoryis supported. -
resourceWeight: An integer from 1 to 100.
Default: cpu=1, memory=1.
-
resourceName: cpu
-
resourceWeight: 1
loadAwareAggregatedUsageAggregationType
enum
The aggregation type for load statistics. The types are defined as follows:
-
avg: Average value.
-
p50: The 50th percentile value, the median.
-
p90, p95, p99: 90th, 95th, and 99th percentile values, respectively.
-
avg
-
p50
-
p90
-
p95
-
p99
The default value is avg.
p90
In the navigation pane on the left, click Cluster Information. On the Basic Information tab, wait for the cluster status to become Running, which indicates that the feature is enabled.
-
Step 2: Verify load-aware scheduling
The following example uses a cluster with three 4-core, 16 GB nodes.
-
Create a file named stress-demo.yaml with the following YAML content.
-
Run the following command to create a Pod. This will increase the load on one of the nodes.
kubectl create -f stress-demo.yaml # Expected output deployment.apps/stress-demo created -
Run the following command to check the status of the Pod until it is running.
kubectl get pod -o wideExpected output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES stress-demo-7fdd89cc6b-g**** 1/1 Running 0 82s 10.XX.XX.112 cn-beijing.10.XX.XX.112 <none> <none>The output shows that the Pod
stress-demo-7fdd89cc6b-g****is scheduled to the nodecn-beijing.10.XX.XX.112.Wait about 3 minutes for the Pod to initialize and for the node's load to increase.
-
Run the following command to check the load on each node.
kubectl top nodeExpected output:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% cn-beijing.10.XX.XX.110 92m 2% 1158Mi 9% cn-beijing.10.XX.XX.111 77m 1% 1162Mi 9% cn-beijing.10.XX.XX.112 2105m 53% 3594Mi 28%The output shows that the node
cn-beijing.10.XX.XX.111has the lowest load, while the nodecn-beijing.10.XX.XX.112has the highest load. This indicates an uneven load distribution in the cluster. -
Create a file named nginx-with-loadaware.yaml with the following YAML content.
-
Run the following command to create the Pods.
kubectl create -f nginx-with-loadaware.yaml # Expected output deployment/nginx-with-loadaware created -
Run the following command to view the Pod scheduling details.
kubectl get pods -l app=nginx -o wideExpected output:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-with-loadaware-5646666d56-2**** 1/1 Running 0 18s 10.XX.XX.118 cn-beijing.10.XX.XX.110 <none> <none> nginx-with-loadaware-5646666d56-7**** 1/1 Running 0 18s 10.XX.XX.115 cn-beijing.10.XX.XX.110 <none> <none> nginx-with-loadaware-5646666d56-k**** 1/1 Running 0 18s 10.XX.XX.119 cn-beijing.10.XX.XX.110 <none> <none> nginx-with-loadaware-5646666d56-q**** 1/1 Running 0 18s 10.XX.XX.113 cn-beijing.10.XX.XX.111 <none> <none> nginx-with-loadaware-5646666d56-s**** 1/1 Running 0 18s 10.XX.XX.120 cn-beijing.10.XX.XX.111 <none> <none> nginx-with-loadaware-5646666d56-z**** 1/1 Running 0 18s 10.XX.XX.116 cn-beijing.10.XX.XX.111 <none> <none>The output shows that with load-aware scheduling enabled, the scheduler avoids the high-load node
cn-beijing.10.XX.XX.112and schedules the Pods to other, less-loaded nodes.
Related operations
Modify the load-aware scheduling configuration
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click Add-ons.
-
On the Add-ons page, find Kube Scheduler, and then in the Kube Scheduler card, click Configuration.
-
In the Kube Scheduler Parameters dialog box, modify the configuration parameters for load-aware scheduling and click OK.
In the navigation pane on the left, click Cluster Information. On the Basic Information tab, wait for the cluster status to become Running. This indicates that the update is complete.
Disable load-aware scheduling
In the Kube Scheduler Parameters dialog box, deselect Specifies whether to enable load-aware node scoring during pod scheduling, delete the loadAwareResourceWeight and loadAwareThreshold parameters, and then click OK.
In the navigation pane on the left, click Cluster Information. On the Basic Information tab, wait for the cluster status to become Running. This indicates that the update is complete.
FAQ
Why not always the lowest-load node?
If the scheduler placed a batch of new Pods onto the single node with the lowest load, that node could quickly become overloaded and create a new hotspot.
To prevent this, the scheduler preemptively adjusts a node's score as soon as a new Pod is scheduled to it, compensating for reporting delays. This prevents over-scheduling Pods to a single node and creating a new hotspot.
What else affects scheduling?
The K8s scheduler consists of multiple plugins. During the scheduling process, many plugins, such as the affinity and topology spread plugins, contribute to the scoring for node sorting. The final node sort order is determined by the combined influence of these plugins. You can adjust the scoring weight of each plugin as needed.
Is the old protocol supported after an upgrade?
To use the load-aware scheduling feature with an older protocol, you must add the annotation alibabacloud.com/loadAwareScheduleEnabled: "true" to your Pods.
The ACK scheduler is backward compatible, allowing you to upgrade seamlessly. After upgrading, we recommend that you enable the global load-aware scheduling policy for the scheduler by following the steps in Step 1: Enable load-aware scheduling. This eliminates the need to configure each Pod individually.
The ACK scheduler for Kubernetes 1.22 maintains compatibility with the old protocol. For Kubernetes 1.24, support for the old protocol ended on August 30, 2023. We recommend upgrading your cluster and using the new configuration method. For information on upgrading your cluster, see Manually upgrade an ACK cluster.
The following tables describe the protocol support and component version requirements.
1.26 and later
|
ACK scheduler version |
ack-koordinator version |
Pod annotation protocol |
Console parameter |
|
All ACK scheduler versions |
≥1.1.1-ack.1 |
No |
Yes |
1.24
|
ACK scheduler version |
ack-koordinator version |
Pod annotation protocol |
Console parameter |
|
≥v1.24.6-ack-4.0 |
≥1.1.1-ack.1 |
Yes |
Yes |
|
≥v1.24.6-ack-3.1 and <v1.24.6-ack-4.0 |
≥0.8.0 |
Yes |
No |
1.22 and earlier
|
ACK scheduler version |
ack-koordinator version |
Pod annotation protocol |
Console parameter |
|
≥1.22.15-ack-4.0 |
≥1.1.1-ack.1 |
Yes |
Yes |
|
≥1.22.15-ack-2.0 and <1.22.15-ack-4.0 |
≥0.8.0 |
Yes |
No |
|
≥0.3.0 and <0.8.0 |
Yes |
No |