In a Kubernetes cluster, scheduling is the process by which the scheduler component (kube-scheduler) assigns Pods to the most suitable nodes based on resource planning to improve application availability and cluster resource utilization. ACK provides more flexible and comprehensive scheduling policies for different workloads, including job scheduling, topology-aware scheduling, QoS-aware scheduling, and descheduling.
Before you begin
This topic describes cluster scheduling solutions for cluster O&M engineers (including cluster resource administrators) and application developers. You can select an appropriate scheduling policy based on your business scenario and role.
Cluster O&M engineers: Focus on managing cluster costs, maximizing resource utilization, ensuring high availability, and maintaining load balancing across nodes to prevent single points of failure.
Application developers: Focus on easily deploying and managing applications and obtaining the necessary resources (such as CPU, GPU, and memory) based on performance requirements.
To make the most of the scheduling policies in ACK, you should first understand fundamental Kubernetes concepts by reviewing the official documentation for the Kubernetes Scheduler, node labels, eviction, and pod topology spread constraints.
Additionally, the default scheduling policy of the ACK Scheduler is consistent with the upstream Kubernetes scheduler and involves two stages: Filter and Score.
Kubernetes-native scheduling policies
Kubernetes-native scheduling policies can be divided into node scheduling policies and inter-Pod scheduling policies.
Node scheduling policies focus on node characteristics and resource availability to ensure Pods are scheduled to nodes that meet their requirements.
Inter-Pod scheduling policies focus on controlling the placement and distribution of Pods to optimize their overall layout and ensure high application availability.
Policy | Description | Use case |
Uses key-value pairs to label nodes. A For example, you can use | Provides basic node selection but does not support more complex scheduling rules, such as preferences (soft rules). | |
This is a more flexible and fine-grained Pod scheduling strategy than NodeSelector, which supports hard scheduling rules ( | Places Pods based on node characteristics such as region, instance type, or hardware configuration. Anti-affinity rules can be used to spread Pods across nodes. | |
A taint consists of a key, a value, and an effect. Common effects include |
| |
You can use Pod labels to specify whether a Pod should be scheduled to certain nodes by configuring hard scheduling rules ( |
|
Scheduling policies provided by ACK
If native Kubernetes scheduling policies cannot meet your more complex business requirements, such as sequential scale-out and reverse-order scale-in for different instance resource types or load-aware scheduling based on actual node resource usage, you can use the advanced scheduling policies provided by ACK.
Scheduling resource priority
Intended role: cluster O&M engineer
Description: If your cluster contains different types of instance resources, such as ECS and Elastic Container Instance (ECI), with different billing methods such as subscription, pay-as-you-go, and preemptible instances, consider configuring scheduling resource priority. This policy allows you to specify the order in which Pods are scheduled to different types of node resources and enables reverse-order scale-in.
Policy | Description | Typical use case | References |
Custom priority scheduling for elastic resources | You can customize the During scale-in, it also supports reverse-order scale-in. For example, it prioritizes deleting ECI Pods, followed by Pods on pay-as-you-go ECS instances, and finally Pods on subscription ECS instances. |
|
Job scheduling
Intended role: cluster O&M engineer
Description: The default scheduler places Pods based on predefined rules but is not optimized for co-scheduling Pods in batch processing tasks. To address this, ACK supports Gang Scheduling and Capacity Scheduling for batch computing jobs.
Policy | Description | Typical use case | References |
Gang Scheduling | Ensures that a group of related Pods are either all scheduled or none are, preventing deadlocks that occur when only some of the required Pods can be scheduled. |
| |
Capacity Scheduling | Reserves a certain amount of resource capacity for specific namespaces or user groups and improves overall resource utilization through resource sharing when cluster resources are scarce. | Ideal for multi-tenant scenarios where different tenants have varying resource usage cycles and patterns, leading to low overall cluster utilization. This policy allows resources to be borrowed and reclaimed on top of fixed allocations. |
Topology-aware scheduling
Intended role: cluster O&M engineer
Description: In machine learning and big data analytics jobs, Pods often have high network communication demands. By default, the scheduler spreads Pods evenly across the cluster, which can increase job completion time. Native node or pod affinity methods cannot retry scheduling across multiple different topology domains, and nodes typically only have zone-level labels.
Description | Typical use case | References |
The scheduler adds a gang scheduling identifier to the job, requiring that all Pods acquire their necessary resources simultaneously. Combined with topology-aware scheduling, this allows the scheduler to find a topology domain that can satisfy the entire job. You can also use the deployment set feature of node pools to schedule Pods to ECS instances within the same low-latency deployment set, which further improves job performance. | Ideal for machine learning or big data analytics jobs with significant network communication between Pods. The goal is to allow the job to retry scheduling across multiple topology domains until one with sufficient resources is found, which reduces the job execution time. |
Load-aware scheduling
Intended roles: cluster O&M engineer, application developer
Description: With native Kubernetes scheduling, the scheduler makes decisions primarily based on resource allocation by comparing a Pod's resource requests with the unallocated resources on a node. However, a node's actual utilization changes dynamically over time, cluster environment, and workload traffic. The native scheduler is unaware of this real-time load.
Description | Typical use case | References |
By analyzing historical node load statistics and predicting the needs of newly scheduled Pods, the ACK scheduler is aware of the actual resource usage on nodes. It prioritizes scheduling Pods to nodes with lower loads to achieve load balancing, which prevents application or node failures caused by overloaded nodes. | Ideal for latency-sensitive applications that have specific requirements for metrics like request pressure or access latency and are sensitive to resource quality. |
Use this feature with load-aware hotspot descheduling to prevent severe load imbalances from reoccurring after Pods are scheduled.
QoS-aware scheduling
Intended roles: cluster O&M engineer, application developer
Description: You can configure specific Quality of Service (QoS) classes for Pods, including Guaranteed, Burstable, and BestEffort. When node resources are insufficient, the kubelet can determine the eviction order of Pods based on their QoS class. ACK provides differentiated Service Level Objectives (SLO) features to improve the performance and service quality of latency-sensitive applications while ensuring resource availability for lower-priority tasks.
Policy | Description | Typical use case | References |
CPU Burst | Because the CPU Limit mechanism restricts resource usage over a fixed time period, containers can be throttled. The CPU Burst feature allows a container to accumulate CPU time slices during idle periods to meet bursty resource demands. This improves container performance, reduces latency, and enhances the application's service quality. |
| |
CPU topology-aware scheduling | For performance-sensitive applications, this policy pins a Pod to specific CPU cores on a node, which mitigates performance degradation caused by CPU context switching and cross-NUMA memory access. |
| |
GPU topology-aware scheduling | When multiple GPU cards are deployed in a cluster and multiple GPU-intensive Pods run simultaneously, they may compete for the node's GPU resources. This can cause Pods to switch frequently between different GPUs or even NUMA nodes, which impacts performance. GPU topology-aware scheduling intelligently assigns workloads to different GPU cards, which reduces cross-NUMA node memory access and improves application performance and responsiveness. |
| |
Dynamic resource overcommitment | Quantifies allocated but unused resources in the cluster and makes them available to low-priority tasks, which enables resource overcommitment. This must be used with the following single-node QoS policies to prevent performance interference between applications.
| Used to improve cluster resource utilization through colocation. Typical colocation scenarios include machine learning training and inference, big data batch jobs and data analysis, and running online services alongside offline backup services. | |
Dynamically modify Pod resource parameters | In Kubernetes 1.27 and earlier versions, modifying a container's parameters while a Pod is running requires updating the PodSpec and resubmitting it, which would trigger a Pod restart. ACK allows you to modify single-node isolation parameters like CPU, memory, and disk I/O without restarting the Pod. | Suitable only for temporary adjustments to Pod resources (CPU and memory). |
Descheduling
Intended roles: cluster O&M engineer, application developer
Description: A cluster's state is constantly changing. For various reasons, you may need to move running Pods to other nodes, a process known as descheduling.
Policy | Description | Typical use case | References |
Descheduling | In scenarios such as uneven cluster utilization creating hotspot nodes or changes in node properties making existing Pod scheduling rules no longer optimal, you may need to deschedule poorly placed Pods to another node. This ensures Pods run on the best possible nodes, which safeguards cluster high availability and workload efficiency. |
| |
Load-aware hotspot descheduling | Combining load-aware scheduling with hotspot descheduling allows you to not only monitor node load changes in real-time but also automatically rebalances nodes that exceed a safe load threshold by descheduling Pods, which prevents extreme load imbalances. |
Billing
When you use the scheduling features provided by ACK, you may incur fees for the scheduling components, in addition to cluster management fees and costs for related cloud resources as described in Billing.
The default ACK scheduler is provided by the kube-scheduler component. As a control plane component, it is free to install and use.
The ack-koordinator component provides these resource scheduling optimization and descheduling capabilities. The ack-koordinator component itself is free to install and use, but additional fees may apply in certain scenarios. For more information, see ack-koordinator (ack-slo-manager).
FAQ
If you encounter issues when you use scheduling features, see Scheduling FAQ for troubleshooting.
References
For component details and release notes for kube-scheduler and ack-koordinator, see kube-scheduler and ack-koordinator (ack-slo-manager).
To customize the behavior of kube-scheduler, see Customize scheduler parameters.
For best practices in scheduling scenarios, such as colocation, see Scheduling best practices.
You can also enable the Cost Insight feature to understand cluster resource usage and cost distribution. This feature provides cost-saving recommendations to help you improve resource utilization. For more information, see Cost Insight.
To implement capabilities like GPU sharing and memory isolation, see GPU sharing.
For scheduling solutions involving virtual nodes, see Schedule a Pod to a virtual node.