Scheduling policies supported by ACK clusters-Container Service for Kubernetes(ACK)-阿里云帮助中心

Before you begin

Select a scheduling policy based on your role and business scenario:
- O&M engineers focus on cluster cost and maximizing resource utilization, ensuring cluster high availability, balancing node loads, and avoiding single points of failure (SPOFs).
- Application developers need simple deployment and management of applications, and adequate resources, such as CPU, GPU, and memory, for application performance.
To use ACK scheduling policies effectively, learn about Kubernetes Scheduler, Node labels, Node-pressure Eviction, and Pod topology spread constraints.

The ACK scheduler uses the same default policy as the open source Kubernetes scheduler, consisting of Filter and Score plug-ins.

Kubernetes-native scheduling policies

Kubernetes-native scheduling policies fall into two categories: node scheduling and inter-pod scheduling.

Node scheduling policies: schedule pods to nodes that match specific characteristics and resource conditions.
Inter-pod scheduling policies: control pod distribution to optimize deployment and ensure application high availability.

Policy	Description	Scenario
nodeSelector	Label nodes with key-value pairs, then use nodeSelector to schedule pods to matching nodes. For example, schedule pods to specific nodes or schedule pods to a specific node pool.	A basic node selection method that does not support more complex scheduling features, such as soft scheduling rules.
nodeAffinity	More flexible and fine-grained than nodeSelector. For example, the `requiredDuringSchedulingIgnoredDuringExecution` hard scheduling rule and the `preferredDuringSchedulingIgnoredDuringExecution` soft scheduling rule.	Schedule pods to nodes with specific characteristics, such as regions, device types, and hardware. Anti-affinity rules spread pods across nodes.
Taints and tolerations	A taint consists of a key, value, and effect (common effects: `NoSchedule`, `PreferNoSchedule`, `NoExecute`). Only pods with a matching toleration are scheduled to tainted nodes.	Reserve dedicated node resources for specific applications, such as GPU-accelerated nodes for AI or ML workloads. Add taints or labels to node pools to schedule application pods to specific pools. See Create and manage node pools and Modify a node pool. Evict pods based on taints and tolerations. For example, add a taint to an unhealthy node and set the effect to `NoExecute`.
Inter-pod affinity and anti-affinity	Pod labels determine pod-to-node scheduling. Supports the `requiredDuringSchedulingIgnoredDuringExecution` affinity rule and the `preferredDuringSchedulingIgnoredDuringExecution` anti-affinity rule.	Co-locate collaborative pods on the same or neighboring nodes to reduce network latency. Spread critical application pods across different nodes or fault domains.

ACK scheduling policies

ACK extends Kubernetes scheduling for requirements such as ordered scale-out with reverse scale-in and load-aware scheduling based on actual node resource usage.

Configure priority-based resource scheduling

Intended role: Cluster O&M engineers.
Description: For clusters with mixed instance types, such as ECS instances and elastic container instances, and billing methods, such as subscription, pay-as-you-go, and preemptible instances, configure priority-based resource scheduling to define the node selection order for pod scheduling and reverse it for scale-in.

Policy

Description

Scenario

Reference

Custom priority-based resource scheduling

Specify a custom ResourcePolicy value during releases or scaling to define the node resource selection order. For example, prefer subscription ECS instances, then pay-as-you-go ECS instances, then elastic container instances.

Scale-in reverses this order: elastic container instances first, then pay-as-you-go ECS instances, then subscription ECS instances.

Define preferred or avoided nodes to balance cluster resource usage.
High-performance application pods are preferentially scheduled to high-performance nodes.
Non-performance-critical pods are preferentially scheduled to preemptible instances or nodes with idle resources, reducing costs.

Custom priority scheduling for elastic resources

Job scheduling

Intended role: Cluster O&M engineers.
Description: The default scheduler is not suited for batch job scheduling. ACK supports gang scheduling and capacity scheduling for batch jobs.

Policy	Description	Scenario	Reference
Gang Scheduling	All related pods are scheduled together or none at all, preventing abnormal processes from blocking the group.	Batch jobs: A job contains multiple interdependent tasks that must be processed at the same time. Distributed computing: Machine learning training jobs or other distributed applications that must run simultaneously. High-performance computing: A job may require all resources available simultaneously before execution.	Use Gang scheduling
Capacity Scheduling	Reserve resources for specific namespaces or user groups, and improve utilization through resource sharing when cluster resources are constrained.	In multi-tenant clusters, varied resource lifecycles and usage patterns lead to low utilization. Resource sharing and reclaiming improve overall utilization.	Work with capacity scheduling

Topology-awarescheduling

Intended role: Cluster O&M engineers.
Description: Machine learning and big data workloads require intensive inter-pod communication, but the default scheduler distributes pods evenly across the cluster, extending job completion times. Native affinity mechanisms cannot retry across multiple topology domains.

Description

Scenario

Reference

The scheduler uses gang scheduling labels to ensure all pod resource requests are fulfilled simultaneously. Topology-aware scheduling iterates through topology domains to find one that meets all pod requirements.

Associate node pools with deployment sets to schedule pods to ECS instances in the same low-latency deployment set for improved job performance.

In machine learning or big data jobs, pods need frequent communication. The scheduler iterates through topology domains to find one that satisfies all pod requirements, reducing job completion time.

Load-aware scheduling

Intended role: Cluster O&M engineers and application developers.
Description: The native scheduler assigns pods based on resource allocation, not actual usage. Because node loads change dynamically with traffic and workloads, the native scheduler cannot detect real-time resource loads.

Description	Scenario	Reference
The ACK scheduler monitors node load history and estimates new pod resource usage to schedule pods to lower-load nodes, preventing crashes from overloaded nodes.	Applications sensitive to load, access latency, or resource QoS.	Use load-aware scheduling

Use load-aware hotspot descheduling to prevent imbalanced node loads.

QoS-aware scheduling

Intended role: Cluster O&M engineers and application developers.
Description: Kubernetes QoS classes (Guaranteed, Burstable, BestEffort) determine pod eviction priority when node resources are insufficient. ACK adds SLO-aware scheduling to enhance latency-sensitive application performance while ensuring resource access for lower-priority jobs.

Policy	Description	Scenario	Reference
CPU Burst	The OS may throttle container CPU usage within a cycle (CPU throttling). CPU Burst lets idle containers accumulate CPU time slices and burst above the CPU limit during demand spikes, enhancing performance and reducing latency.	Containers that consume high CPU during startup and loading but require regular CPU afterward. Applications with sudden CPU spikes, such as e-commerce, gaming, and other web services, that must respond quickly to traffic surges.	Enable the CPU Burst performance optimization policy
Topology-aware CPU scheduling	Pin CPU-sensitive pods to specific CPU cores to avoid performance degradation from frequent context switching and cross-NUMA memory access.	Applications not adapted to cloud-native environments — for example, thread counts based on physical cores instead of container specifications, causing performance degradation. Applications on multi-core ECS Bare Metal instances with Intel or AMD CPUs that experience performance degradation from cross-NUMA memory access. Applications highly sensitive to CPU context switching that cannot tolerate performance fluctuations.	Enable CPU topology-aware scheduling
Topology-aware GPU scheduling	When multiple GPU-intensive pods run concurrently, they may compete for GPU resources and switch between GPUs or NUMA nodes, degrading performance. Topology-aware GPU scheduling assigns workloads to specific GPUs, reducing cross-NUMA memory access and improving performance.	Large-scale distributed computing that requires efficient data transfer, such as high-performance computing. Machine learning and deep learning workloads that require extensive GPU resources and proper allocation of training jobs across GPUs. Graphics rendering and game development that require efficient allocation of rendering jobs across GPUs.	GPU topology-aware scheduling Enable NUMA topology-aware scheduling
Dynamic resource overcommitment	Reclaim resources allocated to but unused by pods and schedule them to low-priority jobs for overcommitment. Use the following single-node QoS policies together to prevent applications from affecting each other: CPU Suppress: Limit CPU resources available to low-priority pods when overall node usage is below the threshold, ensuring container stability. CPU QoS: Ensure sufficient CPU allocation for high-priority applications. Memory QoS: Ensure sufficient memory allocation for high-priority applications and delay memory reclaiming. Resource isolation based on the L3 cache and Memory Bandwidth Allocation (MBA): Prioritize L3 cache and MBA for high-priority applications.	Improve cluster resource utilization through colocation. Typical scenarios include ML model training and inference, big data batch processing and analysis, online services, and offline backup.	Enable dynamic resource overcommitment Enable CPU Suppress Enable CPU QoS for containers Memory QoS Enable resource isolation based on the L3 cache and MBA Best practices for colocation
Dynamically modify the resource parameters of a pod	In Kubernetes 1.27 or earlier, modifying container parameters requires deleting and recreating the pod. ACK lets you modify CPU, memory, and disk IOPS limits without restarting the pod.	Temporary CPU or memory resource adjustments.	Dynamically modify pod resource parameters

Descheduling

Intended role: Cluster O&M engineers and application developers.
Description: As cluster conditions change, running pods may need migration to more suitable nodes.

Policy	Description	Scenario	Reference
Descheduling	Reschedule improperly placed pods to optimal nodes when hotspots form due to uneven resource usage or node attribute changes, ensuring workload high availability and efficiency.	Uneven workload distribution with overloaded nodes, for example in colocation scenarios. Low cluster resource utilization requiring node removal to reduce costs. Resource fragmentation prevents individual nodes from having sufficient resources despite adequate cluster-wide capacity. Taints or labels are added to or removed from a node.	Descheduling Enable the descheduling feature
Work with load-aware hotspot descheduling	Combine load-aware scheduling and hotspot descheduling to monitor node loads and automatically rebalance nodes that exceed the load threshold.		Load-aware hotspot descheduling

Billing

ACK scheduling incurs charges for cluster management and cloud resources based on the billing rules. Additional scheduling component fees:

The default ACK scheduler (kube-scheduler) is free to install and use.
ACK resource scheduling and descheduling are based on ack-koordinator. ack-koordinator is free to install and use, but may incur additional fees in specific scenarios. See ack-koordinator (FKA ack-slo-manager).

FAQ

For scheduling issues, see Scheduling FAQ.

References

For kube-scheduler and ack-koordinator introduction and release notes, see Container Service for Kubernetes:kube-scheduler and ack-koordinator (FKA ack-slo-manager).
To customize kube-scheduler behavior, see Customize scheduler parameters.
For scheduling best practices, such as colocation architecture, see Best practices for resource scheduling.
Enable cost insights to view resource usage, cost allocation, and savings recommendations for ACK clusters.
For GPU scheduling and memory isolation, see GPU sharing.
For virtual node scheduling, see Schedule a pod to a virtual node.