Scaling solution for workloads and compute resources in ACK clusters-Container Service for Kubernetes(ACK)-阿里云帮助中心

Usage notes

Before configuring workload scaling and node scaling, familiarize yourself with community solutions such as Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaling.
If your cluster has more than 500 nodes or 10,000 pods, see Plan scaling rates to ensure cluster and control plane stability.

Workload scaling and compute resource scaling

ACK auto scaling operates at two layers:

Workload scaling: Adjusts pod counts or per-pod resource allocation at the scheduling layer. For example, HPA scales application pods based on traffic changes.
Compute resource scaling: Adjusts cluster resources through node scaling and virtual node scaling based on pod scheduling and resource usage.

Combine both layers to improve resource utilization while meeting pod scheduling demands.

Workload scaling solutions

For temporary scaling, run kubectl scale to manually adjust pod counts. For automated scaling, select from the following workload scaling solutions.

Solution	Description	Scaling metric	Scenario	References
HPA	HPA scales out pods during peaks and scales in during off-peaks to optimize costs. Suitable for most scenarios.	Resource metrics such as CPU and memory utilization Custom metrics	Ideal for online services with frequent traffic fluctuations, such as e-commerce, online education, and financial services.	Use Horizontal Pod Autoscaler (HPA)
CronHPA	CronHPA scales pods on a predefined Crontab-like schedule with time zone and date support. Dates such as holidays can be excluded. Compatible with HPA.	Scheduled scaling	Ideal for applications with predictable traffic patterns or scheduled tasks.	Use CronHPA Make CronHPA compatible with HPA
VPA	VPA monitors pod resource consumption, recommends CPU and memory allocation, and adjusts allocation without changing replica counts.	VPA recommends and optionally auto-adjusts CPU and memory requests and limits for pods.	Ideal for stable resource allocation, such as stateful applications and large monolithic deployments. VPA typically takes effect when pods recover from anomalies.	Vertical Pod Autoscaler (VPA)
Kubernetes-based Event Driven Autoscaling (KEDA)	KEDA enables event-driven auto scaling for workloads from diverse event sources.	Number of events, such as the queue length.	Ideal for event-based offline jobs requiring instant scaling, such as video and audio transcoding, event-driven jobs, and stream processing.	Event-driven autoscaling
Advanced Horizontal Pod Autoscaler (AHPA)	AHPA learns workload fluctuation patterns from historical metrics to predict resource demand and enable predictive scaling.	Resource metrics such as CPU, memory, and GPU utilization Traffic metrics such as queries per second (QPS) and response time (RT) Other custom metrics	Ideal for periodic traffic patterns, such as live streaming, online education, and gaming.	Auto Scaling Prediction (AHPA)

The UnitedDeployment controller manages same-type workloads across multiple subsets with per-subset replica adjustment. Combine it with the scaling solutions above for flexible scaling across compute resource types. See Implement workload scaling based on the UnitedDeployment controller.

Compute resource scaling

Compute resource scaling components detect pending pods and provision ECS nodes or elastic container instances to fulfill scheduling demands.

For node scaling comparisons, see Node scaling.

Important

Resource delivery statistics in the following table are theoretical. Actual values may vary by environment.

Solution	Description	Scenario	Resource delivery efficiency	References
Node auto scaling	ACK automatically scales nodes when cluster resources cannot fulfill pod scheduling.	Suitable for all scenarios, especially online services, deep learning tasks, and small-scale scaling. Recommended for clusters with fewer than 20 auto-scaling node pools or node pools with fewer than 100 nodes.	The time required to add 100 nodes to a cluster: Standard mode: 120 seconds. Swift mode: 60 seconds. Standard mode with images that support quick boot (Qboot): 90 seconds. Standard mode with images that support quick boot (Qboot): 45 seconds.	Enable node autoscaling
Node instant scaling	Node instant scaling offers faster scaling, higher delivery success rates, and ECS instance inventory health monitoring compared to node auto scaling. See Solution comparison.	Suitable for all scenarios, especially large-scale clusters requiring faster scaling, multi-instance-type and multi-zone scaling, or advanced scheduling such as topology spread constraints. A cluster is large if any auto-scaling node pool has more than 100 nodes or the cluster has more than 20 auto-scaling node pools.	The time required to add 100 nodes to a cluster: ContainerOS in swift mode: 45 seconds. Standard mode: 103 seconds. Not yet supported	Enable node instant scaling View the health status of node instant scaling
Virtual nodes	Virtual nodes eliminate node management and capacity planning. A cluster supports up to 50,000 pods on virtual nodes, with up to 10,000 pods created within 1 minute during scale-out.	Suitable for all scenarios, especially tasks, scheduled tasks, data computing, AI applications, and workload spikes.	The time required to create 1,000 pods in a cluster: When image caching is disabled: 30 seconds. When image caching is enabled: 15 seconds.	Schedule pods to run on ECI

Billing

Auto scaling itself is free. The scaling component runs as pods, so your cluster must have at least one node. You are charged for nodes added through auto scaling. See Billing overview.

FAQ

See Auto scaling FAQs.

Click to view the FAQ index of node auto scaling

Category	Subcategory	Link
Scaling behavior of node auto scaling	Known limitations
	Scale-out behavior	What scheduling policies does the cluster-autoscaler use to determine whether an unschedulable pod can be scheduled to a node pool with auto scaling enabled? What resources can the cluster-autoscaler simulate for scheduling analysis? Why does the node auto scaling add-on fail to scale out nodes? How does the autoscaler calculate the resources of a scaling group that contains multiple instance types? During a scale-out, how does the autoscaler choose between multiple enabled node pools? How to configure custom resources for a node pool with auto scaling enabled? Why does enabling auto scaling for a node pool fail?
	Scale-in behavior	Why does the cluster-autoscaler fail to scale in a node? How to enable or disable eviction for a specific DaemonSet? What types of pods can prevent the cluster-autoscaler from removing a node?
	Extension support	Does the cluster-autoscaler support CustomResourceDefinitions (CRDs)?
Custom scaling behavior	Control scaling behavior by using pods	How to delay the cluster-autoscaler's scale-out response to an unschedulable pod?
Custom scaling behavior	Control scaling behavior by using nodes	How to prevent a specific node from being scaled in by the cluster-autoscaler? How to influence node scale-in by using pod annotations?
cluster-autoscaler component		How to upgrade the cluster-autoscaler to the latest version? What operations trigger an automatic update of the cluster-autoscaler? Why does node scaling fail on my ACK managed cluster even after I have granted the required role permissions?

Click to view the FAQ index of node instant scaling

Category	Subcategory	Link
Node instant scaling behavior	Known limitations
	Scale-out behavior	What resource types does node instant scaling use for scaling simulations? Does node instant scaling select an appropriate instance type from a node pool based on pod resource requests? How does node instant scaling select an instance type from a node pool with multiple types? How do I monitor the inventory of instance types in a node pool when using node instant scaling? How can I optimize the node pool configuration to avoid scale-out failures from insufficient inventory? Why does node instant scaling fail to scale out nodes? How do I configure custom resources for a node pool with node instant scaling enabled?
	Scale-in behavior	Why does node instant scaling fail to scale in nodes? What types of pods can prevent node instant scaling from removing a node?
Custom scaling behavior	Controlling scaling with pods	How can I control node scale-in by using pods?
Custom scaling behavior	Controlling scaling with nodes	How do I specify which node to delete during a scale-in? How do I prevent a specific node from being scaled in? Can node instant scaling be configured to scale in only empty nodes?
Node instant scaling add-on		Does the node instant scaling add-on update automatically? Why does node scaling fail on my ACK managed cluster after I granted the required permissions?

Click to view the FAQ index of workload scaling (including HPA and CronHPA)

References

For preinstallation or high-performance scaling, see Custom images for scaling optimization.
To collect auto scaling logs, see Collect log files of system components.
Follow Recommended workload configurations when configuring your workloads.
For serverless containers, Knative scales pods based on request count and concurrency, including scale-to-zero. See Enable auto scaling to withstand traffic fluctuations.