Workload scaling and compute resource scaling dynamically adjust pod replicas and cluster capacity to handle traffic spikes and reduce costs.
Usage notes
-
Before configuring workload scaling and node scaling, familiarize yourself with community solutions such as Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaling.
-
If your cluster has more than 500 nodes or 10,000 pods, see Plan scaling rates to ensure cluster and control plane stability.
Workload scaling and compute resource scaling
ACK auto scaling operates at two layers:
-
Workload scaling: Adjusts pod counts or per-pod resource allocation at the scheduling layer. For example, HPA scales application pods based on traffic changes.
-
Compute resource scaling: Adjusts cluster resources through node scaling and virtual node scaling based on pod scheduling and resource usage.
Combine both layers to improve resource utilization while meeting pod scheduling demands.
Workload scaling solutions
For temporary scaling, run kubectl scale to manually adjust pod counts. For automated scaling, select from the following workload scaling solutions.
|
Solution |
Description |
Scaling metric |
Scenario |
References |
|
HPA |
HPA scales out pods during peaks and scales in during off-peaks to optimize costs. Suitable for most scenarios. |
|
Ideal for online services with frequent traffic fluctuations, such as e-commerce, online education, and financial services. |
|
|
CronHPA |
CronHPA scales pods on a predefined Crontab-like schedule with time zone and date support. Dates such as holidays can be excluded. Compatible with HPA. |
Scheduled scaling |
Ideal for applications with predictable traffic patterns or scheduled tasks. |
|
|
VPA |
VPA monitors pod resource consumption, recommends CPU and memory allocation, and adjusts allocation without changing replica counts. |
VPA recommends and optionally auto-adjusts CPU and memory requests and limits for pods. |
Ideal for stable resource allocation, such as stateful applications and large monolithic deployments. VPA typically takes effect when pods recover from anomalies. |
|
|
Kubernetes-based Event Driven Autoscaling (KEDA) |
KEDA enables event-driven auto scaling for workloads from diverse event sources. |
Number of events, such as the queue length. |
Ideal for event-based offline jobs requiring instant scaling, such as video and audio transcoding, event-driven jobs, and stream processing. |
|
|
Advanced Horizontal Pod Autoscaler (AHPA) |
AHPA learns workload fluctuation patterns from historical metrics to predict resource demand and enable predictive scaling. |
|
Ideal for periodic traffic patterns, such as live streaming, online education, and gaming. |
The UnitedDeployment controller manages same-type workloads across multiple subsets with per-subset replica adjustment. Combine it with the scaling solutions above for flexible scaling across compute resource types. See Implement workload scaling based on the UnitedDeployment controller.
Compute resource scaling
Compute resource scaling components detect pending pods and provision ECS nodes or elastic container instances to fulfill scheduling demands.
For node scaling comparisons, see Node scaling.
Resource delivery statistics in the following table are theoretical. Actual values may vary by environment.
|
Solution |
Description |
Scenario |
Resource delivery efficiency |
References |
|
Node auto scaling |
ACK automatically scales nodes when cluster resources cannot fulfill pod scheduling. |
Suitable for all scenarios, especially online services, deep learning tasks, and small-scale scaling. Recommended for clusters with fewer than 20 auto-scaling node pools or node pools with fewer than 100 nodes. |
The time required to add 100 nodes to a cluster:
|
|
|
Node instant scaling |
Node instant scaling offers faster scaling, higher delivery success rates, and ECS instance inventory health monitoring compared to node auto scaling. See Solution comparison. |
Suitable for all scenarios, especially large-scale clusters requiring faster scaling, multi-instance-type and multi-zone scaling, or advanced scheduling such as topology spread constraints. A cluster is large if any auto-scaling node pool has more than 100 nodes or the cluster has more than 20 auto-scaling node pools. |
The time required to add 100 nodes to a cluster:
|
|
|
Virtual nodes |
Virtual nodes eliminate node management and capacity planning. A cluster supports up to 50,000 pods on virtual nodes, with up to 10,000 pods created within 1 minute during scale-out. |
Suitable for all scenarios, especially tasks, scheduled tasks, data computing, AI applications, and workload spikes. |
The time required to create 1,000 pods in a cluster:
|
Billing
Auto scaling itself is free. The scaling component runs as pods, so your cluster must have at least one node. You are charged for nodes added through auto scaling. See Billing overview.
FAQ
See Auto scaling FAQs.
References
-
For preinstallation or high-performance scaling, see Custom images for scaling optimization.
-
To collect auto scaling logs, see Collect log files of system components.
-
Follow Recommended workload configurations when configuring your workloads.
-
For serverless containers, Knative scales pods based on request count and concurrency, including scale-to-zero. See Enable auto scaling to withstand traffic fluctuations.