Auto scaling overview

更新时间:
复制 MD 格式

Workload scaling and compute resource scaling dynamically adjust pod replicas and cluster capacity to handle traffic spikes and reduce costs.

Usage notes

Workload scaling and compute resource scaling

ACK auto scaling operates at two layers:

  • Workload scaling: Adjusts pod counts or per-pod resource allocation at the scheduling layer. For example, HPA scales application pods based on traffic changes.

  • Compute resource scaling: Adjusts cluster resources through node scaling and virtual node scaling based on pod scheduling and resource usage.

Combine both layers to improve resource utilization while meeting pod scheduling demands.

Workload scaling solutions

image

For temporary scaling, run kubectl scale to manually adjust pod counts. For automated scaling, select from the following workload scaling solutions.

Solution

Description

Scaling metric

Scenario

References

HPA

HPA scales out pods during peaks and scales in during off-peaks to optimize costs. Suitable for most scenarios.

Ideal for online services with frequent traffic fluctuations, such as e-commerce, online education, and financial services.

Use Horizontal Pod Autoscaler (HPA)

CronHPA

CronHPA scales pods on a predefined Crontab-like schedule with time zone and date support. Dates such as holidays can be excluded. Compatible with HPA.

Scheduled scaling

Ideal for applications with predictable traffic patterns or scheduled tasks.

VPA

VPA monitors pod resource consumption, recommends CPU and memory allocation, and adjusts allocation without changing replica counts.

VPA recommends and optionally auto-adjusts CPU and memory requests and limits for pods.

Ideal for stable resource allocation, such as stateful applications and large monolithic deployments. VPA typically takes effect when pods recover from anomalies.

Vertical Pod Autoscaler (VPA)

Kubernetes-based Event Driven Autoscaling (KEDA)

KEDA enables event-driven auto scaling for workloads from diverse event sources.

Number of events, such as the queue length.

Ideal for event-based offline jobs requiring instant scaling, such as video and audio transcoding, event-driven jobs, and stream processing.

Event-driven autoscaling

Advanced Horizontal Pod Autoscaler (AHPA)

AHPA learns workload fluctuation patterns from historical metrics to predict resource demand and enable predictive scaling.

  • Resource metrics such as CPU, memory, and GPU utilization

  • Traffic metrics such as queries per second (QPS) and response time (RT)

  • Other custom metrics

Ideal for periodic traffic patterns, such as live streaming, online education, and gaming.

Auto Scaling Prediction (AHPA)

The UnitedDeployment controller manages same-type workloads across multiple subsets with per-subset replica adjustment. Combine it with the scaling solutions above for flexible scaling across compute resource types. See Implement workload scaling based on the UnitedDeployment controller.

Compute resource scaling

image

Compute resource scaling components detect pending pods and provision ECS nodes or elastic container instances to fulfill scheduling demands.

For node scaling comparisons, see Node scaling.

Important

Resource delivery statistics in the following table are theoretical. Actual values may vary by environment.

Solution

Description

Scenario

Resource delivery efficiency

References

Node auto scaling

ACK automatically scales nodes when cluster resources cannot fulfill pod scheduling.

Suitable for all scenarios, especially online services, deep learning tasks, and small-scale scaling. Recommended for clusters with fewer than 20 auto-scaling node pools or node pools with fewer than 100 nodes.

The time required to add 100 nodes to a cluster:

Enable node autoscaling

Node instant scaling

Node instant scaling offers faster scaling, higher delivery success rates, and ECS instance inventory health monitoring compared to node auto scaling. See Solution comparison.

Suitable for all scenarios, especially large-scale clusters requiring faster scaling, multi-instance-type and multi-zone scaling, or advanced scheduling such as topology spread constraints. A cluster is large if any auto-scaling node pool has more than 100 nodes or the cluster has more than 20 auto-scaling node pools.

The time required to add 100 nodes to a cluster:

Virtual nodes

Virtual nodes eliminate node management and capacity planning. A cluster supports up to 50,000 pods on virtual nodes, with up to 10,000 pods created within 1 minute during scale-out.

Suitable for all scenarios, especially tasks, scheduled tasks, data computing, AI applications, and workload spikes.

The time required to create 1,000 pods in a cluster:

  • When image caching is disabled: 30 seconds.

  • When image caching is enabled: 15 seconds.

Schedule pods to run on ECI

Billing

Auto scaling itself is free. The scaling component runs as pods, so your cluster must have at least one node. You are charged for nodes added through auto scaling. See Billing overview.

FAQ

See Auto scaling FAQs.

Click to view the FAQ index of node auto scaling

Category

Subcategory

Link

Scaling behavior of node auto scaling

Known limitations

Scale-out behavior

Scale-in behavior

Extension support

Does the cluster-autoscaler support CustomResourceDefinitions (CRDs)?

Custom scaling behavior

Control scaling behavior by using pods

Control scaling behavior by using nodes

cluster-autoscaler component

References