Enable the Horizontal Pod Autoscaler (HPA) to automatically scale your pods based on CPU utilization, memory usage, or custom metrics. HPA increases the number of pod replicas to handle traffic spikes and decreases the count during idle periods to save resources automatically. This feature is ideal for applications with fluctuating workloads that require frequent scaling, such as e-commerce platforms, online education services, and financial applications.
Before you begin
To use HPA effectively, we recommend reading the official Kubernetes documentation on Horizontal Pod Autoscaling to understand its fundamental principles, algorithms, and configurable scaling behaviors.
Additionally, Alibaba Cloud Container Service for Kubernetes (ACK) clusters offer various workload scaling (scheduling layer elasticity) and node scaling (resource layer elasticity) solutions. Before you proceed, read Auto scaling to understand the use cases and limitations of each solution.
Prerequisites
-
You have created an ACK managed cluster or an ACK dedicated cluster. For more information, see Create a Kubernetes cluster.
-
If you use kubectl commands to implement HPA, you must connect to the Kubernetes cluster with kubectl. For more information, see Obtain the KubeConfig of a cluster and use kubectl to connect to the cluster.
Create an HPA application in the console
ACK is integrated with HPA, allowing you to create HPA-enabled applications from the console. You can enable HPA when you create a new application or for an existing one. We recommend creating only one HPA per workload.
Enable HPA when creating an application
The following example uses a Deployment to show you how to enable HPA when you create an application. The steps for other workload types are similar.
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
-
On the Deployments page, click Create from Image.
-
On the Create page, follow the on-screen instructions to configure the basic application information, containers, services, and scaling to create a Deployment that supports HPA.
For detailed steps and parameter descriptions, see Create a stateless workload (Deployment). The following section describes only the key parameters.
-
Basic Information: Configure settings for the application, such as its name and number of replicas.
-
Container: Configure the image and the required CPU and memory resources.
The Resource profiling feature analyzes historical resource usage and provides recommendations for configuring container Requests and Limits. For more information, see Resource profiling.
ImportantYou must set resource Requests for the application to enable HPA.
-
Advanced Settings:
-
In the Scaling section, select HPA's Enable, and configure the conditions and parameters for scaling.
-
Metric: The supported metrics are CPU and memory. The metric must be the same as the required resource type that you set. When you specify both CPU and memory resource types, HPA performs a scaling operation when any of the metrics reaches its scaling threshold.
-
Trigger Condition: The resource utilization threshold, specified as a percentage. When resource utilization exceeds this threshold, containers are scaled out. For more information about the algorithm for Horizontal Pod Autoscaling, see Algorithm details.
-
Max. Replicas: The maximum number of replicas that the Deployment can be scaled out to. This value must be greater than the minimum number of replicas.
-
Min. Replicas: The minimum number of replicas for the Deployment. The value must be an integer that is greater than or equal to 1.
-
-
After the creation is complete, you can view the created Deployment on the Deployments page. Click the Deployment name, and then on the Deployment details page, click the Pod Scaling tab. In this section, you can view metrics related to HPA activity, such as CPU or memory utilization and the maximum or minimum number of replicas, and manage the HPA by updating its configuration or disabling it.
-
Enable HPA for an existing application
This example shows how to enable HPA for an existing Deployment. The steps for other workload types are similar.
Workload page
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
-
On the Deployments page, click the name of the target application, click the Pod Scaling tab, and then click Create in the HPA area.
-
In the Create dialog box, configure the scaling settings as prompted on the page.
-
Name: The name of the HPA policy.
-
Metric: Click Add.
-
Metric Name: Supports CPU and memory. The metric must match the resource type that you have configured. When you specify both CPU and memory, the HPA performs a scaling operation as soon as either metric reaches its scaling threshold.
-
Threshold: The percentage of resource utilization above which the containers start to scale out. For more information about the Horizontal Pod Autoscaling algorithm, see Algorithm details.
-
-
-
Max. Containers: The maximum number of replicas to which the Deployment can scale out. This value must be greater than the minimum number of replicas.
-
Min. Containers: The minimum number of containers to which the Deployment can be scaled down. The value must be an integer greater than or equal to 1.
After the configuration is complete, you can click the Deployment name on the Stateless Application page, and then click the Pod Scaling tab on the Deployment details page. In this area, you can view metrics related to HPA activity, such as CPU or memory utilization and the maximum or minimum number of replicas, and manage the HPA by updating its configuration or disabling it.
Workload scaling page
This page is available only to allowlisted users. To request access, submit a ticket.
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click Workload Scaling.
-
In the top-right corner of the page, click Create Auto Scaling, and then click the HPA and CronHPA tab. Select the target workload, and in the Configure Auto Scaling Policy section, select HPA and configure the HPA policy as prompted.
-
Scaling Policy Name: The name of the HPA policy.
-
Min. Containers: The minimum number of replicas for the workload. This value must be an integer greater than or equal to 1.
-
Max. Containers: The maximum number of containers to which the workload can be scaled out. This value must be greater than the minimum number of replicas.
-
Metric Name: Supported metrics include CPU, GPU, memory, Nginx Ingress requests, and custom metrics. The metric type must match the specified resource type. If you specify multiple metrics, the Horizontal Pod Autoscaler (HPA) performs a scaling operation when any of the metrics reaches its threshold.
-
Threshold: The percentage of resource utilization. If the utilization exceeds this value, containers start to scale out. For more information about the Horizontal Pod Autoscaling algorithm, see Algorithm details.
-
After the creation is complete, you can view the list of HPAs on the Workload Scaling page. In the Actions column, you can view metrics related to HPA activity, such as resource utilization and the maximum or minimum number of replicas, and manage the HPA by updating its configuration or disabling it.
Verify the results
On the Clusters page, click the name of your cluster. In the left navigation pane, click Workload Scaling.
-
Click the Horizontal Scaling tab and then select HPA to view the scaling status and task list.
In a production environment, the application scales based on pod load. You can also run a stress test on the pods in a test environment to verify the horizontal scaling behavior.
Create an HPA application by using kubectl
You can also use an orchestration template to manually create an HPA and bind it to the target Deployment. We recommend creating only one HPA per workload. This example shows how to deploy an Nginx application that supports HPA.
-
Create a file named nginx.yml and copy the following content into it.
ImportantTo implement HPA, you must set
requestresources for the Pod. Otherwise, HPA cannot run. You can use the resource profiling feature to analyze historical resource usage data and obtain recommendations for configuring container Requests and Limits. For more information, see Resource profiling. -
Run the following command to create the Nginx application.
kubectl apply -f nginx.yml -
Create a file named hpa.yml and copy the following content into it to create the HPA.
Use
scaleTargetRefto specify the target object for the HPA. In this example, the HPA is bound to a Deployment namednginxand triggers a scaling operation when the average CPU utilization of the containers in all Pods reaches 50%.Kubernetes 1.24 and later
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: nginx-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: nginx minReplicas: 1 # The minimum number of replicas for the Deployment. Must be an integer greater than or equal to 1. maxReplicas: 10 # The maximum number of replicas for the Deployment. Must be greater than minReplicas. metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 # The target average resource utilization, which is the ratio of average resource usage to the requested resource amount.Kubernetes before 1.24
apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: nginx-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: nginx minReplicas: 1 # Must be an integer greater than or equal to 1. maxReplicas: 10 # Must be greater than the minimum number of replicas. metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50If you need to specify both CPU and memory metrics, you can specify both
cpuandmemoryresource types in themetricsfield instead of creating two HPAs. When the HPA detects that any of the metrics reaches the scaling threshold, it will perform a scaling operation.metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 50 -
Run the following command to create the HPA.
kubectl apply -f hpa.ymlAt this point, run the
kubectl describe hpa <HPA name>command. In this example, the HPA name is nginx-hpa. The expected output is the following warning message, which indicates that the HPA is still being deployed. You can run thekubectl get hpacommand to check the status of the HPA.Warning FailedGetResourceMetric 2m (x6 over 4m) horizontal-pod-autoscaler missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5897-mqzs7 Warning FailedComputeMetricsReplicas 2m (x6 over 4m) horizontal-pod-autoscaler failed to get cpu utilization: missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5 -
After the HPA is created and the Pod meets the scaling conditions (in this example, the CPU utilization of the Nginx Pod exceeds 50%), run the
kubectl describe hpa <HPA-name>command again to view the horizontal scaling status.The following output indicates that the HPA is running correctly.
Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 5m6s horizontal-pod-autoscaler New size: 1; reason: All metrics below target
Related operations
If the default scaling behavior cannot meet your business requirements, you can use the behavior field to configure the scale-down (scaleDown) and scale-up (scaleUp) behaviors with finer granularity. For more information, see Configurable scaling behavior.
Typical scenarios for behavior include but are not limited to:
-
Rapidly scale out during sudden traffic spikes.
-
Rapidly scale out but slowly scale in for workloads with frequent fluctuations.
-
Disable scale-in for state-sensitive applications.
-
In resource-constrained or cost-sensitive scenarios, the stabilization window
stabilizationWindowSecondslimits the scale-out rate to reduce frequent adjustments caused by temporary fluctuations.
For a description and examples of the behavior configuration, see Tune HPA scaling sensitivity.
FAQ
-
Why is the `current` field for HPA monitoring data displayed as `unknown`?
-
What should I do if HPA scaling fails due to abnormal metric retrieval?
-
Why does HPA scale out extra pods during a rolling deployment?
-
Is CronHPA compatible with HPA? How do I make it compatible?
-
How do I prevent HPA from scaling out extra pods due to high CPU or memory usage during startup?
-
Why does HPA scale even if the audit log values do not reach the threshold?
-
Can I determine the pod scale-in order during an HPA scale-in?
-
What should I do if the `target` column is `unknown` after I run `kubectl get hpa`?
-
How do you adapt after customizing the NGINX Ingress log format?
-
How do I obtain the sls_ingress_qps metric using the command line?
-
How do I manage a VPA that is installed with kubectl using the console?
Related documentation
Other related operations
-
To implement Horizontal Pod Autoscaling based on metrics from Alibaba Cloud components by using Kubernetes External Metrics, see Horizontal Pod Autoscaling based on Nginx Ingress metrics.
-
To convert metrics from Managed Service for Prometheus into HPA-compatible metrics, see Horizontal Pod autoscaling based on metrics from Managed Service for Prometheus.
-
For HPA troubleshooting, see Node auto scaling FAQ.
-
If you need to coordinate CronHPA and HPA, see Use CronHPA with HPA.
Other workload scaling solutions
-
If your application has periodic resource usage patterns and you need to scale pods based on a schedule, see Use CronHPA for scheduled horizontal scaling.
-
If your application resource usage has cyclical changes that are difficult to define by using rules, you can use AHPA to automatically identify workload patterns based on historical metrics and perform Pod auto scaling. For more information, see Auto Scaling Prediction (AHPA).
-
To automatically set resource limits for pods based on their resource usage, see Use Vertical Pod Autoscaler (VPA).
-
To create flexible scaling policies based on events from message queues, schedules, or custom metrics, see Event-driven autoscaling.
Combined solutions
You can use HPA with the node auto scaling feature to automatically scale cluster nodes when resources are insufficient. For more information, see Enable node auto scaling.