Scale Pods based on CPU and memory using Horizontal Pod Autoscaling (HPA)-Container Service for Kubernetes(ACK)-阿里云帮助中心

Before you begin

To use HPA effectively, we recommend reading the official Kubernetes documentation on Horizontal Pod Autoscaling to understand its fundamental principles, algorithms, and configurable scaling behaviors.

Additionally, Alibaba Cloud Container Service for Kubernetes (ACK) clusters offer various workload scaling (scheduling layer elasticity) and node scaling (resource layer elasticity) solutions. Before you proceed, read Auto scaling to understand the use cases and limitations of each solution.

Prerequisites

You have created an ACK managed cluster or an ACK dedicated cluster. For more information, see Create a Kubernetes cluster.
If you use kubectl commands to implement HPA, you must connect to the Kubernetes cluster with kubectl. For more information, see Obtain the KubeConfig of a cluster and use kubectl to connect to the cluster.

Create an HPA application in the console

ACK is integrated with HPA, allowing you to create HPA-enabled applications from the console. You can enable HPA when you create a new application or for an existing one. We recommend creating only one HPA per workload.

Enable HPA when creating an application

The following example uses a Deployment to show you how to enable HPA when you create an application. The steps for other workload types are similar.

Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click Workloads > Deployments.
On the Deployments page, click Create from Image.
On the Create page, follow the on-screen instructions to configure the basic application information, containers, services, and scaling to create a Deployment that supports HPA.

For detailed steps and parameter descriptions, see Create a stateless workload (Deployment). The following section describes only the key parameters.
- Basic Information: Configure settings for the application, such as its name and number of replicas.
- Container: Configure the image and the required CPU and memory resources.
  
  The Resource profiling feature analyzes historical resource usage and provides recommendations for configuring container Requests and Limits. For more information, see Resource profiling.
  
  Important
  You must set resource Requests for the application to enable HPA.
- Advanced Settings:
  - In the Scaling section, select HPA's Enable, and configure the conditions and parameters for scaling.
    - Metric: The supported metrics are CPU and memory. The metric must be the same as the required resource type that you set. When you specify both CPU and memory resource types, HPA performs a scaling operation when any of the metrics reaches its scaling threshold.
    - Trigger Condition: The resource utilization threshold, specified as a percentage. When resource utilization exceeds this threshold, containers are scaled out. For more information about the algorithm for Horizontal Pod Autoscaling, see Algorithm details.
    - Max. Replicas: The maximum number of replicas that the Deployment can be scaled out to. This value must be greater than the minimum number of replicas.
    - Min. Replicas: The minimum number of replicas for the Deployment. The value must be an integer that is greater than or equal to 1.
After the creation is complete, you can view the created Deployment on the Deployments page. Click the Deployment name, and then on the Deployment details page, click the Pod Scaling tab. In this section, you can view metrics related to HPA activity, such as CPU or memory utilization and the maximum or minimum number of replicas, and manage the HPA by updating its configuration or disabling it.

Enable HPA for an existing application

This example shows how to enable HPA for an existing Deployment. The steps for other workload types are similar.

Workload page

Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click Workloads > Deployments.
On the Deployments page, click the name of the target application, click the Pod Scaling tab, and then click Create in the HPA area.
In the Create dialog box, configure the scaling settings as prompted on the page.
- Name: The name of the HPA policy.
- Metric: Click Add.
  - Metric Name: Supports CPU and memory. The metric must match the resource type that you have configured. When you specify both CPU and memory, the HPA performs a scaling operation as soon as either metric reaches its scaling threshold.
  - Threshold: The percentage of resource utilization above which the containers start to scale out. For more information about the Horizontal Pod Autoscaling algorithm, see Algorithm details.

Max. Containers: The maximum number of replicas to which the Deployment can scale out. This value must be greater than the minimum number of replicas.
Min. Containers: The minimum number of containers to which the Deployment can be scaled down. The value must be an integer greater than or equal to 1.

After the configuration is complete, you can click the Deployment name on the Stateless Application page, and then click the Pod Scaling tab on the Deployment details page. In this area, you can view metrics related to HPA activity, such as CPU or memory utilization and the maximum or minimum number of replicas, and manage the HPA by updating its configuration or disabling it.

Workload scaling page

Note

This page is available only to allowlisted users. To request access, submit a ticket.

Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click Workload Scaling.
In the top-right corner of the page, click Create Auto Scaling, and then click the HPA and CronHPA tab. Select the target workload, and in the Configure Auto Scaling Policy section, select HPA and configure the HPA policy as prompted.
- Scaling Policy Name: The name of the HPA policy.
- Min. Containers: The minimum number of replicas for the workload. This value must be an integer greater than or equal to 1.
- Max. Containers: The maximum number of containers to which the workload can be scaled out. This value must be greater than the minimum number of replicas.
- Metric Name: Supported metrics include CPU, GPU, memory, Nginx Ingress requests, and custom metrics. The metric type must match the specified resource type. If you specify multiple metrics, the Horizontal Pod Autoscaler (HPA) performs a scaling operation when any of the metrics reaches its threshold.
- Threshold: The percentage of resource utilization. If the utilization exceeds this value, containers start to scale out. For more information about the Horizontal Pod Autoscaling algorithm, see Algorithm details.

After the creation is complete, you can view the list of HPAs on the Workload Scaling page. In the Actions column, you can view metrics related to HPA activity, such as resource utilization and the maximum or minimum number of replicas, and manage the HPA by updating its configuration or disabling it.

Verify the results

On the Clusters page, click the name of your cluster. In the left navigation pane, click Workload Scaling.
Click the Horizontal Scaling tab and then select HPA to view the scaling status and task list.

Note

In a production environment, the application scales based on pod load. You can also run a stress test on the pods in a test environment to verify the horizontal scaling behavior.

Create an HPA application by using kubectl

You can also use an orchestration template to manually create an HPA and bind it to the target Deployment. We recommend creating only one HPA per workload. This example shows how to deploy an Nginx application that supports HPA.

Create a file named nginx.yml and copy the following content into it.

Important

To implement HPA, you must set request resources for the Pod. Otherwise, HPA cannot run. You can use the resource profiling feature to analyze historical resource usage data and obtain recommendations for configuring container Requests and Limits. For more information, see Resource profiling.

YAML template

apiVersion: apps/v1 
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx  
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9 # Replace with your actual <image_name:tags>.
        ports:
        - containerPort: 80
        resources:
          requests:         # You must set requests. Otherwise, HPA cannot calculate metrics, and the status will be 'unknown'.
            cpu: 500m

Run the following command to create the Nginx application.
```
kubectl apply -f nginx.yml
```

Create a file named hpa.yml and copy the following content into it to create the HPA.

Use scaleTargetRef to specify the target object for the HPA. In this example, the HPA is bound to a Deployment named nginx and triggers a scaling operation when the average CPU utilization of the containers in all Pods reaches 50%.

Kubernetes 1.24 and later

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1  # The minimum number of replicas for the Deployment. Must be an integer greater than or equal to 1.
  maxReplicas: 10  # The maximum number of replicas for the Deployment. Must be greater than minReplicas.
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50 # The target average resource utilization, which is the ratio of average resource usage to the requested resource amount.

Kubernetes before 1.24

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1  # Must be an integer greater than or equal to 1.
  maxReplicas: 10  # Must be greater than the minimum number of replicas.
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50

If you need to specify both CPU and memory metrics, you can specify both cpu and memory resource types in the metrics field instead of creating two HPAs. When the HPA detects that any of the metrics reaches the scaling threshold, it will perform a scaling operation.

metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 50

Run the following command to create the HPA.

kubectl apply -f hpa.yml

At this point, run the kubectl describe hpa <HPA name> command. In this example, the HPA name is nginx-hpa. The expected output is the following warning message, which indicates that the HPA is still being deployed. You can run the kubectl get hpa command to check the status of the HPA.

Warning  FailedGetResourceMetric       2m (x6 over 4m)  horizontal-pod-autoscaler  missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5897-mqzs7

Warning  FailedComputeMetricsReplicas  2m (x6 over 4m)  horizontal-pod-autoscaler  failed to get cpu utilization: missing request for cpu on container nginx in pod default/nginx-deployment-basic-75675f5

After the HPA is created and the Pod meets the scaling conditions (in this example, the CPU utilization of the Nginx Pod exceeds 50%), run the kubectl describe hpa <HPA-name> command again to view the horizontal scaling status.

The following output indicates that the HPA is running correctly.
```
Type    Reason             Age   From                       Message
  ----    ------             ----  ----                       -------
  Normal  SuccessfulRescale  5m6s  horizontal-pod-autoscaler  New size: 1; reason: All metrics below target
```

Related operations

If the default scaling behavior cannot meet your business requirements, you can use the behavior field to configure the scale-down (scaleDown) and scale-up (scaleUp) behaviors with finer granularity. For more information, see Configurable scaling behavior.

Typical scenarios for behavior include but are not limited to:

Rapidly scale out during sudden traffic spikes.
Rapidly scale out but slowly scale in for workloads with frequent fluctuations.
Disable scale-in for state-sensitive applications.
In resource-constrained or cost-sensitive scenarios, the stabilization window stabilizationWindowSeconds limits the scale-out rate to reduce frequent adjustments caused by temporary fluctuations.

For a description and examples of the behavior configuration, see Tune HPA scaling sensitivity.

Using Horizontal Pod Autoscaling (HPA)