This topic describes common issues and solutions that you may encounter when you use the workload scaling feature, including Horizontal Pod Autoscaler (HPA), CronHPA, and Vertical Pod Autoscaler (VPA).
In this topic
What should I do if HPA scaling fails due to metric collection errors?
How do I prevent HPA from creating extra pods due to CPU or memory spikes at startup?
Why did HPA scale my workload even if the metric in the audit log did not reach the threshold?
Can I control the pod termination order during an HPA scale-in?
What should I do if the TARGETS column shows
after I run kubectl get hpa? How do I use HPA after customizing the NGINX Ingress log format?
How do I get the sls_ingress_qps metric from the command line?
How do I manage a VPA installed by using kubectl from the console?
Why does the current field in HPA metrics show unknown?
When the current field of the HPA monitoring data is unknown, HPA scaling fails because the kube-controller-manager cannot access the monitoring data source to retrieve the corresponding monitoring data.
Name: kubernetes-tutorial-deployment
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Mon, 10 Jun 2019 11:46:48 0530
Reference: Deployment/kubernetes-tutorial-deployment
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unknown> / 2%
Min replicas: 1
Max replicas: 4
Deployment pods: 1 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetResourceMetric the HPA was unable to compute the replica count: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetResourceMetric 3m3s (x1009 over 4h18m) horizontal-pod-autoscaler unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server is currently unable to handle the request (get pods.metrics.k8s.io)Cause 1: The resource metrics data source is unavailable
First, run the command kubectl top pod to check if data is returned. If no data is returned for any of the pods, run kubectl get apiservice to check the status of the data source that provides Resource Metrics. An example of the returned data is as follows.
If the API Service for v1beta1.metrics.k8s.io is not kube-system/metrics-server, check if the service was overwritten by the installation of Prometheus Operator. If so, you can restore the service by deploying the following YAML template.
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1beta1.metrics.k8s.io
spec:
service:
name: metrics-server
namespace: kube-system
group: metrics.k8s.io
version: v1beta1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100If the issue is not caused by the preceding reason, go to the Operations > Add-ons page for your cluster and verify that the metrics-server component is installed. For more information, see metrics-server.
Cause 2: Data cannot be retrieved during a rolling update or scale-out
By default, the metrics-server collection interval is 1 minute. After a scale-out or update, metrics-server cannot retrieve metrics for a short period. Check the metrics about 2 minutes after the scaling or update is complete.
Cause 3: The request field is not configured
By default, HPA uses actual usage/request as the utilization value. Therefore, you can check whether the resource field of a pod contains a request field.
Cause 4: The metric name is incorrect
Verify that the metric name is correct, including its case. For example, if you mistype the cpu metric supported by HPA as CPU, the current field in the monitoring data will show unknown.
What should I do if HPA scaling fails due to metric collection errors?
HPA scaling may fail if there is an issue with metric retrieval. In this case, the current field in the HPA monitoring data will show as unknown. This prevents the HPA from obtaining the metrics required to make scaling decisions, so it cannot adjust the number of pods. See Node Autoscaling FAQ to troubleshoot the issue and find a solution.
Why does HPA create extra pods during a rolling update?
During a rolling update, the community-provided controller manager fills in zero values for pods that do not have metric data. This can sometimes cause HPA to create more pods than necessary. To prevent this behavior, use one of the following configurations.
Cluster-level configuration
Upgrade the ACK metrics-server to the latest version and add the following startup parameter.
This is a global setting that affects all relevant workloads in the cluster.
# Add the following option to the metrics-server startup parameters.
--enable-hpa-rolling-update-skipped=true Workload-level configuration
To prevent this behavior for specific workloads, use one of the following methods.
Method 1: Add the following annotation to the template of the target workload. This temporarily suspends HPA evaluation during a rolling update.
# Add this annotation to spec.template.metadata.annotations to temporarily suspend HPA evaluation during a rolling update. HPARollingUpdateSkipped: "true"Method 2: Add the following annotation to the template of the target workload. This tells HPA to ignore the pod for a specified warm-up period after it starts.
# Add this annotation to spec.template.metadata.annotations to skip HPA evaluation for a specified warm-up period. HPAScaleUpDelay: 3m # The value 3m is an example. Set the duration based on your needs.
Why does HPA not scale when the threshold is reached?
HPA triggers scaling based on more than just whether the CPU or memory usage is above or below the threshold. It also considers whether a scale-out or scale-in action might immediately trigger an opposite action, which helps prevent rapid fluctuations, also known as thrashing.
For example, assume your scale-out threshold is set to 80% and you have two pods, each with 70% CPU usage. In this case, HPA will not scale in. If it did, the CPU usage of the single remaining pod would likely exceed 80%, immediately triggering a scale-out and leading to thrashing.
How do I configure the HPA metric collection interval?
For metric-server versions greater than v0.2.1-b46d98c-aliyun, set the --metric-resolution startup parameter, such as --metric-resolution=15s.
Is CronHPA compatible with HPA?
Yes, CronHPA is compatible with HPA. In Container Service for Kubernetes (ACK), CronHPA sets its scaleTargetRef to the HPA object. It then uses the HPA object to find the actual scaleTargetRef. This allows CronHPA to be aware of the HPA's current state. CronHPA does not directly adjust the replica count of a Deployment. Instead, it operates through the HPA, which prevents conflicts between the two controllers. For more information, see Coordinate CronHPA and HPA.
How do I prevent HPA from creating extra pods due to CPU or memory spikes at startup?
For applications that require a warm-up period, such as those written in Java, CPU and memory usage can spike for several minutes after a container starts. This can cause HPA to scale out unnecessarily. To resolve this, upgrade the metrics-server component provided by ACK to version 0.3.9.6 or later and add an annotation to your pods to prevent false triggers. For more information about how to upgrade the metrics-server component, see Upgrade the metrics-server component before you upgrade a cluster to Kubernetes 1.12.
The following YAML provides an example of how to add the annotation.
Why did HPA scale my workload even if the metric in the audit log did not reach the threshold?
Cause
The Horizontal Pod Autoscaler calculates the scaling ratio based on the current metric and the target metric, where desired number of replicas = ceil(current number of replicas × (current metric / target metric)).
This formula shows that the accuracy of the desired replica count is determined by the accuracy of the current replica count, current metric, and desired metric. Take the widely used resource metrics in HPA as an example. When HPA retrieves the current replica count, it first obtains the scale subresource (subResources) of the object defined by scaleTargetRef. Then, it converts the value of Selector in the status of the scale object to a labelselector and uses it as a condition to match and retrieve Pods. If at any given time, the Pods retrieved by using this condition do not exclusively belong to the object defined in scaleTargetRef, the calculated desired replica count may be incorrect (for example, scaling up even though the real-time metric is below the threshold).
Common reasons for an inaccurate pod count include:
A rolling update is in progress.
Other pods that do not belong to the scaleTargetRef object have the same label. Run the following command to check for such pods:
kubectl get pods -n {your-namespace} -l {value-of-status.selector}
Solution
For issues related to rolling updates, see Why does HPA create extra pods during a rolling update?.
If other pods have the same label, locate these pods. If the pods are still in use, change their labels. If they are no longer needed, delete them.
Can I control the pod termination order during an HPA scale-in?
HPA automatically increases or decreases the number of pods based on defined metrics, but it does not directly decide which pods are terminated first. The pod termination order and graceful shutdown period are determined by the controller that manages the pods, such as a Deployment.
In scenarios where you use a mix of compute resources, such as ECS instances and serverless ECI instances or multiple node pools, you can specify the scale-in priority by configuring a custom ResourcePolicy for your application. By using HPA with a Deployment and a custom ResourcePolicy, you can prioritize the termination of pods on serverless ECI nodes over those on ECS nodes. For more information, see Custom priority-based scheduling of elastic resources.
What does the unit of an HPA utilization metric mean?
Usage metrics are typically unitless integers or integers that use m as the unit, where the conversion ratio is 1000m=1. For example, when tcp_connection_counts is 70000m, it is equivalent to 70.
What should I do if the kubectl get hpa command shows unknown in the TARGETS column?
Follow these steps to resolve the issue.
Run the
kubectl describe hpa <hpa_name>command to determine the cause of the HPA failure.If the
Conditionsfield indicates thatAbleToScaleisFalse, verify that the Deployment is deployed correctly.If the
Conditionsfield indicates thatScalingActiveisFalse, continue to the next step.
Run
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/". If the command returnsError from server (NotFound): the server could not find the requested resource, check the status of alibaba-cloud-metrics-adapter.If the alibaba-cloud-metrics-adapter is running properly, check if the HPA metric is related to an Ingress. If so, first deploy the Log Service component. For more information, see Collect and analyze NGINX Ingress access logs.
Verify that the HPA metric is specified correctly. The value of sls.ingress.route must be in the format
<namespace>-<svc>-<port>.namespace: The namespace where the Ingress is located.svc: The name of the Service that corresponds to the Ingress.port: The name of the port on the Service that the Ingress maps to.
How do I find the metrics supported by HPA?
For a list of supported metrics, see Alibaba Cloud HPA metrics. The following table describes some commonly used metrics.
Metric name | Description | Additional parameter |
sls_ingress_qps | The queries per second (QPS) for a specified Ingress route. | sls.ingress.route |
sls_alb_ingress_qps | The QPS for an ALB Ingress route. | sls.ingress.route |
sls_ingress_latency_avg | The average latency of all requests. | sls.ingress.route |
sls_ingress_latency_p50 | The 50th percentile request latency. | sls.ingress.route |
sls_ingress_latency_p95 | The 95th percentile request latency. | sls.ingress.route |
sls_ingress_latency_p99 | The 99th percentile request latency. | sls.ingress.route |
sls_ingress_latency_p9999 | The 99.99th percentile request latency. | sls.ingress.route |
sls_ingress_inflow | The inbound bandwidth of the Ingress. | sls.ingress.route |
How do I use HPA after customizing the NGINX Ingress log format?
To scale pods based on NGINX Ingress metrics from Log Service, enable and correctly configure NGINX Ingress log collection for your cluster. For more information, see Scale pods based on NGINX Ingress metrics.
When you create a cluster, Log Service is enabled by default. If you keep the default settings, you can view NGINX Ingress access log analysis reports and monitor the real-time status of your NGINX Ingress in the Log Service console after the cluster is created.
If you manually disabled Log Service when you created the cluster, re-enable or configure it to use Log Service Ingress metrics for pod scaling. For more information, see Collect and analyze NGINX Ingress access logs.
The
AliyunLogConfigCRD that is deployed when you first enable Log Service in a cluster applies only to the log format of the default ACK Ingress Controller. If you modify the access log format of the Ingress Controller, you must also modify theprocessor_regexsection in the CRD configuration. For more information, see Collect container logs by using a DaemonSet-CRD configuration.
How do I get the sls_ingress_qps metric from the command line?
Use the following command to query the sls_ingress_qps metric:
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/*/sls_ingress_qps?labelSelector=sls.project={{SLS_Project}},sls.logstore=nginx-ingress"Here, {{SLS_Project}} is the name of the SLS Project for the ACK cluster. If you do not specify a custom name, the default is k8s-log-{{ClusterId}}, where {{ClusterId}} is the cluster ID.
If the result is:
Error from server: {
"httpCode": 400,
"errorCode": "ParameterInvalid",
"errorMessage": "key (slb_pool_name) is not config as key value config,if symbol : is in your log,please wrap : with quotation mark \"",
"requestID": "xxxxxxx"
}This indicates that no data is available for this metric. This might happen if you are querying for the sls_alb_ingress_qps metric but are not using an ALB Ingress.
If the result is similar to the following:
{
"kind": "ExternalMetricValueList",
"apiVersion": "external.metrics.k8s.io/v1beta1",
"metadata": {},
"items": [
{
"metricName": "sls_ingress_qps",
"timestamp": "2023-02-26T16:45:00Z",
"value": "50", # QPS value
"metricLabels": {
"sls.project": "your-sls-project-name",
"sls.logstore": "nginx-ingress"
}
}
]
}This indicates that the Kubernetes external metric QPS is retrieved, where value is the QPS value.
Failed to pull alibaba-cloud-metrics-adapter image
Symptom
When you upgrade the ack-alibaba-cloud-metrics-adapter add-on to version 1.3.7, the image pull fails. The error message is similar to the following:
Failed to pull image "registry-<region-id>-vpc.ack.aliyuncs.com/acs/alibaba-cloud-metrics-adapter-amd64:v0.2.9-ba634de-aliyun".
Cause
The ack-alibaba-cloud-metrics-adapter add-on does not support in-place upgrades.
Solution
Upgrade the add-on by following these steps:
Back up the current add-on configuration.
Uninstall the old version of the add-on.
Install the latest version of the add-on by using the backed-up configuration.
During the uninstallation and reinstallation process, related HPAs pause scaling because metric collection stops.