Service Mesh (ASM) collects telemetry data from ACK and ACS clusters non-intrusively, generating service metrics based on the four golden signals: latency, traffic, errors, and saturation. This topic explains how to configure a Horizontal Pod Autoscaler (HPA) that scales workloads based on these ASM metrics — going beyond CPU and memory to scale on real traffic patterns.
How it works
ASM exposes service metrics (such as requests per second) to Prometheus. A custom metrics adapter — kube-metrics-adapter — registers these Prometheus metrics with the Kubernetes aggregation layer, making them available to HPAs via the custom metrics API.
The end-to-end flow:
ASM collects request metrics and writes them to Prometheus.
kube-metrics-adapter queries Prometheus and registers the metrics as external metrics in Kubernetes.
The HPA polls the external metrics API every 30 seconds and adjusts the replica count when the metric value crosses the threshold.
For a full list of ASM-generated metrics, see Istio Standard Metrics.
Prerequisites
Before you begin, ensure that you have:
An ACK cluster or ACS cluster. See Create an ACK managed cluster or Create an ACS cluster.
An ASM instance. See Create an ASM instance.
A Prometheus instance and a Grafana instance deployed in the clusters. See Use open source Prometheus to monitor an ACK cluster.
A Prometheus instance configured to monitor the ASM instance. See Monitor ASM instances by using a self-managed Prometheus instance.
Step 1: Enable Prometheus monitoring for the ASM instance
Follow the instructions in Collect metrics to Managed Service for Prometheus to enable Prometheus scraping for your ASM instance.
Step 2: Deploy the custom metrics adapter
The custom metrics adapter (kube-metrics-adapter) bridges Prometheus metrics and the Kubernetes external metrics API, so HPAs can query ASM metrics directly.
Install kube-metrics-adapter into the
kube-systemnamespace using Helm 3. Setprometheus.urlto the in-cluster address of your Prometheus instance. For the chart source, see kube-metrics-adapter.Parameter Description asm-custom-metricsHelm release name prometheus.urlIn-cluster address of the Prometheus instance that scrapes ASM metrics helm -n kube-system install asm-custom-metrics ./kube-metrics-adapter \ --set prometheus.url=http://prometheus.istio-system.svc:9090Verify that the adapter is running:
Check that the
autoscaling/v2betaAPI group is registered:kubectl api-versions | grep "autoscaling/v2beta"Expected output:
autoscaling/v2betaCheck that the adapter pod is running:
kubectl get po -n kube-system | grep metrics-adapterExpected output:
asm-custom-metrics-kube-metrics-adapter-85c6d5d865-2**** 1/1 Running 0 19sCheck that the external metrics API is available (no metrics registered yet):
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .Expected output:
{ "kind": "APIResourceList", "apiVersion": "v1", "groupVersion": "external.metrics.k8s.io/v1beta1", "resources": [] }
Step 3: Deploy a sample application
This step deploys a podinfo application and a load testing service in the test namespace, so you can later trigger and observe auto scaling.
Create the
testnamespace. See Manage namespaces and resource quotas.Enable automatic sidecar proxy injection for the
testnamespace. See Enable automatic sidecar proxy injection.Deploy the podinfo application. Create a file named
podinfo.yamlwith the following content, then apply it.apiVersion: apps/v1 kind: Deployment metadata: name: podinfo namespace: test labels: app: podinfo spec: minReadySeconds: 5 strategy: rollingUpdate: maxUnavailable: 0 type: RollingUpdate selector: matchLabels: app: podinfo template: metadata: annotations: prometheus.io/scrape: "true" labels: app: podinfo spec: containers: - name: podinfod image: stefanprodan/podinfo:latest imagePullPolicy: IfNotPresent ports: - containerPort: 9898 name: http protocol: TCP command: - ./podinfo - --port=9898 - --level=info livenessProbe: exec: command: - podcli - check - http - localhost:9898/healthz initialDelaySeconds: 5 timeoutSeconds: 5 readinessProbe: exec: command: - podcli - check - http - localhost:9898/readyz initialDelaySeconds: 5 timeoutSeconds: 5 resources: limits: cpu: 2000m memory: 512Mi requests: cpu: 100m memory: 64Mi --- apiVersion: v1 kind: Service metadata: name: podinfo namespace: test labels: app: podinfo spec: type: ClusterIP ports: - name: http port: 9898 targetPort: 9898 protocol: TCP selector: app: podinfokubectl apply -n test -f podinfo.yamlDeploy the load testing service. Create a file named
loadtester.yamlwith the following content, then apply it.apiVersion: apps/v1 kind: Deployment metadata: name: loadtester namespace: test labels: app: loadtester spec: selector: matchLabels: app: loadtester template: metadata: labels: app: loadtester annotations: prometheus.io/scrape: "true" spec: containers: - name: loadtester image: weaveworks/flagger-loadtester:0.18.0 imagePullPolicy: IfNotPresent ports: - name: http containerPort: 8080 command: - ./loadtester - -port=8080 - -log-level=info - -timeout=1h livenessProbe: exec: command: - wget - --quiet - --tries=1 - --timeout=4 - --spider - http://localhost:8080/healthz timeoutSeconds: 5 readinessProbe: exec: command: - wget - --quiet - --tries=1 - --timeout=4 - --spider - http://localhost:8080/healthz timeoutSeconds: 5 resources: limits: memory: "512Mi" cpu: "1000m" requests: memory: "32Mi" cpu: "10m" securityContext: readOnlyRootFilesystem: true runAsUser: 10001 --- apiVersion: v1 kind: Service metadata: name: loadtester namespace: test labels: app: loadtester spec: type: ClusterIP selector: app: loadtester ports: - name: http port: 80 protocol: TCP targetPort: httpkubectl apply -n test -f loadtester.yamlVerify that both workloads are running:
kubectl get pod -n testExpected output (both pods show
2/2 Running, indicating the app container and the Istio sidecar are both ready):NAME READY STATUS RESTARTS AGE loadtester-64df4846b9-nxhvv 2/2 Running 0 2m8s podinfo-6d845cc8fc-26xbq 2/2 Running 0 11mSend a short burst of traffic to confirm the setup is working end-to-end:
export loadtester=$(kubectl -n test get pod -l "app=loadtester" -o jsonpath='{.items[0].metadata.name}') kubectl -n test exec -it ${loadtester} -c loadtester -- hey -z 5s -c 10 -q 2 http://podinfo.test:9898A successful response from
heyconfirms that the podinfo service is reachable through the mesh.
Step 4: Configure an HPA using ASM metrics
Define an HPA that scales the podinfo deployment based on the number of incoming requests per second, as measured by the istio_requests_total metric in Prometheus.
The HPA uses two Kubernetes constructs together:
An annotation that embeds the PromQL query and gives it a name (
processed-requests-per-second).A metric reference in
spec.metricsthat points to the named query and sets the scale threshold.
Create a file named hpa.yaml with the following content, then apply it:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: podinfo
namespace: test
annotations:
# The annotation key format is:
# metric-config.external.prometheus-query.prometheus/<query-name>
# The query-name must match the value of the matchLabels selector below.
metric-config.external.prometheus-query.prometheus/processed-requests-per-second: |
sum(
rate(
istio_requests_total{
destination_workload="podinfo",
destination_workload_namespace="test",
reporter="destination"
}[1m]
)
)
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: podinfo
metrics:
- type: External
external:
metric:
name: prometheus-query
selector:
matchLabels:
query-name: processed-requests-per-second # matches the annotation key suffix above
target:
type: AverageValue
averageValue: "10" # scale out when average RPS per replica exceeds 10kubectl apply -f hpa.yamlKey fields explained:
| Field | Value | Description |
|---|---|---|
metric-config.external.prometheus-query.prometheus/<query-name> annotation | PromQL expression | Defines the query that kube-metrics-adapter runs against Prometheus. The <query-name> suffix must match the query-name label in spec.metrics. |
query-name label | processed-requests-per-second | Links the annotation (the PromQL query) to the metric reference in spec.metrics. |
averageValue | "10" | The HPA scales out when the average number of requests per second per replica exceeds 10. |
minReplicas / maxReplicas | 1 / 10 | Replica count bounds. |
After applying the HPA, verify that the external metric is now registered:
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1" | jq .Expected output:
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "external.metrics.k8s.io/v1beta1",
"resources": [
{
"name": "prometheus-query",
"singularName": "",
"namespaced": true,
"kind": "ExternalMetricValueList",
"verbs": [
"get"
]
}
]
}The prometheus-query entry in resources confirms that kube-metrics-adapter has registered the metric and the HPA is active.
Verify auto scaling
Open a terminal and start a sustained load against podinfo (5 minutes, 10 concurrent users, 5 requests/second each):
kubectl -n test exec -it ${loadtester} -c loadtester -- sh ~ $ hey -z 5m -c 10 -q 5 http://podinfo.test:9898In a separate terminal, watch the HPA scale up:
Metrics are synchronized every 30 seconds by default. The HPA also enforces a cooldown of 3–5 minutes between scale events to prevent thrashing.
watch kubectl -n test get hpa/podinfoAs load increases above the threshold, the HPA scales out:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE podinfo Deployment/podinfo 8308m/10 (avg) 1 10 6 124mThe value
8308mis Kubernetes milli-unit notation for 8.308 requests per second. Because the average RPS per replica (8.3) is below the threshold of 10, the HPA has stabilized at 6 replicas. If the load were higher, the HPA would continue scaling up toward the 10-replica maximum.After the load test finishes, the request rate drops to zero. The HPA begins scaling down, and within a few minutes the replica count returns to 1.