当应用接口的请求访问量飙升时,您可以通过Java应用接口的QPS配置HPA弹性策略,实现应用的弹性扩缩。本文介绍如何通过ARMS APM应用监控服务实现应用的HPA弹性伸缩。
工作原理
将ACK集群中的Java应用接入ARMS APM应用监控服务后,您可以通过ARMS APM获取应用接口的访问详情。关于如何将Java应用接入ARMS APM应用监控服务,请参见Java应用监控。ARMS APM应用监控服务将ARMS APM数据转换为阿里云Prometheus数据格式,alibaba-cloud-metrics-adapter组件将阿里云Prometheus指标转换成HPA可用的指标,最终实现应用的HPA弹性伸缩。
本文以部署应用arms-springboot-demo,并压测其中接口/demo/queryUser/10为例进行介绍。
前提条件
已部署阿里云Prometheus监控组件。具体操作,请参见开启阿里云Prometheus监控。
已在命名空间kube-system中部署alibaba-cloud-metrics-adapter组件。具体操作,请参见步骤一:部署ack-alibaba-cloud-metrics-adapter组件。
已创建命名空间。具体操作,请参见管理命名空间与配额。本文创建的示例命名空间为arms-demo。
已安装JDK。关于ARMS APM应用监控支持的JDK版本,请参见ARMS应用监控支持的Java组件和框架。
操作流程
操作视频
通过ARMS APM应用监控服务实现HPA弹性伸缩
步骤一:安装ARMS APM应用监控组件
为应用接入ARMS APM应用监控功能,需要在集群中安装ARMS APM应用监控组件one-pilot。
登录容器服务管理控制台,在左侧导航栏选择集群列表。
在集群列表页面,单击目标集群名称,然后在左侧导航栏,选择 。
在组件管理页面,搜索并定位ack-onepilot组件,在组件卡片区域单击安装,然后按照对话框提示配置参数,并单击确定。
步骤二:授予ARMS资源的访问权限
如需监控ACK Serverless集群或对接了ECI的集群应用,请在访问控制快速授权页面完成授权,然后重启ack-onepilot组件下的所有Pod。
如需监控ACK集群应用,请先查看是否存在ARMS Addon Token。
如果ACK集群存在ARMS Addon Token,此时ARMS会进行免密授权。
Kubernetes托管版集群默认存在ARMS Addon Token。但对于部分早期创建的Kubernetes托管版集群可能不存在ARMS Addon Token,请参考下文内容手动为集群授予ARMS资源的访问权限。
如果ACK集群中不存在ARMS Addon Token,请执行以下操作,手动为集群授予ARMS资源的访问权限。
创建自定义权限策略,策略内容如下。具体操作,请参见步骤一:创建自定义权限策略。
{ "Action": "arms:*", "Resource": "*", "Effect": "Allow" }
为集群的WorkerRole添加上一步创建的自定义权限。具体操作,请参见步骤二:为集群的Worker RAM角色授权。
步骤三:为Java应用开启ARMS APM应用监控
在集群中部署Java应用时,通过为应用打上Labels的方式开启ARMS APM应用监控。
登录容器服务管理控制台,在左侧导航栏选择集群列表。
在集群列表页面,单击目标集群名称,然后在左侧导航栏,选择 。
在无状态页面右上角,单击使用YAML创建资源。
选择示例模板,并在模板(YAML格式)中将以下
labels
添加到spec.template.metadata层级下。labels: armsPilotAutoEnable: "on" armsPilotCreateAppName: "<your-deployment-name>" # 请将<your-deployment-name>替换为您的应用名称。 one-agent.jdk.version: "OpenJDK11" # 如果应用的JDK版本是JDK 11,则需要配置此参数。 armsSecAutoEnable: "on" # 如果需要接入应用安全,则需要配置此参数。
以下提供YAML示例模板,展示如何创建一个无状态(Deployment)应用并开启ARMS APM应用监控。
查看部署ARMS APM应用效果。
在无状态页面,目标应用的操作列将出现ARMS控制台按钮。
您可以单击ARMS控制台跳转查看监控数据。在左侧导航栏,单击接口调用,查看应用接口(如HTTP接口)的访问详情。此处提供的Demo应用arms-springboot-demo,已自动产生了平稳的接口调用。
手动创建关联应用arms-springboot-demo的Service,并开启负载均衡来访问此应用的接口。
在集群列表页面,单击目标集群名称,然后在左侧导航栏,选择 。
单击页面右上角创建,创建关联应用的Service,然后单击创建。关于配置项的说明,请参见创建服务。
稍等片刻,创建完成。在服务页面记录arms-demo-svc的外部端口,例如47.94.XX.XX:8080。
执行如下命令,通过外部端口访问此服务的/demo/queryUser/10接口。
curl http://47.94.XX.XX:8080/demo/queryUser/10
预期输出:
{"id":1,"name":"KeyOfSpectator","password":"12****"}
预期输出表明,接口访问正常。
步骤四:对接alibaba-cloud-metrics-adapter组件
请确保已部署阿里云Prometheus监控组件,否则无法进行本操作。具体操作,请参见开启阿里云Prometheus监控。
请确保已在命名空间kube-system中部署alibaba-cloud-metrics-adapter组件,否则无法进行本操作。具体操作,请参见步骤一:部署ack-alibaba-cloud-metrics-adapter组件。
登录ARMS控制台。
在左侧导航栏选择 ,进入可观测监控 Prometheus 版的实例列表页面。
在实例列表页面,单击目标实例名称(格式为arms_metrics_{RegionId}_XXX),在左侧导航栏单击设置,然后在右侧设置页签的最下方查看并记录HTTP API地址(Grafana 读取地址),即Prometheus URL。
在ack-alibaba-cloud-metrics-adapter中填入上一步中记录的HTTP API地址(Grafana 读取地址)(Prometheus URL)。
修改ack-alibaba-cloud-metrics-adapter的adapter-config配置。
在Helm页面,单击ack-alibaba-cloud-metrics-adapter。
在基本信息页签,单击adapter-config。
单击页面右上方YAML 编辑。
将如下内容添加至adapter-config中。
rules: - metricsQuery: sum(sum_over_time_lorc(<<.Series>>{service="arms-k8s-demo",clusterId="cc13c8725******a9839190b7d1695d7",serverIp=~".*",callKind=~"http|rpc|custom_entry|server|consumer|schedule",source="apm",<<.LabelMatchers>>}[1m])) or vector(0) name: as: "${1}_per_second" matches: "^(.*)_count_ign_destid_endpoint_ppid_prpc" resources: namespaced: false seriesQuery: arms_app_requests_count_ign_destid_endpoint_ppid_prpc{service="arms-k8s-demo",clusterId="cc13c8725******a9839190b7d1695d7"}
完整示例如下:
执行如下命令,查看集群中指标数据。
执行如下命令,查看指标arms_app_requests_per_second是否存在。
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1"
预期输出:
{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"external.metrics.k8s.io/v1beta1","resources":[{"name":"slb_l4_packet_rx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_cpu_util","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_custom_week","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_custom_month","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"billing_pretax_gross_amount_total","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_usage","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_network_rx_rate","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_network_rx_errors","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_p95","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_cpu_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_week","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"metrics_kube_pod_labels","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_total","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_p9999","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_status_3xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"ahas_sentinel_block_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_month","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_custom_hour","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_pod_cpu_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_traffic_rx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_traffic_tx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_packet_tx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_connection_utilization","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_rt","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_cpu_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_cpu_utilization","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_memory_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_total_hour","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"memory_usage_average","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_pod_memory_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_cpu_limit","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"ahas_sentinel_pass_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_day","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_min","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_ratio","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_custom","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_status_2xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_upstream_5xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_upstream_rt","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_hour","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_total_month","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"metrics_kube_pod_info","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"metrics_kube_node_info","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"memory_request_average","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_avg","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_status_5xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_limit","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_working_set","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_memory_utilization","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_active_connection","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_max_connection","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"billing_pretax_amount_node","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_status_4xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_upstream_4xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_cache","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_custom_day","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"ahas_sentinel_total_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_alb_ingress_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_rss","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_network_tx_rate","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_cpu_usage","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_total_min","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_total_week","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"billing_pretax_amount_total","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_p99","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_percorepricing","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_total_day","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cpu_core_request_average","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"ahas_sentinel_avg_rt","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_memory_limit","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cpu_core_usage_average","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_p50","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_inflow","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_network_tx_errors","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_cpu_limit","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_memory_usage","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_node","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"arms_app_requests_per_second","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]}]}
预期输出表明,指标arms_app_requests_per_second存在。
执行如下命令,查看指标实时数据。
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/arms-demo/arms_app_requests_per_second"| jq .
预期输出:
{ "kind": "ExternalMetricValueList", "apiVersion": "external.metrics.k8s.io/v1beta1", "metadata": {}, "items": [ { "metricName": "arms_app_requests_per_second", "metricLabels": {}, "timestamp": "2025-02-13T02:51:31Z", "value": "2" } ] }
预期输出表明,实时数据返回正常。
步骤五:配置APM指标进行HPA扩缩
使用如下内容,创建hpa.yaml。
hpa.yaml中的配置指标名与上一步ack-alibaba-cloud-metrics-adapter中定义的指标名需保持一致。
hpa.yaml中的
target
为弹性阈值,当QPS > 40时进行扩容。
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: test-hpa namespace: arms-demo spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: arms-springboot-demo minReplicas: 1 maxReplicas: 10 metrics: - type: External external: metric: name: arms_app_requests_per_second # External指标类型下只支持Value和AverageValue类型的目标值。 target: type: AverageValue averageValue: 40
执行如下命令,对业务应用arms-springboot-demo部署HPA。
kubectl apply -f hpa.yaml
执行如下命令,查看HPA详情。
kubectl get hpa -n arms-demo
预期输出:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE test-hpa Deployment/arms-springboot-demo 12/40 (avg) 1 10 1 113s
预期输出表明,Targets存在数据,HPA配置成功。
步骤六:通过压测查看弹性扩缩容效果
执行如下命令,对Demo应用进行压测实验。
替换
47.94.XX.XX:8080
为服务arms-demo-svc的外部端口。ab -c 50 -n 2000 http://47.94.XX.XX:8080/demo/queryUser/10
执行如下命令,查看HPA详情。
kubectl get hpa -n arms-demo
预期输出:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE test-hpa Deployment/arms-springboot-demo 47500m/40 (avg) 1 10 10 6m43s
预期输出表明,Targets存在数据,HPA配置成功。
查看弹性扩缩容效果。
可以在ARMS APM控制台看到,此接口的请求量因压测而急速上升。
可以在Prometheus大盘看到,当应用接口的QPS值超过阈值时,达到了HPA扩缩的效果。
在ACK集群中可以看到此示例应用的Pod副本数随接口调用的QPS而进行扩缩。
可以执行
kubectl describe hpa test-hpa -n arms-demo
查看发生的扩缩容事件。
拓展示例
下文介绍对应场景下,metrics-adapter的配置示例。
多个Service配置指标转换。
rules:
- metricsQuery: sum(sum_over_time_lorc(<<.Series>>{service="arms-k8s-demo",clusterId="cc13c8725******a9839190b7d1695d7",serverIp=~".*",callKind=~"http|rpc|custom_entry|server|consumer|schedule",source="apm",<<.LabelMatchers>>}[1m])) or vector(0)
name:
as: "${1}_per_second_arms_k8s_demo"
matches: "^(.*)_count_ign_destid_endpoint_ppid_prpc"
resources:
namespaced: false
seriesQuery: arms_app_requests_count_ign_destid_endpoint_ppid_prpc{service="arms-k8s-demo",clusterId="cc13c8725******a9839190b7d1695d7"}
- metricsQuery: sum(sum_over_time_lorc(<<.Series>>{service="arms-k8s-demo-subcomponent",clusterId="cc13c8725******a9839190b7d1695d7",serverIp=~".*",callKind=~"http|rpc|custom_entry|server|consumer|schedule",source="apm",<<.LabelMatchers>>}[1m])) or vector(0)
name:
as: "${1}_per_second_arms_k8s_demo_subcomponent"
matches: "^(.*)_count_ign_destid_endpoint_ppid_prpc"
resources:
namespaced: false
seriesQuery: arms_app_requests_count_ign_destid_endpoint_ppid_prpc{service="arms-k8s-demo-subcomponent",clusterId="cc13c8725******a9839190b7d1695d7"}
多个RPC配置指标转换
rules:
- metricsQuery: sum(sum_over_time_lorc(<<.Series>>{service="arms-k8s-demo",rpc="/demo/queryUser/{id}",clusterId="cc13c8725******a9839190b7d1695d7",serverIp=~".*",callKind=~"http|rpc|custom_entry|server|consumer|schedule",source="apm",<<.LabelMatchers>>}[1m])) or vector(0)
name:
as: "${1}_per_second_arms_k8s_demo_queryUser"
matches: "^(.*)_count_ign_destid_endpoint_ppid_prpc"
resources:
namespaced: false
seriesQuery: arms_app_requests_count_ign_destid_endpoint_ppid_prpc{service="arms-k8s-demo",rpc="/demo/queryUser/{id}",clusterId="cc13c8725******a9839190b7d1695d7"}
- metricsQuery: sum(sum_over_time_lorc(<<.Series>>{service="arms-k8s-demo",rpc="/demo/queryException/{id}",clusterId="cc13c8725******a9839190b7d1695d7",serverIp=~".*",callKind=~"http|rpc|custom_entry|server|consumer|schedule",source="apm",<<.LabelMatchers>>}[1m])) or vector(0)
name:
as: "${1}_per_second__arms_k8s_demo_queryException"
matches: "^(.*)_count_ign_destid_endpoint_ppid_prpc"
resources:
namespaced: false
seriesQuery: arms_app_requests_count_ign_destid_endpoint_ppid_prpc{service="arms-k8s-demo",rpc="/demo/queryException/{id}",clusterId="cc13c8725******a9839190b7d1695d7"}
- metricsQuery: sum(sum_over_time_lorc(<<.Series>>{service="arms-k8s-demo",rpc="/demo/queryNotExistDB/{id}",clusterId="cc13c8725******a9839190b7d1695d7",serverIp=~".*",callKind=~"http|rpc|custom_entry|server|consumer|schedule",source="apm",<<.LabelMatchers>>}[1m])) or vector(0)
name:
as: "${1}_per_second__arms_k8s_demo_queryNotExistDB"
matches: "^(.*)_count_ign_destid_endpoint_ppid_prpc"
resources:
namespaced: false
seriesQuery: arms_app_requests_count_ign_destid_endpoint_ppid_prpc{service="arms-k8s-demo",rpc="/demo/queryNotExistDB/{id}",clusterId="cc13c8725******a9839190b7d1695d7"}
相关文档
如需使用ARMS其它指标或了解指标详情,请参见应用监控指标说明。
- 本页导读 (1)
- 工作原理
- 前提条件
- 操作流程
- 操作视频
- 步骤一:安装ARMS APM应用监控组件
- 步骤二:授予ARMS资源的访问权限
- 步骤三:为Java应用开启ARMS APM应用监控
- 步骤四:对接alibaba-cloud-metrics-adapter组件
- 步骤五:配置APM指标进行HPA扩缩
- 步骤六:通过压测查看弹性扩缩容效果
- 拓展示例
- 相关文档