通过ARMS APM应用监控服务实现HPA弹性伸缩

当应用接口的请求访问量飙升时,您可以通过Java应用接口的QPS配置HPA弹性策略,实现应用的弹性扩缩。本文介绍如何通过ARMS APM应用监控服务实现应用的HPA弹性伸缩。

工作原理

将ACK集群中的Java应用接入ARMS APM应用监控服务后,您可以通过ARMS APM获取应用接口的访问详情。关于如何将Java应用接入ARMS APM应用监控服务,请参见应用监控。ARMS APM应用监控服务将ARMS APM数据转换为阿里云Prometheus数据格式,alibaba-cloud-metrics-adapter组件将阿里云Prometheus指标转换成HPA可用的指标,最终实现应用的HPA弹性伸缩。

本文以部署应用arms-springboot-demo,并压测其中接口/demo/queryUser/10为例进行介绍。

前提条件

操作流程

hpa

操作视频

步骤一:安装ARMS APM应用监控组件

为应用接入ARMS APM应用监控功能,需要在集群中安装ARMS APM应用监控组件one-pilot。

  1. 登录容器服务管理控制台,在左侧导航栏选择集群

  2. 集群列表页面,单击目标集群名称,然后在左侧导航栏,选择运维管理 > 组件管理

  3. 组件管理页面,搜索并定位ack-onepilot组件,在组件卡片区域单击安装,然后按照对话框提示配置参数,并单击确认

步骤二:授予ARMS资源的访问权限

  • 如需监控ACK Serverless集群或对接了ECI的集群应用,请在云资源访问授权页面完成授权,然后重启ack-onepilot组件下的所有Pod。

  • 如需监控ACK集群应用,请先查看是否存在ARMS Addon Token。

    展开查看集群中是否存在ARMS Addon Token

    1. 登录容器服务管理控制台,在左侧导航栏选择集群

    2. 集群列表页面,单击目标集群名称,然后在左侧导航栏,选择配置管理 > 保密字典

    3. 在页面顶部选择命名空间kube-system,查看addon.arms.token是否存在。

    • 如果ACK集群存在ARMS Addon Token,此时ARMS会进行免密授权。

      说明

      Kubernetes托管版集群默认存在ARMS Addon Token。但对于部分早期创建的Kubernetes托管版集群可能不存在ARMS Addon Token,请参考下文内容手动为集群授予ARMS资源的访问权限。

    • 如果ACK集群中不存在ARMS Addon Token,请执行以下操作,手动为集群授予ARMS资源的访问权限。

      1. 创建自定义权限策略,策略内容如下。具体操作,请参见步骤一:创建自定义权限策略

            {
              "Action": "arms:*",
              "Resource": "*",
              "Effect": "Allow"
            }
      2. 为集群的WorkerRole添加上一步创建的自定义权限。具体操作,请参见步骤二:为集群的Worker RAM角色授权

步骤三:为Java应用开启ARMS APM应用监控

在集群中部署Java应用时,通过为应用打上Labels的方式开启ARMS APM应用监控。

  1. 登录容器服务管理控制台,在左侧导航栏选择集群

  2. 集群列表页面,单击目标集群名称,然后在左侧导航栏,选择工作负载 > 无状态

  3. 无状态页面右上角,单击使用YAML创建资源

  4. 选择示例模板,并在模板(YAML格式)中将以下labels添加到spec.template.metadata层级下。

    labels:
      armsPilotAutoEnable: "on"
      armsPilotCreateAppName: "<your-deployment-name>" # 请将<your-deployment-name>替换为您的应用名称。
      one-agent.jdk.version: "OpenJDK11"    # 如果应用的JDK版本是JDK 11,则需要配置此参数。
      armsSecAutoEnable: "on"    # 如果需要接入应用安全,则需要配置此参数。
    说明

    YAML Example

    以下提供YAML示例模板,展示如何创建一个无状态(Deployment)应用并开启ARMS APM应用监控。

    展开查看完整YAML示例文件(Java)

    apiVersion: v1
    kind: Namespace
    metadata:
      name: arms-demo
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: arms-springboot-demo
      namespace: arms-demo
      labels:
        app: arms-springboot-demo
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: arms-springboot-demo
      template:
        metadata:
          labels:
            app: arms-springboot-demo
            armsPilotAutoEnable: "on"
            armsPilotCreateAppName: "arms-k8s-demo"
            one-agent.jdk.version: "OpenJDK11"
        spec:
          containers:
            - resources:
                limits:
                  cpu: 0.5
              image: registry.cn-hangzhou.aliyuncs.com/arms-docker-repo/arms-springboot-demo:v0.1
              imagePullPolicy: Always
              name: arms-springboot-demo
              env:
                - name: SELF_INVOKE_SWITCH
                  value: "true"
                - name: COMPONENT_HOST
                  value: "arms-demo-component"
                - name: COMPONENT_PORT
                  value: "6666"
                - name: MYSQL_SERVICE_HOST
                  value: "arms-demo-mysql"
                - name: MYSQL_SERVICE_PORT
                  value: "3306"
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        name: arms-springboot-demo
      name: arms-springboot-demo
      namespace: arms-demo
    spec:
      ports:
        # the port that this service should serve on
        - name: arms-demo-svc
          port: 6666
          targetPort: 8888
      # label keys and values that must match in order to receive traffic for this service
      selector:
        app: arms-springboot-demo
    ---
    apiVersion: apps/v1 # for versions before 1.8.0 use apps/v1beta1
    kind: Deployment
    metadata:
      name: arms-springboot-demo-subcomponent
      namespace: arms-demo
      labels:
        app: arms-springboot-demo-subcomponent
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: arms-springboot-demo-subcomponent
      template:
        metadata:
          labels:
            app: arms-springboot-demo-subcomponent
            armsPilotAutoEnable: "on"
            armsPilotCreateAppName: "arms-k8s-demo-subcomponent"
            one-agent.jdk.version: "OpenJDK11"
        spec:
          containers:
            - resources:
                limits:
                  cpu: 0.5
              image: registry.cn-hangzhou.aliyuncs.com/arms-docker-repo/arms-springboot-demo:v0.1
              imagePullPolicy: Always
              name: arms-springboot-demo-subcomponent
              env:
                - name: SELF_INVOKE_SWITCH
                  value: "false"
                - name: MYSQL_SERVICE_HOST
                  value: "arms-demo-mysql"
                - name: MYSQL_SERVICE_PORT
                  value: "3306"
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        name: arms-demo-component
      name: arms-demo-component
      namespace: arms-demo
    spec:
      ports:
        # the port that this service should serve on
        - name: arms-demo-component-svc
          port: 6666
          targetPort: 8888
      # label keys and values that must match in order to receive traffic for this service
      selector:
        app: arms-springboot-demo-subcomponent
    ---
    apiVersion: apps/v1 # for versions before 1.8.0 use apps/v1beta1
    kind: Deployment
    metadata:
      name: arms-demo-mysql
      namespace: arms-demo
      labels:
        app: mysql
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: mysql
      template:
        metadata:
          labels:
            app: mysql
        spec:
          containers:
            - resources:
                limits:
                  cpu: 0.5
              image: registry.cn-hangzhou.aliyuncs.com/arms-docker-repo/arms-demo-mysql:v0.1
              name: mysql
              ports:
                - containerPort: 3306
                  name: mysql
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        name: mysql
      name: arms-demo-mysql
      namespace: arms-demo
    spec:
      ports:
        # the port that this service should serve on
        - name: arms-mysql-svc
          port: 3306
          targetPort: 3306
      # label keys and values that must match in order to receive traffic for this service
      selector:
        app: mysql
    ---
  5. 查看部署ARMS APM应用效果。

    无状态页面,目标应用的操作列将出现ARMS控制台按钮。ARMS Console Button

    您可以单击ARMS控制台跳转查看监控数据。在左侧导航栏,单击接口调用,查看应用接口(如HTTP接口)的访问详情。此处提供的Demo应用arms-springboot-demo,已自动产生了平稳的接口调用。4

  6. 手动创建关联应用arms-springboot-demo的Service,并开启负载均衡来访问此应用的接口。

    1. 集群列表页面,单击目标集群名称,然后在左侧导航栏,选择网络 > 服务

    2. 单击页面右上角创建,创建关联应用的Service,然后单击创建。关于配置项的说明,请参见创建服务

    3. 稍等片刻,创建完成。在服务页面记录arms-demo-svc的外部端口,例如47.94.XX.XX:8080。

    4. 执行如下命令,通过外部端口访问此服务的/demo/queryUser/10接口。

      curl http://47.94.XX.XX:8080/demo/queryUser/10

      预期输出:

      {"id":1,"name":"KeyOfSpectator","password":"12****"}

      预期输出表明,接口访问正常。

步骤四:对接alibaba-cloud-metrics-adapter组件

重要
  1. 登录ARMS控制台

  2. 在左侧导航栏选择Prometheus监控 > 实例列表,进入可观测监控 Prometheus 版的实例列表页面。

  3. 实例列表页面,单击目标实例名称(格式为arms_metrics_{RegionId}_XXX),在左侧导航栏单击设置,然后在右侧设置页签的最下方查看并记录HTTP API地址(Grafana 读取地址),即Prometheus URL。

    5

  4. 在ack-alibaba-cloud-metrics-adapter中填入上一步中记录的HTTP API地址(Grafana 读取地址)(Prometheus URL)。

    1. 登录容器服务管理控制台,在左侧导航栏选择集群

    2. 集群列表页面,单击目标集群名称,然后在左侧导航栏,选择应用 > Helm

    3. Helm页面ack-alibaba-cloud-metrics-adapter所在行,单击操作列的更新

    4. 更新发布面板插入步骤2中记录的Prometheus URL。

      8

  5. 修改ack-alibaba-cloud-metrics-adapter的adapter-config配置。

    1. Helm页面,单击ack-alibaba-cloud-metrics-adapter

    2. 基本信息页签,单击adapter-config

    3. 单击页面右上方YAML 编辑

    4. 将如下内容添加至adapter-config中。

      rules:
      - metricsQuery: sum by (rpc) (sum_over_time(<<.Series>>{rpc="/demo/queryUser/{id}",service="arms-demo:arms-k8s-demo",prpc="__all__",ppid="__all__",endpoint="__all__",destId="__all__",<<.LabelMatchers>>}[1m]))
        name:
          as: ${1}_per_second_queryuser
          matches: ^(.*)_count
        resources:
          namespaced: false
        seriesQuery: arms_app_requests_count

      完整示例如下:

      展开查看代码详情

      apiVersion: v1
      data:
        config.yaml: >
          rules:
      
          - metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)
            name:
              as: ${1}_bytes_per_second
              matches: ^(.*)_bytes
            resources:
              overrides:
                namespace:
                  resource: namespace
                pod:
                  resource: pod
            seriesQuery: container_memory_working_set_bytes{namespace!="",pod!=""}
          - metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by
          (<<.GroupBy>>)
            name:
              as: ${1}_core_per_second
              matches: ^(.*)_seconds_total
            resources:
              overrides:
                namespace:
                  resource: namespace
                pod:
                  resource: pod
            seriesQuery: container_cpu_usage_seconds_total{namespace!="",pod!=""}
          - metricsQuery: sum by (rpc)
          (sum_over_time(<<.Series>>{rpc="/demo/queryUser/{id}",service="arms-demo:arms-k8s-demo",prpc="__all__",ppid="__all__",endpoint="__all__",destId="__all__",<<.LabelMatchers>>}[1m]))
            name:
              as: ${1}_per_second_queryuser
              matches: ^(.*)_count
            resources:
              namespaced: false
            seriesQuery: arms_app_requests_count
      kind: ConfigMap
      metadata:
        annotations:
          meta.helm.sh/release-name: ack-alibaba-cloud-metrics-adapter
          meta.helm.sh/release-namespace: kube-system
        creationTimestamp: '2024-04-02T02:29:32Z'
        labels:
          app.kubernetes.io/managed-by: Helm
        managedFields:
          - apiVersion: v1
            fieldsType: FieldsV1
            fieldsV1:
              'f:data':
                .: {}
                'f:config.yaml': {}
              'f:metadata':
                'f:annotations':
                  .: {}
                  'f:meta.helm.sh/release-name': {}
                  'f:meta.helm.sh/release-namespace': {}
                'f:labels':
                  .: {}
                  'f:app.kubernetes.io/managed-by': {}
            manager: rc
            operation: Update
            time: '2024-04-02T02:40:52Z'
        name: adapter-config
        namespace: kube-system
        resourceVersion: '8223891'
        uid: 294634e6-aeae-4048-9e69-365a4ce4b2cd
      
  6. 执行如下命令,查看集群中指标数据。

    1. 执行如下命令,查看指标arms_app_requests_per_second_queryuser是否存在。

      kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1"

      预期输出:

      {"kind":"APIResourceList","apiVersion":"v1","groupVersion":"external.metrics.k8s.io/v1beta1","resources":[{"name":"k8s_workload_memory_working_set","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_rss","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_p9999","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_memory_usage","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_traffic_rx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_inflow","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_upstream_rt","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_ratio","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_traffic_tx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_packet_rx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"ahas_sentinel_pass_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_memory_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_avg","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_max_connection","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_cpu_limit","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_day","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_month","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_connection_utilization","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_status_5xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_cpu_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_network_rx_rate","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_network_rx_errors","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_week","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_cache","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_cpu_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_percorepricing","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_packet_tx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_status_4xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_upstream_5xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_cpu_limit","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"ahas_sentinel_block_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_cpu_utilization","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_alb_ingress_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_memory_utilization","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_p95","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l4_active_connection","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_limit","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_hour","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_status_2xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_status_3xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_cpu_util","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_usage","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_network_tx_rate","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"ahas_sentinel_total_qps","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_cpu_usage","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_network_tx_errors","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_rt","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_min","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_p50","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"sls_ingress_latency_p99","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"slb_l7_upstream_4xx","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"k8s_workload_memory_request","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"ahas_sentinel_avg_rt","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"cost_memory_limit","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]},{"name":"arms_app_requests_per_second_queryuser","singularName":"","namespaced":true,"kind":"ExternalMetricValueList","verbs":["get"]}]}

      预期输出表明,指标arms_app_requests_per_second_queryuser存在。

    2. 执行如下命令,查看指标实时数据。

      kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/arms-demo/arms_app_requests_per_second_queryuser"| jq .

      预期输出:

      {
        "kind": "ExternalMetricValueList",
        "apiVersion": "external.metrics.k8s.io/v1beta1",
        "metadata": {},
        "items": [
          {
            "metricName": "arms_app_requests_per_second_queryuser",
            "metricLabels": {
              "rpc": "/demo/queryUser/10"
            },
            "timestamp": "2022-11-09T07:49:07Z",
            "value": "6"
          }
        ]
      }

      预期输出表明,实时数据返回正常。

步骤五:配置APM指标进行HPA扩缩

  1. 使用如下内容,创建hpa.yaml

    说明
    • hpa.yaml中的配置指标名与上一步ack-alibaba-cloud-metrics-adapter中定义的指标名需保持一致。

    • hpa.yaml中的target为弹性阈值,当QPS > 40时进行扩容。

    apiVersion: autoscaling/v2beta2
    kind: HorizontalPodAutoscaler
    metadata:
      name: test-hpa
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: arms-springboot-demo
      minReplicas: 1
      maxReplicas: 10
      metrics:
        - type: External
          external:
            metric:
              name: arms_app_requests_per_second_queryuser
    # External指标类型下只支持Value和AverageValue类型的目标值。
            target:
              type: AverageValue
              averageValue: 40
  2. 执行如下命令,对业务应用arms-springboot-demo部署HPA。

    kubectl apply -f hpa.yaml
  3. 执行如下命令,查看指标变化。

    kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/arms-demo/arms_app_requests_per_second_queryuser"| jq .
                            

    预期输出:

    {
      "kind": "ExternalMetricValueList",
      "apiVersion": "external.metrics.k8s.io/v1beta1",
      "metadata": {},
      "items": [
        {
          "metricName": "arms_app_requests_per_second_queryuser",
          "metricLabels": {
            "rpc": "/demo/queryUser/10"
          },
          "timestamp": "2022-11-09T07:53:16Z",
          "value": "4216"
        }
      ]
    }
  4. 执行如下命令,查看HPA详情。

    kubectl get hpa -n arms-demo

    预期输出:

    NAME       REFERENCE                         TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
    test-hpa   Deployment/arms-springboot-demo   300m/40 (avg)   1         10        10         148m

    预期输出表明,Targets存在数据,HPA配置成功。

通过压测查看弹性扩缩容效果

  1. 执行如下命令,对Demo应用进行压测实验。

    ab -c 50 -n 2000 http://47.94.XX.XX:8080/demo/queryUser/10
    说明

    47.94.XX.XX:8080为服务arms-demo-svc的外部端口。

  2. 查看弹性扩缩容效果。

    • 可以在ARMS APM控制台看到,此接口的请求量因压测飙升。amp控制台

    • 可以在Prometheus大盘看到,当应用接口的QPS值超过阈值时,达到了HPA扩缩的效果。p大盘

    • 在ACK集群中可以看到此demo应用的Pod副本数随接口调用的QPS进行扩缩。

      您可以通过执行命令kubectl describe hpa test-hpa -n arms-demo查看发生的扩缩容事件。