使用Prometheus采集Gateway with Inference Extension数据面监控指标-容器服务 Kubernetes 版 ACK-阿里云

Gateway with Inference Extension组件支持将数据面指标输出至 Prometheus。本文介绍如何使用可观测监控 Prometheus 版（Managed Service for Prometheus）监控组件数据面的运行状况。

前提条件

已安装1.4.0版本的Gateway with Inference Extension并勾选启用Gateway API推理扩展。操作入口，请参见安装组件。
已开通阿里云Prometheus监控。

指标采集方式

针对生成式AI推理服务，Gateway with Inference Extension的推理扩展提供了更全面的监控指标，包括首Token延迟（TTFT）、Token吞吐速率等信息，指标格式满足OpenTelemetry的生成式AI语义约定。

手动配置采集规则

说明

手动配置采集规则无需开启默认服务发现。

登录Prometheus控制台，在左侧导航栏中单击接入中心。
在搜索框中搜索“gateway”，单击人工智能下的Gateway with Inference Extension。
在右侧弹出框中的选择容器服务集群的下拉框中选择目标集群，单击确定。
弹出框中的配置保持默认即可。

您可以结合快速体验中的mock应用来快速体验。

自定义采集

手动配置采集规则默认会采集组件数据面的全部指标。您也可以通过新增自定义采集来自定义Gateway with Inference Extension组件相关的监控指标。以下为常用指标自定义配置示例。

scrape_configs:
  - job_name: 'ack-gateway'
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
            - envoy-gateway-system
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_managed_by]
        regex: envoy-gateway
        action: keep
    scrape_interval: 15s
    metrics_path: /stats/prometheus
    scheme: http
    metric_relabel_configs:
      - source_labels: [__name__]
        regex: |
          (envoy_server_live|envoy_server_uptime|envoy_server_memory_allocated|envoy_server_memory_heap_size|
          envoy_cluster_membership_healthy|envoy_cluster_membership_total|envoy_cluster_upstream_cx_active|
          envoy_cluster_upstream_rq_total|envoy_cluster_upstream_cx_rx_bytes_total|envoy_cluster_upstream_cx_tx_bytes_total|
          envoy_http_downstream_cx_rx_bytes_total|envoy_http_downstream_cx_tx_bytes_total|envoy_cluster_upstream_rq_time_bucket|
          envoy_cluster_upstream_rq_xx|envoy_http_downstream_rq_total|envoy_http_downstream_cx_total|envoy_http_downstream_rq_time_bucket|
          envoy_listener_downstream_cx_active|envoy_tcp_downstream_cx_total|envoy_tcp_downstream_cx_rx_bytes_total|
          envoy_tcp_downstream_cx_tx_bytes_total|envoy_cluster_upstream_cx_total)
        action: keep

指标大盘

Gateway with Inference Extension同时也提供了对应的Grafana大盘，您可以通过集群的运维管理 > Prometheus 监控 > 其他查看组件对应的大盘。

ACK Gateway GenAI：用于观测当前集群中生成式AI推理服务的各项指标。
Envoy Global：提供了网关整体情况监控，主要包括网关资源使用情况、上下游连接概览以及端点健康状况等指标。
Envoy Clusters：Envoy Cluster级别的大盘。Cluster在Envoy中代表一组端点的集合。在Gateway with Inference Extension中，一个Cluster通常代表一个路由目标，比如HTTPRoute的第1条规则的第1个目标Service。这个大盘能提供更详细的Cluster级别信息。