本文介绍如何使用ASM定义应用服务级SLO。
前提条件
- 已添加集群到ASM实例,且ASM实例为1.15.3及以上版本。
- 已部署入口网关服务。
- 已开启自动注入。具体操作,请参见多种方式灵活开启自动注入。
背景信息
SLO(Service Level Objectives)是指服务等级的目标值或范围值,由一个或多个服务等级指标SLI(Service Level Indicator)组成。您可以基于Prometheus指标手动定义SLO,但过程相对繁琐。ASM提供了生成SLO以及配套的告警规则的能力。通过使用ASM控制台UI界面,简化您的配置流程。更多信息,请参见服务等级目标SLO概述和SLO CRD字段说明。
准备工作
在ACK集群中部署httpbin应用,并配置相应的虚拟服务与网关规则。
在ACK集群中部署httpbin应用
- 使用以下内容,创建httpbin.yaml。
################################################################################################## # httpbin service ################################################################################################## apiVersion: v1 kind: ServiceAccount metadata: name: httpbin --- apiVersion: v1 kind: Service metadata: name: httpbin labels: app: httpbin service: httpbin spec: ports: - name: http port: 8000 targetPort: 80 selector: app: httpbin --- apiVersion: apps/v1 kind: Deployment metadata: name: httpbin spec: replicas: 1 selector: matchLabels: app: httpbin version: v1 template: metadata: labels: app: httpbin version: v1 spec: serviceAccountName: httpbin containers: - image: docker.io/kennethreitz/httpbin imagePullPolicy: IfNotPresent name: httpbin ports: - containerPort: 80
- 通过kubectl工具连接ACK集群,管理集群和应用。具体操作,请参见通过kubectl工具连接集群。
- 执行以下命令,在ACK集群中部署httpbin。
kubectl apply -f httpbin.yaml
在ASM中配置虚拟服务和网关规则
- 使用以下内容,创建httpbin-gateway.yaml。
apiVersion: networking.istio.io/v1alpha3 kind: Gateway metadata: name: httpbin-gateway spec: selector: istio: ingressgateway servers: - port: number: 80 name: http protocol: HTTP hosts: - "*" --- apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: httpbin spec: hosts: - "*" gateways: - httpbin-gateway http: - route: - destination: host: httpbin port: number: 8000
- 通过kubectl工具连接ASM实例,对ASM实例进行管理。具体操作,请参见通过kubectl连接ASM实例。
- 执行以下命令,部署虚拟服务和网关规则。
kubectl apply -f httpbin-gateway.yaml
- 在浏览器地址栏,输入http://{入口网关服务的IP地址}。可以看到httpbin应用的页面,说明httpbin应用部署成功。
定义SLO配置
本文将为default命名空间下的httpbin服务生成服务可用性SLO。其中,目标值为99%,持续时间为30天,配置Page和Ticket两个等级的告警。
- 登录ASM控制台,在左侧导航栏,选择 。
- 在网格管理页面,单击目标实例名称,然后在左侧导航栏,选择 。
- 在SLO配置页面上方,选择命名空间为目标服务所在的命名空间(本文为default),在目标服务httpbin右侧,单击创建。
- 在创建页面的基本信息区域,持续时间选择30天。
- 单击SLO规则,配置名称为asm-slo,目标值为99,插件类型选择选择availability,打开开启告警规则开关,配置告警规则名称为asm-alert,然后打开开启紧急级别的告警规则和开启警告级别的告警规则开关。
- 可选:在页面下方,单击预览,查看配置信息。确认无误后,单击确认。关于配置文件的字段说明,请参见SLO CRD字段说明。
- 配置完成后,在页面下方,单击创建。
自动生成Prometheus规则
SLO配置成功后,您可以在SLO配置页面的目标服务httpbin右侧,单击操作列下的查看Promethe规则,查看生成的结果。

展开查看Promethe规则示例。
groups:
- name: asm-slo-sli-recordings-httpbin-asm-slo
rules:
- record: slo:sli_error:ratio_rate5m
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[5m])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[5m])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 5m
- record: slo:sli_error:ratio_rate30m
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[30m])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[30m])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 30m
- record: slo:sli_error:ratio_rate1h
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[1h])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[1h])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 1h
- record: slo:sli_error:ratio_rate2h
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[2h])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[2h])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 2h
- record: slo:sli_error:ratio_rate6h
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[6h])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[6h])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 6h
- record: slo:sli_error:ratio_rate1d
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[1d])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[1d])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 1d
- record: slo:sli_error:ratio_rate3d
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[3d])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[3d])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 3d
- record: slo:sli_error:ratio_rate30d
expr: |
sum_over_time(slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}[30d])
/ ignoring (slo_window)
count_over_time(slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}[30d])
labels:
slo_window: 30d
- name: asm-slo-meta-recordings-httpbin-asm-slo
rules:
- record: slo:objective:ratio
expr: vector(0.99)
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: slo:error_budget:ratio
expr: vector(1-0.99)
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: slo:time_period:days
expr: vector(30)
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: slo:current_burn_rate:ratio
expr: |
slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
/ on(slo_id, asm_slo, slo_service) group_left
slo:error_budget:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: slo:period_burn_rate:ratio
expr: |
slo:sli_error:ratio_rate30d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
/ on(slo_id, asm_slo, slo_service) group_left
slo:error_budget:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: slo:period_error_budget_remaining:ratio
expr: 1 - slo:period_burn_rate:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo",
slo_service="httpbin"}
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: asm_slo_info
expr: vector(1)
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_mode: cli-gen-prom
slo_objective: "99"
slo_service: httpbin
slo_spec: prometheus/v1
slo_version: dev
- name: asm-slo-alerts-httpbin-asm-slo
rules:
- alert: asm-alert
expr: |
(
(slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (14.4 * 0.01))
and ignoring (slo_window)
(slo:sli_error:ratio_rate1h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (14.4 * 0.01))
)
or ignoring (slo_window)
(
(slo:sli_error:ratio_rate30m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (6 * 0.01))
and ignoring (slo_window)
(slo:sli_error:ratio_rate6h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (6 * 0.01))
)
labels:
slo_severity: page
annotations:
summary: '{{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
rate is over expected.'
title: (page) {{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
rate is too fast.
- alert: asm-alert
expr: |
(
(slo:sli_error:ratio_rate2h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (3 * 0.01))
and ignoring (slo_window)
(slo:sli_error:ratio_rate1d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (3 * 0.01))
)
or ignoring (slo_window)
(
(slo:sli_error:ratio_rate6h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (1 * 0.01))
and ignoring (slo_window)
(slo:sli_error:ratio_rate3d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (1 * 0.01))
)
labels:
slo_severity: ticket
annotations:
summary: '{{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
rate is over expected.'
title: (ticket) {{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget
burn rate is too fast.
您可以将生成的Prometheus规则导入Prometheus中执行SLO。具体操作,请参见将生成的规则导入Prometheus中执行SLO。