当您需要对应用程序的服务水平进行管理和监控时,可以在ASM控制台配置服务等级目标SLO(Service Level Objectives)和相应的告警规则,确保应用程序按照期望的服务水平运行。一旦应用程序的服务水平达到或超过预设的阈值,ASM将根据故障的严重程度,在故障发生时及时发出不同等级的提醒,提高应用程序服务水平管理的效率和响应速度。
前提条件
已添加集群到ASM实例,且ASM实例为1.15.3及以上版本。
步骤一:部署httpbin应用示例
使用以下内容,创建httpbin.yaml。
展开查看httpbin.yaml
################################################################################################## # httpbin service ################################################################################################## apiVersion: v1 kind: ServiceAccount metadata: name: httpbin --- apiVersion: v1 kind: Service metadata: name: httpbin labels: app: httpbin service: httpbin spec: ports: - name: http port: 8000 targetPort: 80 selector: app: httpbin --- apiVersion: apps/v1 kind: Deployment metadata: name: httpbin spec: replicas: 1 selector: matchLabels: app: httpbin version: v1 template: metadata: labels: app: httpbin version: v1 spec: serviceAccountName: httpbin containers: - image: docker.io/kennethreitz/httpbin imagePullPolicy: IfNotPresent name: httpbin ports: - containerPort: 80
使用kubectl连接ACK集群,执行以下命令,在ACK集群中部署httpbin。
关于如何通过kubectl工具连接ACK集群,请参见通过kubectl工具连接集群。
kubectl apply -f httpbin.yaml
步骤二:配置虚拟服务和网关规则
使用以下内容,创建httpbin-gateway.yaml。
展开查看httpbin-gateway.yaml
apiVersion: networking.istio.io/v1alpha3 kind: Gateway metadata: name: httpbin-gateway spec: selector: istio: ingressgateway servers: - port: number: 80 name: http protocol: HTTP hosts: - "*" --- apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: httpbin spec: hosts: - "*" gateways: - httpbin-gateway http: - route: - destination: host: httpbin port: number: 8000
通过kubectl连接ASM实例,执行以下命令,部署虚拟服务和网关规则。
关于如何通过kubectl工具连接ASM实例,请参见通过控制面kubectl访问Istio资源。
kubectl apply -f httpbin-gateway.yaml
在浏览器地址栏,输入
http://{入口网关的IP地址}
。关于如何获取网关IP,请参见获取入口网关地址。如果您可以看到httpbin应用的页面,说明httpbin应用部署成功。
步骤三:定义SLO配置
本文将为default命名空间下的httpbin服务生成服务可用性SLO。其中,目标值为99%,持续时间为30天,配置Page和Ticket两个等级的告警。关于SLO的相关概念说明,请参见服务等级目标SLO概述。
登录ASM控制台,在左侧导航栏,选择服务网格 > 网格管理。
在网格管理页面,单击目标实例名称,然后在左侧导航栏,选择可观测管理中心 > SLO配置。
在SLO配置页面上方,选择命名空间为目标服务所在的命名空间(本文为default),在目标服务httpbin右侧,单击创建。
在创建页面的基本信息区域,持续时间选择30天。
单击SLO规则,配置名称为asm-slo,插件类型选择availability,目标值为99,打开开启告警规则开关,配置告警规则名称为asm-alert,然后打开开启紧急级别的告警规则和开启警告级别的告警规则开关。
可选:在页面下方,单击预览,查看配置信息。确认无误后,单击确认。
关于配置文件的字段说明,请参见SLO CRD字段说明。
配置完成后,在页面下方,单击创建。
步骤四:自动生成Prometheus规则
SLO配置成功后,您可以在SLO配置页面的目标服务httpbin右侧,单击查看Promethe规则,查看生成的结果。
展开查看Promethe规则示例
groups:
- name: asm-slo-sli-recordings-httpbin-asm-slo
rules:
- record: slo:sli_error:ratio_rate5m
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[5m])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[5m])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 5m
- record: slo:sli_error:ratio_rate30m
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[30m])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[30m])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 30m
- record: slo:sli_error:ratio_rate1h
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[1h])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[1h])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 1h
- record: slo:sli_error:ratio_rate2h
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[2h])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[2h])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 2h
- record: slo:sli_error:ratio_rate6h
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[6h])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[6h])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 6h
- record: slo:sli_error:ratio_rate1d
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[1d])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[1d])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 1d
- record: slo:sli_error:ratio_rate3d
expr: "(\n(\n sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\",response_code=~\"(5..|429)\"
}[3d])) \n / \n (sum(rate(istio_requests_total{ destination_service_name=\"httpbin\",destination_service_namespace=\"default\"
}[3d])) > 0)\n) OR on() vector(0)\n)"
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
slo_window: 3d
- record: slo:sli_error:ratio_rate30d
expr: |
sum_over_time(slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}[30d])
/ ignoring (slo_window)
count_over_time(slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}[30d])
labels:
slo_window: 30d
- name: asm-slo-meta-recordings-httpbin-asm-slo
rules:
- record: slo:objective:ratio
expr: vector(0.99)
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: slo:error_budget:ratio
expr: vector(1-0.99)
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: slo:time_period:days
expr: vector(30)
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: slo:current_burn_rate:ratio
expr: |
slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
/ on(slo_id, asm_slo, slo_service) group_left
slo:error_budget:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: slo:period_burn_rate:ratio
expr: |
slo:sli_error:ratio_rate30d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
/ on(slo_id, asm_slo, slo_service) group_left
slo:error_budget:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"}
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: slo:period_error_budget_remaining:ratio
expr: 1 - slo:period_burn_rate:ratio{asm_slo="asm-slo", slo_id="httpbin-asm-slo",
slo_service="httpbin"}
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_service: httpbin
- record: asm_slo_info
expr: vector(1)
labels:
asm_slo: asm-slo
slo_id: httpbin-asm-slo
slo_mode: cli-gen-prom
slo_objective: "99"
slo_service: httpbin
slo_spec: prometheus/v1
slo_version: dev
- name: asm-slo-alerts-httpbin-asm-slo
rules:
- alert: asm-alert
expr: |
(
(slo:sli_error:ratio_rate5m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (14.4 * 0.01))
and ignoring (slo_window)
(slo:sli_error:ratio_rate1h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (14.4 * 0.01))
)
or ignoring (slo_window)
(
(slo:sli_error:ratio_rate30m{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (6 * 0.01))
and ignoring (slo_window)
(slo:sli_error:ratio_rate6h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (6 * 0.01))
)
labels:
slo_severity: page
annotations:
summary: '{{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
rate is over expected.'
title: (page) {{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
rate is too fast.
- alert: asm-alert
expr: |
(
(slo:sli_error:ratio_rate2h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (3 * 0.01))
and ignoring (slo_window)
(slo:sli_error:ratio_rate1d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (3 * 0.01))
)
or ignoring (slo_window)
(
(slo:sli_error:ratio_rate6h{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (1 * 0.01))
and ignoring (slo_window)
(slo:sli_error:ratio_rate3d{asm_slo="asm-slo", slo_id="httpbin-asm-slo", slo_service="httpbin"} > (1 * 0.01))
)
labels:
slo_severity: ticket
annotations:
summary: '{{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget burn
rate is over expected.'
title: (ticket) {{$labels.slo_service}} {{$labels.asm_slo}} SLO error budget
burn rate is too fast.
后续步骤
您可以将生成的Prometheus规则导入Prometheus中执行SLO,并使用Grafana查看SLO相关指标。具体操作,请参见将生成的规则导入Prometheus中执行SLO和使用Grafana查看SLO。