服务网格 ASM(Service Mesh)允许您为访问外部服务的流量配置熔断规则。当ASM检测到外部服务出现故障,如连续返回错误或响应缓慢时,熔断器会在一段时间内直接阻止后续的出站请求,并立即返回一个预设的错误响应。这个过程由Sidecar代理自动完成,无需修改任何应用代码。本文将以访问 httpbin.org
为例来介绍如何为访问外部服务的应用配置熔断规则。
前提条件
操作步骤
步骤一:部署用于测试的客户端应用
部署sleep应用。
kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: sleep
---
apiVersion: v1
kind: Service
metadata:
name: sleep
labels:
app: sleep
service: sleep
spec:
ports:
- port: 80
name: http
selector:
app: sleep
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: sleep
spec:
replicas: 1
selector:
matchLabels:
app: sleep
template:
metadata:
labels:
app: sleep
spec:
terminationGracePeriodSeconds: 0
serviceAccountName: sleep
containers:
- name: sleep
image: registry.cn-hangzhou.aliyuncs.com/acs/curl:8.1.2
command: ["/bin/sleep", "infinity"]
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /etc/sleep/tls
name: secret-volume
volumes:
- name: secret-volume
secret:
secretName: sleep-secret
optional: true
EOF
步骤二:验证无熔断时的访问行为
访问外部服务httpbin.org的
/status/503
,此地址会稳定地返回HTTP 503错误。kubectl exec -it deploy/sleep -- sh -c \ "for i in \$(seq 1 100); do \ curl -s -o /dev/null -w 'Time: %{time_total}s, Status: %{http_code}\n' \ httpbin.org/status/503; \ sleep 0.2; \ done"
预期输出:
Time: 0.524067s, Status: 503 Time: 0.947159s, Status: 503 Time: 0.459089s, Status: 503 Time: 0.264017s, Status: 503 Time: 0.661447s, Status: 503 Time: 0.484715s, Status: 503 Time: 0.952842s, Status: 503 ...
可以看到,所有请求正常返回了503状态码。
访问外部服务httpbin.org的
/delay
,并配置延迟时间为2s,此地址会使响应稳定地超过2s后才会响应。kubectl exec -it deploy/sleep -- sh -c \ "for i in \$(seq 1 100); do \ curl -s -o /dev/null -w 'Time: %{time_total}s, Status: %{http_code}\n' \ httpbin.org/delay/2; \ sleep 0.2; \ done"
预期输出:
Time: 2.467788s, Status: 200 Time: 4.651051s, Status: 200 Time: 3.222184s, Status: 200 Time: 2.592945s, Status: 200 Time: 2.473543s, Status: 200 Time: 2.464152s, Status: 200 ...
可以看到,所有的请求的响应时间都在2s以上。
上述两个示例表明,在没有熔断规则的情况下,即使外部服务持续返回错误或响应缓慢,客户端的Sidecar代理也会将每一个请求都转发出去,这将持续消耗网络和计算资源。
步骤三:部署熔断规则并重新验证
在本步骤中将为sleep应用配置熔断规则。对比上一步中请求了100次httpbin.org均正常返回的情况,此规则将配置sleep应用访问httpbin应用在10秒内有60%以上的请求发生错误,或是超过10个慢请求时,对客户端进行熔断,返回499状态码。
部署ServiceEntry和熔断规则。
kubectl apply -f - <<EOF apiVersion: networking.istio.io/v1 kind: ServiceEntry metadata: name: httpbin-org spec: exportTo: - '*' hosts: - httpbin.org location: MESH_EXTERNAL ports: - name: http number: 80 protocol: HTTP resolution: DNS --- apiVersion: istio.alibabacloud.com/v1 kind: ASMCircuitBreaker metadata: name: httpbin-org-breaker spec: workloadSelector: labels: app: sleep applyToTraffic: sidecar_outbound configs: - target_services: - kind: ServiceEntry name: httpbin-org port: 80 breaker_config: slow_request_rt: 1s break_duration: 90s window_size: 10s max_slow_requests: 10 min_request_amount: 3 error_percent: value: 60 custom_response: header_to_add: x-envoy-circuitbreak: "true" body: "hello, break!" status_code: 499 EOF
再次验证
503
请求。kubectl exec -it deploy/sleep -- sh -c \ "for i in \$(seq 1 100); do \ curl -s -o /dev/null -w 'Time: %{time_total}s, Status: %{http_code}\n' \ httpbin.org/status/503; \ sleep 0.2; \ done"
预期输出:
Time: 1.033321s, Status: 503 Time: 0.492785s, Status: 503 Time: 0.786655s, Status: 503 Time: 0.009272s, Status: 499 Time: 0.009629s, Status: 499 Time: 0.010111s, Status: 499 ...
可以看到,从第四次请求开始返回码变为了
499
。再次验证
delay
请求。kubectl exec -it deploy/sleep -- sh -c \ "for i in \$(seq 1 100); do \ curl -s -o /dev/null -w 'Time: %{time_total}s, Status: %{http_code}\n' \ httpbin.org/delay/2; \ sleep 0.2; \ done"
预期输出:
Time: 3.293483s, Status: 200 Time: 2.851457s, Status: 200 Time: 2.772694s, Status: 200 Time: 4.012661s, Status: 200 Time: 2.505847s, Status: 200 Time: 4.203690s, Status: 200 Time: 4.063237s, Status: 200 Time: 2.514796s, Status: 200 Time: 2.783456s, Status: 200 Time: 2.303864s, Status: 200 Time: 0.009872s, Status: 499 Time: 0.009720s, Status: 499 Time: 0.009374s, Status: 499 ...
可以看到,在十次慢请求后,返回码开始变为
499
。
以上两个请求表明了当前的熔断规则已经生效,这有效地保护了应用,避免了不必要的等待和资源消耗。
使用建议
合理设置阈值:避免误触发,建议根据历史 P99 延迟和错误率设置
slow_request_rt
和error_percent
。结合监控:使用 Prometheus 监控判断熔断状态。
灰度发布:先在测试环境验证,再上线生产。
熔断是流量治理的重要一环,建议与超时、重试、限流等策略结合使用,构建完整的弹性防护体系。