使用Gateway with Inference Extension配置全局限流-容器计算服务-阿里云

Gateway with Inference Extension组件支持为集群开启全局限流，确保系统在高并发或异常流量下保持稳定运行。本文介绍如何基于Gateway with Inference Extension组件配置全局限流以及支持的限流场景。

功能说明

限流是一种限制发送到服务端的请求数量的机制。它指定客户端在给定时间段内可以向服务端发送的最大请求数，通常表示为一段时间内的请求数，例如每分钟300个请求或每秒10个请求等。Gateway with Inference Extension组件在开启全局限流之后，会自动部署一个全局限流服务。该全局限流服务负责集中管理并动态提供全局的限流策略与实时流量数据。Gateway with Inference Extension通过内置的限流过滤器（如Rate Limit Filter）与全局限流服务进行交互，实时获取预设的限流阈值（例如每秒请求数或并发连接数），并基于这些策略对传入的请求进行速率限制。

前提条件

已安装Gateway with Inference Extension，且版本不低于1.4.0。
已完成准备工作中的步骤。

操作步骤

步骤一：开启全局限流

全局限流自动部署的限流服务依赖一个Redis服务作为全局存储，本文采用自建Redis服务的方式。您也可以使用云数据库 Tair（兼容 Redis）（Tair (Redis OSS-compatible)）来快速创建Redis实例，并将相关配置信息更新到envoy-gateway-system命名空间下的ack-gateway-config配置项中。相关配置说明，请参见Envoy Gateway。

创建redis-service.yaml。

kind: Namespace
apiVersion: v1
metadata:
  name: redis-system
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis
  namespace: redis-system
  labels:
    app: redis
spec:
  serviceName: "redis"
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
        - image: registry-cn-hangzhou.ack.aliyuncs.com/dev/redis:6.0.6-for-ack-gateway
          name: redis
          ports:
            - containerPort: 6379
          resources:
            limits:
              cpu: 1500m
              memory: 512Mi
            requests:
              cpu: 200m
              memory: 256Mi
---
apiVersion: v1
kind: Service
metadata:
  name: redis
  namespace: redis-system
  labels:
    app: redis
spec:
  ports:
    - name: redis
      port: 6379
      protocol: TCP
      targetPort: 6379
  selector:
    app: redis

创建enable-global-rate-limit.yaml。

apiVersion: v1
kind: ConfigMap
metadata:
  name: ack-gateway-config
  namespace: envoy-gateway-system
data:
  ack-gateway.yaml: |
    apiVersion: gateway.envoyproxy.io/v1alpha1
    kind: EnvoyGateway
    rateLimit:
      backend:
        type: Redis
        redis:
          url: redis.redis-system.svc.cluster.local:6379

部署Redis服务并开启全局限流。

kubectl apply -f redis-service.yaml
kubectl apply -f enable-global-rate-limit.yaml

步骤二：部署HTTPRoute资源

为后续测试创建HTTPRoute资源，后续的限流规则将会应用到此资源上。

创建httproute.yaml。

---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: http-ratelimit
spec:
  parentRefs:
  - name: eg
  hostnames:
  - ratelimit.example 
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
    - group: ""
      kind: Service
      name: backend
      port: 3000

部署HTTPRoute资源。
```
kubectl apply -f httproute.yaml
```

获取Gateway的公网IP。

export GATEWAY_HOST=$(kubectl get gateway/eg -o jsonpath='{.status.addresses[0].value}')

步骤三：场景演示

对指定用户进行限流

配置全局限流规则，限制请求头x-user-id值为one的请求每小时只能有3次访问。

创建backendtrafficpolicy.yaml。

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy 
metadata:
  name: policy-httproute
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: http-ratelimit
  rateLimit:
    type: Global
    global:
      rules:
      - clientSelectors:
        - headers:
          - name: x-user-id
            value: one
        limit:
          requests: 3
          unit: Hour

部署限流规则。

kubectl apply -f backendtrafficpolicy.yaml

测试带有x-user-id: one请求头的请求限流情况。

for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" --header "x-user-id: one" http://$GATEWAY_HOST/get ; sleep 1; done

预期输出：

HTTP/1.1 200 OK 
content-type: application/json                                                                                                                                             
x-content-type-options: nosniff                                                                                                                                            
date: Tue, 27 May 2025 07:47:49 GMT                                                                                                                                        
content-length: 504                                                                                                                                                        
x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
x-ratelimit-remaining: 2                                                                                                                                                   
x-ratelimit-reset: 731                                                                                                                                                     
 
HTTP/1.1 200 OK 
content-type: application/json                                                                                                                                             
x-content-type-options: nosniff                                                                                                                                            
date: Tue, 27 May 2025 07:47:50 GMT                                                                                                                                        
content-length: 504                                                                                                                                                        
x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
x-ratelimit-remaining: 1                                                                                                                                                   
x-ratelimit-reset: 730                                                                                                                                                     
 
HTTP/1.1 200 OK 
content-type: application/json                                                                                                                                             
x-content-type-options: nosniff                                                                                                                                            
date: Tue, 27 May 2025 07:47:52 GMT                                                                                                                                        
content-length: 504                                                                                                                                                        
x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
x-ratelimit-remaining: 0                                                                                                                                                   
x-ratelimit-reset: 728                                                                                                                                                     
 
HTTP/1.1 429 Too Many Requests 
x-envoy-ratelimited: true                                                                                                                                                  
x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
x-ratelimit-remaining: 0                                                                                                                                                   
x-ratelimit-reset: 727                                                                                                                                                     
date: Tue, 27 May 2025 07:47:52 GMT                                                                                                                                        
transfer-encoding: chunked

可以看到，前3次请求返回了200，第4次请求返回429，说明限流规则限制了带有x-user-id: one请求头的请求。

测试带有x-user-id: two请求头的请求限流情况。

for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" --header "x-user-id: two" http://$GATEWAY_HOST/get ; sleep 1; done

预期输出：

HTTP/1.1 200 OK
content-type: application/json
x-content-type-options: nosniff
date: Tue, 27 May 2025 07:50:11 GMT
content-length: 504

HTTP/1.1 200 OK
content-type: application/json
x-content-type-options: nosniff
date: Tue, 27 May 2025 07:50:12 GMT
content-length: 504

HTTP/1.1 200 OK
content-type: application/json
x-content-type-options: nosniff
date: Tue, 27 May 2025 07:50:14 GMT
content-length: 504

HTTP/1.1 200 OK
content-type: application/json
x-content-type-options: nosniff
date: Tue, 27 May 2025 07:50:15 GMT
content-length: 504

可以看到，4次请求返回都是200，说明限流规则没有对带有x-user-id: two请求头的请求进行限制。

对除管理员之外的其他用户进行分别限流

更新全局限流规则，对请求头x-user-id值为admin的请求不限流，对其他请求头值的请求限制每小时只能有3次访问。

编辑限流规则。

kubectl edit BackendTrafficPolicy policy-httproute

使用以下内容更新限流规则。

...
  rateLimit:
    type: Global
    global:
      rules:
      - clientSelectors:
        - headers:
          - type: Distinct
            name: x-user-id
          - name: x-user-id
            value: admin
            invert: true
        limit:
          requests: 3
          unit: Hour

保存并退出后，限流规则即时生效。

测试带有x-user-id: one请求头的请求限流情况。

for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" --header "x-user-id: one" http://$GATEWAY_HOST/get ; sleep 1; done

预期输出：

HTTP/1.1 200 OK 
content-type: application/json                                                                                                                                             
x-content-type-options: nosniff                                                                                                                                            
date: Tue, 27 May 2025 07:47:49 GMT                                                                                                                                        
content-length: 504                                                                                                                                                        
x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
x-ratelimit-remaining: 2                                                                                                                                                   
x-ratelimit-reset: 731                                                                                                                                                     
 
HTTP/1.1 200 OK 
content-type: application/json                                                                                                                                             
x-content-type-options: nosniff                                                                                                                                            
date: Tue, 27 May 2025 07:47:50 GMT                                                                                                                                        
content-length: 504                                                                                                                                                        
x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
x-ratelimit-remaining: 1                                                                                                                                                   
x-ratelimit-reset: 730                                                                                                                                                     
 
HTTP/1.1 200 OK 
content-type: application/json                                                                                                                                             
x-content-type-options: nosniff                                                                                                                                            
date: Tue, 27 May 2025 07:47:52 GMT                                                                                                                                        
content-length: 504                                                                                                                                                        
x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
x-ratelimit-remaining: 0                                                                                                                                                   
x-ratelimit-reset: 728                                                                                                                                                     
 
HTTP/1.1 429 Too Many Requests 
x-envoy-ratelimited: true                                                                                                                                                  
x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
x-ratelimit-remaining: 0                                                                                                                                                   
x-ratelimit-reset: 727                                                                                                                                                     
date: Tue, 27 May 2025 07:47:52 GMT                                                                                                                                        
transfer-encoding: chunked

可以看到，前3次请求返回了200，第4次请求返回429，说明限流规则限制了带有x-user-id: one请求头的请求。

测试带有x-user-id: two请求头的请求限流情况。

for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" --header "x-user-id: two" http://$GATEWAY_HOST/get ; sleep 1; done

预期输出：

HTTP/1.1 200 OK
content-type: application/json
x-content-type-options: nosniff
date: Tue, 27 May 2025 07:53:38 GMT
content-length: 504
x-ratelimit-limit: 3, 3;w=3600
x-ratelimit-remaining: 2
x-ratelimit-reset: 382

HTTP/1.1 200 OK
content-type: application/json
x-content-type-options: nosniff
date: Tue, 27 May 2025 07:53:39 GMT
content-length: 504
x-ratelimit-limit: 3, 3;w=3600
x-ratelimit-remaining: 1
x-ratelimit-reset: 381

HTTP/1.1 200 OK
content-type: application/json
x-content-type-options: nosniff
date: Tue, 27 May 2025 07:53:41 GMT
content-length: 504
x-ratelimit-limit: 3, 3;w=3600
x-ratelimit-remaining: 0
x-ratelimit-reset: 379

HTTP/1.1 429 Too Many Requests
x-envoy-ratelimited: true
x-ratelimit-limit: 3, 3;w=3600
x-ratelimit-remaining: 0
x-ratelimit-reset: 378
date: Tue, 27 May 2025 07:53:41 GMT
transfer-encoding: chunked

可以看到，前3次请求返回了200，第4次请求返回429，说明限流规则限制了带有x-user-id: two请求头的请求。

测试带有x-user-id: admin请求头的请求限流情况。

for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" --header "x-user-id: admin" http://$GATEWAY_HOST/get ; sleep 1; done

预期输出：

HTTP/1.1 200 OK
content-type: application/json
x-content-type-options: nosniff
date: Tue, 27 May 2025 07:57:44 GMT
content-length: 506

HTTP/1.1 200 OK
content-type: application/json
x-content-type-options: nosniff
date: Tue, 27 May 2025 07:57:45 GMT
content-length: 506

HTTP/1.1 200 OK
content-type: application/json
x-content-type-options: nosniff
date: Tue, 27 May 2025 07:57:46 GMT
content-length: 506

HTTP/1.1 200 OK
content-type: application/json
x-content-type-options: nosniff
date: Tue, 27 May 2025 07:57:47 GMT
content-length: 506

可以看到，4次请求返回都是200，说明限流规则没有对带有x-user-id: admin请求头的请求进行限制。

限制所有请求

更新全局限流规则，使所有请求每小时只能有3次访问。

编辑限流规则。

kubectl edit BackendTrafficPolicy policy-httproute

使用以下内容更新限流规则。

...
  rateLimit:
    type: Global
    global:
      rules:
      - limit:
          requests: 3
          unit: Hour

保存并退出后，限流规则即时生效。

测试普通请求限流情况。

for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" http://$GATEWAY_HOST/get ; sleep 1; done

预期输出：

HTTP/1.1 200 OK 
content-type: application/json                                                                                                                                             
x-content-type-options: nosniff                                                                                                                                            
date: Tue, 27 May 2025 08:02:53 GMT                                                                                                                                        
content-length: 473                                                                                                                                                        
x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
x-ratelimit-remaining: 2                                                                                                                                                   
x-ratelimit-reset: 3427                                                                                                                                                    
 
HTTP/1.1 200 OK 
content-type: application/json                                                                                                                                             
x-content-type-options: nosniff                                                                                                                                            
date: Tue, 27 May 2025 08:02:55 GMT                                                                                                                                        
content-length: 473                                                                                                                                                        
x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
x-ratelimit-remaining: 1                                                                                                                                                   
x-ratelimit-reset: 3425                                                                                                                                                    
 
HTTP/1.1 200 OK 
content-type: application/json                                                                                                                                             
x-content-type-options: nosniff                                                                                                                                            
date: Tue, 27 May 2025 08:02:56 GMT                                                                                                                                        
content-length: 473                                                                                                                                                        
x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
x-ratelimit-remaining: 0                                                                                                                                                   
x-ratelimit-reset: 3424                                                                                                                                                    
 
HTTP/1.1 429 Too Many Requests 
x-envoy-ratelimited: true                                                                                                                                                  
x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
x-ratelimit-remaining: 0                                                                                                                                                   
x-ratelimit-reset: 3423                                                                                                                                                    
date: Tue, 27 May 2025 08:02:57 GMT                                                                                                                                        
transfer-encoding: chunked

可以看到，前3次请求返回了200，第4次请求返回429，说明限流规则已经生效。

根据客户端IP进行限流

更新全局限流规则，对一个IP段内的每个IP进行分别限制每小时只能有3次访问。

说明

为了方便演示，本场景限制的IP网段为0.0.0.0/0，您可以根据实际情况进行调整。

编辑限流规则。

kubectl edit BackendTrafficPolicy policy-httproute

使用以下内容更新限流规则。

...
  rateLimit:
    type: Global
    global:
      rules:
      - clientSelectors:
        - sourceCIDR: 
            value: 0.0.0.0/0
            type: Distinct
        limit:
          requests: 3
          unit: Hour

保存并退出后，限流规则即时生效。

测试普通请求限流情况。

for i in {1..4}; do kubectl exec deployment/sleep -it -- curl -I --header "Host: ratelimit.example" http://$GATEWAY_HOST/get ; sleep 1; done

预期输出：

HTTP/1.1 200 OK 
content-type: application/json                                                                                                                                             
x-content-type-options: nosniff                                                                                                                                            
date: Tue, 27 May 2025 08:02:53 GMT                                                                                                                                        
content-length: 473                                                                                                                                                        
x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
x-ratelimit-remaining: 2                                                                                                                                                   
x-ratelimit-reset: 3427                                                                                                                                                    
 
HTTP/1.1 200 OK 
content-type: application/json                                                                                                                                             
x-content-type-options: nosniff                                                                                                                                            
date: Tue, 27 May 2025 08:02:55 GMT                                                                                                                                        
content-length: 473                                                                                                                                                        
x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
x-ratelimit-remaining: 1                                                                                                                                                   
x-ratelimit-reset: 3425                                                                                                                                                    
 
HTTP/1.1 200 OK 
content-type: application/json                                                                                                                                             
x-content-type-options: nosniff                                                                                                                                            
date: Tue, 27 May 2025 08:02:56 GMT                                                                                                                                        
content-length: 473                                                                                                                                                        
x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
x-ratelimit-remaining: 0                                                                                                                                                   
x-ratelimit-reset: 3424                                                                                                                                                    
 
HTTP/1.1 429 Too Many Requests 
x-envoy-ratelimited: true                                                                                                                                                  
x-ratelimit-limit: 3, 3;w=3600                                                                                                                                             
x-ratelimit-remaining: 0                                                                                                                                                   
x-ratelimit-reset: 3423                                                                                                                                                    
date: Tue, 27 May 2025 08:02:57 GMT                                                                                                                                        
transfer-encoding: chunked

可以看到，前3次请求返回了200，第4次请求返回429，说明限流规则对0.0.0.0/0的限流已经生效。

（可选）步骤四：清理测试资源

清理限流规则。

kubectl delete BackendTrafficPolicy policy-httproute

清理其他资源。

kubectl delete -f httproute.yaml
kubectl delete -f redis-service.yaml
kubectl delete -f enable-global-rate-limit.yaml