Kruise Rollout是基于Kubernetes的一个标准扩展组件,可以配合原生工作负载(Deployment、StatefulSet)以及OpenKruise工作负载(CloneSet、Adcanced StatefulSet),实现金丝雀发布、A/B Testing发布和分批发布等功能。本文通过示例介绍如何使用Kruise Rollout灰度发布云原生应用。
前提条件
已创建Kubernetes集群。具体操作,请参见创建ACK托管集群。
如需使用A/B Testing或金丝雀发布的能力,集群版本需为1.19及以上版本。
如需使用分批发布能力,则集群版本需为1.16及以上版本。
已安装kubectl-kruise。关于kubectl-kruise安装路径,请参见kubectl-kruise。
Kruise Rollout介绍
Kruise Rollout是OpenKruise社区开源的渐进式交付框架。Kruise Rollout支持配合流量和实例灰度的灰度发布、蓝绿发布、A/B Testing发布。基于Prometheus Metrics指标,Kruise Rollout还可以实现发布过程的自动化分批与暂停,并提供旁路的无感对接、兼容已有的多种工作负载(Deployment、CloneSet、StatefulSet)。更多信息,请参见Kruise Rollout。
使用Kruise Rollout,只需配置一份Rollout资源并将其下发到K8s集群中,后续的业务发布、升级均无需额外操作,并且可以与Helm、PaaS平台低成本地无缝对接。使用Kruise Rollout实现灰度发布架构如下图所示。
准备工作
安装Kruise Rollout组件。
登录容器服务管理控制台,在左侧导航栏选择集群列表。
在集群列表页面,单击目标集群名称,然后在左侧导航栏,单击组件管理。
在组件管理页面,单击应用管理页签,然后在ack-kruise卡片右下方,单击安装。
在弹出的对话框确认信息后,单击确认。
说明1.8版本以上的ack-kruise组件已经支持v1beta1版本的API。更多信息,请参见API Specifications。
部署业务应用(Deployment和Service)。
说明业务应用配置基于Deployment部署一个
echoserver
服务,并通过Nginx Ingress对外暴露服务。创建echoserver.yaml文件。
下文基于不同示例介绍如何实现金丝雀发布、A/B Testing发布和分批发布等。
场景一:基于Ingress实现金丝雀或A/B Testing发布
Nginx Ingress和MSE Ingress是目前较为普遍的对外暴露服务的方式。本示例演示如何使用Kruise Rollout + Nginx Ingress/MSE Ingress实现金丝雀或A/B Testing发布。
安装Ingress组件并创建业务应用Ingress。
Nginx Ingress Controller
安装Nginx Ingress Controller。
新建集群:创建集群时,在Ingress配置区域,选择安装Nginx Ingress。具体操作,请参见创建ACK托管集群。
已有集群:关于安装Nginx Ingress Controller的具体操作,请参见创建并使用Nginx Ingress对外暴露服务。
创建echoserver-ingress.yaml。
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: echoserver spec: ingressClassName: nginx rules: - http: paths: - backend: service: name: echoserver port: number: 80 path: /apis/echo pathType: Exact
部署业务应用Ingress。
kubectl apply -f echoserver-ingress.yaml
MSE Ingress Controller
安装MSE Ingress Controller并创建MseIngressConfig和IngressClass。具体操作,请参见通过MSE Ingress访问容器服务。
创建echoserver-ingress.yaml。
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: echoserver spec: # 此处须指定ingressClassName为mse。 ingressClassName: mse rules: - http: paths: - backend: service: name: echoserver port: number: 80 path: /apis/echo pathType: Exact
部署业务应用Ingress。
kubectl apply -f echoserver-ingress.yaml
验证访问。
获取外部IP。
Nginx Ingress
export EXTERNAL_IP=$(kubectl get ingress echoserver -o jsonpath="{.status.loadBalancer.ingress[0].ip}" )
MSE Ingress
export EXTERNAL_IP=$(kubectl get ingress echoserver -o jsonpath="{.status.loadBalancer.ingress[0].hostname}" )
测试访问。
curl http://${EXTERNAL_IP}/apis/echo
预期输出:
Hostname: echoserver-75d49c475c-ls2bs Pod Information: node name: version1 pod name: echoserver-75d49c475c-ls2bs pod namespace: default Server values: server_version=nginx: 1.13.3 - lua: 10008 ...
定义Kruise Rollout灰度发布规则。
以下Rollout资源将定义灰度发布规则,发布分为三批:
金丝雀
第一批:金丝雀发布,20%的流量将导入到新版本,其他则为老版本。
第二批:按照流量比例进行灰度,此批次将灰度50%的实例及流量。
第三批:将灰度完成所有的实例。
A/B Test
第一批:A/B Testing发布,具有
header[User-Agent]=Android
的流量将导入到新版本,其他则为老版本。第二批:按照Pod比例进行灰度,此批次将灰度50%的实例。
第三批:将灰度完成所有的实例。
使用以下内容,创建rollout.yaml文件。
金丝雀
apiVersion: rollouts.kruise.io/v1alpha1 kind: Rollout metadata: name: rollouts-demo spec: objectRef: workloadRef: apiVersion: apps/v1 kind: Deployment name: echoserver strategy: canary: steps: - weight: 20 replicas: 1 pause: {} - weight: 50 replicas: 50% pause: {duration: 60} - weight: 100 replicas: 100% pause: {duration: 60} trafficRoutings: - service: echoserver ingress: name: echoserver
A/B Test
apiVersion: rollouts.kruise.io/v1alpha1 kind: Rollout metadata: name: rollouts-demo spec: objectRef: workloadRef: apiVersion: apps/v1 kind: Deployment name: echoserver strategy: canary: steps: # 阶段1:1个Pod,匹配Android流量 - matches: - headers: - type: Exact name: User-Agent value: Android pause: {} replicas: 1 # 阶段2:50% Pod,自动暂停60秒 - matches: - headers: - type: Exact name: User-Agent value: Android pause: {duration: 60} replicas: 50% # 阶段3:100% 匹配流量的Pod - matches: - headers: - type: Exact name: User-Agent value: Android pause: {duration: 60} replicas: 100% trafficRoutings: - service: echoserver ingress: name: echoserver
执行以下命令,将该Rollout资源下发到集群。
kubectl apply -f rollout.yaml
执行以下命令,查看Rollout资源状态。
kubectl get rollout
预期输出:
NAME STATUS CANARY_STEP CANARY_STATE MESSAGE AGE rollouts-demo Healthy 3 Completed workload deployment is completed 7s rollout is healthy 32s
预期输出
STATUS=Healthy
:表明Rollout资源正常工作。
升级应用版本。
Kruise Rollout是一个常态化的配置,将其下发到集群中后,后续业务版本发布只需调整Deployment配置,无需再对Kruise Rollout进行额外操作。例如,业务将echoserver服务镜像版本升级到1.10.3,然后通过执行
kubectl apply -f echoserver.yaml
命令将Deployment部署到集群。将Deployment配置下发到K8s集群时,除kubectl方式外,也可以使用Helm、Vela等方式。修改echoserver.yaml文件,将echoserver服务镜像版本升级到1.10.3。
# echoserver.yaml apiVersion: apps/v1 kind: Deployment metadata: name: echoserver ... spec: ... containers: - name: echoserver # mac m1 can choice image e2eteam/echoserver:2.2-linux-arm image: openkruise-registry.cn-shanghai.cr.aliyuncs.com/openkruise/demo:1.10.3 imagePullPolicy: IfNotPresent env: - name: NODE_NAME # 可选操作。此处为清晰展示灰度效果,将value改为version2。 value: version2
执行以下命令,查看Rollout资源的状态。
kubectl get rollouts rollouts-demo -n default
预期输出:
NAME STATUS CANARY_STEP CANARY_STATE MESSAGE AGE rollouts-demo Progressing 1 StepPaused Rollout is in step(1/3), and you need manually confirm to enter the next step 41m
通过预期输出的
STATUS
和CANARY
,可观察Rollout的过程以及步骤。若预期输出
STATUS=Progressing
:表明已经在金丝雀发布过程中。若预期输出
CANARY_STEP=1
:表明当前处于第一批次。若预期输出
CANARY_STATE=StepPaused
:表明当前批次已经完成,是否需要继续,可以通过人工确认。
验证新老版本流量情况。
金丝雀
重复执行十次访问服务,查看返回的
node name
值。for i in {1..10}; do curl -s http://${EXTERNAL_IP}/apis/echo | grep 'node name'; sleep 1; done
预期输出:
node name: version1 node name: version1 node name: version2 node name: version1 node name: version2 node name: version1 node name: version1 node name: version1 node name: version1 node name: version1
可以看到,version1与version2的比例约为8:2,符合第一阶段权重预期。
手动进行阶段切换。
kubectl-kruise rollout approve rollouts/rollouts-demo -n default
持续查看rollout状态。
kubectl get rollouts rollouts-demo -n default -w
预期输出:
NAME STATUS CANARY_STEP CANARY_STATE MESSAGE AGE rollouts-demo Progressing 2 StepTrafficRouting Rollout is in step(2/3), and upgrade workload to new version 31m rollouts-demo Progressing 2 StepMetricsAnalysis Rollout is in step(2/3), and upgrade workload to new version 31m rollouts-demo Progressing 2 StepPaused Rollout is in step(2/3), and upgrade workload to new version 31m rollouts-demo Progressing 2 StepPaused Rollout is in step(2/3), and wait duration(60 seconds) to enter the next step 31m rollouts-demo Progressing 2 StepReady Rollout is in step(2/3), and wait duration(60 seconds) to enter the next step 32m rollouts-demo Progressing 3 BeforeStepUpgrade Rollout is in step(2/3), and wait duration(60 seconds) to enter the next step 32m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 32m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 32m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 32m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 32m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 32m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 32m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 32m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 32m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 32m rollouts-demo Progressing 3 StepTrafficRouting Rollout is in step(3/3), and upgrade workload to new version 32m rollouts-demo Progressing 3 StepMetricsAnalysis Rollout is in step(3/3), and upgrade workload to new version 32m rollouts-demo Progressing 3 StepPaused Rollout is in step(3/3), and upgrade workload to new version 32m rollouts-demo Progressing 3 StepReady Rollout is in step(3/3), and upgrade workload to new version 32m rollouts-demo Progressing 3 Completed Rollout is in step(3/3), and upgrade workload to new version 32m rollouts-demo Progressing 3 Completed Rollout has been completed and some closing work is being done 32m rollouts-demo Progressing 3 Completed Rollout progressing has been completed 33m rollouts-demo Healthy 3 Completed Rollout progressing has been completed 33m
可以看到,
approve
之后rollout资源进入阶段二,并在等待时间之后自动进入阶段三,最终STATUS
=Healthy
,且CANARY_STATE
=Completed
,表明本次rollout已经全部完结。
A/B Test
分别访问带有
header[User-Agent]=Android
请求头的应用和不带请求头的应用。curl -s http://${EXTERNAL_IP}/apis/echo |grep 'Pod Information:' -A 3 curl -sH "User-Agent: Android" http://${EXTERNAL_IP}/apis/echo | grep 'Pod Information:' -A 3
预期输出:
Pod Information: node name: version1 pod name: echoserver-69598f9458-7c66v pod namespace: default Pod Information: node name: version2 pod name: echoserver-fvhzg-687b4b56-qbhc8 pod namespace: default
可以看到,两个请求分别返回了
version1
和version2
,说明请求头路由生效。手动进行阶段切换。
kubectl-kruise rollout approve rollouts/rollouts-demo -n default
持续查看rollout状态。
kubectl get rollouts rollouts-demo -n default -w
预期输出:
NAME STATUS CANARY_STEP CANARY_STATE MESSAGE AGE rollouts-demo Progressing 2 StepTrafficRouting Rollout is in step(2/3), and upgrade workload to new version 26m rollouts-demo Progressing 2 StepMetricsAnalysis Rollout is in step(2/3), and upgrade workload to new version 26m rollouts-demo Progressing 2 StepPaused Rollout is in step(2/3), and upgrade workload to new version 26m rollouts-demo Progressing 2 StepPaused Rollout is in step(2/3), and wait duration(60 seconds) to enter the next step 26m rollouts-demo Progressing 2 StepReady Rollout is in step(2/3), and wait duration(60 seconds) to enter the next step 27m rollouts-demo Progressing 3 BeforeStepUpgrade Rollout is in step(2/3), and wait duration(60 seconds) to enter the next step 27m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 27m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 27m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 27m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 27m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 27m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 27m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 27m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 27m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 27m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 27m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 27m rollouts-demo Progressing 3 StepTrafficRouting Rollout is in step(3/3), and upgrade workload to new version 27m rollouts-demo Progressing 3 StepMetricsAnalysis Rollout is in step(3/3), and upgrade workload to new version 27m rollouts-demo Progressing 3 StepPaused Rollout is in step(3/3), and upgrade workload to new version 27m rollouts-demo Progressing 3 StepReady Rollout is in step(3/3), and upgrade workload to new version 27m rollouts-demo Progressing 3 Completed Rollout is in step(3/3), and upgrade workload to new version 27m rollouts-demo Progressing 3 Completed Rollout has been completed and some closing work is being done 27m rollouts-demo Progressing 3 Completed Rollout has been completed and some closing work is being done 27m rollouts-demo Progressing 3 Completed Rollout has been completed and some closing work is being done 27m rollouts-demo Progressing 3 Completed Rollout has been completed and some closing work is being done 27m rollouts-demo Progressing 3 Completed Rollout has been completed and some closing work is being done 27m rollouts-demo Progressing 3 Completed Rollout has been completed and some closing work is being done 27m rollouts-demo Progressing 3 Completed Rollout has been completed and some closing work is being done 27m rollouts-demo Progressing 3 Completed Rollout has been completed and some closing work is being done 27m rollouts-demo Progressing 3 Completed Rollout progressing has been completed 27m rollouts-demo Healthy 3 Completed Rollout progressing has been completed 27m
可以看到,
approve
之后rollout资源进入阶段二,并在等待时间之后自动进入阶段三,最终STATUS
=Healthy
,且CANARY_STATE
=Completed
,表明本次rollout已经全部完结。
(可选)若新版本服务异常,可进行业务回滚。
如果在Rollout过程中,发现新版本服务异常,可以通过Deployment配置恢复到之前版本。然后通过
kubectl apply -f echoserver.yaml
命令进行部署,无需对Rollout资源做任何改动。# echoserver.yaml apiVersion: apps/v1 kind: Deployment metadata: name: echoserver ... spec: ... containers: - name: echoserver # mac m1 can choice image e2eteam/echoserver:2.2-linux-arm. image: openkruise-registry.cn-shanghai.cr.aliyuncs.com/openkruise/demo:1.10.2 imagePullPolicy: IfNotPresent env: - name: NODE_NAME value: version1
场景二:基于Pod实例个数灰度的分批发布(基于Nacos等微服务框架的应用)
大多基于微服务框架的应用(例如Nacos)部署到K8s集群时,并不需要配置对应的Service和Ingress,流量调度部分微服务框架已经集成。因此该类型的应用更适合使用Kruise Rollout的分批发布能力。
由于流量灰度的部分则由微服务框架提供,因此当前场景将跳过新旧版本的验证流程和结果,只演示Rollout阶段切换。
定义并部署Kruise Rollout灰度发布规则。
以下Rollout资源将定义灰度发布规则(无需配置
trafficRoutings
字段),发布分为三批:第一批:灰度1个Pod。
第二批:灰度50%的Pod。
第三批:将灰度完所有的实例。
# 将如下内容存储到文件rollout.yaml。 apiVersion: rollouts.kruise.io/v1alpha1 kind: Rollout metadata: name: rollouts-demo annotations: rollouts.kruise.io/rolling-style: partition spec: objectRef: workloadRef: apiVersion: apps/v1 kind: Deployment # Deployment Name name: echoserver strategy: canary: steps: # 第1步:更新1个Pod,然后暂停等待手动确认。 - replicas: 1 pause: {} # 手动决定是否进入下一批次 # 第2步:更新50%的Pod实例。 - replicas: 50% # 暂停60秒后自动进入下一批次。 pause: {duration: 60} # 第3步:全量发布,更新所有Pod到新版本。 - replicas: 100% pause: {duration: 60}
修改echoserver.yaml文件,将echoserver服务镜像版本升级到1.10.3。
# echoserver.yaml apiVersion: apps/v1 kind: Deployment metadata: name: echoserver ... spec: ... containers: - name: echoserver # mac m1 can choice image e2eteam/echoserver:2.2-linux-arm image: openkruise-registry.cn-shanghai.cr.aliyuncs.com/openkruise/demo:1.10.3 imagePullPolicy: IfNotPresent env: - name: NODE_NAME # 可选操作。此处为清晰展示灰度效果,将value改为version2。 value: version2
查看Rollout资源的状态。
kubectl get rollouts rollouts-demo -n default
预期输出:
NAME STATUS CANARY_STEP CANARY_STATE MESSAGE AGE rollouts-demo Progressing 1 StepPaused Rollout is in step(1/3), and you need manually confirm to enter the next step 41m
通过预期输出的
STATUS
和CANARY
,可观察Rollout的过程以及步骤。若预期输出
STATUS=Progressing
:表明已经在金丝雀发布过程中。若预期输出
CANARY_STEP=1
:表明当前处于第一批次。若预期输出
CANARY_STATE=StepPaused
:表明当前批次已经完成,是否需要继续,可以通过人工确认。
手动进行阶段切换。
kubectl-kruise rollout approve rollouts/rollouts-demo -n default
持续查看rollout状态。
kubectl get rollouts rollouts-demo -n default -w
预期输出:
NAME STATUS CANARY_STEP CANARY_STATE MESSAGE AGE rollouts-demo Progressing 2 StepPaused Rollout is in step(2/3), and wait duration(60 seconds) to enter the next step 45m rollouts-demo Progressing 2 StepReady Rollout is in step(2/3), and wait duration(60 seconds) to enter the next step 45m rollouts-demo Progressing 3 BeforeStepUpgrade Rollout is in step(2/3), and wait duration(60 seconds) to enter the next step 45m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(2/3), and wait duration(60 seconds) to enter the next step 45m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 45m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 45m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 45m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 45m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 45m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 45m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 45m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 45m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 46m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 46m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 46m rollouts-demo Progressing 3 StepUpgrade Rollout is in step(3/3), and upgrade workload to new version 46m rollouts-demo Progressing 3 StepMetricsAnalysis Rollout is in step(3/3), and upgrade workload to new version 46m rollouts-demo Progressing 3 StepPaused Rollout is in step(3/3), and upgrade workload to new version 46m rollouts-demo Progressing 3 StepReady Rollout is in step(3/3), and upgrade workload to new version 46m rollouts-demo Progressing 3 Completed Rollout is in step(3/3), and upgrade workload to new version 46m rollouts-demo Progressing 3 Completed Rollout has been completed and some closing work is being done 46m rollouts-demo Progressing 3 Completed Rollout has been completed and some closing work is being done 46m rollouts-demo Progressing 3 Completed Rollout has been completed and some closing work is being done 46m rollouts-demo Progressing 3 Completed Rollout has been completed and some closing work is being done 46m rollouts-demo Progressing 3 Completed Rollout has been completed and some closing work is being done 46m rollouts-demo Progressing 3 Completed Rollout has been completed and some closing work is being done 46m rollouts-demo Progressing 3 Completed Rollout has been completed and some closing work is being done 46m rollouts-demo Progressing 3 Completed Rollout progressing has been completed 46m rollouts-demo Healthy 3 Completed Rollout progressing has been completed 46m