基于ACK One舰队的多集群分批发布

更新时间:
复制为 MD 格式

ACK One舰队支持通过多集群应用分发与Kruise Rollouts来实现跨集群的Deployment分批发布,可让多个集群都按照相同的rollout策略进行灰度发布和回滚。

工作原理

基于ACK One舰队的多集群分批发布的方案保证了在舰队上仍然以Deployment为多集群调度和多集群HPA的对象,由子集群根据舰队中定义的Rollout策略进行分批发布。

image

前提条件

  • 子集群安装ack-kruise插件(>=1.8.3),开源Kruise Rollouts需要>=0.6.2。详细操作,请参见使用Kruise Rollout实现灰度发布(金丝雀&A/B Testing)

    若使用子账号需要在子集群配置以下权限。

    rules:
    - apiGroups:
      - "rollouts.kruise.io"
      resources:
      - rollouts
      - rollouts/status
      verbs:
      - get
      - list
      - watch
      - create
      - update
      - delete
      - deletecollection
      - patch
  • 已安装最新版AMC命令行帮助工具。

准备工作

在舰队部署Demo应用,并发布到多集群中。

  1. 在舰队中创建namespace demo,并保证子集群中也有该namespace。

    kubectl create ns demo
  2. 在舰队中创建以下DeploymentService资源。

    kubectl apply -f web-demo.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: web-demo
      name: web-demo
      namespace: demo
    spec:
      replicas: 0
      revisionHistoryLimit: 10
      selector:
        matchLabels:
          app: web-demo
      template:
        metadata:
          labels:
            app: web-demo
        spec:
          containers:
          - env:
            - name: ENV_NAME
              value: cluster-1
            image: registry-cn-hangzhou.ack.aliyuncs.com/acs/web-demo:0.5.0
            imagePullPolicy: Always
            name: web-demo
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            resources:
              requests:
                cpu: 1
                memory: 1Gi
              limits:
                cpu: 1
                memory: 1Gi
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: web-demo-svc
      namespace: demo
    spec:
      ports:
      - port: 80
        protocol: TCP
        targetPort: 8080
      selector:
        app: web-demo
      sessionAffinity: None
      type: ClusterIP
  3. 在舰队中创建分发策略,将以上Deployment按照动态权重分发副本到2个子集群中。

    请替换YAML中的${cluster1_id}${cluster2_id}为实际子集群ID。
    kubectl apply -f web-demo-pp.yaml
    apiVersion: policy.one.alibabacloud.com/v1alpha1 
    kind: PropagationPolicy
    metadata:
      name: web-demo-pp
      namespace: demo
    spec:
      preserveResourcesOnDeletion: false
      resourceSelectors:
      - apiVersion: apps/v1
        kind: Deployment
        name: web-demo
      placement:
        replicaScheduling:
          replicaDivisionPreference: Weighted
          replicaSchedulingType: Divided
          weightPreference:
            dynamicWeight: AvailableReplicas
        clusterAffinity:
          clusterNames:
            - ${cluster1_id}
            - ${cluster2_id}
    ---
    apiVersion: policy.one.alibabacloud.com/v1alpha1 
    kind: PropagationPolicy
    metadata:
      name: web-demo-svc-pp
      namespace: demo
    spec:
      preserveResourcesOnDeletion: false
      resourceSelectors:
      - apiVersion: v1
        kind: Service
        name: web-demo-svc
      placement:
        replicaScheduling:
          replicaSchedulingType: Duplicated

步骤一:在舰队创建Kruise Rollout并下发到各子集群

如下Kruise Rollout定义了分批发布。共分为2个批次,第一批发布关联Deployment50%副本,第二批发布完所有副本。

kubectl apply -f rollout.yaml
apiVersion: rollouts.kruise.io/v1beta1
kind: Rollout
metadata:
  name: rollouts-demo
  namespace: demo
spec:
  workloadRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-demo 
  strategy:
    canary:
      enableExtraWorkloadForCanary: false
      steps:
      - replicas: 50%
      - replicas: 100%
---
apiVersion: policy.one.alibabacloud.com/v1alpha1
kind: PropagationPolicy
metadata:
  name: web-demo-rollout-pp
  namespace: demo
spec:
  preserveResourcesOnDeletion: false
  resourceSelectors:
  - apiVersion: rollouts.kruise.io/v1beta1
    kind: Rollout
    name: rollouts-demo
  placement:
    replicaScheduling:
      replicaSchedulingType: Duplicated

步骤二:验证Rollout分批发布效果

  1. web-demo Deployment副本数量从0扩容到8。

    kubectl scale deploy web-demo -ndemo --replicas=8
  2. 查看Deployment副本在子集群分布情况,2个集群各4个副本。

    kubectl amc get pod -ndemo -M

    预期输出:

    NAME                        CLUSTER       CLUSTER_ALIAS    READY   STATUS    RESTARTS   AGE
    web-demo-66d4ff8bb8-b9gmq   c6a8xxxcb36   cluster-ack-bj   1/1     Running   0          16s
    web-demo-66d4ff8bb8-f7rfd   c6a8xxxcb36   cluster-ack-bj   1/1     Running   0          16s
    web-demo-66d4ff8bb8-ntd82   c6a8xxxcb36   cluster-ack-bj   1/1     Running   0          16s
    web-demo-66d4ff8bb8-zwsl6   c6a8xxxcb36   cluster-ack-bj   1/1     Running   0          16s
    web-demo-66d4ff8bb8-b6km7   cdedxxx0a66   cluster-idc      1/1     Running   0          16s
    web-demo-66d4ff8bb8-q2l2d   cdedxxx0a66   cluster-idc      1/1     Running   0          16s
    web-demo-66d4ff8bb8-qpvdq   cdedxxx0a66   cluster-idc      1/1     Running   0          16s
    web-demo-66d4ff8bb8-scwqh   cdedxxx0a66   cluster-idc      1/1     Running   0          16s
  3. 查看子集群中Rollout的状态,可以看到均与Deployment匹配成功。

    kubectl amc get rollouts -ndemo -M

    预期输出:

    NAME            CLUSTER       CLUSTER_ALIAS    STATUS    CANARY_STEP   CANARY_STATE   MESSAGE                            AGE     ADOPTION
    rollouts-demo   c6a8xxxcb36   cluster-ack-bj   Healthy   2             Completed      workload deployment is completed   5m56s   Y
    rollouts-demo   cdedxxx0a66   cluster-idc      Healthy   2             Completed      workload deployment is completed   5m56s   Y
  4. 执行以下命令patch Deployment,触发变更和灰度发布。

    kubectl patch deployment web-demo -ndemo --type='json' -p='[
      {
        "op": "add",
        "path": "/spec/template/spec/containers/0/env",
        "value": [
          {"name": "ENV_NAME", "value": "cluster-3"}
        ]
      }
    ]'
  5. 再次执行kubectl amc get rollouts -ndemo -M命令,rollout处于Progressing状态,CANARY_STEP1。

    NAME            CLUSTER       CLUSTER_ALIAS    STATUS        CANARY_STEP   CANARY_STATE   MESSAGE                                                                         AGE     ADOPTION
    rollouts-demo   cdedxxx0a66   cluster-idc      Progressing   1             StepPaused     Rollout is in step(1/2), and you need manually confirm to enter the next step   4d17h   Y
    rollouts-demo   c6a8xxxcb36   cluster-ack-bj   Progressing   1             StepPaused     Rollout is in step(1/2), and you need manually confirm to enter the next step   4d17h   Y
  6. 查看Pod状态。结果已经按照rollout定义的第一批次50%更新了2个集群中的副本(2个集群各有2Pod更新了)。

    kubectl amc get pod -ndemo -M

    预期输出:

    NAME                        CLUSTER       CLUSTER_ALIAS    READY   STATUS    RESTARTS   AGE
    web-demo-66d4ff8bb8-b9gmq   c6a8xxxcb36   cluster-ack-bj   1/1     Running   0          10m
    web-demo-66d4ff8bb8-f7rfd   c6a8xxxcb36   cluster-ack-bj   1/1     Running   0          10m
    web-demo-66d4ff8bb8-j98hw   c6a8xxxcb36   cluster-ack-bj   1/1     Running   0          7s
    web-demo-66d4ff8bb8-rgbml   c6a8xxxcb36   cluster-ack-bj   1/1     Running   0          7s
    web-demo-66d4ff8bb8-b6km7   cdedxxx0a66   cluster-idc      1/1     Running   0          10m
    web-demo-66d4ff8bb8-q2l2d   cdedxxx0a66   cluster-idc      1/1     Running   0          10m
    web-demo-66d4ff8bb8-vppr2   cdedxxx0a66   cluster-idc      1/1     Running   0          7s
    web-demo-66d4ff8bb8-z6r6l   cdedxxx0a66   cluster-idc      1/1     Running   0          7s
    也可以使用kubectl amc get pod -ndemo -M -oyaml |grep -A1 ENV_NAME命令直接查看ENV_NAMEvalue来验证。
  7. 执行approve命令。

    对单集群执行

    1. 对单集群执行approve命令。

      效果等同于对相应子集群执行kubectl kruise rollout approve rollouts/rollouts-demo -ndemo命令。请替换以下命令中的${clusterid}为实际子集群ID。
      kubectl amc rollout approve rollouts/rollouts-demo -ndemo -m ${clusterid}

      预期输出:

      approving in cluster cdedxxx0a66 ......
      rollout.rollouts.kruise.io/rollouts-demo approved
    2. 查看cluster-idc集群状态为已经发布完成,cluster-ack-bj还处于Progressing状态。

      kubectl amc get rollouts -ndemo -M

      预期输出:

      NAME            CLUSTER       CLUSTER_ALIAS    STATUS        CANARY_STEP   CANARY_STATE   MESSAGE                                                                         AGE     ADOPTION
      rollouts-demo   cdedxxx0a66   cluster-idc      Healthy       2             Completed      Rollout progressing has been completed                                          4d18h   Y
      rollouts-demo   c6a8xxxcb36   cluster-ack-bj   Progressing   1             StepPaused     Rollout is in step(1/2), and you need manually confirm to enter the next step   4d18h   Y

    对所有集群执行

    1. 通过以下命令,对所有集群进行approve。

      kubectl amc rollout approve rollouts/rollouts-demo -ndemo -M

      预期输出:

      approving in all clusters ......
      approving in cluster cdedxxx0a66 ......
      rollout.rollouts.kruise.io/rollouts-demo approved
      approving in cluster c6a8xxxcb36 ......
      rollout.rollouts.kruise.io/rollouts-demo approved
      可通过kubectl amc get rollouts -ndemo -M命令查看集群状态。