集群调度

更新时间: 2022-09-21 16:24:40

ADP底座提供了集群整体的精细化调度能力,可以在一个配置文件中,以全局的视角,根据不同业务场景(例如中间件、核心业务应用、非核心业务应用等)筛选不同的Workload,然后配置统一的调度策略和资源隔离策略等。

Spec定义

Workload 发布分布调度策略 - WorkloadSpread

针对 POD workload patch.spec.affinity.*/.spec.topologySpreadConstraints[]/.spec.tolerations[],主要可以解决全局通配调度能力而免去每个独立应用个别的配置,在 Helm Charts 情况下如果模板内容不具备适配能力将照成一定程度的改造工作,利用WorkloadSpread API 配置将可进行调度条件补充。

# API Version Info (namespaced=false)
apiVersion: opcc.cnx.aliyun-inc.com/v1alpha1
kind: WorkloadSpread
metadata:
  name: <workloadSpreadName>
spec:
  spreadGroups:
  - name: group-a
    targets:
      # targetRef - 指的负载类型,valid GVK are apps/v1/[StatefulSet,Deployment,ReplicaSetSpec] or CustomResourceDefinition that has assodicated workloads,apps/v1/[StatefulSet,Deployment,ReplicaSetSpec] have ownerReferences info will be prosecuted to its owner resource and will not be used to match label selector
    - targetRef:
      - apiGroup: apps
        apiVerion: v1
        kind: StatefulSet
      - apiGroup: apps
        apiVerion: v1
        kind: Deployment
      # labelSelector - 标签赛选(必填)ref: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#resources-that-support-set-based-requirements
      labelSelector:
        matchLabels:
          some-res-label: some-res-label-value
        matchExpressions:
        - key: another-node-label-key
          # operator - Represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
          operator: In
          values:
          - another-node-label-value
      
      # namespaceSelector - 额外 Namespace selector 筛选
      namespaceSelector:
        matchNames:
        - default
    affinity:
      # nodeAffinity - 节点亲和匹配配置
      # ref: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity  
      nodeAffinity:
        # requiredDuringSchedulingIgnoredDuringExecution - 强制匹配调度配置
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: topology.kubernetes.io/zone
              # operator - Represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
              operator: In
              values:
              - zone-a
        # preferredDuringSchedulingIgnoredDuringExecution - 尽量匹配调度配置
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          preference:
            matchExpressions:
            - key: another-node-label-key
              # operator - Represents a key's relationship to a set of values. Valid operators are In, NotIn, Exists, DoesNotExist. Gt, and Lt.
              operator: In
              values:
              - another-node-label-value

      # podAffinity -
      # ref: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity
      podAffinity:

      # podAntiAffinity -
      # https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity
      podAntiAffinity:

    # topologySpreadConstraints - Spread Constraints for Pods
    # ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/#spread-constraints-for-pods
    topologySpreadConstraints:
    - maxSkew: <integer>
      topologyKey: <string>
      whenUnsatisfiable: <string>
      labelSelector: <object>

    # toleration -
    # ref: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#concepts
    tolerations:
    - key: "example-key"
      operator: "Exists"
      effect: "NoSchedule"

    
  - name: group-b
    targets:
      # ...

# required operator controller
status:
  observedGeneration:
  pendingResources:
  - <ns>/statefulset/zzz
  upToDateResources:
  - <ns>/statefulset/xxx
  - <ns>/deployment/yyy

使用说明

PODspec 改写行为说明

  • 针对DeploymentReplicaSet将会保障POD spec 被WorkloadSpread spec 内容 patch。

  • 针对 StatefulSet如果POD已经被 scheduled 那WorkloadSpread spec 内容将不对 `POD` spec 进行任何改变,也就可以理解为初次有效或WorkloadSpread为后置配置将无法生效。

多个WorkloadSpreadCR 或.spec.spreadGroups[]target selector 出现冲突、重复

  • 多个WorkloadSpread CR 有可能产生 target 重复且照成条件冲突,将采用.metadata.creationTimestamp为较新。

  • .spec.spreadGroups[]有可能产生 target 重复且照成条件冲突,将采用第一个匹配配置策略(first find)。

  • 建议只配置一个WorkloadSpread这样方便审视 targets 塞选条件。

FAQ

Q:如何确保WorkloadSpread API对象配置在集群内的StatefulSetworkload 创建前?

A:可以在 ADP 应用编排过程配置底座 OPCC 组件 Helm valuespreadGroups数组内容, 数组内容将会完全渲染在系统缺省的WorkloadSpread对象里的.spec.spreadGroups下。

Q:WorkloadSpread API对象配置是在集群内的StatefulSetworkload 创建后创建,如何迁移 PODs?

A:迁移 PODs 的条件是绑定的 PVC 是可以节点漂移的,但如果用的是缺省的 Yoda-LVM/OpenLocal-LVM Storage Class 那只能评估 PV 文件系统的内容否可以丢弃,如果可以的条件下删除对应的 PVC 及 POD。

阿里云首页 云原生应用交付平台 相关技术圈