Use a custom scheduler

更新时间:
复制 MD 格式

Container Service (ACS) clusters include a default scheduler. When you submit a pod to a cluster, the scheduler named default-scheduler allocates resources by default. To accommodate special resource types and complex scheduling policies, ACS clusters let you use a custom scheduler. This topic describes how to use a custom scheduler in an ACS cluster and its constraints.

Prerequisites

  • The acs-virtual-node component is installed, and its version is v2.12.0-acs.8 or later.

  • The kube-scheduler component is installed, and its version meets the following requirements.

    ACS cluster version

    Required kube-scheduler version

    1.32 and later

    All versions are supported.

    1.31

    v1.31.0-aliyun-1.1.2 and later.

    1.30

    v1.30.3-aliyun-1.1.2 and later.

    1.28

    v1.28.9-aliyun-1.1.2 and later.

    The Enable custom labels and schedulers for GPU-HPN nodes option in the kube-scheduler component is enabled by default in recent versions. For more information, see kube-scheduler.

    Note

    When you use a custom scheduler, you need to configure spec.schedulerName for the pod. For more information, see Specify schedulers for pods.

Procedure

Step 1: Deploy a custom scheduler

Deploy a custom scheduler in your cluster. For more information, see the Kubernetes documentation.

Step 2: Modify the ACS scheduler configuration

  1. Log on to the ACS console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of the target cluster. In the left navigation pane, click Add-ons.

  3. On the Add-ons page, locate Kube Scheduler and click Configuration on its card.

  4. In the dialog box that appears, select Enable custom labels and schedulers for GPU-HPN nodes and click OK.

    For more information, see kube-scheduler.
  5. This table compares the effects on pods and nodes when this option is enabled or disabled.

    Note

    This feature applies only to GPU-HPN pods and nodes. Custom schedulers are not supported for other compute types.

    Default scheduler (disabled)

    Custom scheduler (enabled)

    Pod scheduler name

    Customization is not supported. After a pod is submitted, the spec.schedulerName field will be overwritten with default-scheduler.

    You can set the spec.schedulerName of a Pod to any value, and this value will not be modified after submission.

    Pod scheduling process

    The default ACS scheduler allocates resources for all pods.

    The default ACS scheduler schedules only pods whose spec.schedulerName is default-scheduler. The custom scheduler is responsible for allocating resources for all other pods.

    Constraints on labels, annotations, and taints for GPU-HPN nodes

    ACS constraints apply to adding, modifying, and deleting node labels, annotations, and taints. For more information, see Manage node labels and taints.

    ACS constraints no longer apply to adding, modifying, and deleting node labels, annotations, and taints.

    Constraints on pod affinity scheduling

    ACS constraints apply to the configuration of affinity fields. For more information, see Node Affinity Scheduling.

    ACS constraints no longer apply to the configuration of affinity fields.

Step 3: Assign a custom scheduler to a pod

  1. Create a file named dep-with-koordinator.yaml with the following content.

    The file describes a Deployment application where the Pod template specifies a custom scheduler named koord-scheduler.
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: dep-with-koordinator
      labels:
        app: dep-with-koordinator
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: dep-with-koordinator
      template:
        metadata:
          labels:
            app: dep-with-koordinator
            # Specify the compute class as gpu-hpn. Custom schedulers are not supported for other compute types.
            alibabacloud.com/compute-class: gpu-hpn
        spec:
          containers:
          - name: demo
            image: registry.cn-hangzhou-finance.aliyuncs.com/acs/stress:v1.0.4
            command:
            - "sleep"
            - "infinity"
          restartPolicy: Always
          # Specify the scheduler name as koord-scheduler. The name must match the one configured in Step 1.
          schedulerName: koord-scheduler
  2. Submit the pod to the ACS cluster.

    kubectl apply -f dep-with-koordinator.yaml
  3. Check the schedulerName in the Pod configuration.

    kubectl get pod -lapp=dep-with-koordinator -o custom-columns=NAME:.metadata.name,schedulerName:.spec.schedulerName

    Expected output:

    # The pod specifies the custom scheduler named koord-scheduler.
    NAME                               schedulerName
    dep-with-koordinator-xxxxx-xxxxx   koord-scheduler

FAQ

PVC scheduling error: Insufficient attachable-volumes-xxx

This issue does not occur with the default Kubernetes scheduler. This is because some custom schedulers, when processing PersistentVolumeClaim (PVC) resources, require the node's Container Storage Interface (CSI) Node object to exist or to have reported capacity information for the corresponding CSI driver. If a node does not meet these conditions, the scheduler may incorrectly determine that resources are insufficient.

To resolve this, configure the scheduler to ignore specific CSI drivers. For example, when using the Volcano scheduler, you can add the --ignored-provisioners=${csi-driver-name} launch parameter:

# You can specify multiple drivers, separated by commas.
--ignored-provisioners=povplugin.csi.alibabacloud.com

Adjust the parameter based on the CSI driver you are using.

Resource insufficiency errors: resource in cluster is overused or queue resource quota insufficient

This issue does not occur with the default Kubernetes scheduler. This is because some custom schedulers convert the capacity values of non-CPU or non-memory resources, such as ephemeral-storage, to milli-units. Because virtual nodes report a very large capacity by default, this conversion can cause a 64-bit integer overflow. This causes the scheduler to miscalculate the remaining capacity of the node or cluster as a negative value, triggering a resource insufficiency error. Submit a ticket to technical support to resolve this issue.