Use shared GPU scheduling with ack-co-scheduler

更新时间:
复制 MD 格式

Shared GPU scheduling lets multiple pods share a single GPU, reducing GPU resource waste and improving utilization in registered clusters.

Prerequisites

Before you begin, ensure that you have:

Billing

Enable Cloud-native AI Suite before using shared GPU scheduling. For pricing details, see Billing information for the Cloud-native AI Suite.

Limitations

  • Do not set the CpuPolicy of nodes that use shared GPU scheduling to static.

  • DaemonSet pods for shared GPU scheduling run at lower priority and may be preempted by higher-priority pods. To prevent eviction, add priorityClassName: system-node-critical to the DaemonSet spec — for example, in gpushare-device-plugin-ds.

How it works

Two components work together to enable shared GPU scheduling:

  • ack-ai-installer — provides GPU memory isolation and GPU topology-aware scheduling on each node.

  • ack-co-scheduler — lets you define ResourcePolicy custom resources and enables multi-level elastic scheduling across the cluster.

Pods request GPU memory in GiB using the aliyun.com/gpu-mem resource limit. The scheduler places pods on GPU nodes, distributing GPU memory across workloads that share the same physical GPU.

Node labels control which sharing mode applies:

Label value Mode Isolation
ack.node.gpu.schedule=share Shared GPU scheduling (on-premises nodes) No memory isolation between pods
ack.node.gpu.schedule=cgpu Shared GPU scheduling with memory isolation (cloud nodes) GPU memory is isolated between pods

Step 1: Install components

Install both components in the registered cluster.

Install ack-ai-installer

  1. Log on to the Container Service Management Console. In the left navigation pane, click Clusters.

  2. Click the cluster name, then click Applications > Helm in the left navigation pane.

  3. On the Helm page, click Deploy, then search for and install ack-ai-installer.

Install ack-co-scheduler

  1. On the Clusters page, click the cluster name, then click Add-ons in the left navigation pane.

  2. On the Add-ons page, search for ack-co-scheduler and click Install in the lower-right corner of its card.

Step 2: Install the GPU resource query tool

Download kubectl-inspect-cgpu and place it in your PATH.

Linux:

wget http://aliacs-k8s-cn-beijing.oss-cn-beijing.aliyuncs.com/gpushare/kubectl-inspect-cgpu-linux -O /usr/local/bin/kubectl-inspect-cgpu

macOS:

wget http://aliacs-k8s-cn-beijing.oss-cn-beijing.aliyuncs.com/gpushare/kubectl-inspect-cgpu-darwin -O /usr/local/bin/kubectl-inspect-cgpu

Grant execute permissions:

chmod +x /usr/local/bin/kubectl-inspect-cgpu

Step 3: Create GPU nodes

Create Elastic GPU Service instances and install GPU drivers and nvidia-container-runtime. See Create and manage node pools.

If you already have GPU nodes with the environment configured, skip this step. If you need a script to install GPU drivers, see Manually upgrade GPU node drivers.

Label GPU nodes to enable shared GPU scheduling:

Node type Label to add Effect
On-premises nodes ack.node.gpu.schedule=share Enables shared GPU scheduling without memory isolation
Cloud nodes ack.node.gpu.schedule=cgpu Enables shared GPU scheduling with memory isolation

For on-premises nodes, add the label manually using kubectl. For cloud nodes, use the node pool label feature. See Enable scheduling.

Step 4: Deploy a workload with shared GPU scheduling

Check current GPU usage

Run the following command to view GPU usage in the cluster:

kubectl inspect cgpu

Expected output:

NAME                           IPADDRESS       GPU0(Allocated/Total)  GPU Memory(GiB)
cn-zhangjiakou.192.168.66.139  192.168.66.139  0/15                   0/15
---------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
0/15 (0%)

Deploy a GPU workload

  1. Create a file named GPUtest.yaml with the following content:

    Field Required Description
    schedulerName: ack-co-scheduler Yes Directs the pod to the co-scheduler for GPU-aware placement
    aliyun.com/gpu-mem Yes GPU memory request in GiB. Set this to the amount of GPU memory the pod needs
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: gpu-share-sample
    spec:
      parallelism: 1
      template:
        metadata:
          labels:
            app: gpu-share-sample
        spec:
          schedulerName: ack-co-scheduler
          containers:
          - name: gpu-share-sample
            image: registry.cn-hangzhou.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5
            command:
            - python
            - tensorflow-sample-code/tfjob/docker/mnist/main.py
            - --max_steps=100000
            - --data_dir=tensorflow-sample-code/data
            resources:
              limits:
                # Unit: GiB. This pod requests 3 GiB of GPU memory.
                aliyun.com/gpu-mem: 3
            workingDir: /root
          restartPolicy: Never

    Key fields in the manifest:

  2. Apply the manifest:

    kubectl apply -f GPUtest.yaml
  3. Verify GPU memory allocation:

    • GPU0(Allocated/Total): Shows GPU memory in use on each GPU. 3/15 means 3 GiB of 15 GiB is allocated.

    • Allocated/Total GPU Memory In Cluster: Cluster-wide summary. 3/15 (20%) confirms the pod is scheduled and consuming GPU memory.

    kubectl inspect cgpu

    Expected output:

    NAME                           IPADDRESS       GPU0(Allocated/Total)  GPU Memory(GiB)
    cn-zhangjiakou.192.168.66.139  192.168.66.139  3/15                   3/15
    ---------------------------------------------------------------------------
    Allocated/Total GPU Memory In Cluster:
    3/15 (20%)

    Check the following fields to confirm scheduling succeeded:

What's next

For an overview of shared GPU scheduling modes and architecture, see Shared GPU scheduling.