Enable the scheduling feature-Container Service for Kubernetes(ACK)-阿里云帮助中心

When deploying GPU workloads in an ACK managed cluster Pro, assign GPU scheduling labels (exclusive, shared, topology-aware) and GPU card-model labels to optimize utilization and target scheduling. Shared and topology-aware scheduling require Pro; card-model scheduling is supported on all cluster types.ACK managed cluster Pro to control resource allocation.

Scheduling label overview

GPU scheduling labels identify GPU models and allocation policies for fine-grained resource management.

Scheduling mode	Label value	Use cases
Exclusive scheduling (Default)	`ack.node.gpu.schedule: default`	For performance-critical tasks requiring exclusive GPU access, such as model training and high-performance computing (HPC).
Shared scheduling	`ack.node.gpu.schedule: cgpu` `ack.node.gpu.schedule: core_mem` `ack.node.gpu.schedule: share` `ack.node.gpu.schedule: mps`	Ideal for concurrent lightweight tasks such as multitenancy and inference. Improves GPU utilization. `cgpu`: Shared computing power with isolated GPU memory, based on Alibaba Cloud cGPU. `core_mem`: Isolated computing power and GPU memory. `share`: Shared computing power and GPU memory with no isolation. `mps`: Shared computing power with isolated GPU memory, based on NVIDIA Multi-Process Service (MPS) with Alibaba Cloud cGPU.
Shared scheduling	`ack.node.gpu.placement: binpack` `ack.node.gpu.placement: spread`	Optimizes resource allocation on multi-GPU nodes with `cgpu`, `core_mem`, `share`, or `mps` shared scheduling enabled. `binpack`: (Default) Fills one GPU with Pods before allocating the next, reducing resource fragmentation. Ideal for maximizing utilization or energy savings. `spread`: Spreads Pods across GPUs to reduce single-card failure impact. Suitable for high-availability workloads.
Topology-aware scheduling	`ack.node.gpu.schedule: topology`	Assigns the optimal GPU combination to a Pod based on physical GPU topology within a node. Ideal for tasks sensitive to GPU-to-GPU communication latency.
Card model scheduling	`aliyun.accelerator/nvidia_name: <GPU card name>` Use these labels to set GPU memory and card count for jobs. `aliyun.accelerator/nvidia_mem: <GPU memory per card>` `aliyun.accelerator/nvidia_count: <total number of GPU cards>`	Schedules jobs to nodes with a specific GPU model or avoids nodes with a specific model.

Enable scheduling features

A node supports only one GPU scheduling mode (exclusive, shared, or topology-aware) at a time. Enabling a mode sets the extended resources reported by other modes to 0.

Exclusive scheduling

Without GPU scheduling labels, exclusive scheduling is the default mode. A single GPU card is the smallest allocation unit for Pods.

If you enabled another GPU scheduling mode, deleting the label alone does not restore exclusive scheduling. Set the label to ack.node.gpu.schedule: default to restore it.

Shared scheduling

Shared scheduling is available only for ACK managed cluster Pro. See Limitations.

Install the ack-ai-installer component
1. Log on to the ACK console. In the left navigation pane, click Clusters.
2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Applications > Cloud-native AI Suite.
3. On the Cloud-native AI Suite page, click Deploy. On the Cloud-native AI Suite page, select Scheduling Policy Extension (Batch Task Scheduling, GPU Sharing, Topology-aware GPU Scheduling).
  
  To configure the cGPU scheduling policy, see Install and use the cGPU component.
4. On the Cloud-native AI Suite page, click Deploy Cloud-native AI Suite.
  
  In the component list on the Cloud-native AI Suite page, verify that the ack-ai-installer component is installed.
Enable shared scheduling
1. On the Clusters page, click the name of your target cluster. In the left-side navigation pane, choose Nodes > Node Pools.
2. On the Node Pools page, click Create Node Pool, configure the node labels, and then click Confirm.
  
  Keep default values for other settings. See Scheduling label overview for label use cases.
  - Configure basic shared scheduling.
    
    Click the Add icon for Node Labels, set the Key to ack.node.gpu.schedule, and select one of the following label values: cgpu, core_mem, share, or mps (requires installing the MPS Control Daemon component).
  - Configure multi-card shared scheduling.
    
    For nodes with multiple GPU cards, configure multi-card scheduling to optimize resource allocation.
    
    Click the Add icon for Node Labels, set the Key to ack.node.gpu.placement, and select one of the following label values: binpack or spread.

Verify shared scheduling

`cgpu`/`share`/`mps`

Verify that cgpu, share, or mps shared scheduling is enabled. Replace <NODE_NAME> with your node name.

kubectl get nodes <NODE_NAME> -o yaml | grep -q "aliyun.com/gpu-mem"

Expected output:

aliyun.com/gpu-mem: "60"

A non-zero value confirms cgpu, share, or mps shared scheduling is enabled.

`core_mem`

Verify that core_mem shared scheduling is enabled. Replace <NODE_NAME> with your node name.

kubectl get nodes <NODE_NAME> -o yaml | grep -E 'aliyun\.com/gpu-core\.percentage|aliyun\.com/gpu-mem'

Expected output:

aliyun.com/gpu-core.percentage:"80"
aliyun.com/gpu-mem:"6"

Non-zero aliyun.com/gpu-core.percentage and aliyun.com/gpu-mem values confirm core_mem shared scheduling is enabled.

`binpack`

Use the GPU resource query tool to check GPU resource allocation on the node:

kubectl inspect cgpu

Expected output:

NAME                   IPADDRESS      GPU0(Allocated/Total)  GPU1(Allocated/Total)  GPU2(Allocated/Total)  GPU3(Allocated/Total)  GPU Memory(GiB)
cn-shanghai.192.0.2.109  192.0.2.109  15/15                   9/15                   0/15                   0/15                   24/60
--------------------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
24/60 (40%)

GPU0 is fully allocated (15/15) before GPU1 (9/15), confirming the binpack strategy is active.

`spread`

Use the GPU resource query tool to check GPU resource allocation on the node:

kubectl inspect cgpu

Expected output:

NAME                   IPADDRESS      GPU0(Allocated/Total)  GPU1(Allocated/Total)  GPU2(Allocated/Total)  GPU3(Allocated/Total)  GPU Memory(GiB)
cn-shanghai.192.0.2.109  192.0.2.109  4/15                   4/15                   0/15                   4/15                   12/60
--------------------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
12/60 (20%)

Resources are evenly distributed (4/15) across GPU0, GPU1, and GPU3, confirming the spread policy is active.

Topology-aware scheduling

Topology-aware scheduling is available only for ACK managed cluster Pro. See System component version requirements.

Install the ack-ai-installer component.
Enable topology-aware scheduling
Add a label to enable topology-aware GPU scheduling. Replace <NODE_NAME> with your node name.
```
kubectl label node <NODE_NAME> ack.node.gpu.schedule=topology
```
Enabling topology-aware GPU scheduling on a node disables non-topology-aware GPU workloads. To restore exclusive scheduling, run kubectl label node <NODE_NAME> ack.node.gpu.schedule=default --overwrite.
Verify topology-aware scheduling

Verify that topology-aware scheduling is enabled. Replace <NODE_NAME> with your node name.
```
kubectl get nodes <NODE_NAME> -o yaml | grep aliyun.com/gpu
```
Expected output:
```
aliyun.com/gpu: "2"
```
A non-zero aliyun.com/gpu value confirms topology-aware scheduling is enabled.

Card model scheduling

Schedule jobs to nodes with a specified GPU model, or avoid specific models.

View the GPU card model

Query the GPU card model of cluster nodes.

The NVIDIA_NAME field shows the GPU card model.
```
kubectl get nodes -L aliyun.accelerator/nvidia_name
```
Expected output:
```
NAME                        STATUS   ROLES    AGE   VERSION            NVIDIA_NAME
cn-shanghai.192.XX.XX.176   Ready    <none>   17d   v1.26.3-aliyun.1   Tesla-V100-SXM2-32GB
cn-shanghai.192.XX.XX.177   Ready    <none>   17d   v1.26.3-aliyun.1   Tesla-V100-SXM2-32GB
```
Alternative check methods
On the Clusters page, click the name of the target cluster. In the left-side navigation pane, choose Workloads > Pods. For a running Pod (for example, tensorflow-mnist-multigpu-***), click Terminal in the Actions column. Select the target container from the drop-down list and run the following commands.
- Query the card model: nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0 | sed -e 's/ /-/g'
- Query the GPU memory of each card: nvidia-smi --id=0 --query-gpu=memory.total --format=csv,noheader | sed -e 's/ //g'
- Query the total number of GPU cards on the node: nvidia-smi -L | wc -l

Enable card model scheduling

On the Clusters page, click the name of your cluster. In the left navigation pane, click Workloads > Jobs.

On the Jobs page, click Create from YAML. Use the following examples to create an application and enable card model scheduling.

Specify card model

Ensure your application runs on nodes with a specific GPU card model.

In aliyun.accelerator/nvidia_name: "Tesla-V100-SXM2-32GB", replace Tesla-V100-SXM2-32GB with your node's actual card model.

YAML details

apiVersion: batch/v1
kind: Job
metadata:
  name: tensorflow-mnist
spec:
  parallelism: 1
  template:
    metadata:
      labels:
        app: tensorflow-mnist
    spec:
      nodeSelector:
        aliyun.accelerator/nvidia_name: "Tesla-V100-SXM2-32GB" # Runs the application on a Tesla V100-SXM2-32GB GPU.
      containers:
      - name: tensorflow-mnist
        image: registry.cn-beijing.aliyuncs.com/acs/tensorflow-mnist-sample:v1.5
        command:
        - python
        - tensorflow-sample-code/tfjob/docker/mnist/main.py
        - --max_steps=1000
        - --data_dir=tensorflow-sample-code/data
        resources:
          limits:
            nvidia.com/gpu: 1
        workingDir: /root
      restartPolicy: Never

After the job is created, choose Workloads > Pods . The Pod list confirms the Pod is scheduled to a node with the matching GPU card model.

Exclude card model

Use GPU card model labels with node affinity to prevent scheduling on certain card models.

In values: - "Tesla-V100-SXM2-32GB", replace Tesla-V100-SXM2-32GB with your node's actual card model.

YAML details

apiVersion: batch/v1
kind: Job
metadata:
  name: tensorflow-mnist
spec:
  parallelism: 1
  template:
    metadata:
      labels:
        app: tensorflow-mnist
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: aliyun.accelerator/nvidia_name  # Card model scheduling label
                operator: NotIn
                values:
                - "Tesla-V100-SXM2-32GB"            # Prevents the Pod from being scheduled to a node with a Tesla-V100-SXM2-32GB card.
      containers:
      - name: tensorflow-mnist
        image: registry.cn-beijing.aliyuncs.com/acs/tensorflow-mnist-sample:v1.5
        command:
        - python
        - tensorflow-sample-code/tfjob/docker/mnist/main.py
        - --max_steps=1000
        - --data_dir=tensorflow-sample-code/data
        resources:
          limits:
            nvidia.com/gpu: 1
        workingDir: /root
      restartPolicy: Never

After the job is created, the application is not scheduled to nodes where aliyun.accelerator/nvidia_name is Tesla-V100-SXM2-32GB, but can run on GPU nodes with other card models.

Scheduling label overview

Enable scheduling features

Exclusive scheduling

Shared scheduling

cgpu/share/mps

core_mem

binpack

spread

Topology-aware scheduling

Card model scheduling

Specify card model

Exclude card model

`cgpu`/`share`/`mps`

`core_mem`

`binpack`

`spread`