When you deploy GPU compute jobs in an ACK managed cluster Pro, you can optimize resource utilization and schedule workloads with precision by assigning scheduling attribute labels (such as exclusive, shared, and topology-aware) and GPU model labels (for card model scheduling) to GPU nodes.
Scheduling label overview
GPU scheduling labels identify GPU models and resource allocation policies, enabling fine-grained resource management and efficient scheduling.
|
Scheduling mode |
Label value |
Use cases |
|
Exclusive scheduling (Default) |
|
For performance-critical tasks that require exclusive access to an entire GPU, such as model training and high-performance computing (HPC). |
|
Shared scheduling |
|
Improves GPU utilization and is ideal for scenarios with multiple concurrent lightweight tasks, such as multitenancy and inference.
|
|
|
Optimizes the resource allocation strategy on multi-GPU nodes when
|
|
|
Topology-aware scheduling |
|
Automatically assigns the optimal combination of GPUs to a Pod based on the physical GPU topology within a single node. This is ideal for tasks that are sensitive to GPU-to-GPU communication latency. |
|
Card model scheduling |
Use these labels with card model scheduling to set GPU memory capacity and total GPU card count for GPU jobs.
|
Schedules jobs to nodes with a specific GPU model or avoids nodes with a specific model. |
Enable scheduling features
A node can have only one GPU scheduling mode (exclusive, shared, or topology-aware) enabled at a time. After you enable a mode, the extended resources reported by other scheduling modes are automatically set to 0.
Exclusive scheduling
If a node has no GPU scheduling labels, exclusive scheduling is the default mode. In this mode, a single GPU card is the smallest allocation unit for Pods.
If you have enabled another GPU scheduling mode, deleting the label alone does not restore exclusive scheduling. You must manually change the label value to ack.node.gpu.schedule: default to do so.
Shared scheduling
Shared scheduling is available only for ACK managed cluster Pro. For more information, see Limitations.
-
Install the
ack-ai-installercomponentLog on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
-
On the Cloud-native AI Suite page, click Deploy. On the Cloud-native AI Suite page, select Scheduling Policy Extension (Batch Task Scheduling, GPU Sharing, Topology-aware GPU Scheduling).
To learn how to configure the compute scheduling policy supported by the cGPU service, see Install and use the cGPU component.
-
On the Cloud-native AI Suite page, click Deploy Cloud-native AI Suite.
In the component list on the Cloud-native AI Suite page, verify that the ack-ai-installer component is installed.
-
Enable shared scheduling
-
On the Clusters page, click the name of your target cluster. In the left-side navigation pane, choose .
-
On the Node Pools page, click Create Node Pool, configure the node labels, and then click Confirm.
You can keep the default values for the other settings. For information about the use cases of node labels, see Scheduling label overview.
-
Configure basic shared scheduling.
Click the Add icon
for Node Labels, set the Key to ack.node.gpu.schedule, and select one of the following label values:cgpu,core_mem,share, ormps(requires installing the MPS Control Daemon component). -
Configure multi-card shared scheduling.
If a node has multiple GPU cards and you want to optimize resource allocation, you can further configure multi-card shared scheduling in addition to basic shared scheduling.
Click the Add icon
for Node Labels, set the Key to ack.node.gpu.placement, and select one of the following label values:binpackorspread.
-
-
-
Verify shared scheduling
cgpu/share/mpsReplace <NODE_NAME> with the name of your target node and run the following command to verify that
cgpu,share, ormpsshared scheduling is enabled.kubectl get nodes <NODE_NAME> -o yaml | grep -q "aliyun.com/gpu-mem"Expected output:
aliyun.com/gpu-mem: "60"A non-zero value for the
aliyun.com/gpu-memfield indicates thatcgpu,share, ormpsshared scheduling is enabled.core_memReplace
<NODE_NAME>with the name of your target node and run the following command to verify thatcore_memshared scheduling is enabled.kubectl get nodes <NODE_NAME> -o yaml | grep -E 'aliyun\.com/gpu-core\.percentage|aliyun\.com/gpu-mem'Expected output:
aliyun.com/gpu-core.percentage:"80" aliyun.com/gpu-mem:"6"If the
aliyun.com/gpu-core.percentageandaliyun.com/gpu-memfields are both non-zero,core_memshared scheduling is enabled.binpackFrom the shared GPU GPU resource query tool, run the following command to check the GPU resource allocation on the node:
kubectl inspect cgpuExpected output:
NAME IPADDRESS GPU0(Allocated/Total) GPU1(Allocated/Total) GPU2(Allocated/Total) GPU3(Allocated/Total) GPU Memory(GiB) cn-shanghai.192.0.2.109 192.0.2.109 15/15 9/15 0/15 0/15 24/60 -------------------------------------------------------------------------------------- Allocated/Total GPU Memory In Cluster: 24/60 (40%)The output shows that GPU0 is fully allocated (15/15) while GPU1 is partially allocated (9/15). This confirms that the
binpackstrategy, which fills one GPU completely before allocating resources on the next, is active.spreadFrom the shared scheduling GPU resource query tool, run the following command to check the GPU resource allocation on the node:
kubectl inspect cgpuExpected output:
NAME IPADDRESS GPU0(Allocated/Total) GPU1(Allocated/Total) GPU2(Allocated/Total) GPU3(Allocated/Total) GPU Memory(GiB) cn-shanghai.192.0.2.109 192.0.2.109 4/15 4/15 0/15 4/15 12/60 -------------------------------------------------------------------------------------- Allocated/Total GPU Memory In Cluster: 12/60 (20%)The output indicates that the allocated resources are 4/15 on GPU0, 4/15 on GPU1, and 4/15 on GPU3. This is consistent with the scheduling policy that prioritizes spreading Pods across different GPUs, which indicates that the
spreadpolicy has taken effect.
Topology-aware scheduling
Topology-aware scheduling is available only for ACK managed cluster Pro. For more information, see System component version requirements.
-
Enable topology-aware scheduling
Replace <NODE_NAME> with the name of your target node and run the following command to add a label to the node and explicitly enable topology-aware GPU scheduling.
kubectl label node <NODE_NAME> ack.node.gpu.schedule=topologyAfter you enable topology-aware GPU scheduling on a node, it no longer supports non-topology-aware GPU workloads. To restore exclusive scheduling, run the
kubectl label node <NODE_NAME> ack.node.gpu.schedule=default --overwritecommand to change the label. -
Verify topology-aware scheduling
Replace <NODE_NAME> with the name of your target node and run the following command to verify that
topology-aware scheduling is enabled on the node.kubectl get nodes <NODE_NAME> -o yaml | grep aliyun.com/gpuExpected output:
aliyun.com/gpu: "2"If the
aliyun.com/gpufield is not 0,topology-aware scheduling is enabled.
Card model scheduling
Schedule jobs to nodes with a specified GPU model, or avoid specific models.
-
View the GPU card model
Run the following command to query the GPU card model of the nodes in your cluster.
The NVIDIA_NAME field shows the GPU card model.
kubectl get nodes -L aliyun.accelerator/nvidia_nameThe expected output is similar to the following:
NAME STATUS ROLES AGE VERSION NVIDIA_NAME cn-shanghai.192.XX.XX.176 Ready <none> 17d v1.26.3-aliyun.1 Tesla-V100-SXM2-32GB cn-shanghai.192.XX.XX.177 Ready <none> 17d v1.26.3-aliyun.1 Tesla-V100-SXM2-32GB -
Enable card model scheduling
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
-
On the Jobs page, click Create from YAML. Use the following examples to create an application and enable card model scheduling.
Specify card model
Use the GPU card model scheduling label to ensure your application runs on nodes with a specific card model.
In the code
aliyun.accelerator/nvidia_name: "Tesla-V100-SXM2-32GB", replaceTesla-V100-SXM2-32GBwith the actual card model of your node.After the job is created, choose from the left-side navigation pane. The Pod list shows an example Pod successfully scheduled to a matching node, confirming that scheduling based on the GPU card model label is functioning correctly.
Exclude card model
Use the GPU card model scheduling label with node affinity and anti-affinity to prevent your application from running on certain card models.
In
values: - "Tesla-V100-SXM2-32GB", replaceTesla-V100-SXM2-32GBwith the actual card model of your node.After the job is created, the application will not be scheduled to nodes with the label key
aliyun.accelerator/nvidia_nameand valueTesla-V100-SXM2-32GB, but it can be scheduled to GPU nodes with other card models.