By default, the scheduler allocates GPU resources by filling one GPU on a node before moving to another. This approach helps prevent gpu memory fragmentation. However, in some scenarios, you may want to distribute Pods across multiple GPUs to minimize the impact on your workloads if a single GPU fails. This topic describes how to configure the gpu selection policy for gpu sharing.
Prerequisites
How it works
When you use gpu sharing on a node with multiple GPUs, you can choose from two policies for allocating GPUs to Pods:
-
Binpack: (Default) The scheduler fills one GPU on a node before allocating resources from another. This approach prevents gpu memory fragmentation.
-
Spread: The scheduler distributes Pods as evenly as possible across all available GPUs on a node. This approach minimizes the number of workloads affected if a single GPU fails.
The following example shows a node with two GPUs, each with 15 GiB of gpu memory. Pod1 requests 2 GiB of gpu memory, and Pod2 requests 3 GiB.
Procedure
By default, the gpu selection policy for a node is Binpack. To use the Spread policy for gpu sharing, follow these steps.
Step 1: Create a node pool
Log on to the ACK console. In the left navigation pane, click Clusters.
-
On the Clusters page, click the name of the target cluster. In the left navigation pane, choose .
-
On the Node Pools page, click Create Node Pool in the upper-right corner.
-
On the Create Node Pool page, set the parameters and click Confirm. The following table describes the key parameters. For information about other parameters, see Create and manage a node pool.
Parameter
Description
Instance type
For Architecture, select GPU-accelerated instance and select an instance type.
The
Spreadpolicy is effective only on nodes with multiple GPUs. Select an instance type with multiple GPUs.Expected nodes
Specify the initial number of nodes in the node pool. Set this to 0 if you do not want to create nodes immediately.
Node label
Click
to add the following two labels:-
Set Key to
ack.node.gpu.scheduleand Value tocgpu. This enables gpu sharing and gpu memory isolation. -
Set Key to
ack.node.gpu.placementand Value tospread. This enables theSpreadpolicy for the node.
-
Step 2: Submit a job
Log on to the ACK console. In the left navigation pane, click Clusters.
On the Clusters page, click the name of your cluster. In the left navigation pane, click .
-
In the upper-right corner of the page, click Create from YAML. Copy the following YAML content into the Template editor, modify the content as described in the comments, and click Create.
YAML file description:
-
This YAML file defines a Job that uses the TensorFlow MNIST sample. The Job creates three Pods, and each Pod requests 4 GiB of gpu memory.
-
A Pod requests 4 GiB of gpu memory by defining
aliyun.com/gpu-mem: 4in theresources.limitssection of the Pod specification. -
To observe the effect on a single node, the YAML file adds a
nodeSelector,kubernetes.io/hostname: <NODE_NAME>, to schedule the Pods to a specific node.
-
Step 3: Verify the Spread policy
Use the gpu inspection tool to query GPU resource allocation on the node:
kubectl inspect cgpu
NAME IPADDRESS GPU0(Allocated/Total) GPU1(Allocated/Total) GPU2(Allocated/Total) GPU3(Allocated/Total) GPU Memory(GiB)
cn-shanghai.192.0.2.109 192.0.2.109 4/15 4/15 0/15 4/15 12/60
--------------------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
12/60 (20%)
The output indicates that the three Pods are scheduled to different GPUs on the node. This confirms that the Spread policy is in effect.