Configure the GPU selection policy for GPU sharing

更新时间:
复制 MD 格式

By default, the scheduler allocates GPU resources by filling one GPU on a node before moving to another. This approach helps prevent gpu memory fragmentation. However, in some scenarios, you may want to distribute Pods across multiple GPUs to minimize the impact on your workloads if a single GPU fails. This topic describes how to configure the gpu selection policy for gpu sharing.

Prerequisites

How it works

When you use gpu sharing on a node with multiple GPUs, you can choose from two policies for allocating GPUs to Pods:

  • Binpack: (Default) The scheduler fills one GPU on a node before allocating resources from another. This approach prevents gpu memory fragmentation.

  • Spread: The scheduler distributes Pods as evenly as possible across all available GPUs on a node. This approach minimizes the number of workloads affected if a single GPU fails.

The following example shows a node with two GPUs, each with 15 GiB of gpu memory. Pod1 requests 2 GiB of gpu memory, and Pod2 requests 3 GiB.

Procedure

By default, the gpu selection policy for a node is Binpack. To use the Spread policy for gpu sharing, follow these steps.

Step 1: Create a node pool

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Nodes > Node Pools.

  1. On the Node Pools page, click Create Node Pool in the upper-right corner.

  2. On the Create Node Pool page, set the parameters and click Confirm. The following table describes the key parameters. For information about other parameters, see Create and manage a node pool.

    Parameter

    Description

    Instance type

    For Architecture, select GPU-accelerated instance and select an instance type.

    The Spread policy is effective only on nodes with multiple GPUs. Select an instance type with multiple GPUs.

    Expected nodes

    Specify the initial number of nodes in the node pool. Set this to 0 if you do not want to create nodes immediately.

    Node label

    Click 1.jpg to add the following two labels:

    • Set Key to ack.node.gpu.schedule and Value to cgpu. This enables gpu sharing and gpu memory isolation.

    • Set Key to ack.node.gpu.placement and Value to spread. This enables the Spread policy for the node.

Step 2: Submit a job

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Workloads > Jobs.

  3. In the upper-right corner of the page, click Create from YAML. Copy the following YAML content into the Template editor, modify the content as described in the comments, and click Create.

    YAML details

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: tensorflow-mnist-spread
    spec:
      parallelism: 3
      template:
        metadata:
          labels:
            app: tensorflow-mnist-spread
        spec:
          nodeSelector:
             kubernetes.io/hostname: <NODE_NAME> # Replace <NODE_NAME> with the name of a GPU node in your cluster to better observe the effect, for example, cn-shanghai.192.0.2.109.
          containers:
          - name: tensorflow-mnist-spread
            image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5
            command:
            - python
            - tensorflow-sample-code/tfjob/docker/mnist/main.py
            - --max_steps=100000
            - --data_dir=tensorflow-sample-code/data
            resources:
              limits:
                aliyun.com/gpu-mem: 4 # Requests 4 GiB of gpu memory.
            workingDir: /root
          restartPolicy: Never

    YAML file description:

    • This YAML file defines a Job that uses the TensorFlow MNIST sample. The Job creates three Pods, and each Pod requests 4 GiB of gpu memory.

    • A Pod requests 4 GiB of gpu memory by defining aliyun.com/gpu-mem: 4 in the resources.limits section of the Pod specification.

    • To observe the effect on a single node, the YAML file adds a nodeSelector, kubernetes.io/hostname: <NODE_NAME>, to schedule the Pods to a specific node.

Step 3: Verify the Spread policy

Use the gpu inspection tool to query GPU resource allocation on the node:

kubectl inspect cgpu

NAME                   IPADDRESS      GPU0(Allocated/Total)  GPU1(Allocated/Total)  GPU2(Allocated/Total)  GPU3(Allocated/Total)  GPU Memory(GiB)
cn-shanghai.192.0.2.109  192.0.2.109  4/15                   4/15                   0/15                   4/15                   12/60
--------------------------------------------------------------------------------------
Allocated/Total GPU Memory In Cluster:
12/60 (20%)

The output indicates that the three Pods are scheduled to different GPUs on the node. This confirms that the Spread policy is in effect.