Multi-GPU sharing with shared GPU scheduling

更新时间:
复制 MD 格式

ACK managed Pro cluster supports GPU sharing, which enables scheduling of shared GPUs and memory isolation on Kubernetes. This topic describes how to configure a multi-GPU sharing policy.

Prerequisites

Multi-GPU sharing

Important

Currently, multi-GPU sharing only supports memory isolation with shared compute resources, not with dedicated compute resources.

During model development, a workload may require multiple GPUs but not their full capacity. Assigning entire GPUs to the development environment can waste resources. The multi-GPU sharing feature helps you avoid this issue.

A multi-GPU sharing policy allows an application to request N GiB of GPU memory distributed across M GPUs. Each GPU provides N/M GiB of memory. The value of N/M must be an integer, and all M GPUs must be on the same Kubernetes node. For example, if you request 8 GiB of GPU memory and specify two GPUs, each GPU allocates 4 GiB.

  • Single-GPU sharing: A pod requests a fraction of the resources from a single GPU.

  • Multi-GPU sharing: A pod requests resources from multiple GPUs, with each GPU contributing an equal amount.

image

Configure multi-GPU sharing

  1. Log on to the ACK console. In the left navigation pane, click Clusters.

  2. On the Clusters page, click the name of your cluster. In the left navigation pane, click Workloads > Jobs.

  3. On the Jobs page, click Create from YAML. Copy the following YAML content into the Template editor, and then click Create.

    Click to view YAML content

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: tensorflow-mnist-multigpu
    spec:
      parallelism: 1
      template:
        metadata:
          labels:
            app: tensorflow-mnist-multigpu
            # This configuration requests 8 GiB of GPU memory from 2 GPUs, with 4 GiB from each.
            aliyun.com/gpu-count: "2"
        spec:
          containers:
          - name: tensorflow-mnist-multigpu
            image: registry.cn-beijing.aliyuncs.com/ai-samples/gpushare-sample:tensorflow-1.5
            command:
            - python
            - tensorflow-sample-code/tfjob/docker/mnist/main.py
            - --max_steps=100000
            - --data_dir=tensorflow-sample-code/data
            resources:
              limits:
                aliyun.com/gpu-mem: 8 # Request a total of 8 GiB of GPU memory.
            workingDir: /root
          restartPolicy: Never

    The key configurations in the YAML file are as follows:

    • The YAML file defines a Job that runs a TensorFlow MNIST sample. The Job requests a total of 8 GiB of GPU memory from two GPUs, with each GPU providing 4 GiB.

    • To request two GPUs, add the aliyun.com/gpu-count: "2" label to the pod metadata.

    • To request 8 GiB of GPU memory, specify aliyun.com/gpu-mem: 8 in the resources.limits section of the container specification.

Verify multi-GPU sharing

  1. On the Clusters page, click the name of your cluster. In the left navigation pane, click Workloads > Pods.

  2. Find the pod you created, such as tensorflow-mnist-multigpu-***. In the Actions column, click Terminal and run the following command.

    nvidia-smi

    Expected output:

    Wed Jun 14 03:24:14 2023
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.4     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  Tesla V100-SXM2...  On   | 00000000:00:09.0 Off |                    0 |
    | N/A   38C    P0    61W / 300W |    569MiB /  4309MiB |      2%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   1  Tesla V100-SXM2...  On   | 00000000:00:0A.0 Off |                    0 |
    | N/A   36C    P0    61W / 300W |    381MiB /  4309MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    +-----------------------------------------------------------------------------+

    The output shows that the pod can access two GPUs. The total memory for each GPU is 4,309 MiB, which corresponds to the requested 4 GiB of GPU memory, not the card's total physical memory of 16,160 MiB.

  3. Find the pod you created, such as tensorflow-mnist-multigpu-***. In the Actions column, click Logs to view the container logs. You should see the following key information:

    totalMemory: 4.21GiB freeMemory: 3.91GiB
    totalMemory: 4.21GiB freeMemory: 3.91GiB

    This output confirms that the application can access approximately 4 GiB of total memory on each GPU, not the actual 16,160 MiB of physical memory. This shows that memory isolation is working correctly.