Use RDMA networks for Pods on Lingjun nodes

更新时间:
复制 MD 格式

This topic describes how to configure and use remote direct memory access (RDMA) on Lingjun nodes in an ACK managed cluster Pro for high-performance container network communication. RDMA technology significantly reduces network latency and increases throughput, making it ideal for demanding scenarios such as high-performance computing (HPC), AI training, and distributed storage.

RDMA

Remote direct memory access (RDMA) is a high-performance network communication technology that addresses the data processing latency in traditional network transmissions. RDMA transfers data directly between the memory of computers, bypassing their operating systems. This mechanism enables high-throughput, low-latency network communication, making it ideal for large-scale parallel computing clusters.

RDMA moves data directly into the memory of a remote computer over the network. This process bypasses the operating system and consumes minimal processing power. By reducing the overhead of memory copies and context switching, it frees up memory bandwidth and CPU cycles to improve application performance.

Prerequisites

In Kubernetes, a Pod can use one of two network modes:

  • Independent IP mode: Each Pod has a unique IP address (non-hostNetwork mode).

  • Shared network mode: The Pod uses the host node's network directly (hostNetwork mode).

To use the RDMA feature for Pods in independent IP mode (non-hostNetwork), you must meet the following prerequisites:

  • The computing network of the Lingjun bare metal cluster hosting the Lingjun node must use IPv6.

  • You must select the IPv6 mode when you create the Lingjun bare metal cluster.

Procedure

  1. Install the RDMA Device Plugin add-on.

    1. On the Clusters page, click the name of your cluster. In the left navigation pane, click Components and Add-ons.

    2. On the Add-ons page, click the Others tab. Find the ack-rdma-device-plugin add-on, then follow the prompts to configure and install it.

      Parameter

      Description

      Enable RDMA for non-hostNetwork

      Controls whether to enable the RDMA feature for pods that are not in hostNetwork mode. Valid values:

      • False (cleared): Only pods in hostNetwork mode can use the RDMA network.

      • True (selected): Allows pods that are not in hostNetwork mode to use the RDMA network. Before you enable this option, confirm that the Lingjun bare metal cluster associated with your ACK cluster uses IPv6. Otherwise, the RDMA configuration will not take effect.

  2. Verify that the RDMA Device Plugin is running on each RDMA-capable Lingjun node.

    kubectl get ds ack-rdma-dp-ds -n kube-system

    Expected output:

    NAME             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
    ack-rdma-dp-ds   2         2         2       2            2           <none>          xxh
  3. Verify that the node has the rdma/hca resource.

    kubectl get node e01-cn-xxxx -oyaml

    Expected output:

    ...
      allocatable:
        cpu: 189280m
        ephemeral-storage: "3401372677838"
        hugepages-1Gi: "0"
        hugepages-2Mi: "0"
        memory: 2063229768Ki
        nvidia.com/gpu: "8"
        pods: "64"
        rdma/hca: 1k
      capacity:
        cpu: "192"
        ephemeral-storage: 3690725568Ki
        hugepages-1Gi: "0"
        hugepages-2Mi: "0"
        memory: 2112881480Ki
        nvidia.com/gpu: "8"
        pods: "64"
        rdma/hca: 1k
    ...
  4. Apply the following YAML manifest to request the rdma/hca resource for a pod.

    • A request of rdma/hca: 1 is sufficient.

    • If the RDMA Device Plugin component is not enabled to allow Pods that do not use hostNetwork mode to use RDMA, only Pods configured with hostNetwork: true can use the RDMA feature.

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: hps-benchmark
    spec:
      parallelism: 1
      template:
        spec:
          containers:
          - name: hps-benchmark
            image: "****"
            command:
            - sh
            - -c
            - |
              python /workspace/wdl_8gpu_outbrain.py
            resources:
              limits:
                nvidia.com/gpu: 8
                rdma/hca: 1
            workingDir: /root
            volumeMounts:
              - name: shm
                mountPath: /dev/shm
            securityContext:
              capabilities:
                add:
                - SYS_RESOURCE
                - IPC_LOCK
          restartPolicy: Never
          volumes:
            - name: shm
              emptyDir:
                medium: Memory
                sizeLimit: 8Gi
          hostNetwork: true
          tolerations:
            - operator: Exists