Use RDMA networks for Pods on Lingjun nodes-Container Service for Kubernetes(ACK)-阿里云帮助中心

This topic describes how to configure and use remote direct memory access (RDMA) on Lingjun nodes in an ACK managed cluster Pro for high-performance container network communication. RDMA technology significantly reduces network latency and increases throughput, making it ideal for demanding scenarios such as high-performance computing (HPC), AI training, and distributed storage.

RDMA

Remote direct memory access (RDMA) is a high-performance network communication technology that addresses the data processing latency in traditional network transmissions. RDMA transfers data directly between the memory of computers, bypassing their operating systems. This mechanism enables high-throughput, low-latency network communication, making it ideal for large-scale parallel computing clusters.

RDMA moves data directly into the memory of a remote computer over the network. This process bypasses the operating system and consumes minimal processing power. By reducing the overhead of memory copies and context switching, it frees up memory bandwidth and CPU cycles to improve application performance.

Prerequisites

In Kubernetes, a Pod can use one of two network modes:

Independent IP mode: Each Pod has a unique IP address (non-hostNetwork mode).
Shared network mode: The Pod uses the host node's network directly (hostNetwork mode).

To use the RDMA feature for Pods in independent IP mode (non-hostNetwork), you must meet the following prerequisites:

The computing network of the Lingjun bare metal cluster hosting the Lingjun node must use IPv6.
You must select the IPv6 mode when you create the Lingjun bare metal cluster.

Procedure

Install the RDMA Device Plugin add-on.

On the Clusters page, click the name of your cluster. In the left navigation pane, click Components and Add-ons.

On the Add-ons page, click the Others tab. Find the ack-rdma-device-plugin add-on, then follow the prompts to configure and install it.

Parameter

Description

Enable RDMA for non-hostNetwork

Controls whether to enable the RDMA feature for pods that are not in hostNetwork mode. Valid values:

False (cleared): Only pods in hostNetwork mode can use the RDMA network.
True (selected): Allows pods that are not in hostNetwork mode to use the RDMA network. Before you enable this option, confirm that the Lingjun bare metal cluster associated with your ACK cluster uses IPv6. Otherwise, the RDMA configuration will not take effect.

Verify that the RDMA Device Plugin is running on each RDMA-capable Lingjun node.

kubectl get ds ack-rdma-dp-ds -n kube-system

Expected output:

NAME             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
ack-rdma-dp-ds   2         2         2       2            2           <none>          xxh

Verify that the node has the rdma/hca resource.

kubectl get node e01-cn-xxxx -oyaml

Expected output:

...
  allocatable:
    cpu: 189280m
    ephemeral-storage: "3401372677838"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 2063229768Ki
    nvidia.com/gpu: "8"
    pods: "64"
    rdma/hca: 1k
  capacity:
    cpu: "192"
    ephemeral-storage: 3690725568Ki
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 2112881480Ki
    nvidia.com/gpu: "8"
    pods: "64"
    rdma/hca: 1k
...

Apply the following YAML manifest to request the rdma/hca resource for a pod.

A request of rdma/hca: 1 is sufficient.
If the RDMA Device Plugin component is not enabled to allow Pods that do not use hostNetwork mode to use RDMA, only Pods configured with hostNetwork: true can use the RDMA feature.

apiVersion: batch/v1
kind: Job
metadata:
  name: hps-benchmark
spec:
  parallelism: 1
  template:
    spec:
      containers:
      - name: hps-benchmark
        image: "****"
        command:
        - sh
        - -c
        - |
          python /workspace/wdl_8gpu_outbrain.py
        resources:
          limits:
            nvidia.com/gpu: 8
            rdma/hca: 1
        workingDir: /root
        volumeMounts:
          - name: shm
            mountPath: /dev/shm
        securityContext:
          capabilities:
            add:
            - SYS_RESOURCE
            - IPC_LOCK
      restartPolicy: Never
      volumes:
        - name: shm
          emptyDir:
            medium: Memory
            sizeLimit: 8Gi
      hostNetwork: true
      tolerations:
        - operator: Exists