Use eRDMA to accelerate container networks in ACK clusters

更新时间:
复制 MD 格式

Enable RDMA networking in pods for NCCL distributed training or SMC-R transparent TCP acceleration.

Install the ACK eRDMA Controller and enable eRDMA acceleration in pods. Two scenarios are covered: GPU distributed training with NCCL and transparent TCP acceleration with SMC-R.

Prerequisites

Ensure you have:

How it works

The ACK eRDMA Controller manages eRDMA-capable elastic network interfaces (ENIs) on cluster nodes. It registers eRDMA devices as the aliyun/erdma extended resource, configures ENI routes, and exposes devices to pods through Kubernetes resource requests.

Pods request eRDMA access by declaring aliyun/erdma in their resource limits. To also accelerate TCP connections without code changes, enable SMC-R through a pod annotation.

Component Role
ACK eRDMA Controller Registers eRDMA devices as extended resources; manages ENI routes
eRDMA driver (default / compat / ofed) Kernel-level driver installed on each node
aliyun/erdma resource Kubernetes extended resource for pod device access
SMC-R Linux protocol that transparently replaces TCP with RDMA transport

Step 1: Install the ACK eRDMA Controller

Note

If your cluster uses the Terway network plugin, configure a whitelist for ENIs to prevent Terway from modifying eRDMA-capable ENIs.

  1. On the Clusters page, click your cluster name, then click Add-ons.

  2. On the Add-ons page, click the Network tab. Find and install ACK eRDMA Controller with the following settings:

    Note

    When a node has multiple NICs, the controller assigns eRDMA ENI routes a lower priority than same-CIDR NIC routes. The default priority is 200. Avoid conflicts with this range when configuring NIC routes manually.

    Setting Description
    preferDriver (driver type) The eRDMA driver mode. default: standard acceleration. compat: RoCE-compatible environments. ofed: GPU instances running NCCL. See Enable eRDMA.
    Specifies whether to assign all eRDMA devices of nodes to pods True: allocates all eRDMA devices on the node to the pod. False: allocates one device by NUMA topology. False requires the static CPU management policy. See Create and manage node pools.
  3. Verify the controller is running. Go to Workloads > Pods, set the namespace to ack-erdma-controller, and confirm all pods are running.

Step 2: Enable eRDMA in pods

Add the following configurations to your pod spec to enable eRDMA acceleration.

Enable eRDMA device access

Declare the aliyun/erdma resource in the container's resource limits:

spec:
  containers:
  - name: erdma-container
    resources:
      limits:
        aliyun/erdma: 1

After the pod starts, verify eRDMA devices are available:

/# ls /dev/infiniband/
rdma_cm  uverbs0

Enable transparent TCP acceleration with SMC-R

SMC-R transparently accelerates TCP connections over RDMA without code changes. After enabling eRDMA device access, add the following annotation:

metadata:
  annotations:
    network.alibabacloud.com/erdma-smcr: "true"
Note
  • Both ends of the TCP connection must have SMC-R enabled.

  • Supported only on Alibaba Cloud Linux 3 with kernel version 5.10.134-17 or later. See Alibaba Cloud Linux 3 image release notes.

  • Not supported with preferDriver set to ofed or compat.

  • eRDMA and SMC-R do not support IPv6. SMC-R falls back to TCP for IPv6 connections.

Use cases

Accelerate NCCL communication for GPU distributed training

Use this scenario for GPU-accelerated instances running distributed training. Pods use the ofed driver and request GPU and eRDMA resources.

  1. Install the ACK eRDMA Controller with preferDriver set to ofed.

  2. Add GPU nodes to the node pool. See Create and manage node pools.

  3. Install eRDMA packages in the container image at build time. For Debian or Ubuntu (replace {OS|ubuntu} and {Version|focal} with your values):

    wget -qO - https://mirrors.aliyun.com/erdma/GPGKEY | apt-key add - \
      && echo "deb [ arch=amd64 ] https://mirrors.aliyun.com/erdma/apt/{OS|ubuntu} {Version|focal}/erdma main" \
        | tee /etc/apt/sources.list.d/erdma.list \
      && apt update \
      && apt install -y libibverbs1 ibverbs-providers ibverbs-utils librdmacm1

    For Alibaba Cloud Linux or RHEL:

    cat > /etc/yum.repos.d/erdma.repo <<EOF
    [erdma]
    name = ERDMA Repository
    baseurl = http://mirrors.aliyun.com/erdma/yum/redhat/7/erdma/x86_64/
    gpgcheck = 0
    enabled = 1
    EOF
    yum install --disablerepo=* --enablerepo erdma -y libibverbs ibverbs-providers ibverbs-utils librdmacm
  4. Deploy the GPU application. This StatefulSet runs nccl-test across two replicas, each with 8 GPUs and one eRDMA device:

    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: nccltest
    spec:
      selector:
        matchLabels:
          app: nccltest
      serviceName: "nccltest"
      replicas: 2
      template:
        metadata:
          labels:
            app: nccltest
        spec:
          hostNetwork: true
          dnsPolicy: ClusterFirstWithHostNet
          containers:
          - env:
            - name: NCCL_SOCKET_IFNAME
              value: "eth0"
            - name: NCCL_DEBUG
              value: "INFO"
            - name: NCCL_IB_GID_INDEX
              value: "1"
            image: <nccl-test-image-with-erdma>
            imagePullPolicy: Always
            name: nccltest
            securityContext:
              privileged: true
            resources:
              limits:
                nvidia.com/gpu: "8"
                aliyun/erdma: "1"
              requests:
                nvidia.com/gpu: "8"
                aliyun/erdma: "1"
  5. Verify NCCL is using eRDMA. The logs should show erdma_0 and erdma_1 as active, confirming eRDMA acceleration.

    image

Accelerate application networks transparently with SMC-R

Use this scenario for standard TCP workloads that need eRDMA acceleration without code changes. Set preferDriver to default.

  1. Install the ACK eRDMA Controller with preferDriver set to default.

  2. Deploy the application with the SMC-R annotation. This Deployment enables eRDMA device access and SMC-R acceleration:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: app-with-erdma
      name: app-with-erdma
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: app-with-erdma
      template:
        metadata:
          labels:
            app: app-with-erdma
          annotations:
            network.alibabacloud.com/erdma-smcr: "true"
        spec:
          containers:
          - image: <application image>
            imagePullPolicy: Always
            name: app-with-erdma
            resources:
              limits:
                aliyun/erdma: 1
  3. Verify acceleration. Install smc-tools in the container and run smcss:

    /# smcss
    State          UID   Inode   Local Address           Peer Address            Intf Mode
    ACTIVE         00000 0059964 172.17.192.73:47772     172.17.192.10:80        0000 SMCR

    SMCR in the Mode column confirms eRDMA acceleration is active. If it shows TCP, verify both client and server pods have the SMC-R annotation set to "true".

Next steps