Enable RDMA networking in pods for NCCL distributed training or SMC-R transparent TCP acceleration.
Install the ACK eRDMA Controller and enable eRDMA acceleration in pods. Two scenarios are covered: GPU distributed training with NCCL and transparent TCP acceleration with SMC-R.
Prerequisites
Ensure you have:
-
An ACK cluster running Kubernetes 1.20 or later. Upgrade a cluster if needed.
-
Nodes with Elastic RDMA Interface (ERI) support from ERI-capable instance families are added to a node pool.
How it works
The ACK eRDMA Controller manages eRDMA-capable elastic network interfaces (ENIs) on cluster nodes. It registers eRDMA devices as the aliyun/erdma extended resource, configures ENI routes, and exposes devices to pods through Kubernetes resource requests.
Pods request eRDMA access by declaring aliyun/erdma in their resource limits. To also accelerate TCP connections without code changes, enable SMC-R through a pod annotation.
| Component | Role |
|---|---|
| ACK eRDMA Controller | Registers eRDMA devices as extended resources; manages ENI routes |
eRDMA driver (default / compat / ofed) |
Kernel-level driver installed on each node |
aliyun/erdma resource |
Kubernetes extended resource for pod device access |
| SMC-R | Linux protocol that transparently replaces TCP with RDMA transport |
Step 1: Install the ACK eRDMA Controller
If your cluster uses the Terway network plugin, configure a whitelist for ENIs to prevent Terway from modifying eRDMA-capable ENIs.
-
On the Clusters page, click your cluster name, then click Add-ons.
-
On the Add-ons page, click the Network tab. Find and install ACK eRDMA Controller with the following settings:
NoteWhen a node has multiple NICs, the controller assigns eRDMA ENI routes a lower priority than same-CIDR NIC routes. The default priority is
200. Avoid conflicts with this range when configuring NIC routes manually.Setting Description preferDriver (driver type) The eRDMA driver mode. default: standard acceleration.compat: RoCE-compatible environments.ofed: GPU instances running NCCL. See Enable eRDMA.Specifies whether to assign all eRDMA devices of nodes to pods True: allocates all eRDMA devices on the node to the pod.False: allocates one device by NUMA topology.Falserequires the static CPU management policy. See Create and manage node pools. -
Verify the controller is running. Go to Workloads > Pods, set the namespace to
ack-erdma-controller, and confirm all pods are running.
Step 2: Enable eRDMA in pods
Add the following configurations to your pod spec to enable eRDMA acceleration.
Enable eRDMA device access
Declare the aliyun/erdma resource in the container's resource limits:
spec:
containers:
- name: erdma-container
resources:
limits:
aliyun/erdma: 1
After the pod starts, verify eRDMA devices are available:
/# ls /dev/infiniband/
rdma_cm uverbs0
Enable transparent TCP acceleration with SMC-R
SMC-R transparently accelerates TCP connections over RDMA without code changes. After enabling eRDMA device access, add the following annotation:
metadata:
annotations:
network.alibabacloud.com/erdma-smcr: "true"
-
Both ends of the TCP connection must have SMC-R enabled.
-
Supported only on Alibaba Cloud Linux 3 with kernel version 5.10.134-17 or later. See Alibaba Cloud Linux 3 image release notes.
-
Not supported with
preferDriverset toofedorcompat. -
eRDMA and SMC-R do not support IPv6. SMC-R falls back to TCP for IPv6 connections.
Use cases
Accelerate NCCL communication for GPU distributed training
Use this scenario for GPU-accelerated instances running distributed training. Pods use the ofed driver and request GPU and eRDMA resources.
-
Install the ACK eRDMA Controller with
preferDriverset toofed. -
Add GPU nodes to the node pool. See Create and manage node pools.
-
Install eRDMA packages in the container image at build time. For Debian or Ubuntu (replace
{OS|ubuntu}and{Version|focal}with your values):wget -qO - https://mirrors.aliyun.com/erdma/GPGKEY | apt-key add - \ && echo "deb [ arch=amd64 ] https://mirrors.aliyun.com/erdma/apt/{OS|ubuntu} {Version|focal}/erdma main" \ | tee /etc/apt/sources.list.d/erdma.list \ && apt update \ && apt install -y libibverbs1 ibverbs-providers ibverbs-utils librdmacm1For Alibaba Cloud Linux or RHEL:
cat > /etc/yum.repos.d/erdma.repo <<EOF [erdma] name = ERDMA Repository baseurl = http://mirrors.aliyun.com/erdma/yum/redhat/7/erdma/x86_64/ gpgcheck = 0 enabled = 1 EOF yum install --disablerepo=* --enablerepo erdma -y libibverbs ibverbs-providers ibverbs-utils librdmacm -
Deploy the GPU application. This StatefulSet runs
nccl-testacross two replicas, each with 8 GPUs and one eRDMA device:apiVersion: apps/v1 kind: StatefulSet metadata: name: nccltest spec: selector: matchLabels: app: nccltest serviceName: "nccltest" replicas: 2 template: metadata: labels: app: nccltest spec: hostNetwork: true dnsPolicy: ClusterFirstWithHostNet containers: - env: - name: NCCL_SOCKET_IFNAME value: "eth0" - name: NCCL_DEBUG value: "INFO" - name: NCCL_IB_GID_INDEX value: "1" image: <nccl-test-image-with-erdma> imagePullPolicy: Always name: nccltest securityContext: privileged: true resources: limits: nvidia.com/gpu: "8" aliyun/erdma: "1" requests: nvidia.com/gpu: "8" aliyun/erdma: "1" -
Verify NCCL is using eRDMA. The logs should show
erdma_0anderdma_1as active, confirming eRDMA acceleration.
Accelerate application networks transparently with SMC-R
Use this scenario for standard TCP workloads that need eRDMA acceleration without code changes. Set preferDriver to default.
-
Install the ACK eRDMA Controller with
preferDriverset todefault. -
Deploy the application with the SMC-R annotation. This Deployment enables eRDMA device access and SMC-R acceleration:
apiVersion: apps/v1 kind: Deployment metadata: labels: app: app-with-erdma name: app-with-erdma spec: replicas: 2 selector: matchLabels: app: app-with-erdma template: metadata: labels: app: app-with-erdma annotations: network.alibabacloud.com/erdma-smcr: "true" spec: containers: - image: <application image> imagePullPolicy: Always name: app-with-erdma resources: limits: aliyun/erdma: 1 -
Verify acceleration. Install
smc-toolsin the container and runsmcss:/# smcss State UID Inode Local Address Peer Address Intf Mode ACTIVE 00000 0059964 172.17.192.73:47772 172.17.192.10:80 0000 SMCRSMCRin the Mode column confirms eRDMA acceleration is active. If it showsTCP, verify both client and server pods have the SMC-R annotation set to"true".
Next steps
-
ACK eRDMA Controller — component reference and configuration details
-
Enable eRDMA — driver type selection guide
-
Create and manage node pools — add eRDMA-capable nodes and configure CPU policies