Enable eRDMA

更新时间:
复制 MD 格式

Set up eRDMA on a supported instance type by installing the driver and binding an Elastic RDMA Interface (ERI) for ultra-low latency RDMA networking.

Enable eRDMA on an ECS instance

Step 1: Confirm instance and image support

Only certain instance types and images support eRDMA:

Step 2: Install the eRDMA driver

Important
  • The eRDMA driver is developed and maintained by Alibaba Cloud.

  • Driver installation may take some time.

  • Driver installation packages

    Release notes for eRDMA installation packages (sorted by version from latest to oldest)

    Version

    Release date

    Download URL

    Checksum

    Changes

    1.5.7

    2026-03-31

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.5.7.tar.gz

    • MD5: 926e4625813d8f57a2f1dff04ae07087

    • SHA256: cb44f4c4aaaabfbd81224c85a9e7889732bda26cc832ff70ecc0559bcc14165c

    Set MPCC as the default congestion control (CC) algorithm in compat mode.
    Fixed an issue where the Memory Region (MR) length was abnormal when registering a 4 GB MR with 1 GB Hugepages.

    1.5.6

    2026-03-30

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.5.6.tar.gz

    • MD5: fdba4eed123b4c16c37f14ebc38cef63

    • SHA256: a0529f5ca597ffb31a960c59b8c13ce854550c3512e6ce4c0542784f8e99fcc2

    Fixed an RDMA_CM connection establishment failure in standard mode caused by an uninitialized iw_ifname or iwcm->ifname.
    Fixed a deadlock that occurred when switching the network namespace (netns) of an ENI.
    Fixed an issue where ibv_create_qp returned -EINVAL in NCCL version 2.29.

    1.5.5

    2026-03-09

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.5.5.tar.gz

    • MD5: f19e223bce6e5a87a635f9a08f47366c

    • SHA256: e62dd73078ba2299cc943333748982281b6b5cf6a18724648788a110abba3d35

    Added support for Rocky Linux 9.2.
    Fixed an RDMA_CM connection failure in compat mode that occurred when the ENI was not in init_net.

    1.5.4

    2026-01-04

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.5.4.tar.gz

    • MD5: fef18757361ceb339cc64fc3033b5a97

    • SHA256: 25cc5fc527075b1845306594a0e2da120885a9fa500fc85bf4ad8f48b7bf82ab

    Added support for Debian 12.9.
    Added a feature to collect statistics on remaining hardware Memory Translation Table (MTT) entries.

    1.5.3

    2025-12-10

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.5.3.tar.gz

    • MD5: 1b793a96d004915c9ddaff2c49072f11

    • SHA256: 99929345c65988b61997e805d85af2e761fa3319e4b5fc0db6781099fe512ba5

    Added support for DeepEP.
    Introduced a QPN domain to control the port number allocation policy in compat mode.

    1.5.2

    2025-11-10

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.5.2.tar.gz

    • MD5: da25bc06f1486baf0354cee3bafa1df8

    • SHA256: fc9648498846b4b949cc697f9c9c00eca2766c0e27bc0cc5cfdded5633243a35

    Fixed an issue where a 4,096-byte page size was used during Hugepages registration.

    1.5.0

    2025-09-26

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.5.0.tar.gz

    • MD5: 51e5c315543bba7ea5e79dfed00ad000

    • SHA256: 3bca1e6579a32b313e56cdf6add37de3605b5eb1bff8e0f7da81b0c8c3ee186e

    Added support for Alibaba Cloud Linux 4.
    Fixed a loopback failure for Unreliable Datagram (UD) Queue Pairs (QPs).
    Upgraded rdma-core to the community stable version 56.2.
    Optimized the installation package size.


    1.4.6

    2025-06-20

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.4.6.tar.gz

    • MD5: bd4b30f40fb02467298400fdc0e43d0a

    • SHA256: 26d555e6d7883f5315f6aab02a9e4c5564e53dd1b9840d3fa65dba59c594f484

    Added support for Rocky Linux 9.3.

    1.4.5

    2025-04-29

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.4.5.tar.gz

    • MD5: 37c89059d005aebe5d8bbde530b5bf56

    • SHA256: 83d810301f9141ca6f387a7e0cf99c89f40b27d484a312eb1d1bd605ebf8bc28

    • Supports user-mode qp flush

    • rdma-core core library enhancements

    • Supports Rocky Linux 8.

    1.4.3

    2025-03-13

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.4.3.tar.gz

    • MD5: 417d2fb88af5832475c7285187f57c11

    • SHA256: f82c1eb7a5f93387185a6c0ce7a78c39495d8a07a3e2ee8248cec8b9d525ba2f

    • Support for MLNX OFED 24.10

    • Support for Ubuntu24.04

    1.4.0

    2024-09-27

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.4.0.tar.gz

    • MD5: 77135d946dddc015000c8f3ea4e6c586

    • SHA256: 8613d3d81e8eb3b78bf840c37cbe02c79f62631df36cdc8b2c7c101f49f5af29

    Optimized performance in heterogeneous GPU scenarios.

    1.3.3

    2023-10-09

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.3.3.tar.gz

    • MD5: 51ffb06266255139554275bc86fa4caa

    • SHA256: 5aad6d006662bd902ef5e913fb97d2a6623aadeeacd06f1c3f1c74cbd1f57ded

    Updated with the latest patches.

    1.3.2

    2023-09-08

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.3.2.tar.gz

    • MD5: 8492016fc96eece6a60687b0e4ea66dd

    • SHA256: 89ab265dc9fa8d56f1b2d8b13d7f50032390a265eddb2e04eeee3aa86fd169ce

    Updated with the latest patches.

    1.3.1

    2023-08-18

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.3.1.tar.gz

    • MD5: b9b90212e6ba49d57b81d3c5d4210deb

    • SHA256: 4ebe31760443613f8f61fcdbef7a85b277dabc59039d048898536ea4fe5d8d4a

    Configures the underlying transport mode on the driver side for strict in-order delivery. Data packets are committed to memory only in sequence.

    1.3.0

    2023-06-26

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.3.0.tar.gz

    • MD5: 2da0c65643b5e2ffb61d75e1b5e5a7ab

    • SHA256: cce03aac0e07d0890884c35ad4f10e9d15f587535d788c8fc97ea268312ad4a9

    • Supports multi-level page tables for MR registration.

    • Supports IPv6. Full IPv6 functionality also requires hardware support.

    • Added support for Ubuntu 22.04.

    • Updated with the latest patches.

    1.2.3

    2023-05-30

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.2.3.tar.gz

    • MD5: 7496a6324f3872469d7194c2e234b19f

    • SHA256: 16c2de0d90da6906db91c2e2469aaad9e24131c44ce52b9464036f1c3747f8a2

    Updated with the latest patches.

    1.2.2

    2023-05-04

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.2.2.tar.gz

    • MD5: f449d3961a41ff6a97a53cfa29e20d6c

    • SHA256: 11fdb4b3c778762ad0bdf2d0327008aa2ecb22dc508c9f9fae3568b41ae5462b

    Added support for Ubuntu 22.04.

    1.2.1

    2023-04-04

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.2.1.tar.gz

    • MD5: e080103934da76ce83924da789aecece

    • SHA256: be3a89e57143d7544cf968052250df92f911aebb035f07b06ebeb8c5f13bf976

    Updated with the latest patches.

    1.2.0

    2023-03-09

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.2.0.tar.gz

    • MD5: c8d440a6e35ec6d2aaf1a568affea876

    • SHA256: d484997e28e29f862dc580c112b55b389a00faf88dc6aa89eea588ee1369a8ca

    • Added support for compat mode.

    • Updated with the latest patches.

    1.1.0

    2023-01-16

    http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-1.1.0.tar.gz

    • MD5: 1fea69d819919a77384f902213eb681e

    • SHA256: 176c3bb35d5584e8c8e43eba9b1824b8cb2b43a19d802c4e469363ed8e33fea6

    Updated with the latest patches.

  • Install the eRDMA driver

    Install the eRDMA driver automatically during instance creation, or manually on an existing instance.

  • eRDMA kernel driver

    After installing the eRDMA driver, run eadm ver to check the eRDMA kernel driver version. The latest installer package (1.4.5) corresponds to kernel driver version 0.2.38.

    image

    The eRDMA kernel driver supports two installation modes that determine the available connection establishment methods. See RDMA_CM for details on eRDMA connection establishment.

Step 3: Bind an ERI to the instance

Enable the eRDMA interface on the primary ENI during instance creation, or bind an ERI to an existing instance.

Note

You can call the DescribeInstanceTypes API operation to query the maximum number of ERIs that an instance type supports. Check the value of the EriQuantity parameter in the response. A value of 0 indicates that the instance type does not support ERIs.

  • Create a secondary ENI with the eRDMA interface enabled and bind it to an instance by using an API operation

    Create and bind a secondary ENI by using API operations:

    1. Call an API operation to create an ERI.

      Call the CreateNetworkInterface operation and set the NetworkInterfaceTrafficMode parameter to HighPerformance to create an elastic network interface with the ERI feature enabled.

      Record the ENI ID returned in the NetworkInterfaceId parameter.

    2. Call AttachNetworkInterface. Set NetworkInterfaceId to the ID from the previous step and InstanceId to the target instance ID to bind the ERI-enabled ENI.

      Important

      If your instance type supports multiple ERIs, specify a different NetworkCardIndex for each ERI when binding them to maximize network bandwidth. This ensures the ERIs are bound to different channels. See Network card indexes.

Verify eRDMA configuration

Run ibv_devinfo to check device hardware properties, port statuses, and supported features. If RDMA components work correctly, the ibv_devinfo output shows at least one port as PORT_ACTIVE. See Testing RDMA operations.

Use ibv_devinfo with the -v flag for detailed device information, such as hardware version, maximum message size, queue count, and memory window size.

Run ibv_devinfo to verify eRDMA configuration:

  • eRDMA is configured correctly: The eRDMA interface is enabled on the ENIs and the driver is installed correctly.

    Note
    • If your instance has multiple ERIs attached, each eRDMA device port should show state as PORT_ACTIVE.

    • If the state shows invalid state, the eRDMA network interface is abnormal. Check whether the secondary ENI is configured correctly. For example, run ifconfig to verify all network interfaces and IP addresses. See Configure a secondary elastic network interface.

    image

  • eRDMA driver is not installed correctly: The driver is missing or improperly installed. See Step 2: Install the eRDMA driver on the ECS instance.

    image

  • No ERI is bound to the instance: The driver is installed, but no ERI is enabled on the ENI. See Step 3: Bind an ERI to the ECS instance.

    image

You can also use the diagnose tool to test eRDMA. See Use the diagnose tool to check for RDMA-related issues and evaluate eRDMA performance.

Test eRDMA network performance

perftest measures RDMA operation latency and bandwidth for send, receive, read, and write operations. Use it to evaluate RDMA device performance and diagnose network issues. See the perftest documentation.

perftest test suite

The perftest package includes the following bandwidth and latency tests:

RDMA operation

Bandwidth test program

Latency test program

Send

ib_send_bw (send bandwidth test)

ib_send_lat (send latency test)

RDMA Read

ib_read_bw (read bandwidth test)

ib_read_lat (read latency test)

RDMA Write

ib_write_bw (write bandwidth test)

ib_write_lat (write latency test)

RDMA Atomic

ib_atomic_bw (atomic bandwidth test)

ib_atomic_lat (atomic latency test)

Native Ethernet

raw_ethernet_bw (raw Ethernet bandwidth test)

raw_ethernet_lat (raw Ethernet latency test)

Install perftest

Install perftest from the official repository (requires a public IP address) or from a YUM/APT repository.

Official repository
  1. Assign a public IP address to the ECS instance.

  2. Download and install perftest from the official perftest repository.

YUM or APT repository
Note

The perftest versions in software repositories may differ across Linux distributions, causing compatibility issues. Use the same distribution on all communicating instances. If not possible, install from the official repository.

  • Alibaba Cloud Linux 3, CentOS, or Anolis OS

    sudo yum install perftest -y
  • Ubuntu

    sudo apt install perftest -y

perftest usage example

Each test runs as an independent command. For example, ib_send_lat runs a send latency test.

Set the correct test parameters to control perftest behavior and obtain accurate perftest results. The following guidelines help you configure and run perftest.

Common test parameters

Run <subcommand> -h to view test-specific parameters.

Test category

Test parameter

Latency test

  • -C, --report-cycles: Reports time in CPU cycles for accurate latency measurement.

  • -H, --report-histogram: Prints all results instead of the default summary, showing the data distribution.

  • -U, --report-unsorted: Prints unsorted results (sorted by default) for raw data distribution analysis.

Bandwidth test

  • -b, --bidirectional: Measures bidirectional bandwidth (default: unidirectional).

  • -N, --no peak-bw: Disables peak bandwidth calculation (enabled by default).

  • -t, --tx-depth=<dep>: Sets the send queue size. Default: 128.

  • -D, --duration=<sec>: Sets the test duration in seconds.

Send test

  • -r, --rx-depth=<dep>: Sets the receive queue size. Default: 512.

  • -g, --mcg=<num_of_qps>: Sends messages to a multicast group with <num_of_qps> QPs attached.

Other advanced options

  • -u, --qp-timeout=<timeout>: Sets the QP timeout, calculated as 4 usec * 2^(timeout). Default: 14.

  • --force-link=<type>: Forces a specific link type: IB or Ethernet.

  • --use_hugepages: Uses Hugepages for allocation instead of contig or memalign.

  • --rate_limit=<limit>: Sets the maximum packet sending rate. Default unit: Gbps. Use --rate_units to change the unit.

References