Some enterprise-level instances support eRDMA, which provides an ultra-low-latency, high-throughput, and highly elastic RDMA networking service without requiring changes to your existing network architecture. This topic describes how to enable eRDMA on these instances.
Limitations
Limitation | Description |
Instance type | The following instance types support eRDMA: |
Image |
Note The available images vary by instance type. The instance buy page displays only the images that are compatible with the selected instance type. |
Number of eRDMA devices | To determine the maximum number of ERIs that an instance type supports, check the value of the EriQuantity parameter in the response of the DescribeInstanceTypes operation. A value of 0 indicates that the instance type does not support ERIs. |
Network limitations |
|
Configure eRDMA for an enterprise-level instance
On instance creation
If the operating system does not support automatic installation of the eRDMA driver, or if the automatic installation fails, you can install the driver manually or by using a script after the instance is created. For more information, see Configure eRDMA for an existing instance.
The eRDMA driver installation begins after the instance starts and may take some time.
Create an enterprise-level instance that supports the ERI feature. Note the following parameters when you create the instance. For information about other parameters, see Create an instance by using the wizard.
Instances & Images: Select an instance type that supports eRDMA and install the eRDMA driver.
Instance: See Limitations.
Images: Select Public Image.
Extension: Select eRDMA Driver. The eRDMA driver is automatically installed when the instance starts. When you create an Arm-based instance that uses an Alibaba Cloud Linux image, you can also select performance-acceleration extensions. For more information, see Application performance acceleration.
ImportantTo use the extension feature, you must have the AliyunECSExtensionsFullAccess system policy. An Alibaba Cloud account has this permission by default. If you are a RAM user, ask an Alibaba Cloud account administrator to grant you this permission. For more information, see Extensions.
ENIs: To the right of Primary ENI, enable the ERI feature to bind an ERI to the ECS instance.
NoteWhen you create an enterprise-level instance, you can enable the ERI feature only for the primary ENI. If you need to configure eRDMA for a secondary ENI, you can enable the ERI feature for the secondary ENI in the console or by calling an API operation. For more information, see Elastic RDMA network interface (ERI).
For an existing instance
Confirm that the instance type is on the list of eRDMA-supported instance types.
You must select an instance type that supports eRDMA.
Verify that eRDMA is correctly configured on the instance.
Verify that eRDMA is correctly configured on the instance. For more information, see Verify the eRDMA configuration.
If eRDMA is not configured on the instance, follow the steps below to install the eRDMA driver and bind an ERI to the instance.
Install the eRDMA driver for the instance.
If you did not select eRDMA Driver when you created the instance, you must install the eRDMA driver manually or by using a script.
Script-based method: This method downloads the latest stable driver package by default.
Manual method: You can download a specific version of the driver package.
One-click script installation
Run the following command to download the latest stable version of the driver package.
curl -O http://mirrors.cloud.aliyuncs.com/erdma/env_setup.shRun the following command to install the downloaded driver package.
sudo /bin/bash env_setup.sh > /var/log/erdma_install.log 2>&1After you run the installation script, the script automatically installs the required software dependencies and the eRDMA driver. Wait for the installation to complete.
NoteIf the driver installation fails, check the installation log at
/var/log/erdma_install.log.
Manual installation
Run the following command to update prerequisite packages.
Alibaba Cloud Linux 3, CentOS, or Anolis OS:
sudo yum update -yUbuntu: No update is required. You can skip this step.
Run the following commands in sequence to check the latest kernel package version and the running kernel version.
rpm -qa | grep kernel # Check the latest kernel package version. uname -r # Check the running kernel version.The sample output indicates that the versions are inconsistent. You must restart the instance to apply the new kernel. After the restart, run the uname -r command again to confirm that the running kernel version is the same as the latest kernel package version.
[ecs-user@iZbp1xxxxxxxxxxxxx ~]$ rpm -qa | grep kernel kernel-tools-5.10.134-16.1.al8.x86_64 kernel-5.10.134-16.1.al8.x86_64 kernel-modules-extra-5.10.134-16.1.al8.x86_64 kernel-hotfix-13383560-5.10.134-15-1.0-20230724161633.al8.x86_64 kernel-devel-5.10.134-16.1.al8.x86_64 kernel-5.10.134-15.al8.x86_64 kernel-modules-5.10.134-16.1.al8.x86_64 kernel-tools-libs-5.10.134-16.1.al8.x86_64 kernel-core-5.10.134-16.1.al8.x86_64 kernel-core-5.10.134-15.al8.x86_64 kernel-modules-internal-5.10.134-16.1.al8.x86_64 kernel-devel-5.10.134-15.al8.x86_64 kernel-modules-5.10.134-15.al8.x86_64 kernel-headers-5.10.134-16.1.al8.x86_64 [ecs-user@iZbp1xxxxxxxxxxxxx ~]$ uname -r 5.10.134-15.al8.x86_64Run the following command to install dependency packages.
For x86 instances, perform the following operations:
Alibaba Cloud Linux 3, CentOS, or Anolis OS:
sudo yum install gcc-c++ dkms cmake kernel-devel kernel-headers libnl3 libnl3-develUbuntu:
sudo apt-get install dkms cmake libnl-3-dev libnl-route-3-dev linux-headers-generic
For Arm instances, which require source-based build tasks, the dependency packages are numerous and may change. Therefore, you can skip this step and directly run the installation script. If the installation script fails, it prompts you to install the necessary dependency packages. Install them as prompted, and then run the installation script again.
Run the following command to download the driver installation package.
Obtain the package from an internal URL:
wget http://mirrors.cloud.aliyuncs.com/erdma/erdma_installer-latest.tar.gzObtain the package from a public URL:
wget https://mirrors.aliyun.com/erdma/erdma_installer-latest.tar.gz
By default, the latest version of the driver installation package is downloaded. You can also download a specific version based on your requirements. For the release information of different eRDMA installer package versions, see Step 2: Install the eRDMA driver for an ECS instance.
Run the following command to decompress the installation package and change to the directory of the decompressed package.
tar -xvf erdma_installer-latest.tar.gz && cd erdma_installerRun the following command to install the driver.
Method 1: Manually confirm uninstallation and automatic download steps during the installation.
sudo sh install.shMethod 2: Run the installation without confirmation prompts.
sudo sh install.sh --batch
Check the output to confirm the installation result.
If the following information is returned, the driver is successfully installed.
================================= Installation Information ====== architecture : x86_64 distributor : alinux release : 3 binary path : RPMS/alinux3 build from source : erdma(N) rdma-core(N) eadm(N) install requires : gcc make patch gcc-c++ dkms cmake kernel- build requires : gcc make patch gcc-c++ dkms cmake kernel- vel redhat-rpm-config rpm-build libnl3-devel ninja-build perl-gen hon3-docutils temp dir : /tmp/ERDMA.120833 log file : /tmp/install-erdma.log.120833 ==================================================================== Checking for Linux headers availability... Verifying dependencies... Removing erdma package... Start to uninstall rdma-core libraries... Installing erdma Reloading erdma modules Installation finished successfully. Update erdma.ko in initramfsIf the following information is returned, the driver installation failed. Follow the instructions in the prompt, and then try to install the driver again.
========================== Installation Information ======================================== architecture : x86_64 distributor : alinux release : 3 binary path : RPMS/alinux3 build from source : erdma(Y) rdma-core(Y) eadm(Y) install requires : gcc make patch gcc-c++ dkms cmake kernel-headers kernel-devel build requires : gcc make patch gcc-c++ dkms cmake kernel-headers kernel-devel gcc make patch gcc-c++ dkms cmake kernel-headers kernel-devel redhat-rpm-config rpm-build libnl3-devel ninja-build perl-generators pandoc systemd-devel valgrind-devel kernel-rpm-macros python3-Cython python3-docutils temp dir : /tmp/ERDMA.172873 log file : /tmp/install-erdma.log.172873 =========================================================================================== Checking for Linux headers availability... Verifying dependencies... ERDMA requires the following Packages(s) to be installed: libnl3-devel ninja-build perl-generators pandoc systemd-devel valgrind-devel python3-Cython python3-docutilsNoteIf you are using CentOS 7 and a package is missing during driver reinstallation but you cannot obtain the package by using
yum, you may need to run theyum install -y epel-releasecommand to install the EPEL repository. You can then obtain the required package.
Bind an ERI to the instance.
You can bind an ERI to an instance in one of the following ways.
NoteTo determine the maximum number of ERIs that an instance type supports, check the value of the EriQuantity parameter in the response of the DescribeInstanceTypes operation. A value of 0 indicates that the instance type does not support ERIs.
-
Modify the attributes of a bound ENI to enable the eRDMA interface
-
Create an ERI and bind it to an ECS instance
-
To create an ERI, see Create a standalone ERI.
-
To bind an ERI to an ECS instance, see Bind a secondary ENI.
-
-
Create a secondary ENI with the eRDMA interface enabled and bind it to an instance by using an API operation
Create and bind a secondary ENI by using API operations:
-
Call an API operation to create an ERI.
You can call the CreateNetworkInterface operation to create an ENI. To create an ENI with an ERI enabled, set the NetworkInterfaceTrafficMode parameter to HighPerformance.
Record the ENI ID returned in the
NetworkInterfaceIdparameter. -
Call AttachNetworkInterface. Set NetworkInterfaceId to the ID from the previous step and InstanceId to the target instance ID to bind the ERI-enabled ENI.
ImportantIf your instance type supports multiple ERIs, specify a different NetworkCardIndex for each ERI when binding them to maximize network bandwidth. This ensures the ERIs are bound to different channels. See Network card indexes.
-
-
Test eRDMA write latency
You can installperftest on two eRDMA-enabled enterprise-level instances and useib_write_lat to test write latency. For more information about perftest, see the perftest test suite.
Prerequisites
Prepare two enterprise-level instances with eRDMA configured. This means each instance must have the eRDMA software stack installed and the ERI feature enabled for its ENI. One instance serves as the server and the other serves as the client.
Ensure the two instances can communicate with each other over the internal network. For more information, see Enable internal communication between ECS instances.
Procedure
Remotely connect to each of the two instances.
For more information, see Connect to a Linux instance by using Workbench.
Verify that eRDMA is correctly configured on both instances.
For more information, see Verify the eRDMA configuration.
On each instance, run the following commands to install the perftest tool.
Install perftest from the official repository (requires a public IP address) or from a YUM/APT repository.
Official repository
-
Assign a public IP address to the ECS instance.
-
Download and install perftest from the official perftest repository.
YUM or APT repository
NoteThe
perftestversions in software repositories may differ across Linux distributions, causing compatibility issues. Use the same distribution on all communicating instances. If not possible, install from the official repository.-
Alibaba Cloud Linux 3, CentOS, or Anolis OS
sudo yum install perftest -y -
Ubuntu
sudo apt install perftest -y
-
Test the eRDMA network latency.
On the server instance, run the following command to start
ib_write_latas a server that listens for connections from the client.ib_write_lat -R -a -F-R: uses the RDMA Connection Manager (RDMA_CM) to establish a connection.ImportantFor CPU-based instance types that support eRDMA, the eRDMA kernel driver is installed in Standard mode by default, which supports only the RDMA_CM connection establishment method. For more information, see Connection establishment method.
By default, perftest uses an out-of-band (OOB) connection. When you run a perftest on a CPU-based instance, you can use the-R parameter on both the server and the client to specify the RDMA_CM connection establishment method. Otherwise, the connection may fail.
You can also use the command line to make the RDMA_CM and OOB connection establishment methods compatible. For more information, see Modify the connection establishment mode of eRDMA and bRPC to achieve compatibility. After you enable compatibility, the -R parameter is no longer required.
-a: runs tests for all message sizes from 2 bytes to 2^23 bytes. This allows you to test the impact of different message sizes on latency.-F: Forces the overwrite of any existing connections. This means that if a connection has already been established, using the-Foption ignores these connections and forces a new connection to be established.
On the client instance, run the following command to start
ib_write_latand connect to the server.ib_write_lat -R -a -F <server_ip>Replace
<server_ip>with theprivate IP address of theserver's ERI-enabled ENI. To obtain the IP address, see View IP addresses.View the test results.
After the test on the client is complete,
ib_write_latoutputs test configuration information, connection information, and performance test results, including latency statistics such as minimum, maximum, and average latency.[root@xxx ~]# ib_write_lat -R -a -F 172.17.0.131 RDMA_Write Latency Test Dual-port : OFF Device : erdma_0 Number of qps : 1 Transport type : IW Connection type : RC Using SRQ : OFF PCIe relax order: OFF ibv_wr* API : OFF TX depth : 1 Mtu : 1024[B] Link type : Ethernet GID index : 0 Max inline data : 96[B] rdma_cm QPs : ON Data ex. method : rdma_cm local address: LID 0000 QPN 0x000b PSN 0xaf51c5 GID: 00:22:62:22:161:180:00:00:00:00:00:00:00:00:00:00 remote address: LID 0000 QPN 0x000a PSN 0x347ff0 GID: 00:22:62:09:142:189:00:00:00:00:00:00:00:00:00:00 #bytes #iterations t_min[usec] t_max[usec] t_typical[usec] t_avg[usec] t_stdev[usec] 99% percentile[usec] 99.9% percentile[usec] 2 1000 12.26 16.79 12.59 12.65 0.19 14.32 16.79 4 1000 12.10 18.10 12.54 12.60 0.24 14.57 18.10 8 1000 12.04 18.75 12.42 12.48 0.31 14.09 18.75 16 1000 12.45 14.77 12.77 12.81 0.11 14.10 14.77 32 1000 12.28 15.55 12.75 12.81 0.21 14.67 15.55 64 1000 12.41 16.10 12.79 12.84 0.16 14.31 16.10 128 1000 12.86 16.22 13.19 13.21 0.15 14.78 16.22 256 1000 13.11 20.85 13.53 13.64 0.35 15.43 20.85 512 1000 13.09 16.54 13.44 13.47 0.17 15.16 16.54