SMC issues

更新时间:
复制 MD 格式

This topic covers common Shared Memory Communications (SMC) issues on Alibaba Cloud Linux 3 and how to resolve them.

SMC does not improve application performance over TCP

SMC connections may fall back to TCP without Remote Direct Memory Access (RDMA) acceleration. To check for fallback, run smcss -a. If the mode column shows TCP, see SMC falls back to TCP for the cause code lookup.

Even when SMC is active, performance gains are not guaranteed. Common reasons:

  • CPU-bound workload. If the application spends most time on computation rather than network I/O, switching to RDMA has little effect.

  • RDMA header overhead. RDMA packets carry additional headers compared to TCP, which slightly reduces throughput for the same bandwidth. To mitigate this, enable Jumbo Frames.

  • Short-lived connections. SMC connection setup involves slow-path operations such as creating and requesting RDMA resources. This overhead outweighs the benefit for workloads dominated by short-lived connections.

  • Insufficient resources. SMC requires memory and elastic RDMA interface (ERI) resources tied to the ECS instance specifications. When resources run out, SMC falls back to TCP. For resource requirements, see Enable and configure SMC.

Communication fails after SMC is enabled

After enabling SMC-R on an Alibaba Cloud Linux 3 ECS instance, some Internet-facing addresses can be pinged but not accessed — cURL requests fail while ICMP pings succeed. Disabling SMC-R resolves the issue.

This happens when a remote server replays TCP options instead of ignoring unsupported ones, as required by RFC 9293. When a server echoes back the SMC TCP option, the local end incorrectly identifies the peer as SMC-capable. The resulting handshake mismatch causes connection failures.

To diagnose the problem, run a communication link check.

To fix it, configure SMC negotiation control based on BPF policies and disable SMC on the problematic link.

SMC is not enabled after running smc_run

Running smc_run ./foo does not create SMC connections. smcr l shows no link groups, and smcss -a shows either no SMC connections or a one-sided TCP fallback.

smc_run uses LD_PRELOAD to inject smc-tools libraries that intercept socket(2) calls and modify socket families and protocols. This mechanism does not work for statically linked applications.

For statically linked applications, enable SMC at the kernel level instead:

sysctl net.smc.tcp2smc

For details, see Enable and configure SMC.

Ports 65500–65515 become unusable after SMC is enabled

After loading SMC modules, bind(2) calls on ports 65500–65515 return EADDRINUSE.

SMC-R with elastic Remote Direct Memory Access (eRDMA) reserves these 16 ports for out-of-band (OOB) connections in the net namespace where ERIs reside. Run dmesg to confirm:

smc: smc: load SMC module with reserve_mode
NET: Registered protocol family 43
smc: netns <netns ID> reserved ports [65500 ~ 65515] for eRDMA OOB
smc: adding ib device erdma_0 with port count 1
smc: ib device erdma_0 port 1 has pnetid

If these ports are already in use when SMC modules load, the modules cannot use eRDMA devices.

To release the ports, unload the SMC modules. See Use SMC in Alibaba Cloud ECS.

SMC falls back to TCP with IPv6 addresses

After enabling SMC for applications that use IPv6, smcss shows TCP fallback with cause code 0x03030000 or 0x09990000.

Alibaba Cloud eRDMA devices and SMC do not support IPv6. Apply one of the following workarounds before enabling SMC for new connections.

Disable IPv6 for all interfaces:

sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1

Disable IPv6 for a specific interface:

sudo sysctl -w net.ipv6.conf.<NetInName>.disable_ipv6=1

Replace <NetInName> with the interface name.

Use IPv4-mapped IPv6 addresses on kernel version 5.10.134-17.3 and later.

SMC performance is lower than TCP at the PPS limit

When network load reaches the maximum packets per second (PPS) rate for the ECS instance type, applications using SMC with eRDMA show lower queries per second (QPS) than those using TCP.

To check whether you have hit the PPS limit:

RDMA generates more packets per request than TCP, so the PPS limit is reached sooner. This only occurs under extreme load such as benchmark stress tests — in production, traffic rarely hits the PPS ceiling.

If the PPS limit is the bottleneck, do not use SMC for that workload.

SMC falls back to TCP and RDMA cannot accelerate communications

After enabling SMC, smcss -a shows the connection fell back to TCP. The connection still works, but without RDMA acceleration.

Identify the cause code

Run smcss -a to get the fallback cause code:

State          UID   Inode   Local Address           Peer Address            Intf Mode
ACTIVE         00000 0156721 192.168.99.21:60188     192.168.99.22:8090      0000 TCP 0x03010000
ACTIVE         00000 1202539 172.16.4.189:44780      172.16.4.190:1811       0000 SMCR

In the first entry, TCP in the mode column indicates a fallback. The cause code is 0x03010000. In the second entry, SMCR confirms a successful SMC-R connection.

If two cause codes appear (for example, 0x05000000 and 0x03030001), the first is from the local host and the second from the peer. Most fallbacks are caused by the peer.

Cause code reference

Cause code

Description

Cause and solution

0x01010000

Insufficient memory for SMC data structures

Host memory cannot accommodate SMC connection resources. Free memory by stopping unnecessary processes.

0x02010000

CLC or LLC message timeout during TCP handshake

Cause 1: RDMA network interface cards (RNICs) or RDMA links failed, causing LLC message timeouts. Make sure the RNICs work correctly. Cause 2: Ethernet NICs or TCP/IP networks failed, causing CLC message timeouts. Make sure the Ethernet NICs work correctly.

0x02020000

LLC timeout for RDMA link establishment

Not in use.

0x03000000

Cannot obtain correct IP addresses

The IP address for the CLC socket cannot be retrieved when creating an SMC connection proposal. Make sure the TCP-based CLC connection and corresponding devices work correctly.

0x03010000

Peer does not support SMC

The peer does not include SMC TCP option flags in SYN or SYN-ACK packets during the TCP handshake. Check whether the protocol stacks on both sides are replaced with SMC. Run smcss to check the SMC connection status.

0x03020000

IPsec not supported by SMC

Do not use IPsec with SMC connections.

0x03030000

No SMC-D or SMC-R devices available

Cause 1: No RDMA devices available. Run smcr d to check. For eRDMA, make sure ERIs are configured in the ECS console and drivers are installed. Cause 2: With multiple NICs, the NIC used for SMC-R is not eRDMA-capable. Run ibv_devinfo to get eRDMA device GUIDs and ip addr to get NIC MAC addresses, then compare them. Cause 3: RDMA devices running in exclusive mode. Run rdma system to check. If netns exclusive appears, move the device with rdma dev set <device> netns <namespace>. For RDMA over Converged Ethernet (RoCE) or Internet Wide Area RDMA Protocol (iWARP) devices, also move the Ethernet devices. Cause 4: A client attempted to replace an AF_INET6 connection with SMC. eRDMA uses SMCv2, which does not support AF_INET6. Switch the application to AF_INET.

0x03030001

No SMC-D devices available

Alibaba Cloud does not provide SMC-D devices. Contact technical support.

0x03030002

No SMC-R devices available

Cause 1: The selected RDMA device became invalid during connection setup. Run smcr d to check. For eRDMA, make sure ERIs are configured and drivers are installed. Cause 2: With multiple NICs, the NIC used for SMC-R is not eRDMA-capable. Run ibv_devinfo and ip addr to compare device GUIDs and NIC MAC addresses. Cause 3: RDMA devices in exclusive mode. Run rdma system to check. Move the device to the correct net namespace with rdma dev set.

0x03030003

SMC-D devices do not support ISMv2

Alibaba Cloud does not provide SMC-D devices. Contact technical support.

0x03030004

Peer does not support SMCv2 extension

The local host uses SMCv2, but the peer does not. eRDMA and RoCE v2 use SMCv2. Make sure both hosts use the same type of RDMA device. Run smcr d to check device types — the Type column shows values such as RoCE_Express, RoCE_Express2, or 0x107f (Alibaba Cloud eRDMA).

0x03030005

Peer does not support SMC-D v2 extension

Alibaba Cloud does not provide SMC-D devices. Contact technical support.

0x03030006

Peer has no system enterprise ID (SEID)

Not in use.

0x03030007

No SMC-D v2 devices available

Alibaba Cloud does not provide SMC-D devices. Contact technical support.

0x03030008

Peer has no user-defined enterprise ID (UEID)

SMCv2 requires a UEID. Run smcr ueid {show &#124; add &#124; del} to configure the same UEID on both hosts.

0x03030009

SMC version negotiation failed

The negotiated SMC version changed during the CLC handshake. Make sure both hosts run the same operating system distribution.

0x0303000a

Max connections per LGR negotiation failed

SMCv2.1 negotiates the maximum number of connections per link group (LGR). A fallback occurs if the negotiated value is zero or exceeds the local maximum. Make sure both hosts run the same operating system distribution.

0x0303000b

Max links per LGR negotiation failed

SMCv2.1 negotiates the maximum number of links per LGR. A fallback occurs if the negotiated value is zero or exceeds the local maximum. Make sure both hosts run the same operating system distribution.

0x0303000c

SMC vendor feature negotiation failed

The vendor feature changed during the CLC handshake. Make sure both hosts run the same operating system distribution. On kernel 5.10.134-015, do not change sysctl net.smc.vendor_exp_options during connection establishment. On kernel 5.10.134-016 or later, do not change sysctl net.smc.experiment_vendor_options.

0x03040000

Local and peer use different SMC device modes (SMC-D vs. SMC-R)

Alibaba Cloud does not provide SMC-D devices. Contact technical support.

0x03050000

Peer has RMBE eyecatcher

Not in use for Linux.

0x03060000

MSG_FASTOPEN not supported by SMC

Remove the MSG_FASTOPEN flag when creating SMC sockets.

0x03070000

Different IP prefix or subnet between hosts

RoCEv1 devices use SMCv1, which only supports same-subnet communication. Make sure both hosts are in the same subnet. eRDMA devices use SMCv2 and are not subject to this restriction.

0x03080000

Cannot obtain the VLAN ID

SMC cannot retrieve the VLAN ID for the device during connection setup. Make sure the TCP connection and Ethernet devices work correctly.

0x03090000

Cannot register the VLAN ID with an ISM device

Alibaba Cloud does not provide SMC-D devices. Contact technical support.

0x030a0000

No SMC-R RDMA links in the link group

The connection could not get a link from its LGR. Run smcr d to check RNIC status. For eRDMA, make sure ERIs are configured and drivers are installed.

0x030b0000

Client cannot find the server's RDMA links

The client searches for RDMA links using the queue pair number (QPN), global identifier (GID), and MAC address provided by the server. If no matching links are found, the connection cannot use RDMA. Run smcr d to check RNIC status. For eRDMA, make sure ERIs are configured and drivers are installed.

0x030c0000

SMC version negotiation failed

The negotiated SMC version is unacceptable. Make sure both hosts run the same operating system distribution.

0x030d0000

Maximum number of SMC-D DMBs reached

Alibaba Cloud does not provide SMC-D devices. Contact technical support.

0x030e0000

SMC-R V2 connection failed

During SMCv2 connection setup, the client cannot find routing information for the peer IP addresses. Make sure the TCP connection, Ethernet NICs, IP configuration, and routing configuration are correct and reachable.

0x030f0000

Indirect connection flag mismatch

During SMCv2 connection setup, the client detects a mismatch between the server's gateway flag and local routing information. Make sure the TCP connection, Ethernet NICs, IP configuration, and routing configuration are correct and use the same network path.

0x04000000

Server and client use different link groups

The server reuses an LGR, but the client wants to create a new one. Run smcr d to check RNIC status. For eRDMA, make sure ERIs are configured and drivers are installed.

0x05000000

Peer rejected the handshake

The peer sent a CLC message rejecting the RDMA connection. Run smcss, find the connection by its quintuple, and check the peer's cause code.

0x09990000

RDMA resource creation failed

RDMA resources could not be created or initialized. Use an RDMA monitoring tool to check error statistics. For eRDMA, run eadm stat.

0x09990001

RDMA RToken failed

This is an SMC protocol stack issue. Contact technical support.

0x09990002

RDMA queue pair (QP) initialization failed

SMC calls InfiniBand (IB) verbs interfaces to initialize the QP, and an error occurred. Run smcr d to check SMC-R devices. For eRDMA, make sure ERIs are configured and drivers are installed.

0x09990003

Memory region (MR) registration failed

The number or size of MRs exceeds the RDMA device specifications. Run ibv_devinfo -d <device> -v &#124; grep max_mr to check limits — max_mr is the maximum count and max_mr_size is the maximum size. This typically means the MR count limit was reached. Reduce the number of SMC connections.

0x09990004

SMC flow control credit initialization failed

RNICs or RDMA links failed, preventing credit messages from being sent. Make sure RNICs work correctly.

Network O&M tools show unexpected data after SMC is enabled

After enabling SMC, tools such as tcpdump, Wireshark, ss, and netstat show network traffic that does not match expectations, or fail to capture expected traffic.

SMC-R is based on RDMA. These tools only analyze TCP traffic and cannot see RDMA packets.

Use RDMA-specific tools instead. See Monitor and check eRDMA.

SMC module is unusable on GPU-accelerated or SCC instances

The SMC module loaded on a GPU-accelerated or Super Computing Cluster (SCC) instance does not function.

These instance types have Mellanox OpenFabrics Enterprise Distribution (OFED) drivers installed. The OFED stack includes its own SMC module that auto-loads but does not work. After installing Mellanox OFED drivers, RDMA function symbols change and the kernel SMC module fails to load with an Unknown symbol error.

SMC cannot be used on GPU-accelerated or SCC instances with Alibaba Cloud Linux 3.

Some SOL_SOCKET and SOL_TCP options do not work after enabling SMC

After replacing TCP with SMC, some setsockopt and getsockopt options at the SOL_SOCKET or SOL_TCP level cannot be configured, cannot be retrieved, or do not work as expected.

SMC uses shared buffers and a different protocol stack design than TCP. Some socket options are incompatible with this design.

Support levels:

  • Y: Fully supported. The option can be set, retrieved, and works as expected.

  • M: Configurable but may not work as expected due to design differences between SMC and TCP.

  • N: Not supported. Using the option causes a TCP fallback with cause code 0x03060000 or 0x03010001.

SOL_SOCKET options

Option

Support

SO_DEBUG

Y

SO_REUSEADDR

Y

SO_TYPE

Y

SO_ERROR

Y

SO_DONTROUTE

M

SO_BROADCAST

M

SO_SNDBUF

Y

SO_RCVBUF

Y

SO_SNDBUFFORCE

Y

SO_RCVBUFFORCE

Y

SO_KEEPALIVE

M

SO_OOBINLINE

M

SO_NO_CHECK

M

SO_PRIORITY

M

SO_LINGER

Y

SO_BSDCOMPAT

M

SO_REUSEPORT

Y

SO_PASSCRED

M

SO_PEERCRED

M

SO_RCVLOWAT

M

SO_SNDLOWAT

M

SO_RCVTIMEO_OLD

Y

SO_SNDTIMEO_OLD

Y

SO_SECURITY_AUTHENTICATION

N

SO_SECURITY_ENCRYPTION_TRANSPORT

N

SO_SECURITY_ENCRYPTION_NETWORK

N

SO_BINDTODEVICE

N

SO_ATTACH_FILTER

M

SO_DETACH_FILTER

M

SO_PEERNAME

Y

SO_ACCEPTCONN

M

SO_PEERSEC

N

SO_PASSSEC

M

SO_MARK

M

SO_PROTOCOL

Y

SO_DOMAIN

Y

SO_RXQ_OVFL

M

SO_WIFI_STATUS

M

SO_PEEK_OFF

N

SO_NOFCS

M

SO_LOCK_FILTER

Y

SO_SELECT_ERR_QUEUE

M

SO_BUSY_POLL

M

SO_MAX_PACING_RATE

M

SO_BPF_EXTENSIONS

Y

SO_INCOMING_CPU

M

SO_ATTACH_BPF

M

SO_ATTACH_REUSEPORT_CBPF

M

SO_ATTACH_REUSEPORT_EBPF

N

SO_CNX_ADVICE

M

SO_MEMINFO

M

SO_INCOMING_NAPI_ID

M

SO_COOKIE

Y

SO_PEERGROUPS

N

SO_ZEROCOPY

N

SO_TXTIME

M

SO_BINDTOIFINDEX

N

SO_TIMESTAMP_OLD

M

SO_TIMESTAMPNS_OLD

M

SO_TIMESTAMPING_OLD

M

SO_TIMESTAMP_NEW

M

SO_TIMESTAMPNS_NEW

M

SO_TIMESTAMPING_NEW

M

SO_RCVTIMEO_NEW

Y

SO_SNDTIMEO_NEW

Y

SO_DETACH_REUSEPORT_BPF

N

SOL_TCP options

Option

Support

TCP_NODELAY

Y

TCP_MAXSEG

M

TCP_CORK

Y

TCP_KEEPIDLE

M

TCP_KEEPINTVL

M

TCP_KEEPCNT

M

TCP_SYNCNT

M

TCP_LINGER2

M

TCP_DEFER_ACCEPT

Y

TCP_WINDOW_CLAMP

M

TCP_INFO

M

TCP_QUICKACK

M

TCP_CONGESTION

M

TCP_MD5SIG

Y

TCP_THIN_LINEAR_TIMEOUTS

M

TCP_THIN_DUPACK

M

TCP_USER_TIMEOUT

M

TCP_REPAIR

M

TCP_REPAIR_QUEUE

M

TCP_QUEUE_SEQ

M

TCP_REPAIR_OPTIONS

M

TCP_FASTOPEN

N

TCP_TIMESTAMP

M

TCP_NOTSENT_LOWAT

M

TCP_CC_INFO

M

TCP_SAVE_SYN

Y

TCP_SAVED_SYN

Y

TCP_REPAIR_WINDOW

M

TCP_FASTOPEN_CONNECT

N

TCP_ULP

N

TCP_MD5SIG_EXT

Y

TCP_FASTOPEN_KEY

N

TCP_FASTOPEN_NO_COOKIE

N

TCP_ZEROCOPY_RECEIVE

N

TCP_CM_INQ/TCP_INQ

M

TCP_TX_DELAY

M