Accelerate TCP applications with NetACC

更新时间:
复制 MD 格式

NetACC is a user-mode library that accelerates TCP applications via eRDMA with compatible socket interfaces—no code changes required.

Important

NetACC is in public preview.

Use cases

NetACC suits high-network-overhead scenarios.

  • High packets per second (PPS) workloads, especially small-packet traffic. NetACC reduces CPU overhead and improves throughput, such as when Redis processes requests.

  • Latency-sensitive workloads: eRDMA delivers lower latency than TCP, accelerating network responses.

  • Frequent short-lived connections: NetACC accelerates connection establishment, reducing creation time and improving performance.

Install NetACC

  • Installation methods

    • Use the eRDMA driver to install NetACC

      NetACC is automatically installed with the eRDMA driver. See Enable eRDMA in the "Use eRDMA" topic.

    • Separately install NetACC

      To install a specific version of NetACC or use it temporarily on an ECS instance, run:

      sudo curl -fsSL https://netacc-release.oss-cn-hangzhou.aliyuncs.com/release/netacc_download_install.sh | sudo sh
  • Configuration file and optimized parameters

    After installation, the /etc/netacc.conf configuration file is generated automatically. You can tune the following parameters based on your workload:

    • NACC_SOR_MSG_SIZE: buffer size.

    • NACC_RDMA_MR_MIN_INC_SIZE: size of the first Memory Region (MR) registered by RDMA.

    • NACC_RDMA_MR_MAX_INC_SIZE: maximum MR size registered by RDMA.

    • NACC_SOR_CONN_PER_QP: connections per Queue Pair (QP).

    • NACC_SOR_IO_THREADS: number of NetACC threads.

    The following example shows a sample configuration file:

    Sample /etc/netacc.conf configuration file

    [netacc]
    # The size of a buffer. If a data block to be sent is large, you can increase the size to improve performance or reduce the size to save memory. 
    # int
    NACC_SOR_MSG_SIZE=16384
    
    # The size of the first MR registered by RDMA. You can reduce the size to save memory.
    # Set this parameter to a value that is the Nth power multiple of 2 of the NACC_SOR_MSG_SIZE value. The minimum multiple is 1.
    NACC_RDMA_MR_MIN_INC_SIZE=16384
    
    # The maximum size of an MR registered by RDMA, which ranges from 1 MB to 512 MB. You can reduce the size to save memory.
    # Set this parameter to a value that is the Nth power multiple of 2 of the NACC_RDMA_MR_MIN_INC_SIZE value. The minimum multiple is 1.
    NACC_RDMA_MR_MAX_INC_SIZE=8388608
    
    # The number of connections per QP. You can increase the value to improve performance. In specific scenarios, set this parameter to 1.
    # int
    NACC_SOR_CONN_PER_QP=1
    
    # The number of NetACC threads. If the throughput is high, increase the value.
    # int
    NACC_SOR_IO_THREADS=1
    
    # The expiration time of empty QPs. Unit: milliseconds. A value of 0 specifies that the empty QPs immediately expire. A value of -1 specifies that the empty QPs never expire.
    NACC_EMPTY_QP_EXPIRE_MS=60000
    
    # The maximum number of empty QPs allowed.
    NACC_EMPTY_QP_MAX_ALL=100
    
    # The maximum number of empty QPs allowed for each destination address.
    NACC_EMPTY_QP_MAX_PER=10
    
    # The probability of using RDMA to establish connections. Valid values: 0 to 100.
    NACC_CONNECT_RDMA_PERCENT=100
    
    # Specifies whether RDMA is enabled by default.
    NACC_ENABLE_RDMA_DEFAULT=1
    
    # The log level.
    # 0: TRACE
    # 1: DEBUG
    # 2: INFO
    # 3: WARN
    # 4: ERROR
    # 5: FATAL
    NACC_LOG_LEVEL=3
    
    # The log path.
    NACC_LOG_PATH="/tmp/netacc.log"
    
    # The following parameters are infrequently used or do not need to be configured.
    
    # The thread affinity.
    # string
    NACC_SOR_AFFINITY=""
    
    # Specifies whether to preferentially use TCP to establish a connection.
    # bool
    NACC_CONN_TCP_FIRST=0

Use NetACC

Use NetACC in applications by running the netacc_run command or setting the LD_PRELOAD environment variable. Read the Considerations section before proceeding.

Run the netacc_run command

netacc_run loads NetACC at application startup. Prefix your application command with netacc_run to start the application with NetACC enabled.

netacc_run accepts parameters such as -t (I/O threads) and -p (connections per QP). Parameters specified when you run the netacc_run command override the configuration file.

netacc_run command parameters

netacc_run -h
Usage: netacc_run [ OPTIONS ] COMMAND

Run COMMAND using NetACC for TCP sockets

OPTIONS:
   -f <path>   set config file, default /etc/netacc.conf
   -p <num>    set max connections per QP, default 1
   -t <num>    set netacc io threads, default 4
   -s <num>    set netacc message size, default 16384
   -F <num>    fast connect mode, default 0
   -d          enable debug mode
   -T          use TCP first in connect
   -P <num>    polling cq time ms
   -A <str>    affinity CPU list, 0 | 1-3 | 1,3,4
   -i <num>    set cq comp_vector, default 0
   -h          display this message
   -v          display version info
  • Examples:

    The following examples use Redis. Prefix a Redis command with netacc_run to start Redis with NetACC.

    • Start Redis with NetACC:

      netacc_run redis-server
    • Start redis-benchmark with NetACC:

      netacc_run redis-benchmark

Configure the LD_PRELOAD environment variable

The LD_PRELOAD environment variable specifies shared libraries preloaded at program startup. Set it to the NetACC library in the LD_PRELOAD variable to automate loading.

  1. Query the location of the NetACC dynamic library:

    ldconfig -p | grep netacc

    Sample output:

    image

  2. Set the LD_PRELOAD environment variable to the NetACC shared library:

    LD_PRELOAD=/lib64/libnetacc-preload.so your_application

    Replace your_application with the target application.

    Examples (Redis):

    • Start Redis with NetACC:

      LD_PRELOAD=/lib64/libnetacc-preload.so redis-server
    • Start redis-benchmark with NetACC:

      LD_PRELOAD=/lib64/libnetacc-preload.so redis-benchmark

    Configure the LD_PRELOAD environment variable in a script

    To automatically load NetACC for frequently used applications or to manage multiple applications with a script, set LD_PRELOAD in a wrapper script. For example, create a script named run_with_netacc.

    #!/bin/bash
    LD_PRELOAD=/lib64/libnetacc-preload.so $@

    Start an application with NetACC:

    ./run_with_netacc.sh your_application

    Examples (Redis):

    • Start Redis with NetACC:

      ./run_with_netacc.sh redis-server
    • Start redis-benchmark with NetACC:

      ./run_with_netacc.sh redis-benchmark

Monitor NetACC

netacc_ss is the NetACC monitoring tool. Run the netacc_ss command to check the data transfer status of NetACC-accelerated TCP processes. You can run it on both the server and client.

netacc_ss command

netacc_ss -h
Usage:
 netacc_ss: [-p] <pid> [options]...
 Show monitoring information of specified netacc process

Options:
 -c   clear unused sock file
 -h   display this help
 -s   display specified monitoring metric[s]. [all|cfg|cnt|mem|qp|sock]
      all: all monitoring information
      cfg: configuration information
      cnt: counter information[default]
      mem: memory information
      qp : queue pair information
      sock: socket information
 -v   display netacc version

Examples:
 netacc_ss -p 12345 -s mem,cnt

Query data transfer status of NetACC-accelerated TCP processes:

netacc_ss -s all -p <Process ID>
Note

To query a process ID, run ps -ef | grep <Process name>.

Usage notes

  • Only TCP connections established through ENIs with the eRDMA Interface (ERI) feature enabled are converted to RDMA connections. Other connections remain TCP.

    Note

    If neither communication end has an ERI-enabled ENI, NetACC falls back to TCP.

  • RDMA socket file descriptors cannot be sent to other processes via kernel IPC.

    Note

    RDMA connections are bound to specific QPs that cannot be shared among processes.

  • NetACC does not support IPv6. To prevent conflicts, disable IPv6 by running sysctl net.ipv6.conf.all.disable_ipv6=1.

  • NetACC does not support hot updates. Stop all NetACC-accelerated processes before updating NetACC.

  • NetACC does not support certain TCP socket options, such as SO_REUSEPORT, SO_ZEROCOPY, and TCP_INQ.

  • NetACC depends on glibc and cannot run in non-glibc environments, such as Golang.

  • Before using NetACC, set the maximum lockable memory to unlimited by running ulimit -l unlimited.

    Note

    If the ulimit -l value is too small, RDMA may fail to register MRs that exceed the limit.

  • When a NetACC-accelerated application listens on a TCP port, NetACC also listens on an RDMA port (TCP port + 20000) for RDMA data transfer.

    Note

    If the RDMA port is occupied or outside the valid range, the connection fails. Allocate ports properly to avoid conflicts.

  • A child process does not inherit socket connections established by the parent process after a fork() call.

    Note

    The child process must establish a new socket connection.

  • QP reuse is disabled by default.

    • To enable QP reuse, set connections per QP (-p) to a value greater than 1 via the NACC_SOR_CONN_PER_QP parameter in the NetACC configuration file or the netacc_run command.

    • QP reuse reduces the number of QPs, management overhead, and resource consumption, improving communication efficiency in high-concurrency scenarios.

    • With QP reuse enabled, multiple RDMA connections may share a local port number because port numbers identify QPs, not individual connections.

      Note

      If your applications require distinct local port numbers (e.g., for different services), disable QP reuse to avoid port conflicts.

Use NetACC in Redis applications

Benefits of NetACC for Redis applications

  • Improved system throughput

    NetACC reduces CPU overhead and improves throughput for high-PPS Redis workloads.

  • Accelerated network responses

    NetACC leverages eRDMA low latency to accelerate Redis network responses.

NetACC used in Redis performance benchmarks

Redis-benchmark is a built-in Redis utility that measures server performance by simulating concurrent client requests.

Test scenario

Use NetACC with redis-benchmark to simulate 100 clients across 4 threads sending 5 million SET requests.

Common parameters used together with the redis-server command

The redis-server command starts the Redis server. Run redis-server -h to view available parameters. Example:

redis-server --port 6379 --protected-mode no
  • --port 6379: The port to start Redis on. Default: 6379.

  • --protected-mode no: Disables protected mode. When protected mode is enabled, Redis only accepts connections from localhost (127.0.0.1) and rejects external connections. Setting this to no allows connections from all IP addresses.

    Important

    Disabling protected mode in a production environment exposes the server to security risks. Use caution in open network environments.

Common command parameters used together with redis-benchmark

redis-benchmark is a Redis stress testing tool that simulates multiple clients sending requests. Run redis-benchmark --help to view available parameters. Example:

redis-benchmark -h 172.17.0.90 -p 6379 -c 100 -n 5000000 -r 10000 --threads 4 -d 512 -t set
  • -h 172.17.0.90: The Redis server hostname or IP address. In this example, set to 172.17.0.90.

  • -p 6379: The Redis port. Default: 6379.

    Note

    To check the Redis port, run sudo grep "^port" /<Path in which the redis.conf file is stored>/redis.conf. By default, redis.conf is at /etc/redis.conf.

  • -c 100: The number of concurrent connections.

  • -n 5000000: The total number of requests.

  • -r 10000: The range of random keys. The SET command uses random integers from 0 to 9999 as part of the keys.

  • --threads 4: The number of threads. By default, redis-benchmark uses one thread.

  • -d 512: The data size per SET request in bytes.

  • -t set: Runs only the specified test. -t is followed by the command name to benchmark. Here set benchmarks only the SET command.

This command uses 4 threads to establish 100 concurrent connections to the Redis server at 172.17.0.90 and send 5 million SET requests with 512-byte payloads using random keys from 0 to 9999.

Common metrics in redis-benchmark benchmark results

  • Throughput summary:

    rps: requests the Redis server processes per second. For example, 332933.81 requests per second means ~332,934 requests/second.

  • Latency summary (unit: milliseconds):

    • avg: the average response time across all requests.

    • min: the minimum response time across all requests.

    • p50: 50% of requests complete within this latency.

    • p95: 95% of requests complete within this latency.

    • p99: 99% of requests complete within this latency.

    • max: the maximum response time across all requests.

Prerequisites

Create two eRDMA-capable ECS instances. Select Auto-install eRDMA Driver and eRDMA Interface to enable ERI for the primary ENI. Use one instance as the Redis server and the other as the client.

Instance configurations:

  • Image: Alibaba Cloud Linux 3

  • Instance type: ecs.g8ae.4xlarge

  • Primary ENI private IP address: 172.17.0.90 (server) and 172.17.0.91 (client). Replace with actual values in your benchmark.

    Note
    • This benchmark enables ERI on the primary ENIs. 172.17.0.90 is the primary ENI private IP of the Redis server instance.

    • If ERI is enabled on secondary ENIs, use their private IP addresses instead. See Enable eRDMA in the "Use eRDMA" topic.

Example on how to configure specific parameters during ECS instance creation

Note the following parameters when creating the instances. For other parameters, see Custom launch ECS instances.

  • Instances & Images: Select an instance type that supports eRDMA and install the eRDMA driver.

    • Instance: For more information, see Limitations.

    • Images: Select Public Image.

    • Extension: Select eRDMA Driver. The eRDMA driver is automatically installed when the instance starts. When you create an Arm instance that uses an Alibaba Cloud Linux image, you can also select performance-acceleration extensions. For more information, see Application performance acceleration.

      image

      Important

      To use the extension feature, you must have the AliyunECSExtensionsFullAccess system policy. Alibaba Cloud accounts have this permission by default. If you are a RAM user, contact an Alibaba Cloud account administrator to grant this permission to your RAM user. For more information, see Extensions.

  • ENIs: To the right of Primary ENI, enable the ERI feature to bind an ERI to the ECS instance.

    image

    Note

    When you create an enterprise-level instance, you can enable the ERI feature only for the primary ENI. If you need to configure eRDMA for a secondary ENI, you can enable the ERI feature for the secondary ENI in the console or by calling an API operation. For more information, see Elastic RDMA network interface card (ERI).

Procedure

  1. Connect to both ECS instances (Redis server and client).

    See Use Workbench to log on to a Linux instance over SSH.

  2. Verify that the eRDMA driver is installed.

    After the instances start, run ibv_devinfo to check the eRDMA driver status.

    • The following output indicates the driver is installed.

      image

    • The following output indicates the driver is being installed. Wait a few minutes and try again.

      image

  3. Install Redis on both instances:

    sudo yum install -y redis

    The following output indicates Redis is installed.

    image

  4. Benchmark Redis performance with redis-benchmark.

    Perform a benchmark by using NetACC
    1. On the Redis server instance, start Redis with NetACC:

      netacc_run redis-server --port 6379 --protected-mode no
      Note

      The following output indicates Redis started successfully.

      image

    2. On the Redis client instance, start redis-benchmark with NetACC:

       netacc_run redis-benchmark -h 172.17.0.90 -p 6379 -c 100 -n 5000000 -r 10000 --threads 4 -d 512 -t set
      Note

      Sample Redis benchmark result

      ====== SET ======                                                      
        5000000 requests completed in 6.52 seconds
        100 parallel clients
        512 bytes payload
        keep alive: 1
        host configuration "save": 3600 1 300 100 60 10000
        host configuration "appendonly": no
        multi-thread: yes
        threads: 4
      
      Latency by percentile distribution:
      0.000% <= 0.039 milliseconds (cumulative count 3)
      50.000% <= 0.127 milliseconds (cumulative count 2677326)
      75.000% <= 0.143 milliseconds (cumulative count 3873096)
      87.500% <= 0.151 milliseconds (cumulative count 4437348)
      93.750% <= 0.159 milliseconds (cumulative count 4715347)
      96.875% <= 0.175 milliseconds (cumulative count 4890339)
      98.438% <= 0.183 milliseconds (cumulative count 4967042)
      99.609% <= 0.191 milliseconds (cumulative count 4991789)
      99.902% <= 0.207 milliseconds (cumulative count 4995847)
      99.951% <= 0.263 milliseconds (cumulative count 4997733)
      99.976% <= 0.303 milliseconds (cumulative count 4998853)
      99.988% <= 0.343 milliseconds (cumulative count 4999403)
      99.994% <= 0.367 milliseconds (cumulative count 4999704)
      99.997% <= 0.391 milliseconds (cumulative count 4999849)
      99.998% <= 2.407 milliseconds (cumulative count 4999924)
      99.999% <= 5.407 milliseconds (cumulative count 4999962)
      100.000% <= 6.847 milliseconds (cumulative count 4999981)
      100.000% <= 8.423 milliseconds (cumulative count 4999991)
      100.000% <= 8.919 milliseconds (cumulative count 4999996)
      100.000% <= 9.271 milliseconds (cumulative count 4999998)
      100.000% <= 9.471 milliseconds (cumulative count 4999999)
      100.000% <= 9.583 milliseconds (cumulative count 5000000)
      100.000% <= 9.583 milliseconds (cumulative count 5000000)
      
      Cumulative distribution of latencies:
      18.820% <= 0.103 milliseconds (cumulative count 941003)
      99.917% <= 0.207 milliseconds (cumulative count 4995847)
      99.977% <= 0.303 milliseconds (cumulative count 4998853)
      99.998% <= 0.407 milliseconds (cumulative count 4999879)
      99.998% <= 0.503 milliseconds (cumulative count 4999903)
      99.998% <= 0.703 milliseconds (cumulative count 4999904)
      99.998% <= 0.807 milliseconds (cumulative count 4999905)
      99.998% <= 0.903 milliseconds (cumulative count 4999906)
      99.998% <= 1.007 milliseconds (cumulative count 4999908)
      99.998% <= 1.103 milliseconds (cumulative count 4999909)
      99.998% <= 1.207 milliseconds (cumulative count 4999912)
      99.998% <= 1.407 milliseconds (cumulative count 4999913)
      99.998% <= 1.503 milliseconds (cumulative count 4999915)
      99.998% <= 1.607 milliseconds (cumulative count 4999916)
      99.998% <= 1.703 milliseconds (cumulative count 4999917)
      99.998% <= 1.807 milliseconds (cumulative count 4999918)
      99.998% <= 1.903 milliseconds (cumulative count 4999919)
      99.998% <= 2.103 milliseconds (cumulative count 4999920)
      99.999% <= 3.103 milliseconds (cumulative count 4999931)
      99.999% <= 4.103 milliseconds (cumulative count 4999944)
      99.999% <= 5.103 milliseconds (cumulative count 4999958)
      99.999% <= 6.103 milliseconds (cumulative count 4999971)
      100.000% <= 7.103 milliseconds (cumulative count 4999984)
      100.000% <= 8.103 milliseconds (cumulative count 4999989)
      100.000% <= 9.103 milliseconds (cumulative count 4999996)
      100.000% <= 10.103 milliseconds (cumulative count 5000000)
      
      Summary:
        throughput summary: 767341.94 requests per second
        latency summary (msec):
                avg       min       p50       p95       p99       max
              0.126     0.032     0.127     0.167     0.183     9.583

      The Summary shows ~770,000 requests/second. For metric details, see Common metrics in redis-benchmark benchmark results.

    Use netacc_ss to monitor the Redis server during the benchmark

    During the benchmark, use netacc_ss on the Redis server instance to monitor it.

    • Query the Redis process ID (redis-server):

      ps -ef | grep redis-server

      The following output shows the redis-server process ID is 114379.

      image

    • Query Redis connection and data transfer status:

      netacc_ss -p 114379 -s all
      Note

      Replace 114379 with the actual Redis process ID. See netacc_ss command.

      The output shows an RDMA connection because ERI is enabled on both instances. The rightmost four columns show message counts and volumes.

      image

    Perform a benchmark without NetACC
    1. On the Redis server instance, start Redis:

      redis-server --port 6379 --protected-mode no --save
      Note

      Replace 6379 with your actual port. See Common parameters used together with the redis-server command.

      The following output indicates Redis started successfully.

      image

    2. On the Redis client instance, start redis-benchmark:

       redis-benchmark -h 172.17.0.90 -c 100 -n 5000000 -r 10000 --threads 4 -d 512 -t set
      Note

      Sample Redis benchmark result

      ====== SET ======                                                         
        5000000 requests completed in 15.02 seconds
        100 parallel clients
        512 bytes payload
        keep alive: 1
        host configuration "save": 
        host configuration "appendonly": no
        multi-thread: yes
        threads: 4
      
      Latency by percentile distribution:
      0.000% <= 0.055 milliseconds (cumulative count 27)
      50.000% <= 0.287 milliseconds (cumulative count 2635010)
      75.000% <= 0.335 milliseconds (cumulative count 3782931)
      87.500% <= 0.367 milliseconds (cumulative count 4459136)
      93.750% <= 0.391 milliseconds (cumulative count 4720397)
      96.875% <= 0.415 milliseconds (cumulative count 4855130)
      98.438% <= 0.439 milliseconds (cumulative count 4936478)
      99.219% <= 0.455 milliseconds (cumulative count 4965765)
      99.609% <= 0.471 milliseconds (cumulative count 4984031)
      99.805% <= 0.487 milliseconds (cumulative count 4993326)
      99.902% <= 0.495 milliseconds (cumulative count 4995579)
      99.951% <= 0.511 milliseconds (cumulative count 4997659)
      99.976% <= 0.551 milliseconds (cumulative count 4998848)
      99.988% <= 0.599 milliseconds (cumulative count 4999468)
      99.994% <= 0.631 milliseconds (cumulative count 4999722)
      99.997% <= 0.663 milliseconds (cumulative count 4999862)
      99.998% <= 0.695 milliseconds (cumulative count 4999924)
      99.999% <= 0.759 milliseconds (cumulative count 4999964)
      100.000% <= 0.807 milliseconds (cumulative count 4999982)
      100.000% <= 1.935 milliseconds (cumulative count 4999993)
      100.000% <= 2.071 milliseconds (cumulative count 4999996)
      100.000% <= 2.111 milliseconds (cumulative count 4999998)
      100.000% <= 2.119 milliseconds (cumulative count 4999999)
      100.000% <= 2.143 milliseconds (cumulative count 5000000)
      100.000% <= 2.143 milliseconds (cumulative count 5000000)
      
      Cumulative distribution of latencies:
      0.028% <= 0.103 milliseconds (cumulative count 1377)
      0.985% <= 0.207 milliseconds (cumulative count 49228)
      60.094% <= 0.303 milliseconds (cumulative count 3004705)
      96.325% <= 0.407 milliseconds (cumulative count 4816230)
      99.938% <= 0.503 milliseconds (cumulative count 4996887)
      99.991% <= 0.607 milliseconds (cumulative count 4999546)
      99.999% <= 0.703 milliseconds (cumulative count 4999927)
      100.000% <= 0.807 milliseconds (cumulative count 4999982)
      100.000% <= 0.903 milliseconds (cumulative count 4999987)
      100.000% <= 1.903 milliseconds (cumulative count 4999990)
      100.000% <= 2.007 milliseconds (cumulative count 4999995)
      100.000% <= 2.103 milliseconds (cumulative count 4999997)
      100.000% <= 3.103 milliseconds (cumulative count 5000000)
      
      Summary:
        throughput summary: 332955.97 requests per second
        latency summary (msec):
                avg       min       p50       p95       p99       max
              0.292     0.048     0.287     0.399     0.447     2.143

      The Summary shows ~330,000 requests/second. For metric details, see Common metrics in redis-benchmark benchmark results.