基于eRDMA加速服务网格Pod间的网络性能

Alibaba Cloud Linux 3提供的共享内存通信(Shared Memory Communication)是一种兼容socket层、使用远程内存直接访问(RDMA)技术的高性能内核网络协议栈,能够显著优化网络通信性能。然而,在原生ECS环境中使用SMC技术优化网络性能时,用户需要谨慎维护SMC白名单以及容器网络命名空间中的配置,以防止SMC非预期降级到TCP。ASM为用户提供了可控的网络环境(集群内)SMC优化能力,能够自动优化服务网格Pod间的流量,用户无需关心具体的SMC配置。

前提条件

已添加集群到ASM实例

使用限制

说明

ASM启用SMC网络性能优化当前处于beta阶段。

操作步骤

步骤一:节点环境初始化

SMC利用eRDMA网卡加速网络性能,在启用之前需要对节点进行相应的初始化准备。

  1. 升级Alibaba Cloud Linux 3系统内核为5.10.134-16.3及以上。

    说明

    特定版本内核的已知问题以及修复方式可见于已知问题小节。

    1. 使用uname -r查看当前内核版本,若内核版本为5.10.134-16.3及以上,则无需额外操作,跳过内核升级步骤。

      $ uname -r
      5.10.134-16.3.al8.x86_64
    2. 查看可安装内核版本

      $ sudo yum search kernel --showduplicates | grep kernel-5.10
      Last metadata expiration check: 3:01:27 ago on Tue 09 Apr 2024 07:40:15 AM CST.
      kernel-5.10.134-15.1.al8.x86_64 : The Linux kernel, based on version 5.10.134, heavily modified with backports
      kernel-5.10.134-15.2.al8.x86_64 : The Linux kernel, based on version 5.10.134, heavily modified with backports
      kernel-5.10.134-15.al8.x86_64 : The Linux kernel, based on version 5.10.134, heavily modified with backports
      kernel-5.10.134-16.1.al8.x86_64 : The Linux kernel, based on version 5.10.134, heavily modified with backports
      kernel-5.10.134-16.2.al8.x86_64 : The Linux kernel, based on version 5.10.134, heavily modified with backports
      kernel-5.10.134-16.3.al8.x86_64 : The Linux kernel, based on version 5.10.134, heavily modified with backports
      kernel-5.10.134-16.al8.x86_64 : The Linux kernel, based on version 5.10.134, heavily modified with backports
      [...]
    3. 安装最新版本内核或安装指定版本内核

      1. 安装最新版本内核:

        $ sudo yum update kernel
      2. 或安装指定内核版本,以 kernel-5.10.134-16.3.al8.x86_64 版本为例:

        $ sudo yum install kernel-5.10.134-16.3.al8.x86_64
    4. 重启节点。系统重新启动后使用uname -r检查内核是否已升级至预期版本。

  2. 为ACK网络插件Terway配置弹性网卡白名单,防止其纳管即将添加的辅助eRDMA网卡。操作步骤请参见为弹性网卡(ENI)配置白名单

  3. 为集群内各节点创建和绑定一块辅助eRDMA网卡,具体操作参见为已有实例配置eRDMA

    说明

    只需完成辅助eRDMA网卡的创建和绑定步骤即可。在此场景下,辅助eRDMA网卡需要和主网卡处在同一子网内。配置辅助eRDMA网卡方式请参见下一步骤。

  4. 配置节点上的辅助eRDMA网卡

    1. 将下述脚本存放于节点的任意目录下。为脚本添加可执行权限:sudo chmod +x asm_erdma_eth_config.sh

      asm_erdma_eth_config.sh

      #!/bin/bash
      
      #
      # Params
      #
      mode=
      mac=
      ipv4=
      mask=
      gateway=
      state=  # UP/DOWN
      
      #
      # Functions
      #
      function find_erdma_eth
      {
              echo "$(rdma link show | awk '{print $NF}')"
      }
      
      function get_erdma_eth_info
      {
              e=$1
              echo "Find ethernet device with erdma: $e"
      
              # UP/DOWN
              ip link show $e | grep -q "state UP" && state="UP" || state="DOWN"
      
              # MAC address
              mac=$(ip a show dev $e | grep ether | awk '{print $2}')
      
              # IPv4 address
              ipv4=$(curl --silent --show-error --connect-timeout 5 \
                      http://100.100.100.200/latest/meta-data/network/interfaces/macs/"$mac"/primary-ip-address \
                      2>&1)
              if [ $? -ne 0 ]; then
                      echo "failed to retrieve $e IPv4 address. Error: $ipv4"
                      exit 1
              fi
      
              # Mask
              mask=$(curl --silent --show-error --connect-timeout 5 \
                      http://100.100.100.200/latest/meta-data/network/interfaces/macs/"$mac"/netmask \
                      2>&1)
              if [ $? -ne 0 ]; then
                      echo "failed to retrieve $e network mask. Error: $mask"
                      exit 1
              fi
      
              # Gateway
              gateway=$(curl --silent --show-error --connect-timeout 5 \
                      http://100.100.100.200/latest/meta-data/network/interfaces/macs/"$mac"/gateway \
                      2>&1)
              if [ $? -ne 0 ]; then
                      echo "failed to retrieve $e gateway. Error: $mask"
                      exit 1
              fi
              echo "- state <$state>, IPv4 <$ipv4>, mask <$mask>, gateway <$gateway>"
      }
      
      function set_erdma_eth
      {
              local eths=()
      
              # find all eths with erdma
              eths=$(find_erdma_eth)
              if [ ${#eths[@]} -eq 0 ]; then
                      echo "Can't find ethernet device with erdma capability"
                      exit 1
              fi
      
              for e in ${eths[@]}
              do
                      if [[ $e == "eth0" ]]; then
                              echo "Skip eth0, no need to configure."
                              continue
                      fi
                      get_erdma_eth_info $e
                      echo "Config.."
      
                      # Enable
                      if [ "$state" == "DOWN" ]; then
                              ip link set $e up 1>/dev/null 2>&1
                              if ip link show $e | grep -q "state UP" ; then
                                      echo "- successed to set $e UP"
                              else
                                      echo "- failed to set $e UP"
                                      exit 1
                              fi
                      else
                              echo "- $e has been activated, nothing to do."
                      fi
      
                      # Set IP & mask
                      if ! ip addr show $e | grep -q "inet\b"; then
                              local eth0_metric=$(ip route | grep "dev eth0 proto kernel scope link" \
                                                      | awk '/metric/ {print $NF}')
                              ip addr add $ipv4/$mask dev $e metric $((eth0_metric + 1)) 1>/dev/null 2>&1
                              if [ $? -eq 0 ]; then
                                      echo "- successed to configure $e IPv4/mask and direct route"
                              else
                                      echo "- failed to configure $e IPv4/mask and direct route"
                              fi
                      else
                              echo "- $e has been configured with IPv4(s), nothing to do."
                      fi
      
                      echo "Complete all configurations of $e"
              done
      }
      
      function reset_erdma_eth
      {
              local eths=()
      
              # Find all eths with erdma
              eths=$(find_erdma_eth)
              if [ ${#eths[@]} -eq 0 ]; then
                      echo "Can't find ethernet device with erdma capability"
                      exit 1
              fi
      
              for e in ${eths[@]}
              do
                      if [[ $e == "eth0" ]]; then
                              echo "Skip eth0, no need to configure."
                              continue
                      fi
                      get_erdma_eth_info $e
                      echo "Reset.."
      
                      # Remove IPv4
                      ip addr flush dev $e scope global 1>/dev/null 2>&1
                      if [ $? -eq 0 ]; then
                              echo "- successed to flush $e IPv4(s)"
                      else
                              echo "- failed to flush $e IPv4(s)"
                      fi
      
                      # Disable
                      ip link set $e down 1>/dev/null 2>&1
                      if [ $? -eq 0 ]; then
                              echo "- successed to set $e DOWN"
                      else
                              echo "- failed to set $e DOWN"
                      fi
                      echo "Complete all resets of $e"
              done
      }
      
      print_help() {
              echo "Usage: $0 [option]"
              echo "Options:"
              echo "  -s            Enable eRDMA-cap Eth and configure its IPv4"
              echo "  -r            Disable eRDMA-cap Eth and remove all its IPv4"
              echo "  -h, --help    Show this help message"
      }
      
      while [ "$1" != "" ]; do
              case $1 in
                      -s)
                              set_erdma_eth
                              exit 0
                              ;;
                      -r)
                              reset_erdma_eth
                              exit 0
                              ;;
                      -h | --help)
                              print_help
                              exit 0
                              ;;
                      *)
                              echo "Invalid option: $1"
                              print_help
                              exit 1
                              ;;
              esac
              shift
      done
      
      if [ -z "$1" ]; then
              print_help
              exit 1
      fi
      说明

      此脚本仅在当前上下文中用于辅助eRDMA网卡配置,不适用于其他网卡配置场景。

    2. 执行sudo ./asm_erdma_eth_config.sh -s将新添加的辅助eRDMA网卡状态设置为UP,并为其配置IPv4地址。预计输出类似如下内容:

      $ sudo ./asm_erdma_eth_config.sh -s
      Find ethernet device with erdma: eth2
      - state <DOWN>, IPv4 <192.168.x.x>, mask <255.255.255.0>, gateway <192.168.x.x>
      Config..
      - successed to set eth2 UP
      - successed to configure eth2 IPv4/mask and direct route
      Complete all configurations of eth2
    3. (可选)上述配置辅助eRDMA网卡的步骤在每次重启节点后需要再次执行。若希望重启节点时自动执行配置,可按如下方式创建对应的systemd service。

      1. 在节点/etc/systemd/system目录下添加如下asm_erdma_eth_config.service文件,将其中的/path/to/asm_erdma_eth_config.sh更换为节点上asm_erdma_eth_config.sh脚本的实际路径。

        asm_erdma_eth_config.service

        [Unit]
        Description=Run asm_erdma_eth_config.sh script after network is up
        Wants=network-online.target
        After=network-online.target
        
        [Service]
        Type=oneshot
        ExecStart=/bin/sh /path/to/asm_erdma_eth_config.sh -s
        RemainAfterExit=yes
        
        [Install]
        WantedBy=multi-user.target
      2. 启用asm_erdma_eth_config.service。

        sudo systemctl daemon-reload
        sudo systemctl enable asm_erdma_eth_config.service

        此后在节点启动时将自动执行网卡配置。节点启动后可通过sudo systemctl status asm_erdma_eth_config.service查看asm_erdma_eth_config.service状态,预期状态为active,输出类似如下。

        # sudo systemctl status asm_erdma_eth_config.service
        ● asm_erdma_eth_config.service - Run asm_erdma_eth_config.sh script after network is up
           Loaded: loaded (/etc/systemd/system/asm_erdma_eth_config.service; enabled; vendor preset: enabled)
           Active: active (exited) since [time]
         Main PID: 1689 (code=exited, status=0/SUCCESS)
            Tasks: 0 (limit: 403123)
           Memory: 0B
           CGroup: /system.slice/asm_erdma_eth_config.service
        
        [time] <hostname> sh[1689]: Find ethernet device with erdma: eth2
        [time] <hostname> systemd[1]: Starting Run asm_erdma_eth_config.sh script after network is up...
        [time] <hostname> sh[1689]: - state <DOWN>, IPv4 <192.168.x.x>, mask <255.255.255.0>, gateway <192.168.x.x>
        [time] <hostname> sh[1689]: Config..
        [time] <hostname> sh[1689]: - successed to set eth2 UP
        [time] <hostname> sh[1689]: - successed to configure eth2 IPv4/mask and direct route
        [time] <hostname> sh[1689]: Complete all configurations of eth2
        [time] <hostname> systemd[1]: Started Run asm_erdma_eth_config.sh script after network is up.
      3. 相反,若不再需要asm_erdma_eth_config.service,可通过sudo systemctl disable asm_erdma_eth_config.service移除。

步骤二:部署测试应用

  1. 为测试使用的default命名空间启用自动注入,具体请参见启用自动注入

  2. 使用以下内容,创建fortioserver.yaml文件。

    展开查看fortioserver.yaml

    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: fortioserver
    spec:
      ports:
      - name: http-echo
        port: 8080
        protocol: TCP
      - name: tcp-echoa
        port: 8078
        protocol: TCP
      - name: grpc-ping
        port: 8079
        protocol: TCP
      selector:
        app: fortioserver
      type: ClusterIP
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: fortioserver
      name: fortioserver
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: fortioserver
      template:
        metadata:
          labels:
            app: fortioserver
          annotations:
            sidecar.istio.io/inject: "true"
            sidecar.istio.io/proxyCPULimit: 2000m
            proxy.istio.io/config: |
              concurrency: 2 
        spec:
          shareProcessNamespace: true
          containers:
          - name: captured
            image: fortio/fortio:latest_release
            ports:
            - containerPort: 8080
              protocol: TCP
            - containerPort: 8078
              protocol: TCP
            - containerPort: 8079
              protocol: TCP
          - name: anolis
            securityContext:
              runAsUser: 0
            image: openanolis/anolisos:latest
            args:
            - /bin/sleep
            - 3650d
    ---
    apiVersion: v1
    kind: Service
    metadata:
      annotations:
          service.beta.kubernetes.io/alibaba-cloud-loadbalancer-health-check-switch: "off"
      name: fortioclient
    spec:
      ports:
      - name: http-report
        port: 8080
        protocol: TCP
      selector:
        app: fortioclient
      type: LoadBalancer
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: fortioclient
      name: fortioclient
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: fortioclient
      template:
        metadata:
          annotations:
            sidecar.istio.io/inject: "true"
            sidecar.istio.io/proxyCPULimit: 4000m
            proxy.istio.io/config: |
               concurrency: 4
          labels:
            app: fortioclient
        spec:
          shareProcessNamespace: true
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                  - key: app
                    operator: In
                    values:
                    - fortioserver
                topologyKey: "kubernetes.io/hostname"
          containers:
          - name: captured
            volumeMounts:
            - name: shared-data
              mountPath: /var/lib/fortio
            image: fortio/fortio:latest_release
            ports:
            - containerPort: 8080
              protocol: TCP
          - name: anolis
            securityContext:
              runAsUser: 0
            image: openanolis/anolisos:latest
            args:
            - /bin/sleep
            - 3650d
          volumes:
          - name: shared-data
            emptyDir: {}
    
  3. 使用ACK集群的KubeConfig,执行以下命令,部署测试应用。

    kubectl apply -f fortioserver.yaml
  4. 执行以下命令,查看测试应用的状态。

    kubectl get pods | grep fortio

    预期输出:

    NAME                            READY   STATUS    RESTARTS      
    fortioclient-8569b98544-9qqbj   3/3     Running   0
    fortioserver-7cd5c46c49-mwbtq   3/3     Running   0

    预期输出表明两个应用均正常启动。

步骤三:在基础环境运行测试,查看基线测试结果

fortio应用启动后,会暴露8080端口监听,访问该端口将打开fortio应用的控制台页面。为了生成测试流量,可以将fortioclient的端口映射到当前所用机器,在当前所用机器上打开fortio的控制台页面。

  1. 使用ACK集群的KubeConfig,执行以下命令,将fortio客户端的Service监听的8080端口映射到本地的8080端口。

    kubectl port-forward service/fortioclient 8080:8080
  2. 在浏览器中输入http://localhost:8080/fortio地址,访问fortio客户端控制台,并修改相关配置。

    image

    请按照下表修改页面上的参数。

    参数

    示例值

    URL

    http://fortioserver:8080/echo

    QPS

    100000

    Duration

    30s

    Threads/Simultaneous connections

    64

    Payload

    填写以下字符串(128 Byte):

    xhsyL4ELNoUUbC3WEyvaz0qoHcNYUh0j2YHJTpltJueyXlSgf7xkGqc5RcSJBtqUENNjVHNnGXmoMyILWsrZL1O2uordH6nLE7fY6h5TfTJCZtff3Wib8YgzASha8T8g

  3. 配置完成后,在页面下方,单击Start开始测试,等待进度条结束,测试完毕。

    image

    测试运行完毕后,页面将给出本次测试的结果。下图仅供参考,测试结果请以实际环境为准。

    image

    页面输出的测试结果横坐标为请求的Latency,观察柱形图在横坐标上的分布可以得出请求延迟的分布情况,紫色曲线为在不同响应时间范围内完成的请求数量。纵坐标为完成的请求数。同时,图表顶部给出了P50/P75/P90/P99/P99.9的请求Latency数据。得到基础环境数据后,需要为应用启用SMC,准备进行SMC加速后的性能验证。

步骤四:为ASM实例和工作负载启用SMC加速

  1. 使用服务网格的KubeConfig编辑网格配置,添加"smcEnabled: true",以启用SMC加速功能。

    $ kubectl edit asmmeshconfig
    
    apiVersion: istio.alibabacloud.com/v1beta1
    kind: ASMMeshConfig
    metadata:
      name: default
    spec:
      ambientConfiguration:
        redirectMode: ""
        waypoint: {}
        ztunnel: {}
      cniConfiguration:
        enabled: true
        repair: {}
      smcEnabled: true
  2. 使用ACK集群的KubeConfig,执行以下命令,修改fortioserver和fortioclient的Deployment,为Pod添加Annotation。

    为网格实例启用加速后,还需要进一步为工作负载启用加速,通过为Pod添加Key为smc.asm.alibabacloud.com/enabled,值设置为true的Annotation,可以为工作负载启用SMC加速,您需要同时为需要优化的两端工作负载均启用加速

    1. 编辑fortioclient的Deployment定义。

      $ kubectl edit deployment fortioclient
      
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        ......
        name: fortioclient
      spec:
        ......
        template:
          metadata:
            ......
            annotations:
              smc.asm.alibabacloud.com/enabled: "true"
              
    2. 编辑fortioserver的Deployment定义。

      $ kubectl edit deployment fortioserver
      
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        ......
        name: fortioserver
      spec:
        ......
        template:
          metadata:
            ......
            annotations:
              smc.asm.alibabacloud.com/enabled: "true"
              

步骤五:加速后环境运行测试,查看启用优化后的测试结果

由于修改Deployment将使工作负载重启,因此您需要参考步骤三重新进行fortioclient端口映射,再次发起测试,等待测试结束查看结果。

image

与加速前的结果对比,可以看到启用ASM SMC加速后,延迟下降,QPS明显提升。

已知问题

  1. 使用Alibaba Cloud Linux 3系统5.10.134-16.3版本内核启用SMC加速后,重启POD时系统报告类似unregister_netdevice: waiting for eth* to become free. Usage count = *错误信息。POD无法成功删除。

    此问题是由于smc内核模块未正确释放网络接口的引用计数。可通过启用如下热补丁,修复此已知问题。其余更高版本可忽略此步骤。更多内核热补丁操作详见内核热补丁操作说明

    $ sudo yum install kernel-hotfix-16664924-5.10.134-16.3