在ACK上的eRDMA节点运行gRPC+Verbs应用

在eRDMA环境下,您可以使用基于RDMA的通信(可能是gRPC+Verbs协议)来替代传统的只使用gRPC的通信方式。这样做可以有效地进行网络传输,减少参数服务器和工作节点之间通信的延迟,从而加速整个分布式训练过程。

前提条件

  • 以hostNetwork模式安装Arena。详细信息,请参见配置Arena客户端

  • 已通过kubectl工具连接集群。详细信息,请参见获取集群KubeConfig并通过kubectl工具连接集群

  • 已在集群中部署eRDMA的Device Plugin。详细信息,请参见后续操作

    展开查看部署eRDMA的Device Plugin的YAML文件

    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: rdma-devices
      namespace: kube-system
    data:
      config.json: |
        {
            "mode" : "hca",
            "deviceType" : "eRDMA"
        }
    ---
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: rdma-sriov-dp-ds
      namespace: kube-system
      labels:
        app: rdma-device-plugin
    spec:
      selector:
        matchLabels:
          app: rdma-device-plugin
      template:
        metadata:
          labels:
            app: rdma-device-plugin
            name: rdma-sriov-dp-ds
        spec:
          hostNetwork: true
          nodeSelector:
            aliyun.accelerator/erdma: "true"
          tolerations:
          - key: CriticalAddonsOnly
            operator: Exists
          containers:
          - image: registry-cn-beijing.ack.aliyuncs.com/acs/k8s-rdma-sriov-dev-plugin:v1.0.0-b3dcbc5-aliyun
            name: k8s-rdma-sriov-dp-ds
            imagePullPolicy: Always
            resources:
              limits:
                memory: "300Mi"
                cpu: "300m"
              requests:
                memory: "300Mi"
                cpu: "300m"
            securityContext:
              privileged: true
            volumeMounts:
              - name: device-plugin
                mountPath: /var/lib/kubelet/device-plugins
              - name: config
                mountPath: /k8s-rdma-sriov-dev-plugin
          volumes:
            - name: device-plugin
              hostPath:
                path: /var/lib/kubelet/device-plugins
            - name: config
              configMap:
                name: rdma-devices
                items:
                - key: config.json
                  path: config.json
    

操作步骤

以下操作以tf_cnn_benchmark Job为例进行介绍。

  1. 提交使用eRDMA的TFJob。

    arena submit tfjob --name=tf-ps-benchmark \
    --gpus=8 --workers=1 --ps=1 \
    --rdma \
    --hostNetwork true \
    --psImage=registry.cn-beijing.aliyuncs.com/acs/tf-benchmark:1.0 \
    --image=registry.cn-beijing.aliyuncs.com/acs/tf-benchmark:1.0 \
    	"CUDA_VISIBLE_DEVICES= python benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py \
      --server_protocol=grpc+verbs \
      --model=resnet50 \
      --batch_size=16 \
      --data_format=NHWC"
  2. 查询eRDMA网卡。

    $ ibv_devinfo
    hca_id:	rocep156s0
    	transport:			eRDMA
    	fw_ver:				0.2.0
    	node_guid:			0216:3eff:fe2c:b8f3
    	sys_image_guid:			0216:3eff:fe2c:b8f3
    	vendor_id:			0x1ded
    	vendor_part_id:			4223
    	hw_ver:				0x0
    	phys_port_cnt:			1
    		port:	1
    			state:			PORT_DOWN (1)
    			max_mtu:		1024 (3)
    			active_mtu:		1024 (3)
    			sm_lid:			0
    			port_lid:		0
    			port_lmc:		0x00
    			link_layer:		Ethernet
    
    hca_id:	rocep26s0
    	transport:			eRDMA
    	fw_ver:				0.2.0
    	node_guid:			0216:3eff:fe10:f8b0
    	sys_image_guid:			0216:3eff:fe10:f8b0
    	vendor_id:			0x1ded
    	vendor_part_id:			4223
    	hw_ver:				0x0
    	phys_port_cnt:			1
    		port:	1
    			state:			PORT_ACTIVE (4)
    			max_mtu:		1024 (3)
    			active_mtu:		1024 (3)
    			sm_lid:			0
    			port_lid:		0
    			port_lmc:		0x00
    			link_layer:		Ethernet
  3. 监测eRDMA流量。

    $ eadm stat -d rocep26s0 -l
    Monitoring rocep26s0...    (press CTRL-C to stop)
    
     15:59:56  rx:           0 B/s     0 p/s          tx:           0 B/s     0 p/s
    
    
     rocep26s0  /  traffic statistics
    
                               rx         |       tx
    --------------------------------------+------------------
      bytes                    11.06 GiB  |       11.18 GiB
    --------------------------------------+------------------
              max            52.43 MiB/s  |     52.10 MiB/s
          average             4.03 MiB/s  |      4.07 MiB/s
              min                  0 B/s  |           0 B/s
    --------------------------------------+------------------
      packets                    8406769  |         8546764
    --------------------------------------+------------------
              max              38990 p/s  |       37488 p/s
          average               2988 p/s  |        3038 p/s
              min                  0 p/s  |           0 p/s
    --------------------------------------+------------------
      time                 46.88 minutes

    以上信息表明,可以监控到实时的eRDMA通信流量。