基于RDS Custom AI节点部署DeepSeek-R1模型-阿里云帮助中心

RDS Custom提供了已经部署DeepSeek-R1模型的镜像，您可以在RDS Custom实例上快速且便捷地构建DeepSeek-R1的推理服务，无需进行额外配置，即可实现开箱即用。

操作步骤

步骤一：创建RDS Custom AI节点实例

重要

RDS Custom AI节点目前仅对白名单用户开放，如需使用，请联系我们。

创建RDS Custom实例请参见创建RDS Custom实例。关键参数说明如下。

参数	说明
实例规格	架构选择AI 节点，实例规格选择： rds.nv8.8xlarge.8cm：GPU卡型号为NVIDIA H20，适合于部署70B以上参数量级的LLM模型，推荐部署满血版DeepSeek-R1模型（671B参数）。 rds.ns8.8xlarge.8cm：GPU卡型号为NVIDIA L20，适合于部署70B以下参数量级的LLM模型，推荐部署蒸馏版DeepSeek-R1模型（32B参数）。说明在部署DeepSeek-R1模型时，建议您购买两台RDS Custom实例以构建主从集群，并采用eRDMA进行互联。
镜像	选择默认镜像中的DeepSeek R1。 RDS Custom提供了部署DeepSeek-R1及其所需所有资源的镜像包。
系统盘	选择容量至少为200 GiB，性能级别为PL0或PL1的ESSD云盘。
数据盘	选择容量至少为1000 GiB，性能级别为PL1的ESSD云盘。部署DeepSeek-R1模型至少需要1000 GiB的ESSD PL1云盘，以确保充足的存储空间和良好的IO性能。在存储空间不足时，可以通过扩容现有云盘或挂载新的数据盘来实现空间扩展。
安全组	所选安全组需开放8000端口，以便后续获取DeepSeek服务。创建及配置安全组请参见管理安全组和管理安全组规则。

步骤二：获取RDS Custom实例数据盘的盘符和eth1网卡的IP地址

连接RDS Custom实例。
执行以下命令，获取数据盘的盘符。
```
fdisk -l
```
返回示例：
如果仅挂载一块数据盘，通常其盘符为/dev/nvme1n1。
（可选）如果是主从集群，请执行以下命令，获取主节点eth1网卡IP地址。
```
ifconfig eth1
```
返回示例：

步骤三：启动DeepSeek-R1模型

说明

主从集群与单节点实例的启动方式存在差异，请根据实际情况进行选择。

主从集群

在主节点上启动DeepSeek-R1模型。

执行以下命令，初始化环境。
```
sh init_ds_env.sh eth1 /dev/nvme1n1
```
返回示例：

执行以下命令，启动DeepSeek-R1模型。

docker run \
    --entrypoint sh -dit \
    --network host \
    --name ds-node \
    --device nvidia.com/gpu=all \
    --ipc host \
     -e NCCL_DEBUG=TRACE \
    -e NCCL_SOCKET_IFNAME=eth1 -e SGLANG_SET_CPU_AFFINITY=1 \
    -e NCCL_IB_DISABLE=0 -e NCCL_IB_HCA=erdma_0:1 -e SGL_ENABLE_JIT_DEEPGEMM=1 \
    --privileged \
    --pids-limit 65535 --restart unless-stopped \
    -v "/mnt/deepseek-ai/DeepSeek-R1:/sgl-workspace/deepseek-ai/DeepSeek-R1" \
    "docker.io/lmsysorg/sglang:v0.4.4.post1" \
    -c "python3 sglang/scripts/export_deepseek_nextn.py --input-dir /sgl-workspace/deepseek-ai/DeepSeek-R1 --output-dir /sgl-workspace/deepseek-ai/DeepSeek-R1-NextN && python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1 --tp 16 --dist-init-addr $主节点eth1网卡IP:5000 --nnodes 2 --node-rank 0 --trust-remote-code --port 8000 --context-length 163840 --enable-metrics --enable-torch-compile --torch-compile-max-bs 8  --speculative-algo EAGLE --speculative-draft /sgl-workspace/deepseek-ai/DeepSeek-R1-NextN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --enable-flashinfer-mla  --reasoning-parser deepseek-r1 --host 0.0.0.0 2>&1 | tee /sgl-workspace/server.log "

说明

使用步骤一获取的主节点eth1网卡IP，替换命令中的$主节点eth1网卡IP。

在从节点上启动DeepSeek-R1模型。

执行以下命令，初始化环境。
```
sh init_ds_env.sh eth1 /dev/nvme1n1
```
返回示例：

执行以下命令，启动DeepSeek-R1模型。

docker run \
    --entrypoint sh -dit \
    --network host \
    --name ds-node \
    --device nvidia.com/gpu=all \
    --ipc host \
     -e NCCL_DEBUG=TRACE \
    -e NCCL_SOCKET_IFNAME=eth1 -e SGLANG_SET_CPU_AFFINITY=1 \
    -e NCCL_IB_DISABLE=0 -e NCCL_IB_HCA=erdma_0:1 -e SGL_ENABLE_JIT_DEEPGEMM=1 \
    --privileged \
    --pids-limit 65535 --restart unless-stopped \
    -v "/mnt/deepseek-ai/DeepSeek-R1:/sgl-workspace/deepseek-ai/DeepSeek-R1" \
    "docker.io/lmsysorg/sglang:v0.4.4.post1" \
    -c "python3 sglang/scripts/export_deepseek_nextn.py --input-dir /sgl-workspace/deepseek-ai/DeepSeek-R1 --output-dir /sgl-workspace/deepseek-ai/DeepSeek-R1-NextN && python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1 --tp 16 --dist-init-addr $主节点eth1网卡IP:5000 --nnodes 2 --node-rank 1 --trust-remote-code --port 8000 --context-length 163840 --enable-metrics --enable-torch-compile --torch-compile-max-bs 8  --speculative-algo EAGLE --speculative-draft /sgl-workspace/deepseek-ai/DeepSeek-R1-NextN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --enable-flashinfer-mla --reasoning-parser deepseek-r1 2>&1 | tee /sgl-workspace/server.log "

说明

使用步骤一获取的主节点eth1网卡IP，替换命令中的$主节点eth1网卡IP。

单节点实例

执行以下命令，初始化环境。
```
sh init_ds_env.sh eth1 /dev/nvme1n1
```
返回示例：

执行以下命令，启动DeepSeek-R1模型。

docker run \
    --replace\
    --entrypoint sh -dit \
    --network host \
    --name ds-node \
    --device nvidia.com/gpu=all \
    --ipc host \
     -e NCCL_DEBUG=TRACE \
    -e NCCL_SOCKET_IFNAME=eth1 -e SGLANG_SET_CPU_AFFINITY=1 \
    -e NCCL_IB_DISABLE=0 -e NCCL_IB_HCA=erdma_0:1 -e SGL_ENABLE_JIT_DEEPGEMM=1 \
    --privileged \
    --pids-limit 65535 --restart unless-stopped \
    -v "/mnt/deepseek-ai/DeepSeek-R1:/sgl-workspace/deepseek-ai/DeepSeek-R1" \
    "docker.io/lmsysorg/sglang:v0.4.4.post1" \
    -c "python3 sglang/scripts/export_deepseek_nextn.py --input-dir /sgl-workspace/deepseek-ai/DeepSeek-R1 --output-dir /sgl-workspace/deepseek-ai/DeepSeek-R1-NextN && python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1 --tp 8 --trust-remote-code --port 8000 --context-length 163840 --enable-metrics --enable-torch-compile --torch-compile-max-bs 8  --speculative-algo EAGLE --speculative-draft /sgl-workspace/deepseek-ai/DeepSeek-R1-NextN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --enable-flashinfer-mla  --reasoning-parser deepseek-r1  --mem-fraction-static 0.9 --host 0.0.0.0  2>&1 | tee /sgl-workspace/server.log "

步骤四：查询模型启动日志

执行以下命令以查询DeepSeek-R1模型的启动日志。如为主从实例，请在主节点上执行。

docker exec -it ds-node  bash -c "tail -f server.log"

返回示例：

模型验证

执行以下命令验证模型。如为主从集群，请在主节点上执行。

curl http://localhost:8000/v1/chat/completions     -H "Content-Type: application/json"     -d '{
        "model": "deepseek-ai/DeepSeek-R1",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is the capital of France?"}
        ],
        "stream": false
    }'

返回示例：

（可选）满血版DeepSeek-R1模型性能测试

性能测试

重要

压测工具及其参数对最终数据具有显著影响，因此在分析数据时应综合考虑压测工具和压测参数。

本文使用sglang.bench_serving进行测试。在测试过程中，指定参数--backend sglang将会清空kvcache缓存，压力测试将在完全无缓存命中的条件下进行。

为RDS Custom实例绑定弹性公网IP，并通过弹性公网IP连接实例，详情请参见连接RDS Custom实例。

在安装了DeepSeek-R1模型的容器（本文以ds-node为例）中执行以下命令，运行压测脚本。

说明

如为主从集群，请在主节点上执行。

重要

以下提供了从4K到128K的数据集测试脚本，请根据实际情况进行相应的修改。

压测脚本

echo "start bench 4k 1"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=3072     --random-output=1024     --max-concurrency=1     --num-prompts=2     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./4k_1.json

echo "start bench 4k 10"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=3072     --random-output=1024     --max-concurrency=10     --num-prompts=20     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./4k_10.json

echo "start bench 4k 15"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=3072     --random-output=1024     --max-concurrency=15     --num-prompts=30     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./4k_15.json

echo "start bench 4k 20"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=3072     --random-output=1024     --max-concurrency=20     --num-prompts=40     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./4k_20.json

echo "start bench 4k 30"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=3072     --random-output=1024     --max-concurrency=30     --num-prompts=60     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./4k_30.json

echo "start bench 8k 1"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=7168     --random-output=1024     --max-concurrency=1     --num-prompts=2     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./8k_1.json

echo "start bench 8k 10"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=7168     --random-output=1024     --max-concurrency=10     --num-prompts=20     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./8k_10.json

echo "start bench 8k 15"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=7168     --random-output=1024     --max-concurrency=15     --num-prompts=30     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./8k_15.json

echo "start bench 8k 20"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=7168     --random-output=1024     --max-concurrency=20     --num-prompts=40     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./8k_20.json

echo "start bench 8k 30"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=7168     --random-output=1024     --max-concurrency=30     --num-prompts=60     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./8k_30.json

echo "start bench 16k 1"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=15360     --random-output=1024     --max-concurrency=1     --num-prompts=2     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./16k_1.json

echo "start bench 16k 10"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=15360     --random-output=1024     --max-concurrency=10     --num-prompts=20     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./16k_10.json

echo "start bench 16k 15"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=15360     --random-output=1024     --max-concurrency=15     --num-prompts=30     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./16k_15.json

echo "start bench 16k 20"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=15360     --random-output=1024     --max-concurrency=20     --num-prompts=40     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./16k_20.json

echo "start bench 16k 30"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=15360     --random-output=1024     --max-concurrency=30     --num-prompts=60     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./16k_30.json

echo "start bench 32k 1"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=31744     --random-output=1024     --max-concurrency=1     --num-prompts=2     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./32k_1.json

echo "start bench 32k 10"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=31744     --random-output=1024     --max-concurrency=10     --num-prompts=20     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./32k_10.json

echo "start bench 32k 15"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=31744     --random-output=1024     --max-concurrency=15     --num-prompts=30     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./32k_15.json

echo "start bench 32k 20"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=31744     --random-output=1024     --max-concurrency=20     --num-prompts=40     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./32k_20.json

echo "start bench 32k 30"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=31744     --random-output=1024     --max-concurrency=30     --num-prompts=60     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./32k_30.json

echo "start bench 64k 1"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=64512     --random-output=1024     --max-concurrency=1     --num-prompts=2     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./64k_1.json

echo "start bench 64k 10"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=64512     --random-output=1024     --max-concurrency=10     --num-prompts=20     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./64k_10.json

echo "start bench 64k 15"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=64512     --random-output=1024     --max-concurrency=15     --num-prompts=30     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./64k_15.json

echo "start bench 64k 20"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=64512     --random-output=1024     --max-concurrency=20     --num-prompts=40     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./64k_20.json

echo "start bench 64k 30"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=64512     --random-output=1024     --max-concurrency=30     --num-prompts=60     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./64k_30.json

echo "start bench 128k 1"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=130048     --random-output=1024     --max-concurrency=1     --num-prompts=2     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./128k_1.json

echo "start bench 128k 10"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=130048     --random-output=1024     --max-concurrency=10     --num-prompts=20     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./128k_10.json

echo "start bench 128k 15"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=130048     --random-output=1024     --max-concurrency=15     --num-prompts=30     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./128k_15.json

echo "start bench 128k 20"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=130048     --random-output=1024     --max-concurrency=20     --num-prompts=40     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./128k_20.json

echo "start bench 128k 30"
python3 -m sglang.bench_serving --backend sglang     --model deepseek-ai/DeepSeek-R1   --port 8000     --dataset-name=random --random-input=130048     --random-output=1024     --max-concurrency=30     --num-prompts=60     --random-range-ratio 0.9     --dataset-path ~/ShareGPT_V3_unfiltered_cleaned_split.json --output-file ./128k_30.json

操作步骤

步骤一：创建RDS Custom AI节点实例

步骤二：获取RDS Custom实例数据盘的盘符和eth1网卡的IP地址

步骤三：启动DeepSeek-R1模型

主从集群

单节点实例

步骤四：查询模型启动日志

模型验证

（可选）满血版DeepSeek-R1模型性能测试

相关文档