本文介绍如何通过Arena把Tensorflow模型部署成推理服务。
操作步骤
- 执行以下命令,检查集群中可用的GPU资源。
arena top node
预期输出:
NAME IPADDRESS ROLE STATUS GPU(Total) GPU(Allocated)
cn-beijing.192.168.0.100 192.168.0.100 <none> Ready 1 0
cn-beijing.192.168.0.101 192.168.0.101 <none> Ready 1 0
cn-beijing.192.168.0.99 192.168.0.99 <none> Ready 1 0
---------------------------------------------------------------------------------------------------
Allocated/Total GPUs of nodes which own resource nvidia.com/gpu In Cluster:
0/3 (0.0%)
从上述输出可知,该集群有3个GPU节点可以用来部署模型。
- 把准备好的模型上传到OSS上。具体操作,请参见上传文件。
- 使用以下YAML文件,创建PV和PVC。
apiVersion: v1
kind: PersistentVolume
metadata:
name: model-csi-pv
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
csi:
driver: ossplugin.csi.alibabacloud.com
volumeHandle: model-csi-pv // 需要和PV名字一致。
volumeAttributes:
bucket: "Your Bucket"
url: "Your oss url"
akId: "Your Access Key Id"
akSecret: "Your Access Key Secret"
otherOpts: "-o max_stat_cache_size=0 -o allow_other"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: model-pvc
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
- 执行以下命令,使用
tensorflow serving
部署模型。arena serve tensorflow \
--name=bert-tfserving \
--model-name=chnsenticorp \
--gpus=1 \
--image=tensorflow/serving:1.15.0-gpu \
--data=model-pvc:/models \
--model-path=/models/tensorflow \
--version-policy=specific:1623831335
预期输出:
configmap/bert-tfserving-202106251556-tf-serving created
configmap/bert-tfserving-202106251556-tf-serving labeled
configmap/bert-tfserving-202106251556-tensorflow-serving-cm created
service/bert-tfserving-202106251556-tensorflow-serving created
deployment.apps/bert-tfserving-202106251556-tensorflow-serving created
INFO[0003] The Job bert-tfserving has been submitted successfully
INFO[0003] You can run `arena get bert-tfserving --type tf-serving` to check the job status
- 执行以下命令,查看
tensorflow serving
部署情况。arena serve list
预期输出:
NAME TYPE VERSION DESIRED AVAILABLE ADDRESS PORTS
bert-tfserving Tensorflow 202106251556 1 1 172.16.95.171 GRPC:8500,RESTFUL:8501
- 执行以下命令,查看推理服务详情。
arena serve get bert-tfserving
预期输出:
Name: bert-tfserving
Namespace: inference
Type: Tensorflow
Version: 202106251556
Desired: 1
Available: 1
Age: 4m
Address: 172.16.95.171
Port: GRPC:8500,RESTFUL:8501
Instances:
NAME STATUS AGE READY RESTARTS NODE
---- ------ --- ----- -------- ----
bert-tfserving-202106251556-tensorflow-serving-8554d58d67-jd2z9 Running 4m 1/1 0 cn-beijing.192.168.0.88
从上述输出可知,通过tensorflow serving
部署模型成功,并提供了8500(GRPC)和8501(HTTP)两个API端口。
- 配置公网Ingress。具体操作,请参见创建Ingress路由。
说明 通过
arena serve tensorflow
部署的推理服务默认提供的是ClusterIP,不能直接访问。您需要为推理服务创建Ingress,具体参数设置如下:
- 命名空间选择inference。
- 服务的端口配置为8501端口(RESTFUL)。
- 路由创建成功后,您可以在路由页面的规则列获取到Ingress地址。

- 利用获取的Ingress地址,执行以下命令,调用推理服务接口。更多关于
tensorflow serving
的信息,请参见API文档Tensorflow Serving API。curl "http://<Ingress地址>"
预期输出:
{
"model_version_status": [
{
"version": "1623831335",
"state": "AVAILABLE",
"status": {
"error_code": "OK",
"error_message": ""
}
}
]
}
从上述输出可知您已成功调用推理服务接口,即已成功部署推理服务。