Tensorflow_人工智能平台 PAI(PAI)-阿里云帮助中心

EAS内置的Tensorflow Processor支持将Tensorflow标准的Savedmodel格式的模型部署成在线服务。本文为您介绍如何部署及调用该模型服务。

背景信息

对于Keras和Checkpoint模型，您需要先将其转换为Savedmodel模型，再进行部署，详情请参见TensorFlow模型如何导出为SavedModel。PAI-Blade优化过的模型可以直接运行。

Tensorflow Processor版本说明

Tensorflow支持多个版本，包括GPU和CPU版本，服务部署时无特殊需求可以使用最新版本。Tensorflow版本功能向前兼容，新版本性能相对较好。各个Tensorflow版本对应的Processor名称如下表所示。

Processor名称	Tensorflow版本	是否支持GPU版本

Processor名称	Tensorflow版本	是否支持GPU版本
tensorflow_cpu_1.12	Tensorflow 1.12	否
tensorflow_cpu_1.14	Tensorflow 1.14	否
tensorflow_cpu_1.15	Tensorflow 1.15	否
tensorflow_cpu_2.3	Tensorflow 2.3	否
tensorflow_cpu_2.4	Tensorflow 2.4	否
tensorflow_cpu_2.7	Tensorflow 2.7	否
tensorflow_gpu_1.12	Tensorflow 1.12	是
tensorflow_gpu_1.14	Tensorflow 1.14	是
tensorflow_gpu_1.15	Tensorflow 1.15	是
tensorflow_gpu_2.4	Tensorflow 2.4	是
tensorflow_gpu_2.7	Tensorflow 2.7	是

步骤一：部署服务

可选：配置请求预热文件。
部分TensorFlow模型初次调用时，需将模型相关文件或参数加载到内存中，可能耗时较长，导致前几次请求模型服务的RT较长，甚至出现408（请求计算超时）或450（超出队列长度丢弃请求）等情况。为避免服务在滚动更新时出现抖动，建议在部署服务时增加相关参数对模型进行预热，以保证在预热完成后服务实例才会正式接收流量，具体操作，详情请参见高级配置：模型服务预热。

部署服务。

使用eascmd客户端部署Tensorflow模型服务时，指定Processor种类为Processor名称，配置文件示例如下。

{
  "name": "tf_serving_test",
  "model_path": "http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/model.tar.gz",
  "processor": "tensorflow_cpu_1.15",
  "warm_up_data_path":"oss://path/to/warm_up_test.bin", // 模型预热的请求文件路径。    
  "metadata": {
    "instance": 1,
    "cpu": 1,
    "memory": 4000
  }
}

关于如何使用客户端工具部署服务，详情请参见服务部署：EASCMD&DSW。

您也可以通过控制台部署Tensorflow模型服务，详情请参见服务部署：控制台。

Tensorflow服务部署完成后，在模型在线服务（EAS）页面，单击待调用服务的服务方式列下的调用信息，查看服务访问的Endpoint和用于服务鉴权的Token信息。

步骤二：调用服务

由于Tensorflow服务输入输出格式为protobuf（不是纯文本），而在线调试仅支持纯文本的输入输出数据，因此无法使用控制台的在线调试功能。

EAS提供了SDK，来封装服务请求和响应数据，且SDK内部包含了直连和容错相关的机制，推荐使用SDK来构建和发送请求，具体操作步骤如下。

查询模型结构。

对于标准的Savedmodel格式的模型，对服务发送空请求，会返回JSON格式的模型结构信息。

// 发送空请求。
$ curl 1828488879222***.cn-shanghai.pai-eas.aliyuncs.com/api/predict/mnist_saved_model_example -H 'Authorization: YTg2ZjE0ZjM4ZmE3OTc0NzYxZDMyNmYzMTJjZTQ1***'

// 返回模型结构信息。
{
    "inputs": [
        {
            "name": "images",
            "shape": [
                -1,
                784
            ],
            "type": "DT_FLOAT"
        }
    ],
    "outputs": [
        {
            "name": "scores",
            "shape": [
                -1,
                10
            ],
            "type": "DT_FLOAT"
        }
    ],
    "signature_name": "predict_images"
}

说明

对于freezon pb格式的模型，无法获取模型结构信息。

发送推理请求。

使用Python SDK发送模型请求示例如下。

#!/usr/bin/env python

from eas_prediction import PredictClient
from eas_prediction import TFRequest

if __name__ == '__main__':
    client = PredictClient('http://1828488879222***.cn-shanghai.pai-eas.aliyuncs.com', 'mnist_saved_model_example')
    client.set_token('YTg2ZjE0ZjM4ZmE3OTc0NzYxZDMyNmYzMTJjZTQ1****')
    client.init()

    req = TFRequest('predict_images')
    req.add_feed('images', [1, 784], TFRequest.DT_FLOAT, [1] * 784)
    for x in range(0, 1000000):
        resp = client.predict(req)
        print(resp)

关于代码中的参数配置说明，详情请参见Python SDK使用说明。

后续您也可以自行构建服务请求，详情请参见请求格式。

请求格式

Tensorflow processor输入输出为protobuf格式。当您使用SDK发送请求时，SDK对请求进行了封装，您只需根据SDK提供的函数来构建请求即可。如果您希望自行构建服务请求，则可以参考如下pb定义来生成相关的代码，详情请参见TensorFlow服务请求构造。

syntax = "proto3";
option cc_enable_arenas = true;
option java_package = "com.aliyun.openservices.eas.predict.proto";
option java_outer_classname = "PredictProtos";
enum ArrayDataType {
  // Not a legal value for DataType. Used to indicate a DataType field
  // has not been set.
  DT_INVALID = 0;
  // Data types that all computation devices are expected to be
  // capable to support.
  DT_FLOAT = 1;
  DT_DOUBLE = 2;
  DT_INT32 = 3;
  DT_UINT8 = 4;
  DT_INT16 = 5;
  DT_INT8 = 6;
  DT_STRING = 7;
  DT_COMPLEX64 = 8;  // Single-precision complex.
  DT_INT64 = 9;
  DT_BOOL = 10;
  DT_QINT8 = 11;     // Quantized int8.
  DT_QUINT8 = 12;    // Quantized uint8.
  DT_QINT32 = 13;    // Quantized int32.
  DT_BFLOAT16 = 14;  // Float32 truncated to 16 bits.  Only for cast ops.
  DT_QINT16 = 15;    // Quantized int16.
  DT_QUINT16 = 16;   // Quantized uint16.
  DT_UINT16 = 17;
  DT_COMPLEX128 = 18;  // Double-precision complex.
  DT_HALF = 19;
  DT_RESOURCE = 20;
  DT_VARIANT = 21;  // Arbitrary C++ data types.
}
// Dimensions of an array.
message ArrayShape {
  repeated int64 dim = 1 [packed = true];
}
// Protocol buffer representing an array.
message ArrayProto {
  // Data Type.
  ArrayDataType dtype = 1;
  // Shape of the array.
  ArrayShape array_shape = 2;
  // DT_FLOAT.
  repeated float float_val = 3 [packed = true];
  // DT_DOUBLE.
  repeated double double_val = 4 [packed = true];
  // DT_INT32, DT_INT16, DT_INT8, DT_UINT8.
  repeated int32 int_val = 5 [packed = true];
  // DT_STRING.
  repeated bytes string_val = 6;
  // DT_INT64.
  repeated int64 int64_val = 7 [packed = true];
  // DT_BOOL.
  repeated bool bool_val = 8 [packed = true];
}
// PredictRequest specifies which TensorFlow model to run, as well as
// how inputs are mapped to tensors and how outputs are filtered before
// returning to user.
message PredictRequest {
  // A named signature to evaluate. If unspecified, the default signature
  // will be used.
  string signature_name = 1;
  // Input tensors.
  // Names of input tensor are alias names. The mapping from aliases to real
  // input tensor names is expected to be stored as named generic signature
  // under the key "inputs" in the model export.
  // Each alias listed in a generic signature named "inputs" should be provided
  // exactly once in order to run the prediction.
  map<string, ArrayProto> inputs = 2;
  // Output filter.
  // Names specified are alias names. The mapping from aliases to real output
  // tensor names is expected to be stored as named generic signature under
  // the key "outputs" in the model export.
  // Only tensors specified here will be run/fetched and returned, with the
  // exception that when none is specified, all tensors specified in the
  // named signature will be run/fetched and returned.
  repeated string output_filter = 3;
}
// Response for PredictRequest on successful run.
message PredictResponse {
  // Output tensors.
  map<string, ArrayProto> outputs = 1;
}