Tensorflow

EAS内置的Tensorflow Processor支持将Tensorflow标准的Savedmodel格式的模型部署成在线服务。本文为您介绍如何部署及调用Tensorflow模型服务。

背景信息

对于KerasCheckpoint模型,您需要先将其转换为Savedmodel模型,再进行部署,详情请参见TensorFlow模型如何导出为SavedModel。PAI-Blade优化过的模型可以直接运行。

Tensorflow Processor版本说明

Tensorflow支持多个版本,包括GPUCPU版本,服务部署时无特殊需求可以使用最新版本。Tensorflow版本功能会向前兼容,新版本性能相对较好。各个Tensorflow版本对应的Processor名称如下表所示。

Processor名称

Tensorflow版本

是否支持GPU版本

tensorflow_cpu_1.12

Tensorflow 1.12

tensorflow_cpu_1.14

Tensorflow 1.14

tensorflow_cpu_1.15

Tensorflow 1.15

tensorflow_cpu_2.3

Tensorflow 2.3

tensorflow_cpu_2.4

Tensorflow 2.4

tensorflow_cpu_2.7

Tensorflow 2.7

tensorflow_gpu_1.12

Tensorflow 1.12

tensorflow_gpu_1.14

Tensorflow 1.14

tensorflow_gpu_1.15

Tensorflow 1.15

tensorflow_gpu_2.4

Tensorflow 2.4

tensorflow_gpu_2.7

Tensorflow 2.7

步骤一:部署服务

  1. 可选:配置请求预热文件。

    对于部分TensorFlow模型,初次调用时需要将模型相关文件或参数加载到内存中,该过程可能耗费较长时间,从而导致前几次请求模型服务的RT较长,甚至出现408(请求计算超时)或450(超出队列长度丢弃请求)等情况。为保证服务在滚动更新过程中不会发生抖动,需要在服务部署时增加相关参数对模型进行预热,以保证在预热完成后服务实例才会正式接收流量,具体操作,详情请参见高级配置:模型服务预热

  2. 部署服务。

    使用eascmd客户端部署Tensorflow模型服务时,您需要指定Processor种类Processor名称,服务配置文件示例如下。

    {
      "name": "tf_serving_test",
      "model_path": "http://examplebucket.oss-cn-shanghai.aliyuncs.com/models/model.tar.gz",
      "processor": "tensorflow_cpu_1.15",
      "warm_up_data_path":"oss://path/to/warm_up_test.bin", // 模型预热的请求文件路径。    
      "metadata": {
        "instance": 1,
        "cpu": 1,
        "memory": 4000
      }
    }

    关于如何使用客户端工具部署服务,详情请参见服务部署:EASCMD&DSW

    您也可以通过控制台部署Tensorflow模型服务,详情请参见服务部署:控制台

  3. Tensorflow服务部署完成后,在PAI EAS模型在线服务页面,单击待调用服务服务方式列下的调用信息,查看服务访问的Endpoint和用于服务鉴权的Token信息。

步骤二:调用服务

Tensorflow服务输入输出格式为protobuf,不是纯文本,而在线调试目前仅支持纯文本的输入输出数据,因此无法使用控制台的在线调试功能。

EAS提供了不同版本的SDK,对服务请求和响应数据进行了封装,且SDK内部包含了直连和容错相关的机制,推荐使用SDK来构建和发送请求,具体操作步骤如下。

  1. 查询模型结构。

    对于标准的Savedmodel格式的模型,对服务发送空请求,会返回JSON格式的模型结构信息。

    // 发送空请求。
    $ curl 1828488879222***.cn-shanghai.pai-eas.aliyuncs.com/api/predict/mnist_saved_model_example -H 'Authorization: YTg2ZjE0ZjM4ZmE3OTc0NzYxZDMyNmYzMTJjZTQ1***'
    
    // 返回模型结构信息。
    {
        "inputs": [
            {
                "name": "images",
                "shape": [
                    -1,
                    784
                ],
                "type": "DT_FLOAT"
            }
        ],
        "outputs": [
            {
                "name": "scores",
                "shape": [
                    -1,
                    10
                ],
                "type": "DT_FLOAT"
            }
        ],
        "signature_name": "predict_images"
    }              
    说明

    对于freezon pb格式的模型,无法获取模型结构信息。

  2. 发送推理请求。

    使用Python SDK发送模型请求示例如下。

    #!/usr/bin/env python
    
    from eas_prediction import PredictClient
    from eas_prediction import TFRequest
    
    if __name__ == '__main__':
        client = PredictClient('http://1828488879222***.cn-shanghai.pai-eas.aliyuncs.com', 'mnist_saved_model_example')
        client.set_token('YTg2ZjE0ZjM4ZmE3OTc0NzYxZDMyNmYzMTJjZTQ1****')
        client.init()
    
        req = TFRequest('predict_images')
        req.add_feed('images', [1, 784], TFRequest.DT_FLOAT, [1] * 784)
        for x in range(0, 1000000):
            resp = client.predict(req)
            print(resp)

    关于代码中的参数配置说明,详情请参见Python SDK使用说明

后续您也可以自行构建服务请求,详情请参见请求格式

请求格式

Tensorflow processor输入输出为protobuf格式。当您使用SDK发送请求时,SDK对请求进行了封装,您只需根据SDK提供的函数来构建请求即可。如果您希望自行构建服务请求,则可以参考如下pb定义来生成相关的代码,详情请参见TensorFlow服务请求构造

syntax = "proto3";
option cc_enable_arenas = true;
option java_package = "com.aliyun.openservices.eas.predict.proto";
option java_outer_classname = "PredictProtos";
enum ArrayDataType {
  // Not a legal value for DataType. Used to indicate a DataType field
  // has not been set.
  DT_INVALID = 0;
  // Data types that all computation devices are expected to be
  // capable to support.
  DT_FLOAT = 1;
  DT_DOUBLE = 2;
  DT_INT32 = 3;
  DT_UINT8 = 4;
  DT_INT16 = 5;
  DT_INT8 = 6;
  DT_STRING = 7;
  DT_COMPLEX64 = 8;  // Single-precision complex.
  DT_INT64 = 9;
  DT_BOOL = 10;
  DT_QINT8 = 11;     // Quantized int8.
  DT_QUINT8 = 12;    // Quantized uint8.
  DT_QINT32 = 13;    // Quantized int32.
  DT_BFLOAT16 = 14;  // Float32 truncated to 16 bits.  Only for cast ops.
  DT_QINT16 = 15;    // Quantized int16.
  DT_QUINT16 = 16;   // Quantized uint16.
  DT_UINT16 = 17;
  DT_COMPLEX128 = 18;  // Double-precision complex.
  DT_HALF = 19;
  DT_RESOURCE = 20;
  DT_VARIANT = 21;  // Arbitrary C++ data types.
}
// Dimensions of an array.
message ArrayShape {
  repeated int64 dim = 1 [packed = true];
}
// Protocol buffer representing an array.
message ArrayProto {
  // Data Type.
  ArrayDataType dtype = 1;
  // Shape of the array.
  ArrayShape array_shape = 2;
  // DT_FLOAT.
  repeated float float_val = 3 [packed = true];
  // DT_DOUBLE.
  repeated double double_val = 4 [packed = true];
  // DT_INT32, DT_INT16, DT_INT8, DT_UINT8.
  repeated int32 int_val = 5 [packed = true];
  // DT_STRING.
  repeated bytes string_val = 6;
  // DT_INT64.
  repeated int64 int64_val = 7 [packed = true];
  // DT_BOOL.
  repeated bool bool_val = 8 [packed = true];
}
// PredictRequest specifies which TensorFlow model to run, as well as
// how inputs are mapped to tensors and how outputs are filtered before
// returning to user.
message PredictRequest {
  // A named signature to evaluate. If unspecified, the default signature
  // will be used.
  string signature_name = 1;
  // Input tensors.
  // Names of input tensor are alias names. The mapping from aliases to real
  // input tensor names is expected to be stored as named generic signature
  // under the key "inputs" in the model export.
  // Each alias listed in a generic signature named "inputs" should be provided
  // exactly once in order to run the prediction.
  map<string, ArrayProto> inputs = 2;
  // Output filter.
  // Names specified are alias names. The mapping from aliases to real output
  // tensor names is expected to be stored as named generic signature under
  // the key "outputs" in the model export.
  // Only tensors specified here will be run/fetched and returned, with the
  // exception that when none is specified, all tensors specified in the
  // named signature will be run/fetched and returned.
  repeated string output_filter = 3;
}
// Response for PredictRequest on successful run.
message PredictResponse {
  // Output tensors.
  map<string, ArrayProto> outputs = 1;
}