PAI端到端文字识别预测_人工智能平台 PAI(PAI)-阿里云帮助中心

PAI-EasyVision提供端到端文字识别的训练及预测功能，支持多机分布式训练和预测。本文为您介绍如何通过PAI-EasyVision使用已有的训练模型完成端到端文字识别的离线预测任务。

数据格式

端到端文字识别预测

基于已有的文件列表，您可以通过PAI命令启动端到端文字识别的离线预测任务，示例如下。您可以使用SQL脚本组件进行PAI命令调用，也可以使用MaxCompute客户端或DataWorks的开发节点进行PAI命令调用，详情请参见使用本地客户端（odpscmd）连接或开发ODPS SQL任务。

pai -name ev_predict_ext
             -Dmodel_path='您的模型路径'
             -Dmodel_type='text_spotter'
             -Dinput_oss_file='oss://path/to/your/filelist.txt'
             -Doutput_oss_file='oss://path/to/your/result.txt'
             -Dimage_type='url'
             -Dnum_worker=2
             -DcpuRequired=800
             -DgpuRequired=100
             -Dbuckets='您的OSS目录'
             -Darn='您的rolearn'
             -DossHost='您的OSS域名'

详细的参数解释请参见参数说明。

输出结果

结果文件的每行表示原始图片路径及模型预测结果（格式为JSON字符串），示例如下。

oss://path/to/your/image1.jpg,  JSON格式结果字符串
oss://path/to/your/image1.jpg,  JSON格式结果字符串
oss://path/to/your/image1.jpg,  JSON格式结果字符串

JSON格式结果字符串的示例如下。

{
  "detection_keypoints": [[[243.57516479492188, 198.84210205078125], [243.91038513183594, 247.62425231933594], [385.5513916015625, 246.61660766601562], [385.2197570800781, 197.79345703125]], [[292.2718200683594, 114.44700622558594], [292.2237243652344, 164.684814453125], [571.1962890625, 164.931640625], [571.2444458007812, 114.67433166503906]]],
  "detection_boxes": [[243.5308074951172, 197.69570922851562, 385.59625244140625, 247.7247772216797], [292.1929931640625, 114.28043365478516, 571.2748413085938, 165.09771728515625]],
  "detection_scores": [0.9942291975021362, 0.9940272569656372],
  "detection_classes": [1, 1],
  "detection_classe_names": ["text", "text"],
  "detection_texts_ids" : [[1,2,2008,12], [1,2,2008,12]],
  "detection_texts": ["这是示例", "这是示例"],
  "detection_texts_scores" : [0.88, 0.88]
 }

其中的参数解释如下表所示。

参数	描述	Shape	数据类型
detection_boxes	检测到的文字框，坐标顺序为[top, left, bottom, right]。	[num_detections, 4]	FLOAT
detection_scores	文字检测概率。	num_detections	FLOAT
detection_classes	文字区域类别ID。	num_detections	INT
detection_class_names	文字区域类别名称。	num_detections	STRING
detection_keypoints	检测到的文字区域四个角的点，每个点的坐标为(y,x)。	[num_detections, 4, 2]	FLOAT
detection_texts_ids	单行文字识别类别ID。	[num_detections, max_text_length]	INT
detection_texts	单行文字识别结果。	[num_detections]	STRING
detection_texts_scores	单行文字识别概率。	[num_detections]	FLOAT