如何通过RPA调用OCR_机器人流程自动化-阿里云帮助中心

element_text

element_text(element, element_index=1, engine='google', window=None, eliminate_spaces=False)

方法描述

在控件区域中，通过ocr获取所有文本

参数说明

element<str>控件名

element_index<int>控件位置

engine<str>引擎

可选项：

google : 谷歌
aliyun : 阿里云
paddle : 飞桨

eliminate_spaces<bool>是否去掉识别结果中的空格

window<object>控件所在窗口对象

返回值说明

返回识别结果<str>

调用样例- rpa.ai.ocr.element_text-

# 注意事项：
# 1. 使用此方法前需要先通过捕捉控件功能捕捉对应控件
# 2. 执行时，需要确保控件所在的页面是打开状态
# 代码调用样例如下：
page = rpa.app.chrome.create('www.taobao.com')
text = rpa.ai.ocr.element_text('淘宝logo-chrome',engine='paddle')

click

click(element, keyword, element_index=1, keyword_index=1, engine='google', button='left', offset_x=0, offset_y=0, window=None, timeout=15)

方法描述

在控件区域中，通过ocr找到keyword的子区域，并对子区域的中心点为坐标，模拟鼠标点击

参数说明

element<str>控件名

keyword<str>关键词

element_index<int>控件位置

keyword_index<int>关键词位置

engine<str>引擎

可选项：

google : 谷歌
aliyun : 阿里云
paddle : 飞桨

button<str>鼠标键位

可选项：

left : 左键
right : 右键

offset_x<int>横向偏移量

offset_y<int>纵向偏移量

window<object>控件所在窗口对象

timeout<int>超时时间

调用样例- rpa.ai.ocr.click-

# 注意事项：
# 1. 使用此方法前需要先通过捕捉控件功能捕捉对应控件
# 2. 执行时，需要确保控件所在的页面是打开状态
# 3. 此方法会在指定控件上，识别指定的关键词文本，以识别结果为原点，根据设定的偏移量移动鼠标，然后执行模拟点击。
# 代码调用样例如下，本例中从页面元素上识别关键词"文档"，随后将鼠标移动到关键词上执行模拟点击动作：
page = rpa.app.chrome.create('www.aliyun.com')
rpa.ai.ocr.click('阿里云右上角banner-chrome','文档',engine='paddle',offset_x=0,offset_y=0)

double_click

double_click(element, keyword, element_index=1, keyword_index=1, engine='google', offset_x=0, offset_y=0, window=None, timeout=15)

方法描述

在控件区域中，通过ocr找到keyword的子区域，并对子区域的中心点为坐标，模拟鼠标双击

参数说明

element<str>控件名

keyword<str>关键词

element_index<int>控件位置

keyword_index<int>关键词位置

engine<str>引擎

可选项：

google : 谷歌
aliyun : 阿里云
paddle : 飞桨

offset_x<int>横向偏移量

offset_y<int>纵向偏移量

window<object>控件所在窗口对象

timeout<int>超时时间

调用样例- rpa.ai.ocr.double_click-

# 注意事项：
# 1. 使用此方法前需要先通过捕捉控件功能捕捉对应控件
# 2. 执行时，需要确保控件所在的页面是打开状态
# 3. 此方法会在指定控件上，识别指定的关键词文本，以识别结果为原点，根据设定的偏移量移动鼠标，然后执行模拟双击。
# 代码调用样例如下，本例中从页面元素上识别关键词"文档"，随后将鼠标移动到关键词上执行模拟双击动作：
page = rpa.app.chrome.create('www.aliyun.com')
rpa.ai.ocr.double_click('阿里云右上角banner-chrome','文档',engine='paddle',offset_x=0,offset_y=0)

input_text

input_text(element, keyword, value, element_index=1, keyword_index=1, engine='google', simulate=False, offset_x=0, offset_y=0, window=None, wait_mili_seconds=20, timeout=15)

方法描述

在控件区域中，通过ocr找到keyword的子区域，并对子区域的中心点为坐标，模拟键盘输入

参数说明

element<str>控件名

keyword<str>关键词

value<str>输入的内容

element_index<int>控件位置

keyword_index<int>关键词位置

engine<str>引擎

可选项：

google : 谷歌
aliyun : 阿里云
paddle : 飞桨

offset_x<int>横向偏移量

offset_y<int>纵向偏移量

window<object>控件所在窗口对象

wait_mili_seconds<int>字符间输入间隔（毫秒），仅在模拟输入下有效，默认值为20，最大值100，该值设置过大可能会引起超时

timeout<int>超时时间

调用样例- rpa.ai.ocr.input_text-

# 注意事项：
# 1. 使用此方法前需要先通过捕捉控件功能捕捉对应控件
# 2. 执行时，需要确保控件所在的页面是打开状态
# 3. 此方法会在指定控件上，识别指定的关键词文本，以识别结果为原点，根据设定的偏移量移动鼠标，然后模拟输入指定文本。
# 代码调用样例如下，本例中从页面元素上识别关键词"百度"，随后将鼠标左移350个像素，然后模拟输入指定内容：
page = rpa.app.chrome.create('www.baidu.com')
rpa.ai.ocr.input_text('百度一下-chrome','百度','阿里云RPA',engine='paddle',offset_x=-350)

mouse_move

mouse_move(element, keyword, element_index=1, keyword_index=1, engine='google', offset_x=0, offset_y=0, window=None, timeout=15)

方法描述

在控件区域中，通过ocr找到keyword的子区域，并对子区域的中心点为坐标进行鼠标移动

参数说明

element<str>控件名

keyword<str>关键词

element_index<int>控件位置

keyword_index<int>关键词位置

engine<str>引擎

可选项：

google : 谷歌
aliyun : 阿里云
paddle : 飞桨

offset_x<int>横向偏移量

offset_y<int>纵向偏移量

window<object>控件所在窗口对象

timeout<int>超时时间

调用样例- rpa.ai.ocr.mouse_move-

# 注意事项：
# 1. 使用此方法前需要先通过捕捉控件功能捕捉对应控件
# 2. 执行时，需要确保控件所在的页面是打开状态
# 3. 此方法会在指定控件上，识别指定的关键词文本，以识别结果为原点，根据设定的偏移量移动鼠标。
# 代码调用样例如下，本例中从页面元素上识别关键词"文档"，随后将鼠标移动到关键词上：
page = rpa.app.chrome.create('www.aliyun.com')
rpa.ai.ocr.mouse_move('阿里云右上角banner-chrome','文档',engine='paddle',offset_x=0,offset_y=0)

text

text(image_path, engine='aliyun', app_code='', detail=False, eliminate_spaces=False)

说明

若OCR引擎选择阿里云，需要购买指定的阿里云OCR服务方可使用。

方法描述

文字识别

参数说明

image_path<str>图片的路径

engine<str>引擎

可选项：

google : 谷歌
aliyun : 阿里云
paddle : 飞桨

app_code<str>OCR文字识别appcode

detail<str>是否需要识别文字的详细信息

eliminate_spaces<bool>是否去掉文字识别结果中的空格（仅在detail为False时有效）

返回值说明

返回识别结果<str>

调用样例- rpa.ai.ocr.text-

# 注意事项：无
# 代码调用样例如下：
image_path = r'D:\2_测试文件归档\OCR文字识别.jpg'
text = rpa.ai.ocr.text(image_path,engine='paddle')

id_card

id_card(image_path)

方法描述

身份证识别

参数说明

image_path<str>身份证图片路径

返回值说明

返回识别结果<CardFront>

调用样例- rpa.ai.ocr.id_card-

# 注意事项：内置SDK使用的OCR能力需要额外购买，使用前请在控制台-授权管理-AI中确认是否已授权
# 代码调用样例如下：
image_path = r'D:\2_测试文件归档\OCR身份证识别.png'
id_card_data = rpa.ai.ocr.id_card(image_path)

vat_invoice

vat_invoice(image_path, app_code='')

方法描述

增值发票识别

参数说明

image_path<str>图片路径

app_code<str>OCR发票识别appcode

调用样例- rpa.ai.ocr.vat_invoice-

# 注意事项：
# 1. 内置SDK使用的OCR能力需要额外购买，使用前请在控制台-授权管理-AI中确认是否已授权
# 2. 此方法支持以输入appcode的形式调用阿里云API相关服务，appcode的获取及使用可参考：https://help.aliyun.com/document_detail/157953.html
# 代码调用样例如下：
image_path = r'D:\2_测试文件归档\OCR发票识别.png'
vat_invoice_data = rpa.ai.ocr.vat_invoice(image_path,app_code='')

business_license

business_license(image_path)

方法描述

营业执照

参数说明

image_path<str>营业执照图片路径

返回值说明

返回识别结果<json>

调用样例- rpa.ai.ocr.business_license-

# 注意事项：内置SDK使用的OCR能力需要额外购买，使用前请在控制台-授权管理-AI中确认是否已授权
# 代码调用样例如下：
image_path = r'D:\2_测试文件归档\OCR营业执照识别.png'
business_license_data = rpa.ai.ocr.business_license(image_path)

house_cert

house_cert(image_path)

方法描述

房产证

参数说明

image_path<str>房产证图片路径

返回值说明

返回识别结果<json>

调用样例- rpa.ai.ocr.house_cert-

# 注意事项：内置SDK使用的OCR能力需要额外购买，使用前请在控制台-授权管理-AI中确认是否已授权
# 代码调用样例如下：
image_path = r'D:\2_测试文件归档\OCR房产证识别.png'
house_cert_data = rpa.ai.ocr.house_cert(image_path)

bank_card

bank_card(image_path)

方法描述

银行卡

参数说明

image_path<str>银行卡图片路径

返回值说明

返回识别结果<json>

调用样例- rpa.ai.ocr.bank_card-

# 注意事项：内置SDK使用的OCR能力需要额外购买，使用前请在控制台-授权管理-AI中确认是否已授权
# 代码调用样例如下：
image_path = r'D:\2_测试文件归档\OCR银行卡.png'
bank_card_data = rpa.ai.ocr.bank_card(image_path)

drivers_license

drivers_license(image_path)

方法描述

驾驶证

参数说明

image_path<str>驾驶证图片路径

返回值说明

返回识别结果<json>

调用样例- rpa.ai.ocr.drivers_license-

# 注意事项：内置SDK使用的OCR能力需要额外购买，使用前请在控制台-授权管理-AI中确认是否已授权
# 代码调用样例如下：
image_path = r'D:\2_测试文件归档\OCR驾驶证识别.png'
drivers_license_data = rpa.ai.ocr.drivers_license(image_path)

vehicle_license

vehicle_license(image_path)

方法描述

行驶证

参数说明

image_path<str>行驶证图片路径

返回值说明

返回识别结果<json>

调用样例- rpa.ai.ocr.vehicle_license-

# 注意事项：内置SDK使用的OCR能力需要额外购买，使用前请在控制台-授权管理-AI中确认是否已授权
# 代码调用样例如下：
image_path = r'D:\2_测试文件归档\OCR行驶证识别.png'
vehicle_license_data = rpa.ai.ocr.vehicle_license(image_path)

passport

passport(image_path)

方法描述

护照

参数说明

image_path<str>护照

返回值说明

返回识别结果<json>

调用样例- rpa.ai.ocr.passport-

# 注意事项：内置SDK使用的OCR能力需要额外购买，使用前请在控制台-授权管理-AI中确认是否已授权
# 代码调用样例如下：
image_path = r'D:\2_测试文件归档\OCR护照识别.png'
passport_data = rpa.ai.ocr.passport(image_path)

real_estate_cert

real_estate_cert(image_path)

方法描述

不动产证

参数说明

image_path<str>不动产证

返回值说明

返回识别结果<json>

调用样例- rpa.ai.ocr.real_estate_cert-

# 注意事项：内置SDK使用的OCR能力需要额外购买，使用前请在控制台-授权管理-AI中确认是否已授权
# 代码调用样例如下：
image_path = r'D:\2_测试文件归档\OCR不动产证识别.png'
real_estate_cert_data = rpa.ai.ocr.real_estate_cert(image_path)

food_permit

food_permit(image_path)

方法描述

⻝品经营许可证

参数说明

image_path<str>⻝品经营许可证

返回值说明

返回识别结果<json>

调用样例- rpa.ai.ocr.food_permit-

# 注意事项：内置SDK使用的OCR能力需要额外购买，使用前请在控制台-授权管理-AI中确认是否已授权
# 代码调用样例如下：
image_path = r'D:\2_测试文件归档\OCR食品经营许可.png'
food_permit_data = rpa.ai.ocr.food_permit(image_path)

bank_account_permit

bank_account_permit(image_path)

方法描述

银⾏开户许可证

参数说明

image_path<str>银⾏开户许可证

返回值说明

返回识别结果<json>

调用样例- rpa.ai.ocr.bank_account_permit-

# 注意事项：内置SDK使用的OCR能力需要额外购买，使用前请在控制台-授权管理-AI中确认是否已授权
# 代码调用样例如下：
image_path = r'D:\2_测试文件归档\OCR银行开户许可.png'
bank_account_permit_data = rpa.ai.ocr.bank_account_permit(image_path)

car_invoice

car_invoice(image_path)

方法描述

机动车发票

参数说明

image_path<str>机动车发票图片路径

返回值说明

返回识别结果<json>

调用样例- rpa.ai.ocr.car_invoice-

# 注意事项：内置SDK使用的OCR能力需要额外购买，使用前请在控制台-授权管理-AI中确认是否已授权
# 代码调用样例如下：
image_path = r'D:\2_测试文件归档\OCR机动车发票.png'
car_invoice_data = rpa.ai.ocr.car_invoice(image_path)

train_ticket

train_ticket(image_path)

方法描述

火车票

参数说明

image_path<str>火车票图片路径

返回值说明

返回识别结果<json>

调用样例- rpa.ai.ocr.train_ticket-

# 注意事项：内置SDK使用的OCR能力需要额外购买，使用前请在控制台-授权管理-AI中确认是否已授权
# 代码调用样例如下：
image_path = r'D:\2_测试文件归档\OCR火车票.png'
train_ticket_data = rpa.ai.ocr.train_ticket(image_path)

table

table(file_path, index=1)

方法描述

识别表格

参数说明

file_path<str>图片文件名

index<str>页面中的表格索引

返回值说明

返回表格内容的二维数组<list>

调用样例- rpa.ai.ocr.table-

# 注意事项：内置SDK使用的OCR能力需要额外购买，使用前请在控制台-授权管理-AI中确认是否已授权
# 图片中存在多个表格的情况，可以设置参数Index来指定需识别的单个表格，index从1开始。
# 代码调用样例如下：
image_path = r'D:\2_测试文件归档\表格图片.png'
table_data = rpa.ai.ocr.table(image_path,index=2)

tables

tables(file_path)

方法描述

识别页面中的所有表格信息，用于页面中有多个表格的情况

参数说明

file_path<str>图片文件名

返回值说明

返回识别结果的三维数组，数组的第一维代表页面中的表格索引，后两个维度代表表格内容<object>

调用样例- rpa.ai.ocr.tables-

# 注意事项：内置SDK使用的OCR能力需要额外购买，使用前请在控制台-授权管理-AI中确认是否已授权
# 返回值形如[ 表格1内容（二维列表）, 表格2内容（二维列表）]
# 代码调用样例如下：
image_path = r'D:\2_测试文件归档\表格图片.png'
table_data = rpa.ai.ocr.tables(image_path)

is_key_existing

is_key_existing(element, keyword, element_index=1, engine='google', window=None)

方法描述

判断OCR识别结果中是否存在关键词

参数说明

element<str>控件名

keyword<str>关键词

element_index<int>控件位置

engine<str>引擎

可选项：

google : 谷歌
aliyun : 阿里云
paddle : 飞桨

window<object>控件所在窗口对象

返回值说明

返回关键词是否存在于OCR识别结果中<bool>

调用样例- rpa.ai.ocr.is_key_existing-

# 注意事项：
# 1. 使用此方法前需要先通过捕捉控件功能捕捉对应控件
# 2. 执行时，需要确保控件所在的页面是打开状态
# 代码调用样例如下，本例中判断对应元素的文本识别结果中是否包括关键词"文档"：
page = rpa.app.chrome.create('www.aliyun.com')
flag = rpa.ai.ocr.is_key_existing('阿里云右上角banner-chrome',keyword='文档',engine='paddle')