本文档提供说明如何通过授权用户托管OSS资源给到阿里云docmind服务,可通过docmind服务读取oss目录中指定文件进行处理,并将结果写到固定文件中。用于提高用户文件隐私性,存储文档和保存结构都在用户OSS资源中,减少用户独立转存结果成本。
方案概览
对于用户文档资源有隐私性要求(文档不出公网),需要文档的处理和解析的结果都在用户指定的存储空间,并且不会进过公网流量进行访问。OSS托管方案支持提供以上场景中,用户有独立的OSS资源,可用于托管指定的访问OSS bucket和访问权限,并可对存储对象做更细粒度权限管控:
1、可通过授权OSS实现文件访问权限细粒度访问控制;
2、可通过托管OSS实现文件处理不出公网;
3、可通过托管OSS实现处理结果权限控制;
方案部署与验证
通过阿里云ram创建授权策略
1、登录阿里云RAM控制台,选择“身份管理”,选择“角色”选项,选择创建角色:
选择类型为“阿里云账号”:可通过扮演RAM角色访问资源
配置角色为角色名称为:“AliyunDocmindAccessingOssRole”,完成创建
2、完成角色创建后,在“权限管理”,选择“权限策略”,进行“创建权限策略”,创建完后可自定义策略名称(本文中策略名称使用为testDocmindAccessOss):
可进行可视化编辑或者脚本编辑,下面以最小化访问权限脚本编辑为例(详细访问OSS权限可阅读教程示例:使用RAM Policy控制OSS的访问权限):
以下为提供OSS的名称为“your-bucket-name”的bucket,对于“your-bucket-directory”目录下面的文件具有:读取文件“GetObject”。即将your-bucket-directory 允许提供给到docmind进行访问,其余文件路径无权限;
docmind服务将处理好的结果放在 [your-bucket-name]/[your-uid]下面,your-uid为主账号uid,授予的为读取文件“GetObject”,写文件“PutObject” 权限,其中GetObject权限用于当调用查询接口时候返回的url信息,“PutObject”权限用于将解析结果写入到路径中。即将[your-uid] 文件夹允许提供到docmind进行访问,其余文件路径无权限
由于托管OSS后,需要将结果写入到指定文件夹,因此需指定“acs:oss:*:*:[your-bucket-name]/[your-uid]/*” 包含PutObject权限;
AliyunDocmindAccessingOssRole 若进行删除改动,默认12小时后失效;
{
"Statement": [
{
"Effect": "Allow",
"Action": [
"oss:GetObject"
],
"Resource": "acs:oss:*:*:[your-bucket-name]/[your-bucket-directory]/*"
},
{
"Effect": "Allow",
"Action": [
"oss:PutObject",
"oss:GetObject"
],
"Resource": "acs:oss:*:*:[your-bucket-name]/[your-uid]/*"
}
],
"Version": "1"
}
或通过可视化编辑,选择读文件、写文件、获取文件访问权限:
3、创建角色后,选择该角色,进行“权限管理”,选择“新增授权”,选择创建好的自定义策略(本文中策略名称为testDocmindAccessOss),确认进行添加:
4、添加权限后,在“信任策略”中,编辑信任策略:
信任策略即信任Service为“docmind-api.aliyuncs.com”的服务获取对应的角色权限(本文中指上面创建的OSS权限)
{
"Statement": [
{
"Action": "sts:AssumeRole",
"Effect": "Allow",
"Principal": {
"Service": [
"docmind-api.aliyuncs.com"
]
}
}
],
"Version": "1"
}
至此已经完成用户在阿里云账户上面创建信任策略,将指定的OSS目录授权给docmind服务进行访问。
通过docmind服务处理指定OSS目录文件
以文档智能解析为例,你只需要在提交接口中加入ossBucket和ossEndpoint两个参数即可。
ossEndpoint 走内网流量 目前杭州同region的OSS可配置oss-cn-hangzhou-internal.aliyuncs.com,其余地区暂不支持;
ossEndpoint 走公网流量 目前均支持;
import com.aliyun.docmind_api20220711.models.*;
import com.aliyun.teaopenapi.models.Config;
import com.aliyun.docmind_api20220711.Client;
public static void main(String[] args) throws Exception {
submit();
}
public static void submit() throws Exception {
// 使用默认凭证初始化Credentials Client。
com.aliyun.credentials.Client credentialClient = new com.aliyun.credentials.Client();
Config config = new Config()
// 通过credentials获取配置中的AccessKey ID
.setAccessKeyId(credentialClient.getAccessKeyId())
// 通过credentials获取配置中的AccessKey Secret
.setAccessKeySecret(credentialClient.getAccessKeySecret());
// 访问的域名,支持ipv4和ipv6两种方式,ipv6请使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
config.endpoint = "docmind-api.cn-hangzhou.aliyuncs.com";
Client client = new Client(config);
SubmitDocStructureJobRequest request = new SubmitDocStructureJobRequest();
request.fileName = "example.pdf";
# fileUrl可传入:
# 1、公网可访问的url。如:https://example.com/example.pdf
# 2、授权访问的object下的文件。如:https://ossBucket.ossEndpoint/[your-bucket-directory]/文件名
request.fileUrl = "https://example.com/example.pdf";
// 传入[your-bucket-name]
request.ossBucket = "docmind-trust";
request.ossEndpoint = "oss-cn-hangzhou.aliyuncs.com";
SubmitDocStructureJobResponse response = client.submitDocStructureJob(request);
System.out.println(JSON.toJSON(response.getBody()));
}
from alibabacloud_docmind_api20220711.client import Client as docmind_api20220711Client
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_docmind_api20220711 import models as docmind_api20220711_models
from alibabacloud_tea_util.client import Client as UtilClient
from alibabacloud_credentials.client import Client as CredClient
if __name__ == '__main__':
# 使用默认凭证初始化Credentials Client。
cred=CredClient()
config = open_api_models.Config(
# 通过credentials获取配置中的AccessKey ID
access_key_id=cred.get_credential().get_access_key_id(),
# 通过credentials获取配置中的AccessKey Secret
access_key_secret=cred.get_credential().get_access_key_secret()
)
# 访问的域名
config.endpoint = f'docmind-api.cn-hangzhou.aliyuncs.com'
client = docmind_api20220711Client(config)
request = docmind_api20220711_models.SubmitDocStructureJobRequest(
# file_url : 文件url地址
file_url='https://example.com/example.pdf',
# file_name :文件名称。名称必须包含文件类型
file_name='123.pdf',
# file_name_extension : 文件后缀格式。与文件名二选一
file_name_extension='pdf',
# oss_bucket:用户自己的oss bucket
oss_bucket='docmind-trust',
# oss_endpoint:用户自己的oss endpoint
oss_endpoint='oss-cn-hangzhou.aliyuncs.com'
)
try:
# 复制代码运行请自行打印 API 的返回值
response = client.submit_doc_structure_job(request)
# API返回值格式层级为 body -> data -> 具体属性。可根据业务需要打印相应的结果。如下示例为打印返回的业务id格式
# 获取属性值均以小写开头,
print(response.body.data)
except Exception as error:
# 如有需要,请打印 error
UtilClient.assert_as_string(error.message)
正常返回示例:
{
"RequestId": "43A29C77-405E-4CC0-BC55-EE694AD0****",
"Data": {
"Id": "docmind-20241209-b15f****"
}
}
调用文档智能解析结果查询服务GetDocStructureResult接口
使用示例
以Java SDK为例,调用文档智能解析接口的结果查询类API示例代码如下:调用接口,通过ID参数传入查询流水号。
import com.aliyun.docmind_api20220711.models.*;
import com.aliyun.teaopenapi.models.Config;
import com.aliyun.docmind_api20220711.Client;
public static void main(String[] args) throws Exception {
submit();
}
public static void submit() throws Exception {
// 使用默认凭证初始化Credentials Client。
com.aliyun.credentials.Client credentialClient = new com.aliyun.credentials.Client();
Config config = new Config()
// 通过credentials获取配置中的AccessKey ID
.setAccessKeyId(credentialClient.getAccessKeyId())
// 通过credentials获取配置中的AccessKey Secret
.setAccessKeySecret(credentialClient.getAccessKeySecret());
// 访问的域名,支持ipv4和ipv6两种方式,ipv6请使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
config.endpoint = "docmind-api.cn-hangzhou.aliyuncs.com";
Client client = new Client(config);
GetDocStructureResultRequest resultRequest = new GetDocStructureResultRequest();
resultRequest.id = "docmind-20241209-824b****";
GetDocStructureResultResponse response = client.getDocStructureResult(resultRequest);
System.out.println(JSON.toJSON(response.getBody()));
from typing import List
from alibabacloud_docmind_api20220711.client import Client as docmind_api20220711Client
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_docmind_api20220711 import models as docmind_api20220711_models
from alibabacloud_tea_util.client import Client as UtilClient
from alibabacloud_credentials.client import Client as CredClient
if __name__ == '__main__':
# 使用默认凭证初始化Credentials Client。
cred=CredClient()
config = open_api_models.Config(
# 通过credentials获取配置中的AccessKey ID
access_key_id=cred.get_credential().get_access_key_id(),
# 通过credentials获取配置中的AccessKey Secret
access_key_secret=cred.get_credential().get_access_key_secret()
)
# 访问的域名
config.endpoint = f'docmind-api.cn-hangzhou.aliyuncs.com'
client = docmind_api20220711Client(config)
request = docmind_api20220711_models.GetDocStructureResultRequest(
# id : 任务提交接口返回的id
id='docmind-20241209-824b****'
)
try:
# 复制代码运行请自行打印 API 的返回值
response = client.get_doc_structure_result(request)
# API返回值格式层级为 body -> data -> 具体属性。可根据业务需要打印相应的结果。获取属性值均以小写开头
# 获取异步任务处理情况,可根据response.body.completed判断是否需要继续轮询结果
print(response.body.completed)
# 获取返回结果。建议先把response.body.data转成json,然后再从json里面取具体需要的值。
print(response.body)
except Exception as error:
# 如有需要,请打印 error
UtilClient.assert_as_string(error.message)
正常返回示例:
{
"Status": "Success",
"RequestId": "73134E1A-E281-1B2C-A105-D0ECFE2DFail",
"Completed": true,
"Data": {
"docInfo": {
"docType": "pdf",
"orignalDocName": "1.pdf",
"pages": [
{
"imageType": "JPEG",
"imageUrl": "http://docmind-trust.oss-cn-hangzhou.aliyuncs.com/19547627052365xx/publicDocStructure/docmind-20241209-f3007ea79d9a403a94c6ad624a4c852a/0.png?Expires=1733770459&OSSAccessKeyId=STS.XXX&Signature=VQkABf%2BCGEPpycMAFvMgMVV6W9U%3D&security-token=XXXXXXXXXX%2BgVWTjjTYBXMJC3fbNuDz2IHhMdHlvBuwXtv4%2BmW1T7v0Zlrh%2FTJRARErIWsxr9aNL9gCsZdI0ZWE4P%2BZW5qe%2BEE2%2FVjTZvqaLEcibIfrZfvCyESOm8gZ43br9cxi7QlWhKufnoJV7b9MRLGLaBHg8c7UwHAZ5r9IAPnb8LOukNgWQ4lDdF011oAFx%2BwgdgOadupTNt0aB0gelkrJP%2FNqsesKeApMybMslYbCcx%2Fdrc6fN6ilU5iVR%2Bb1%2B5K4%2Bom%2Bf7oHCXAUAu0%2FXbrePqoY0NnpwYqkrBqhIq%2FP5lPt0s%2BfYmp%2FsyhBCOvpOSSPbSZB2VE0RsRFbXDxQV8EYWxylurjnXvF%2BQxCnzp8uGin%2B2svzW55hiCFd2%2FzgUNuD0nrkDPnttVZ7%2Fl%2FYn9SLsRtGk7ToQ3rLd9GztUC8UsfzjZt2X3Z%2BGoABb2wsEAWkEeTkjHp5EdxaGWL0W0CQnmLOkWcRVb3H%2BRr7CVvzdTEPCJf7%2Bh8POBNbm8tciF0vcfLjGUs3%2FeiKBJtGaxK3SubvQJe99OiRFY0kcj%2Bjl5SQH%2B8Qy%2B5j5DzcqwhNdS1cMNbfIz9HbU5sU24CfYnwAELt9dtge7lMeccgAA%3D%3D",
"angle": null,
"imageWidth": 1273,
"imageHeight": 1801,
"pageIdCurDoc": 1,
"pageIdAllDocs": 1
}
]
},
"styles": [
{
"styleId": 0,
"underline": false,
"deleteLine": false,
"bold": true,
"italic": false,
"fontSize": 15,
"fontName": "黑体",
"color": "000000",
"charScale": 0.95
},
{
"styleId": 1,
"underline": false,
"deleteLine": false,
"bold": false,
"italic": false,
"fontSize": 12,
"fontName": "微软雅黑",
"color": "000000",
"charScale": 1
}
],
"layouts": [
{
"text": "测试标题",
"index": 0,
"uniqueId": "xxxx9816e77caea338df554b80ab95c7",
"alignment": "center",
"pageNum": [
0
],
"pos": [
{
"x": 405,
"y": 192
},
{
"x": 860,
"y": 191
},
{
"x": 860,
"y": 236
},
{
"x": 406,
"y": 237
}
],
"type": "title",
"subType": "doc_title"
},
{
"text": "本段为测试内容",
"index": 1,
"uniqueId": "xxxx8606c213c01c12d70f98dcfb2525",
"alignment": "left",
"pageNum": [
0
],
"pos": [
{
"x": 187,
"y": 311
},
{
"x": 1075,
"y": 311
},
{
"x": 1076,
"y": 373
},
{
"x": 187,
"y": 373
}
],
"type": "text",
"subType": "para",
"lineHeight": 7,
"firstLinesChars": 30,
"blocks": [
{
"text": "本段",
"pos": null,
"styleId": 0
},
{
"text": "为测试内容",
"pos": null,
"styleId": 1
}
]
}
],
"logics": {
"docTree": [
{
"uniqueId": "xxxx9816e77caea338df554b80ab95c7",
"level": 0,
"link": {
"下级": [],
"包含": []
},
"backlink": {
"上级": [
"ROOT"
]
}
}
],
"paragraphKVs": null,
"tableKVs": null
}
}
}
- 本页导读 (1)
- 方案概览
- 方案部署与验证
- 通过阿里云ram创建授权策略
- 通过docmind服务处理指定OSS目录文件
- 调用文档智能解析结果查询服务GetDocStructureResult接口