OSS托管支持

更新时间:2025-02-28 07:43:25

本文档提供说明如何通过授权用户托管OSS资源给到阿里云docmind服务,可通过docmind服务读取oss目录中指定文件进行处理,并将结果写到固定文件中。用于提高用户文件隐私性,存储文档和保存结构都在用户OSS资源中,减少用户独立转存结果成本。

方案概览

对于用户文档资源有隐私性要求(文档不出公网),需要文档的处理和解析的结果都在用户指定的存储空间,并且不会进过公网流量进行访问。OSS托管方案支持提供以上场景中,用户有独立的OSS资源,可用于托管指定的访问OSS bucket和访问权限,并可对存储对象做更细粒度权限管控:

1、可通过授权OSS实现文件访问权限细粒度访问控制;

2、可通过托管OSS实现文件处理不出公网;

3、可通过托管OSS实现处理结果权限控制;

image

方案部署与验证

通过阿里云ram创建授权策略

1、登录阿里云RAM控制台,选择“身份管理”,选择“角色”选项,选择创建角色:

  • 选择类型为“阿里云账号”:可通过扮演RAM角色访问资源

  • 配置角色为角色名称为:“AliyunDocmindAccessingOssRole”,完成创建

image

2、完成角色创建后,在“权限管理”,选择“权限策略”,进行“创建权限策略”,创建完后可自定义策略名称(本文中策略名称使用为testDocmindAccessOss):

可进行可视化编辑或者脚本编辑,下面以最小化访问权限脚本编辑为例(详细访问OSS权限可阅读教程示例:使用RAM Policy控制OSS的访问权限):

RAM Policy常见示例

  • 以下为提供OSS的名称为“your-bucket-name”的bucket,对于“your-bucket-directory”目录下面的文件具有:读取文件“GetObject”。即将your-bucket-directory 允许提供给到docmind进行访问,其余文件路径无权限;

  • docmind服务将处理好的结果放在 [your-bucket-name]/[your-uid]下面,your-uid为主账号uid,授予的为读取文件“GetObject”,写文件“PutObject” 权限,其中GetObject权限用于当调用查询接口时候返回的url信息,“PutObject”权限用于将解析结果写入到路径中。即将[your-uid] 文件夹允许提供到docmind进行访问,其余文件路径无权限

重要
  • 由于托管OSS后,需要将结果写入到指定文件夹,因此需指定“acs:oss:*:*:[your-bucket-name]/[your-uid]/*” 包含PutObject权限;

  • AliyunDocmindAccessingOssRole 若进行删除改动,默认12小时后失效;

{
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "oss:GetObject"
            ],
            "Resource": "acs:oss:*:*:[your-bucket-name]/[your-bucket-directory]/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "oss:PutObject",
                "oss:GetObject"
            ],
            "Resource": "acs:oss:*:*:[your-bucket-name]/[your-uid]/*"
        }
    ],
    "Version": "1"
}
  • 或通过可视化编辑,选择读文件、写文件、获取文件访问权限:

image

3、创建角色后,选择该角色,进行“权限管理”,选择“新增授权”,选择创建好的自定义策略(本文中策略名称为testDocmindAccessOss),确认进行添加:

image

4、添加权限后,在“信任策略”中,编辑信任策略:

信任策略即信任Service为“docmind-api.aliyuncs.com”的服务获取对应的角色权限(本文中指上面创建的OSS权限)

{
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "docmind-api.aliyuncs.com"
        ]
      }
    }
  ],
  "Version": "1"
}

image

至此已经完成用户在阿里云账户上面创建信任策略,将指定的OSS目录授权给docmind服务进行访问。

通过docmind服务处理指定OSS目录文件

以文档智能解析为例,你只需要在提交接口中加入ossBucketossEndpoint两个参数即可。

重要

ossEndpoint 走内网流量 目前杭州同regionOSS可配置oss-cn-hangzhou-internal.aliyuncs.com,其余地区暂不支持;

ossEndpoint 走公网流量 目前均支持;

Java
Python
import com.aliyun.docmind_api20220711.models.*;
import com.aliyun.teaopenapi.models.Config;
import com.aliyun.docmind_api20220711.Client;

public static void main(String[] args) throws Exception {
        submit();
    }
public static void submit() throws Exception {
    // 使用默认凭证初始化Credentials Client。
    com.aliyun.credentials.Client credentialClient = new com.aliyun.credentials.Client();
    Config config = new Config()
        // 通过credentials获取配置中的AccessKey ID
        .setAccessKeyId(credentialClient.getAccessKeyId())
        // 通过credentials获取配置中的AccessKey Secret
        .setAccessKeySecret(credentialClient.getAccessKeySecret());
    // 访问的域名,支持ipv4和ipv6两种方式,ipv6请使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
    config.endpoint = "docmind-api.cn-hangzhou.aliyuncs.com";
    Client client = new Client(config);
    SubmitDocStructureJobRequest request = new SubmitDocStructureJobRequest();
    request.fileName = "example.pdf";
    # fileUrl可传入:
    #   1、公网可访问的url。如:https://example.com/example.pdf
    #   2、授权访问的object下的文件。如:https://ossBucket.ossEndpoint/[your-bucket-directory]/文件名
    request.fileUrl = "https://example.com/example.pdf";
    // 传入[your-bucket-name]
    request.ossBucket = "docmind-trust";
    request.ossEndpoint = "oss-cn-hangzhou.aliyuncs.com";
    SubmitDocStructureJobResponse response = client.submitDocStructureJob(request);
    System.out.println(JSON.toJSON(response.getBody()));
}
from alibabacloud_docmind_api20220711.client import Client as docmind_api20220711Client
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_docmind_api20220711 import models as docmind_api20220711_models
from alibabacloud_tea_util.client import Client as UtilClient
from alibabacloud_credentials.client import Client as CredClient

if __name__ == '__main__':
    # 使用默认凭证初始化Credentials Client。
    cred=CredClient()
    config = open_api_models.Config(
        # 通过credentials获取配置中的AccessKey ID
        access_key_id=cred.get_credential().get_access_key_id(),
        # 通过credentials获取配置中的AccessKey Secret
        access_key_secret=cred.get_credential().get_access_key_secret()
    )
    # 访问的域名
    config.endpoint = f'docmind-api.cn-hangzhou.aliyuncs.com'
    client = docmind_api20220711Client(config)
    request = docmind_api20220711_models.SubmitDocStructureJobRequest(
        # file_url : 文件url地址
        file_url='https://example.com/example.pdf',
        # file_name :文件名称。名称必须包含文件类型
        file_name='123.pdf',
        # file_name_extension : 文件后缀格式。与文件名二选一
        file_name_extension='pdf',
        # oss_bucket:用户自己的oss bucket
        oss_bucket='docmind-trust',
        # oss_endpoint:用户自己的oss endpoint
        oss_endpoint='oss-cn-hangzhou.aliyuncs.com'
    )
    try:
        # 复制代码运行请自行打印 API 的返回值
        response = client.submit_doc_structure_job(request)
        # API返回值格式层级为 body -> data -> 具体属性。可根据业务需要打印相应的结果。如下示例为打印返回的业务id格式
        # 获取属性值均以小写开头,
        print(response.body.data)       
    except Exception as error:
        # 如有需要,请打印 error
        UtilClient.assert_as_string(error.message)

正常返回示例:

{
  "RequestId": "43A29C77-405E-4CC0-BC55-EE694AD0****",
  "Data": {
    "Id": "docmind-20241209-b15f****"
  }  
}

调用文档智能解析结果查询服务GetDocStructureResult接口

使用示例

Java SDK为例,调用文档智能解析接口的结果查询类API示例代码如下:调用接口,通过ID参数传入查询流水号。

Java
Python
import com.aliyun.docmind_api20220711.models.*;
import com.aliyun.teaopenapi.models.Config;
import com.aliyun.docmind_api20220711.Client;

public static void main(String[] args) throws Exception {
        submit();
    }
public static void submit() throws Exception {
    // 使用默认凭证初始化Credentials Client。
    com.aliyun.credentials.Client credentialClient = new com.aliyun.credentials.Client();
    Config config = new Config()
        // 通过credentials获取配置中的AccessKey ID
        .setAccessKeyId(credentialClient.getAccessKeyId())
        // 通过credentials获取配置中的AccessKey Secret
        .setAccessKeySecret(credentialClient.getAccessKeySecret());
    // 访问的域名,支持ipv4和ipv6两种方式,ipv6请使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
    config.endpoint = "docmind-api.cn-hangzhou.aliyuncs.com";
    Client client = new Client(config);
    GetDocStructureResultRequest resultRequest = new GetDocStructureResultRequest();
    resultRequest.id = "docmind-20241209-824b****";
    GetDocStructureResultResponse response = client.getDocStructureResult(resultRequest);
    System.out.println(JSON.toJSON(response.getBody()));
from typing import List
from alibabacloud_docmind_api20220711.client import Client as docmind_api20220711Client
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_docmind_api20220711 import models as docmind_api20220711_models
from alibabacloud_tea_util.client import Client as UtilClient
from alibabacloud_credentials.client import Client as CredClient

if __name__ == '__main__':
  	# 使用默认凭证初始化Credentials Client。
    cred=CredClient()
    config = open_api_models.Config(
        # 通过credentials获取配置中的AccessKey ID
        access_key_id=cred.get_credential().get_access_key_id(),
        # 通过credentials获取配置中的AccessKey Secret
        access_key_secret=cred.get_credential().get_access_key_secret()
    )
    # 访问的域名
    config.endpoint = f'docmind-api.cn-hangzhou.aliyuncs.com'
    client = docmind_api20220711Client(config)
    request = docmind_api20220711_models.GetDocStructureResultRequest(
        # id :  任务提交接口返回的id
        id='docmind-20241209-824b****'
    )
    try:
        # 复制代码运行请自行打印 API 的返回值
        response = client.get_doc_structure_result(request)
        # API返回值格式层级为 body -> data -> 具体属性。可根据业务需要打印相应的结果。获取属性值均以小写开头
        # 获取异步任务处理情况,可根据response.body.completed判断是否需要继续轮询结果
        print(response.body.completed)
        # 获取返回结果。建议先把response.body.data转成json,然后再从json里面取具体需要的值。
        print(response.body)
    except Exception as error:
        # 如有需要,请打印 error
        UtilClient.assert_as_string(error.message)

正常返回示例:

{
    "Status": "Success",
    "RequestId": "73134E1A-E281-1B2C-A105-D0ECFE2DFail",
    "Completed": true,
    "Data": {
        "docInfo": {
            "docType": "pdf",
            "orignalDocName": "1.pdf",
            "pages": [
                {
                    "imageType": "JPEG",
                    "imageUrl": "http://docmind-trust.oss-cn-hangzhou.aliyuncs.com/19547627052365xx/publicDocStructure/docmind-20241209-f3007ea79d9a403a94c6ad624a4c852a/0.png?Expires=1733770459&OSSAccessKeyId=STS.XXX&Signature=VQkABf%2BCGEPpycMAFvMgMVV6W9U%3D&security-token=XXXXXXXXXX%2BgVWTjjTYBXMJC3fbNuDz2IHhMdHlvBuwXtv4%2BmW1T7v0Zlrh%2FTJRARErIWsxr9aNL9gCsZdI0ZWE4P%2BZW5qe%2BEE2%2FVjTZvqaLEcibIfrZfvCyESOm8gZ43br9cxi7QlWhKufnoJV7b9MRLGLaBHg8c7UwHAZ5r9IAPnb8LOukNgWQ4lDdF011oAFx%2BwgdgOadupTNt0aB0gelkrJP%2FNqsesKeApMybMslYbCcx%2Fdrc6fN6ilU5iVR%2Bb1%2B5K4%2Bom%2Bf7oHCXAUAu0%2FXbrePqoY0NnpwYqkrBqhIq%2FP5lPt0s%2BfYmp%2FsyhBCOvpOSSPbSZB2VE0RsRFbXDxQV8EYWxylurjnXvF%2BQxCnzp8uGin%2B2svzW55hiCFd2%2FzgUNuD0nrkDPnttVZ7%2Fl%2FYn9SLsRtGk7ToQ3rLd9GztUC8UsfzjZt2X3Z%2BGoABb2wsEAWkEeTkjHp5EdxaGWL0W0CQnmLOkWcRVb3H%2BRr7CVvzdTEPCJf7%2Bh8POBNbm8tciF0vcfLjGUs3%2FeiKBJtGaxK3SubvQJe99OiRFY0kcj%2Bjl5SQH%2B8Qy%2B5j5DzcqwhNdS1cMNbfIz9HbU5sU24CfYnwAELt9dtge7lMeccgAA%3D%3D",
                    "angle": null,
                    "imageWidth": 1273,
                    "imageHeight": 1801,
                    "pageIdCurDoc": 1,
                    "pageIdAllDocs": 1
                }
            ]
        },
        "styles": [
            {
                "styleId": 0,
                "underline": false,
                "deleteLine": false,
                "bold": true,
                "italic": false,
                "fontSize": 15,
                "fontName": "黑体",
                "color": "000000",
                "charScale": 0.95
            },
            {
                "styleId": 1,
                "underline": false,
                "deleteLine": false,
                "bold": false,
                "italic": false,
                "fontSize": 12,
                "fontName": "微软雅黑",
                "color": "000000",
                "charScale": 1
            }
        ],
        "layouts": [
            {
                "text": "测试标题",
                "index": 0,
                "uniqueId": "xxxx9816e77caea338df554b80ab95c7",
                "alignment": "center",
                "pageNum": [
                    0
                ],
                "pos": [
                    {
                        "x": 405,
                        "y": 192
                    },
                    {
                        "x": 860,
                        "y": 191
                    },
                    {
                        "x": 860,
                        "y": 236
                    },
                    {
                        "x": 406,
                        "y": 237
                    }
                ],
                "type": "title",
                "subType": "doc_title"
            },
            {
                "text": "本段为测试内容",
                "index": 1,
                "uniqueId": "xxxx8606c213c01c12d70f98dcfb2525",
                "alignment": "left",
                "pageNum": [
                    0
                ],
                "pos": [
                    {
                        "x": 187,
                        "y": 311
                    },
                    {
                        "x": 1075,
                        "y": 311
                    },
                    {
                        "x": 1076,
                        "y": 373
                    },
                    {
                        "x": 187,
                        "y": 373
                    }
                ],
                "type": "text",
                "subType": "para",
                "lineHeight": 7,
                "firstLinesChars": 30,
                "blocks": [
                    {
                        "text": "本段",
                        "pos": null,
                        "styleId": 0
                    },
                    {
                        "text": "为测试内容",
                        "pos": null,
                        "styleId": 1
                    }
                ]
            }
        ],
        "logics": {
            "docTree": [
                {
                    "uniqueId": "xxxx9816e77caea338df554b80ab95c7",
                    "level": 0,
                    "link": {
                        "下级": [],
                        "包含": []
                    },
                    "backlink": {
                        "上级": [
                            "ROOT"
                        ]
                    }
                }
            ],
            "paragraphKVs": null,
            "tableKVs": null
        }
    }
}
  • 本页导读 (1)
  • 方案概览
  • 方案部署与验证
  • 通过阿里云ram创建授权策略
  • 通过docmind服务处理指定OSS目录文件
  • 调用文档智能解析结果查询服务GetDocStructureResult接口