文档抽取

文档介绍了文档抽取API的调用方式,调用前,请先阅读API使用指南

内容简介

文档抽取接口可以对各种类型文档和表格中的关键信息进行自动化抽取,返回通用KV结构化内容。

文档抽取接口为异步接口,需要先调用文档抽取异步提交服务SubmitDocumentExtractJob接口进行异步任务提交,然后调用文档抽取结果查询服务GetDocumentExtractResult接口进行结果轮询,建议每10秒轮询一次,最多轮询120分钟,如果120分钟还未查询到处理完成结果,则视为处理超时。

当异步任务处理提交后,用户可以在处理结束后的24小时之内查询处理结果,超过24小时后将无法查询到处理结果。

操作步骤

步骤一:调用文档抽取异步提交服务SubmitDocumentExtractJob接口

请求参数

名称

类型

必填

描述

示例值

FileUrl

string

单个文档的url(支持1000页以内的pdf文件,支持100MB以内的pdf文件,支持20MB以内的单张图片)。

如果需要本地上传文件方式,sdk会提供单独入参支持文件流上传。

https://example.com/example.pdf

FileName

string

文件名,需带文件类型后缀。与fileNameExtension二选一。

example.pdf

FileNameExtension

string

文件类型,与fileName二选一。支持类型:pdf、jpg、jpeg、png、bmp、gif。

pdf

说明

支持的文档格式:pdf和图片,图片支持jpg、jpeg、png、bmp、gif。

返回参数

名称

类型

描述

示例值

RequestId

string

请求唯一Id。

43A29C77-405E-4CC0-BC55-EE694AD0****

Data

object

返回数据。

{"Id": "docmind-20220712-b15f****"}

Id

string

业务订单号,用于后续查询接口进行查询的唯一标识。

docmind-20220712-b15f****

Code

string

状态码。

200

Message

string

详细信息。

Message

使用示例

本接口支持本地文档上传和传入文档URL这两种调用方式。

  • 本地文档上传:以Java SDK为例,本地文档上传调用方式的请求示例代码如下,调用submitDocumentExtractJobAdvance接口,通过fileUrlObject参数实现本地文档上传。

    说明

    获取并使用AccessKey信息的方式,可参考SDK概述中不同语言的SDK使用指南。

    import com.aliyun.docmind_api20220711.models.*;
    import com.aliyun.teaopenapi.models.Config;
    import com.aliyun.docmind_api20220711.Client;
    import com.aliyun.teautil.models.RuntimeOptions;
    import java.io.File;
    import java.io.FileInputStream;
    
    public static void submit() throws Exception {
        // 使用默认凭证初始化Credentials Client。
        com.aliyun.credentials.Client credentialClient = new com.aliyun.credentials.Client();
        Config config = new Config()
            // 通过credentials获取配置中的AccessKey ID
            .setAccessKeyId(credentialClient.getAccessKeyId())
            // 通过credentials获取配置中的AccessKey Secret
            .setAccessKeySecret(credentialClient.getAccessKeySecret());
        // 访问的域名,支持ipv4和ipv6两种方式,ipv6请使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
        config.endpoint = "docmind-api.cn-hangzhou.aliyuncs.com";
        Client client = new Client(config);
        // 创建RuntimeObject实例并设置运行参数
        RuntimeOptions runtime = new RuntimeOptions();
        SubmitDocumentExtractJobAdvanceRequest advanceRequest = new SubmitDocumentExtractJobAdvanceRequest();
        File file = new File("D:\\example.pdf");
        advanceRequest.fileUrlObject = new FileInputStream(file);
        advanceRequest.fileName = "example.pdf";
        // 发起请求并处理应答或异常。
        SubmitDocumentExtractJobResponse response = client.submitDocumentExtractJobAdvance(advanceRequest, runtime);
    }
    const Client = require('@alicloud/docmind-api20220711');
    const Credential = require('@alicloud/credentials');
    const Util = require('@alicloud/tea-util');
    const fs = require('fs');
    
    const getResult = async () => {
      // 使用默认凭证初始化Credentials Client
      const cred = new Credential.default();
      const client = new Client.default({
        // 访问的域名,支持ipv4和ipv6两种方式,ipv6请使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
        endpoint: 'docmind-api.cn-hangzhou.aliyuncs.com',
        // 通过credentials获取配置中的AccessKey ID
        accessKeyId: cred.credential.accessKeyId,
        // 通过credentials获取配置中的AccessKey Secret
        accessKeySecret: cred.credential.accessKeySecret,
        type: 'access_key',
        regionId: 'cn-hangzhou',
      });
      
      const advanceRequest = new Client.SubmitDocumentExtractJobAdvanceRequest();
      const file = fs.createReadStream('./example.pdf');
      advanceRequest.fileUrlObject = file;
      advanceRequest.fileName = 'example.pdf';
      const runtimeObject = new Util.RuntimeOptions({});
      const response = await client.submitDocumentExtractJobAdvance(advanceRequest, runtimeObject);
    	return response.body;
    }
    from alibabacloud_docmind_api20220711.client import Client as docmind_api20220711Client
    from alibabacloud_tea_openapi import models as open_api_models
    from alibabacloud_docmind_api20220711 import models as docmind_api20220711_models
    from alibabacloud_tea_util.client import Client as UtilClient
    from alibabacloud_tea_util import models as util_models
    from alibabacloud_credentials.client import Client as CredClient
    
    def submit_file():
        cred=CredClient()
        config = open_api_models.Config(
            # 通过credentials获取配置中的AccessKey ID
            access_key_id=cred.get_access_key_id(),
            # 通过credentials获取配置中的AccessKey Secret
            access_key_secret=cred.get_access_key_secret()
        )
        # 访问的域名
        config.endpoint = f'docmind-api.cn-hangzhou.aliyuncs.com'
        client = docmind_api20220711Client(config)
        request = docmind_api20220711_models.SubmitDocumentExtractJobAdvanceRequest(
            # file_url_object : 本地文件流
            file_url_object=open("./example.pdf", "rb"),
            # file_name :文件名称。名称必须包含文件类型
            file_name='123.pdf',
            # file_name_extension : 文件后缀格式。与文件名二选一
            file_name_extension='pdf'
        )
        runtime = util_models.RuntimeOptions()
        try:
            # 复制代码运行请自行打印 API 的返回值
            response = client.submit_document_extract_job_advance(request, runtime)
            # API返回值格式层级为 body -> data -> 具体属性。可根据业务需要打印相应的结果。如下示例为打印返回的业务id格式
            # 获取属性值均以小写开头,
            print(response.body.data.id)        
        except Exception as error:
            # 如有需要,请打印 error
            UtilClient.assert_as_string(error.message)
    import (
    	"fmt"
    	"os"
      
    	openClient "github.com/alibabacloud-go/darabonba-openapi/v2/client"
    	"github.com/alibabacloud-go/docmind-api-20220711/client"
    	"github.com/alibabacloud-go/tea-utils/v2/service"
      "github.com/aliyun/credentials-go/credentials"
    )
    
    func submit(){
      // 使用默认凭证初始化Credentials Client。
    	credential, err := credentials.NewCredential(nil)
    	// 通过credentials获取配置中的AccessKey ID
    	accessKeyId, err := credential.GetAccessKeyId()
    	// 通过credentials获取配置中的AccessKey Secret
    	accessKeySecret, err := credential.GetAccessKeySecret()
      // 访问的域名,支持ipv4和ipv6两种方式,ipv6请使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
      var endpoint string = "docmind-api.cn-hangzhou.aliyuncs.com"
    	config := openClient.Config{AccessKeyId: accessKeyId, AccessKeySecret: accessKeySecret, Endpoint: &endpoint}
      // 初始化client
      cli, err := client.NewClient(&config)
    	if err != nil {
    		panic(err)
    	}
      // 上传本地文档调用接口
      filename := "D:\\example.pdf"    
      f, err := os.Open(filename)
    	if err != nil {
        panic(err)
    	}
      // 初始化接口request
      request := client.SubmitDocumentExtractJobAdvanceRequest{
    		FileName:      &filename,
    		FileUrlObject: f,
    	}
      // 创建RuntimeObject实例并设置运行参数
      options := service.RuntimeOptions{}
      response, err := cli.SubmitDocumentExtractJobAdvance(&request, &options)
      if err != nil {
    		panic(err)
    	}
      // 打印结果
    	fmt.Println(response.Body.String())
    }
    using Newtonsoft.Json;
    using System;
    using System.Collections;
    using System.Collections.Generic;
    using System.IO;
    using System.Threading.Tasks;
    
    using Tea;
    using Tea.Utils;
    
    public static void SubmitFile()
            {
                // 使用默认凭证初始化Credentials Client。
              	var akCredential = new Aliyun.Credentials.Client(null);
                AlibabaCloud.OpenApiClient.Models.Config config = new AlibabaCloud.OpenApiClient.Models.Config
                {
                    // 通过credentials获取配置中的AccessKey Secret
                    AccessKeyId = akCredential.GetAccessKeyId(),
                    // 通过credentials获取配置中的AccessKey Secret
                    AccessKeySecret = akCredential.GetAccessKeySecret(),
                };
                // 访问的域名
                config.Endpoint = "docmind-api.cn-hangzhou.aliyuncs.com";
                //需要安装额外的依赖库--> AlibabaCloud.DarabonbaStream
                AlibabaCloud.SDK.Docmind_api20220711.Client client = new AlibabaCloud.SDK.Docmind_api20220711.Client(config);
                Stream bodySyream = AlibabaCloud.DarabonbaStream.StreamUtil.ReadFromFilePath("<YOUR-FILE-PATH>");
                AlibabaCloud.SDK.Docmind_api20220711.Models.SubmitDocumentExtractJobAdvanceRequest request = new AlibabaCloud.SDK.Docmind_api20220711.Models.SubmitDocumentExtractJobAdvanceRequest
                {
                    FileUrlObject = bodySyream,
                    FileNameExtension = "pdf"
                };
                AlibabaCloud.TeaUtil.Models.RuntimeOptions runtime = new AlibabaCloud.TeaUtil.Models.RuntimeOptions();
                try
                {
                    // 复制代码运行请自行打印 API 的返回值
                    client.SubmitDocumentExtractJobAdvance(request, runtime);
                }
                catch (TeaException error)
                {
                    // 如有需要,请打印 error
                    AlibabaCloud.TeaUtil.Common.AssertAsString(error.Message);
                }
                catch (Exception _error)
                {
                    TeaException error = new TeaException(new Dictionary<string, object>
                    {
                        { "message", _error.Message }
                    });
                    // 如有需要,请打印 error
                    AlibabaCloud.TeaUtil.Common.AssertAsString(error.Message);
                }
            }
  • 传入文档URL:以Java SDK为例,传入文档URL调用方式的请求示例代码如下,调用submitDocumentExtractJob接口,通过fileUrl参数实现传入文档URL。请注意,您传入的文档URL必须为公网可访问下载的URL地址,无跨域限制,URL不带特殊转义字符。

    说明

    获取并使用AccessKey信息的方式,可参考SDK概述中不同语言的SDK使用指南。

    import com.aliyun.docmind_api20220711.models.*;
    import com.aliyun.teaopenapi.models.Config;
    import com.aliyun.docmind_api20220711.Client;
    
    public static void submit() throws Exception {
        // 使用默认凭证初始化Credentials Client。
        com.aliyun.credentials.Client credentialClient = new com.aliyun.credentials.Client();
        Config config = new Config()
            // 通过credentials获取配置中的AccessKey ID
            .setAccessKeyId(credentialClient.getAccessKeyId())
            // 通过credentials获取配置中的AccessKey Secret
            .setAccessKeySecret(credentialClient.getAccessKeySecret());
        // 访问的域名,支持ipv4和ipv6两种方式,ipv6请使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
        config.endpoint = "docmind-api.cn-hangzhou.aliyuncs.com";
        Client client = new Client(config);
        // 替换成具体异步任务提交类API接口的入参和方法
        SubmitDocumentExtractJobRequest request = new SubmitDocumentExtractJobRequest();
        request.fileName = "example.pdf";
        request.fileUrl = "https://example.com/example.pdf";
        SubmitDocumentExtractJobResponse response = client.submitDocumentExtractJob(request);
    }
    const Client = require('@alicloud/docmind-api20220711');
    const Credential = require('@alicloud/credentials');
    
    const getResult = async () => {
      // 使用默认凭证初始化Credentials Client
      const cred = new Credential.default();
      const client = new Client.default({
        // 访问的域名,支持ipv4和ipv6两种方式,ipv6请使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
        endpoint: 'docmind-api.cn-hangzhou.aliyuncs.com',
        // 通过credentials获取配置中的AccessKey ID
        accessKeyId: cred.credential.accessKeyId,
        // 通过credentials获取配置中的AccessKey Secret
        accessKeySecret: cred.credential.accessKeySecret,
        type: 'access_key',
        regionId: 'cn-hangzhou'
      });
      
      const request = new Client.SubmitDocumentExtractJobRequest();
      request.fileName = 'example.pdf';
      request.fileUrl = 'https://example.com/example.pdf';
      const response = await client.submitDocumentExtractJob(request);
      
      return response.body;
    }
    from alibabacloud_docmind_api20220711.client import Client as docmind_api20220711Client
    from alibabacloud_tea_openapi import models as open_api_models
    from alibabacloud_docmind_api20220711 import models as docmind_api20220711_models
    from alibabacloud_tea_util.client import Client as UtilClient
    from alibabacloud_credentials.client import Client as CredClient
    
    
    def submit_url():
        cred=CredClient()
        config = open_api_models.Config(
            # 通过credentials获取配置中的AccessKey ID
            access_key_id=cred.get_access_key_id(),
            # 通过credentials获取配置中的AccessKey Secret
            access_key_secret=cred.get_access_key_secret()
        )
        # 访问的域名
        config.endpoint = f'docmind-api.cn-hangzhou.aliyuncs.com'
        client = docmind_api20220711Client(config)
        request = docmind_api20220711_models.SubmitDocumentExtractJobRequest(
            # file_url : 文件url地址
            file_url='https://example.com/example.pdf',
            # file_name :文件名称。名称必须包含文件类型
            file_name='123.pdf',
            # file_name_extension : 文件后缀格式。与文件名二选一
            file_name_extension='pdf'
        )
        try:
            # 复制代码运行请自行打印 API 的返回值
            response = client.submit_document_extract_job(request)
            # API返回值格式层级为 body -> data -> 具体属性。可根据业务需要打印相应的结果。如下示例为打印返回的业务id格式
            # 获取属性值均以小写开头,
            print(response.body.data.id)       
        except Exception as error:
            # 如有需要,请打印 error
            UtilClient.assert_as_string(error.message)
    import (
    	"fmt"
    
    	openClient "github.com/alibabacloud-go/darabonba-openapi/v2/client"
      "github.com/alibabacloud-go/docmind-api-20220711/client"
      "github.com/aliyun/credentials-go/credentials"
    )
    
    func submit(){
      // 使用默认凭证初始化Credentials Client。
    	credential, err := credentials.NewCredential(nil)
    	// 通过credentials获取配置中的AccessKey ID
    	accessKeyId, err := credential.GetAccessKeyId()
    	// 通过credentials获取配置中的AccessKey Secret
    	accessKeySecret, err := credential.GetAccessKeySecret()
      // 访问的域名,支持ipv4和ipv6两种方式,ipv6请使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
      var endpoint string = "docmind-api.cn-hangzhou.aliyuncs.com"
    	config := openClient.Config{AccessKeyId: accessKeyId, AccessKeySecret: accessKeySecret, Endpoint: &endpoint}
      // 初始化client
      cli, err := client.NewClient(&config)
    	if err != nil {
    		panic(err)
    	}
      // 文件URL
      fileURL := "https://example.com/example.pdf"
      // 文件名
      fileName := "example.pdf"
      // 初始化接口request
      request := client.SubmitDocumentExtractJobRequest{
    		FileUrl:  &fileURL,
    		FileName: &fileName,
    	}
      response, err := cli.SubmitDocumentExtractJob(&request)
      if err != nil {
    		panic(err)
    	}
      // 打印结果
    	fmt.Println(response.Body.String())
    }
    using Newtonsoft.Json;
    using System;
    using System.Collections;
    using System.Collections.Generic;
    using System.IO;
    using System.Threading.Tasks;
    
    using Tea;
    using Tea.Utils;
    
    public static void SubmitUrl()
            {
                // 使用默认凭证初始化Credentials Client。
              	var akCredential = new Aliyun.Credentials.Client(null);
                AlibabaCloud.OpenApiClient.Models.Config config = new AlibabaCloud.OpenApiClient.Models.Config
                {
                    // 通过credentials获取配置中的AccessKey Secret
                    AccessKeyId = akCredential.GetAccessKeyId(),
                    // 通过credentials获取配置中的AccessKey Secret
                    AccessKeySecret = akCredential.GetAccessKeySecret(),
                };
                // 访问的域名
                config.Endpoint = "docmind-api.cn-hangzhou.aliyuncs.com";
                AlibabaCloud.SDK.Docmind_api20220711.Client client = new AlibabaCloud.SDK.Docmind_api20220711.Client(config);
                AlibabaCloud.SDK.Docmind_api20220711.Models.SubmitDocumentExtractJobRequest request = new AlibabaCloud.SDK.Docmind_api20220711.Models.SubmitDocumentExtractJobRequest
                {
                    FileUrl = "https://example.pdf",
                    FileNameExtension = "pdf"
                };
                try
                {
                    // 复制代码运行请自行打印 API 的返回值
                    client.SubmitDocumentExtractJob(request);
                }
                catch (TeaException error)
                {
                    // 如有需要,请打印 error
                    AlibabaCloud.TeaUtil.Common.AssertAsString(error.Message);
                }
                catch (Exception _error)
                {
                    TeaException error = new TeaException(new Dictionary<string, object>
                    {
                        { "message", _error.Message }
                    });
                    // 如有需要,请打印 error
                    AlibabaCloud.TeaUtil.Common.AssertAsString(error.Message);
                }
            }
    use AlibabaCloud\SDK\Docmindapi\V20220711\Docmindapi;
    use AlibabaCloud\SDK\Docmindapi\V20220711\Models\SubmitDocumentExtractJobRequest;
    use Darabonba\OpenApi\Models\Config;
    use AlibabaCloud\Tea\Utils\Utils\RuntimeOptions;
    use AlibabaCloud\Tea\Exception\TeaUnableRetryError;
    use AlibabaCloud\Credentials\Credential;
    
    // 使用默认凭证初始化Credentials Client。
    $bearerToken = new Credential();    
    $config = new Config();
    // 访问的域名,支持ipv4和ipv6两种方式,ipv6请使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
    $config->endpoint = "docmind-api.cn-hangzhou.aliyuncs.com";
    // 通过credentials获取配置中的AccessKey ID
    $config->accessKeyId = $bearerToken->getCredential()->getAccessKeyId();
    // 通过credentials获取配置中的AccessKey Secret
    $config->accessKeySecret = $bearerToken->getCredential()->getAccessKeySecret();
    $config->type = "access_key";
    $config->regionId = "cn-hangzhou";
    $client = new Docmindapi($config);
    $request = new SubmitDocumentExtractJobRequest();
    
    $runtime = new RuntimeOptions();
    $runtime->maxIdleConns = 3;
    $runtime->connectTimeout = 10000;
    $runtime->readTimeout = 10000;
    
    $request->fileName = "example.pdf";
    $request->fileUrl = "https://example.com/example.pdf";
    
    try {
      $response = $client->submitDocumentExtractJob($request, $runtime);
      var_dump($response->toMap());
    } catch (TeaUnableRetryError $e) {
      var_dump($e->getMessage());
      var_dump($e->getErrorInfo());
      var_dump($e->getLastException());
      var_dump($e->getLastRequest());
    }

正常返回示例:JSON格式。

{
  "RequestId": "43A29C77-405E-4CC0-BC55-EE694AD00655",
  "Data": {
    "Id": "docmind-20220712-b15fe420"
  }  
}

步骤二:轮询文档抽取结果查询服务GetDocumentExtractResult接口

调用查询接口的入参ID就是前面异步任务提交接口返回的出参ID,查询结果有处理中、处理成功、处理失败三种情况。建议每10秒轮询一次,最多轮询120分钟。若明确返回Completed为true或者超过轮询最大时间,则终止轮询。

请求参数

名称

类型

必填

描述

示例值

Id

string

需要查询的业务订单号,订单号从提交接口的返回结果中获取。

docmind-20220712-b15f****

返回参数

名称

类型

描述

示例值

RequestId

string

请求唯一Id。

43A29C77-405E-4CC0-BC55-EE694AD0****

Completed

boolean

异步任务是否处理完成,false表示任务仍在处理中,true代表任务处理完成,有处理成功或处理失败的明确结果。

true

Status

string

异步任务处理完成的状态,最终处理结束后的状态。Success为处理成功,Fail为处理失败。

Success

Data

string

返回数据,通用KV结构化内容的JSON数据结构返回。

-

Code

string

状态码。

200

Message

string

详细信息。

Message

使用示例

以Java SDK为例,调用文档抽取接口的结果查询类API示例代码如下,调用getDocumentExtractResult接口,通过Id参数传入查询流水号。

说明

获取并使用AccessKey信息的方式,可参考SDK概述中不同语言的SDK使用指南。

import com.aliyun.docmind_api20220711.models.*;
import com.aliyun.teaopenapi.models.Config;
import com.aliyun.docmind_api20220711.Client;

public static void submit() throws Exception {
    // 使用默认凭证初始化Credentials Client。
    com.aliyun.credentials.Client credentialClient = new com.aliyun.credentials.Client();
    Config config = new Config()
        // 通过credentials获取配置中的AccessKey ID
        .setAccessKeyId(credentialClient.getAccessKeyId())
        // 通过credentials获取配置中的AccessKey Secret
        .setAccessKeySecret(credentialClient.getAccessKeySecret());
    // 访问的域名,支持ipv4和ipv6两种方式,ipv6请使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
    config.endpoint = "docmind-api.cn-hangzhou.aliyuncs.com";
    Client client = new Client(config);
    GetDocumentExtractResultRequest resultRequest = new GetDocumentExtractResultRequest();
    resultRequest.id = "docmind-20220902-824b****";
    GetDocumentExtractResultResponse response = client.getDocumentExtractResult(resultRequest);
}
const Client = require('@alicloud/docmind-api20220711');
const Credential = require('@alicloud/credentials');

const getResult = async () => {
  // 使用默认凭证初始化Credentials Client
  const cred = new Credential.default();
  const client = new Client.default({
    // 访问的域名,支持ipv4和ipv6两种方式,ipv6请使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
    endpoint: 'docmind-api.cn-hangzhou.aliyuncs.com',
    // 通过credentials获取配置中的AccessKey ID
    accessKeyId: cred.credential.accessKeyId,
    // 通过credentials获取配置中的AccessKey Secret
    accessKeySecret: cred.credential.accessKeySecret,
    type: 'access_key',
    regionId: 'cn-hangzhou'
  });
  
  const resultRequest = new Client.GetDocumentExtractResultRequest();
  resultRequest.id = "docmind-20220902-824b****";
  const response = await client.getDocumentExtractResult(resultRequest);
  
  return response.body;
}
from alibabacloud_docmind_api20220711.client import Client as docmind_api20220711Client
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_docmind_api20220711 import models as docmind_api20220711_models
from alibabacloud_tea_util.client import Client as UtilClient
from alibabacloud_credentials.client import Client as CredClient

def query():
    cred=CredClient()
    config = open_api_models.Config(
        # 通过credentials获取配置中的AccessKey ID
        access_key_id=cred.get_access_key_id(),
        # 通过credentials获取配置中的AccessKey Secret
        access_key_secret=cred.get_access_key_secret()
    )
    # 访问的域名
    config.endpoint = f'docmind-api.cn-hangzhou.aliyuncs.com'
    client = docmind_api20220711Client(config)
    request = docmind_api20220711_models.GetDocumentExtractResultRequest(
        # id :  任务提交接口返回的id
        id='docmind-20220902-824b****'
    )
    try:
        # 复制代码运行请自行打印 API 的返回值
        response = client.get_document_extract_result(request)
        # API返回值格式层级为 body -> data -> 具体属性。可根据业务需要打印相应的结果。获取属性值均以小写开头
        # 获取异步任务处理情况,可根据response.body.completed判断是否需要继续轮询结果
        print(response.body.completed)
        # 获取返回结果。建议先把response.body.data转成json,然后再从json里面取具体需要的值。
        print(response.body.data)       
    except Exception as error:
        # 如有需要,请打印 error
        UtilClient.assert_as_string(error.message)
        
import (
	"fmt"

	openClient "github.com/alibabacloud-go/darabonba-openapi/v2/client"
  "github.com/alibabacloud-go/docmind-api-20220711/client"
  "github.com/aliyun/credentials-go/credentials"
)

func submit(){
    // 使用默认凭证初始化Credentials Client。
		credential, err := credentials.NewCredential(nil)
		// 通过credentials获取配置中的AccessKey ID
		accessKeyId, err := credential.GetAccessKeyId()
		// 通过credentials获取配置中的AccessKey Secret
		accessKeySecret, err := credential.GetAccessKeySecret()
  	// 访问的域名,支持ipv4和ipv6两种方式,ipv6请使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
  	var endpoint string = "docmind-api.cn-hangzhou.aliyuncs.com"
		config := openClient.Config{AccessKeyId: accessKeyId, AccessKeySecret: accessKeySecret, Endpoint: &endpoint}
    // 初始化client
    cli, err := client.NewClient(&config)
    if err != nil {
      panic(err)
    }
    id := "docmind-20220925-76b1****"
    // 调用查询接口
    request := client.GetDocumentExtractResultRequest{Id: &id}
    response, err := cli.GetDocumentExtractResult(&request)
    if err != nil {
      panic(err)
    }
    // 打印查询结果
    fmt.Println(response.Body.String())
}
using Newtonsoft.Json;
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;

using Tea;
using Tea.Utils;

 public static void GetResult() 
        {
            // 使用默认凭证初始化Credentials Client。
          	var akCredential = new Aliyun.Credentials.Client(null);
            AlibabaCloud.OpenApiClient.Models.Config config = new AlibabaCloud.OpenApiClient.Models.Config
            {
                // 通过credentials获取配置中的AccessKey Secret
                AccessKeyId = akCredential.GetAccessKeyId(),
                // 通过credentials获取配置中的AccessKey Secret
                AccessKeySecret = akCredential.GetAccessKeySecret(),
            };
            // 访问的域名
            config.Endpoint = "docmind-api.cn-hangzhou.aliyuncs.com";
            AlibabaCloud.SDK.Docmind_api20220711.Client client = new AlibabaCloud.SDK.Docmind_api20220711.Client(config);
            AlibabaCloud.SDK.Docmind_api20220711.Models.GetDocumentExtractResultRequest request = new AlibabaCloud.SDK.Docmind_api20220711.Models.GetDocumentExtractResultRequest
            {
                Id = "docmind-20220902-824b****"
            };
            AlibabaCloud.TeaUtil.Models.RuntimeOptions runtime = new AlibabaCloud.TeaUtil.Models.RuntimeOptions();
            try
            {
                // 复制代码运行请自行打印 API 的返回值
                client.GetDocumentExtractResult(request);
            }
            catch (TeaException error)
            {
                // 如有需要,请打印 error
                AlibabaCloud.TeaUtil.Common.AssertAsString(error.Message);
            }
            catch (Exception _error)
            {
                TeaException error = new TeaException(new Dictionary<string, object>
                {
                    { "message", _error.Message }
                });
                // 如有需要,请打印 error
                AlibabaCloud.TeaUtil.Common.AssertAsString(error.Message);
            }
        }
use AlibabaCloud\SDK\Docmindapi\V20220711\Docmindapi;
use AlibabaCloud\SDK\Docmindapi\V20220711\Models\GetDocumentExtractResultRequest;
use Darabonba\OpenApi\Models\Config;
use AlibabaCloud\Tea\Utils\Utils\RuntimeOptions;
use AlibabaCloud\Tea\Exception\TeaUnableRetryError;
use AlibabaCloud\Credentials\Credential;

// 使用默认凭证初始化Credentials Client。
$bearerToken = new Credential();    
$config = new Config();
// 访问的域名,支持ipv4和ipv6两种方式,ipv6请使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
$config->endpoint = "docmind-api.cn-hangzhou.aliyuncs.com";
// 通过credentials获取配置中的AccessKey ID
$config->accessKeyId = $bearerToken->getCredential()->getAccessKeyId();
// 通过credentials获取配置中的AccessKey Secret
$config->accessKeySecret = $bearerToken->getCredential()->getAccessKeySecret();
$config->type = "access_key";
$config->regionId = "cn-hangzhou";
$client = new Docmindapi($config);
$request = new GetDocumentExtractResultRequest();   
$request->id = "docmind-20220902-824b****";

$runtime = new RuntimeOptions();
$runtime->maxIdleConns = 3;
$runtime->connectTimeout = 10000;
$runtime->readTimeout = 10000;

try {
  $response = $client->getDocumentExtractResult($request, $runtime);
  var_dump($response->toMap());
} catch (TeaUnableRetryError $e) {
  var_dump($e->getMessage());
  var_dump($e->getErrorInfo());
  var_dump($e->getLastException());
  var_dump($e->getLastRequest());
}

查询结果

查询结果有处理中、处理成功、处理失败三种情况,分别说明每种情况的返回结果示例。

  • 处理中的返回结果如下所示:

    {
      "RequestId": "2AABD2C2-D24F-12F7-875D-683A27C3****",
      "Completed": false,
      "Code": "DocProcessing",
      "Message": "Document processing",
      "HostId": "ocr-api.cn-hangzhou.aliyuncs.com",
      "Recommend": "https://next.api.aliyun.com/troubleshoot?q=DocProcessing&product=docmind-api"
    }

    处理中Completed会返回false,表示任务没有处理结束,仍在处理中。这种情况需要继续轮询,直到明确返回Completed为true或者超过轮询最大时间。

  • 处理失败的返回结果如下所示:

    {
      "RequestId": "A8EF3A36-1380-1116-A39E-B377BE27****",
      "Completed": true,
      "Status": "Fail",
      "Code": "UrlNotLegal",
      "Message": "Failed to process the document.  The document url you provided is not legal.",
      "HostId": "docmind-api.cn-hangzhou.aliyuncs.com",
      "Recommend": "https://next.api.aliyun.com/troubleshoot?q=IDP.UrlNotLegal&product=docmind-api"
    }

    处理失败Completed会返回true,表示任务处理结束,Status返回值为Fail,表示处理成功失败,同时会返回失败Code和详细原因Message。访问错误码可以查看错误码详细介绍。

  • 处理成功的返回结果如下所示:

    {
    	"Status": "Success",
    	"RequestId": "73134E1A-E281-1B2C-A105-D0ECFE2D****",
    	"Completed": true,
    	"Data": {
    		"status": "success",
    		"errorCode": null,
    		"errorMessage": null,
    		"result": {
    			"kvListInfo": [
    				[
    					[{
    							"value": [
    								"019W"
    							],
    							"key": [
    								"Voyage"
    							],
    							"extInfo": {
    								"table_id": "adf1d2f40b208d4923764d2ea6175365"
    							}
    						},
    						{
    							"value": [
    								"Ningbo"
    							],
    							"key": [
    								"POL"
    							],
    							"extInfo": {
    								"table_id": "adf1d2f40b208d4923764d2ea6175365"
    							}
    						},
    						{
    							"value": [
    								"2022-05-3110:00"
    							],
    							"key": [
    								"ETD"
    							],
    							"extInfo": {
    								"table_id": "adf1d2f40b208d4923764d2ea6175365"
    							}
    						},
    						{
    							"value": [
    								"Piraeus"
    							],
    							"key": [
    								"POD"
    							],
    							"extInfo": {
    								"table_id": "adf1d2f40b208d4923764d2ea6175365"
    							}
    						},
    						{
    							"value": [
    								"2022-06-2007:00"
    							],
    							"key": [
    								"ETA"
    							],
    							"extInfo": {
    								"table_id": "adf1d2f40b208d4923764d2ea6175365"
    							}
    						}
    					],
    					[{
    							"value": [
    								""
    							],
    							"key": [
    								"Voyage"
    							],
    							"extInfo": {
    								"table_id": "adf1d2f40b208d4923764d2ea6175365"
    							}
    						},
    						{
    							"value": [
    								"Piraeus"
    							],
    							"key": [
    								"POL"
    							],
    							"extInfo": {
    								"table_id": "adf1d2f40b208d4923764d2ea6175365"
    							}
    						},
    						{
    							"value": [
    								""
    							],
    							"key": [
    								"ETD"
    							],
    							"extInfo": {
    								"table_id": "adf1d2f40b208d4923764d2ea6175365"
    							}
    						},
    						{
    							"value": [
    								"Algeciras"
    							],
    							"key": [
    								"POD"
    							],
    							"extInfo": {
    								"table_id": "adf1d2f40b208d4923764d2ea6175365"
    							}
    						},
    						{
    							"value": [
    								""
    							],
    							"key": [
    								"ETA"
    							],
    							"extInfo": {
    								"table_id": "adf1d2f40b208d4923764d2ea6175365"
    							}
    						}
    					]
    				]
    			],
    			"kvInfo": [{
    				"value": [
    					"Ningbo"
    				],
    				"key": [
    					"接货地"
    				],
    				"extInfo": {
    					"valueLayoutId": "7248c73597b46266b9c84505f2bab8fe",
    					"valueConfidence": 0.9994202852249146,
    					"keyConfidence": 0.9719930092493693,
    					"keyLayoutId": "7248c73597b46266b9c84505f2bab8fe"
    				}
    			}],
    			"pageInfo": [{
    					"imageWidth": 1917,
    					"imageUrl": "http://docmind-api-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/idp/ab4fe775d9dd423182f30db57c62d379/example1.jpg?Expires=1661221931&OSSAccessKeyId=XX&Signature=YY",
    					"angle": 0.0,
    					"pageIdCurDoc": 1,
    					"imageType": "JPEG",
    					"imageHeight": 2713,
    					"pageIdAllDocs": 1
    				},
    				{
    					"imageWidth": 1917,
    					"imageUrl": "http://docmind-api-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/idp/ab4fe775d9dd423182f30db57c62d379/example2.jpg?Expires=1661221931&OSSAccessKeyId=XX&Signature=YY",
    					"angle": 0.0,
    					"pageIdCurDoc": 2,
    					"imageType": "JPEG",
    					"imageHeight": 2713,
    					"pageIdAllDocs": 2
    				},
    				{
    					"imageWidth": 1917,
    					"imageUrl": "http://docmind-api-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/idp/ab4fe775d9dd423182f30db57c62d379/example3.jpg?Expires=1661221931&OSSAccessKeyId=XX&Signature=YY",
    					"angle": 0.0,
    					"pageIdCurDoc": 3,
    					"imageType": "JPEG",
    					"imageHeight": 2713,
    					"pageIdAllDocs": 3
    				}
    			]
    		}
    	}
    }

    处理成功Completed会返回true,表示任务处理结束,Status返回值为Success,表示处理成功。具体的处理结果在Data节点中,如下所示为Data节点的具体格式:

    名称

    类型

    示例值

    描述

    status

    String

    init

    状态值,包括:

    init(初始化),processing(处理中),success(成功)

    result

    JSONObject

    -

    kv抽取结果

    kvListInfo

    array嵌入array

    需要注意,可能有多个表格。

    kv列表的信息,一般出现于表格kv,请特别注意,kvListInfo本身是个array嵌套array的形式,因为可能涉及到多个表格的表格抽取结果,所以每个表格都是一组kvInfo的集合

    kvInfo

    array

    -

    段落kv信息

    valuePos

    array

    -

    value的坐标,可能有多个

    width

    int

    863

    x

    int

    410

    x坐标

    y

    int

    837

    y坐标

    pageId

    int

    0

    页码

    height

    int

    45

    existCorrection

    boolean

    false

    是否存在纠错

    existTranscoding

    boolean

    true

    是否存在转码

    originalValue

    array

    某公司

    处理前的原始值

    keyPos

    array

    -

    key的坐标

    width

    int

    863

    x

    int

    410

    x坐标

    y

    int

    837

    y坐标

    pageId

    int

    0

    页码

    height

    int

    45

    keyDesc

    array

    甲方名称

    key的描述

    value

    array

    某公司

    最终处理后的抽取值

    key

    array

    firstPartyName

    key的英文code

    extInfo

    object

    -

    扩展信息

    valueConfidence

    double

    0.9994202852249146

    value的置信度

    keyConfidence

    double

    0.9719930092493693

    key的置信度

    extractFrom

    String

    -

    抽取来源,默认是nlp

    pageInfo

    array

    -

    文档页面列表

    imageType

    string

    JPEG

    页面转换后的类型

    imageUrl

    string

    -

    页面转图片后的图片url

    angle

    float

    90

    页面转图片后的图片的旋转角度,为逆时针旋转角度

    imageWidth

    int

    1917

    页面转图的宽

    imageHeight

    int

    1917

    页面转图的高

    pageIdCurDoc

    int

    0

    页面在当前文档的页索引,从0开始

    pageIdAllDocs

    int

    0

    页面在所有文档的页索引,从0开始