文档

高实时性服务数据增量导入API

更新时间:

1. 描述

当前仅支持实时链路服务,如出现索引构建版本不支持异常请联系管理员处理。

查询导入进度请使用数据处理状态查询API接口

2. 参数

2.1. 入参

参数

类型

是否必传

说明

ServiceId

Long

服务id

DataType

String

数据类型;

  • document:文档,完整文档导入链路

  • json_line:json行,每一行为一条记录直接导入,不需要解析、chunk

Documents

List<Document>

导入文档

Document:

参数

类型

是否必传

说明

FilePath

String

文件地址

FileName

String

文件名称

FileExtension

String

文件后缀

DocId

String

document必传

文档id,唯一标识

Version

String

document必传

版本号,使用compareTo比较,请使用格式长度一致的字符串。

BizParams

Map

文档描述信息,例如标签、用户等,不为空,则导入索引,key为索引字段名,value为要导入的值

{
  "ServiceId": "1",
  "DataType": "document",
  "Documents": [
    {
      "FileName": "test.pdf",
      "FilePath": "https://xxxx/test.pdf",
      "FileExtension": "pdf",
      "DocId": "1",
      "Version": "20240101000000",
      "BizParams": {
        "title": "测试"
      }
    }
  ]
}

2.2. 出参

参数名称

参数类型

参数描述

Code

Integer

错误码

Success

Boolean

是否成功

RequestId

String

请求id

Data

Long

导入批次id

Msg

String

响应信息

{
  "RequestId": "5321A649-91B5-1B40-B331-B88F21D5AA27",
  "Data": 1,
  "Code": 200,
  "Success": true
}

3. SDK调用

阿里云SDK文档

https://help.aliyun.com/zh/sdk/developer-reference/?spm=a2c4g.11186623.0.0.2d865a33cm2q5I

3.1. Java SDK

3.1.1. Maven依赖

相关Java SDK引入的Maven依赖

<dependency>
    <groupId>com.aliyun</groupId>
    <artifactId>alinlp20200629</artifactId>
    <version>2.7.0</version>
</dependency>


<!--出现java.lang.NoSuchMethodError: com.aliyun.credentials.Client.getCredential()Lcom/aliyun/credentials/models/CredentialModel;异常则引入-->
<dependency>
    <groupId>com.aliyun</groupId>
    <artifactId>credentials-java</artifactId>
    <version>0.3.0</version>
</dependency>

3.1.2. 调用示例

/**
 * 使用AK&SK初始化账号Client
 * @param accessKeyId
 * @param accessKeySecret
 * @return Client
 * @throws Exception
 */
public static Client createClient(String accessKeyId, String accessKeySecret) throws Exception {
    Config config = new Config()
    // 必填,您的 AccessKey ID
    .setAccessKeyId(accessKeyId)
    // 必填,您的 AccessKey Secret
    .setAccessKeySecret(accessKeySecret);
    // Endpoint 请参考 https://api.aliyun.com/product/alinlp
    config.endpoint = "alinlp.cn-beijing.aliyuncs.com";
    return new Client(config);
}

public static void main(String[] args_) throws Exception {
    PostMSServiceDataImport();
}
public static void PostMSServiceDataImport() throws Exception {
    // 工程代码泄露可能会导致 AccessKey 泄露,并威胁账号下所有资源的安全性。以下代码示例仅供参考,建议使用更安全的 STS 方式,更多鉴权访问方式请参见:https://help.aliyun.com/document_detail/378657.html
    Client client = createClient(AccessConstant.AK3, AccessConstant.SK3);
    PostMSServiceDataImportRequest.PostMSServiceDataImportRequestDocuments documents = getDocuments();
    PostMSServiceDataImportRequest requset = new PostMSServiceDataImportRequest()
            .setServiceId(1L)
            .setDataType("document")
            .setDocuments(Arrays.asList(documents));
    try {
        PostMSServiceDataImportResponse response = client.postMSServiceDataImport(requset);
        System.out.println(JSON.toJSONString(response.getBody(), SerializerFeature.PrettyFormat));
    } catch (Exception e) {
        e.printStackTrace();
    }
}

private static PostMSServiceDataImportRequest.PostMSServiceDataImportRequestDocuments getDocuments() {
    PostMSServiceDataImportRequest.PostMSServiceDataImportRequestDocuments documents = new PostMSServiceDataImportRequest.PostMSServiceDataImportRequestDocuments();
    documents.setFileName("test.pdf");
    documents.setFilePath("https://xxxx/test.pdf");
    documents.setFileExtension("pdf");
    documents.setDocId("1");
    documents.setVersion("20240101000000");
    Map<String, Object> bizParams = new HashMap<>();
    bizParams.put("title", "测试");
    documents.setBizParams(bizParams);
    return documents;
}

3.2. Python SDK

3.2.1. pip源

pip install alibabacloud_alinlp20200629==2.8.0

3.2.2. 调用示例

from alibabacloud_alinlp20200629.client import Client as Alinlp20200629Client
from alibabacloud_alinlp20200629.models import PostMSServiceDataImportRequest, PostMSServiceDataImportRequestDocuments
from alibabacloud_tea_openapi import models as open_api_models


def create_client():
    config = open_api_models.Config(
        # 必填,您的 AccessKey ID,
        access_key_id='AccessKey ID',
        # 必填,您的 AccessKey Secret,
        access_key_secret='AccessKey Secret'
    )
    # Endpoint 请参考 https://api.aliyun.com/product/alinlp
    config.endpoint = 'alinlp-share.cn-beijing.aliyuncs.com'
    return Alinlp20200629Client(config)


def data_import():
    client = create_client()
    request = PostMSServiceDataImportRequest()
    request.service_id = 1
    request.data_type = "document"
    document = PostMSServiceDataImportRequestDocuments(
        file_name="test.pdf",
        file_path="https://xxxx/test.pdf",
        file_extension="pdf",
        doc_id="1",
        version="20240101000000",
        biz_params={"title": "测试"}
    )
    request.documents = [document]
    response = client.post_msservice_data_import(request)
    print(response.body)


if __name__ == '__main__':
    data_import()
  • 本页导读 (0)