通过API上传文档

您可以将私有领域的文档上传至百炼的知识库,使大模型应用可以回答私有领域的问题。百炼支持通过控制台或API上传文档。本文介绍如何使用API将文档上传至百炼。

API使用前提:已开通服务并安装百炼SDK,详情参见API概览
不支持通过API上传结构化数据,请通过控制台上传。您可以将知识库与云数据库RDS关联,以实现结构化知识库的自动更新,详情参见创建知识库

操作步骤

通过API将非结构化文档上传至百炼,只需四步:

image
  1. 调用ApplyFileUploadLease接口申请文档上传租约

    调用ApplyFileUploadLease接口,获取用于上传文档的 URL 链接(租约),以及上传所需的相关参数。成功调用ApplyFileUploadLease接口的响应示例如下:

    此接口请求参数中的Md5字段指的是文档的MD5值,用于验证文档是否完整,您可以使用JavaMessageDigest类或Pythonhashlib模块生成该值。
    此接口响应参数中的Data.Param.MethodData.Param.UrlData.Param.HeadersX-bailian-extraContent-Type字段的值将用于下一步上传文档至百炼的临时存储
    此接口响应参数中的Data.Param.Url字段的值(即租约)有效期为分钟级,请尽快上传文档,以免链接过期导致无法上传。
    {
      "RequestId": "778C0B3B-59C2-5FC1-A947-36EDD1xxxxxx",
      "Success": true,
      "Message": "",
      "Code": "success",
      "Status": "200",
      "Data": {
        "FileUploadLeaseId": "1e6a159107384782be5e45ac4759b247.1719325231035",
        "Type": "HTTP",
        "Param": {
          "Method": "PUT",
          "Url": "https://bailian-datahub-data-origin-prod.oss-cn-hangzhou.aliyuncs.com/1005426495169178/10024405/68abd1dea7b6404d8f7d7b9f7fbd332d.1716698936847.pdf?Expires=1716699536&OSSAccessKeyId=TestID&Signature=HfwPUZo4pR6DatSDym0zFKVh9Wg%3D",
          "Headers": "        \"X-bailian-extra\": \"MTAwNTQyNjQ5NTE2OTE3OA==\",\n        \"Content-Type\": \"application/pdf\""
        }
      }
    }
  2. 上传文档至百炼的临时存储

    使用上一步返回的租约以及相关参数,将文档上传至百炼的临时存储暂存,示例代码如下。

    Python

    示例代码

    # 示例代码仅供参考,请勿在生产环境中直接使用
    import requests
    from urllib.parse import urlparse
    
    def upload_file(pre_signed_url, file_path):
        try:
            # 设置请求头
            headers = {
                "X-bailian-extra": "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param.HeadersX-bailian-extra字段的值",
                "Content-Type": "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param.HeadersContent-Type字段的值"
            }
    
            # 读取文档并上传
            with open(file_path, 'rb') as file:
                # 下方设置请求方法用于文档上传,需与您在上一步中调用ApplyFileUploadLease接口实际返回的Data.ParamMethod字段的值一致
                response = requests.put(pre_signed_url, data=file, headers=headers)
    
            # 检查响应状态码
            if response.status_code == 200:
                print("File uploaded successfully.")
            else:
                print(f"Failed to upload the file. ResponseCode: {response.status_code}")
    
        except Exception as e:
            print(f"An error occurred: {str(e)}")
    
    def upload_file_link(pre_signed_url, source_url_string):
        try:
            # 设置请求头
            headers = {
                "X-bailian-extra": "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param.HeadersX-bailian-extra字段的值",
                "Content-Type": "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param.HeadersContent-Type字段的值"
            }
    
            # 设置访问OSS的请求方法为GET
            source_response = requests.get(source_url_string)
            if source_response.status_code != 200:
                raise RuntimeError("Failed to get source file.")
    
            # 下方设置请求方法用于文档上传,需与您在上一步中调用ApplyFileUploadLease接口实际返回的Data.ParamMethod字段的值一致
            response = requests.put(pre_signed_url, data=source_response.content, headers=headers)
    
            # 检查响应状态码
            if response.status_code == 200:
                print("File uploaded successfully.")
            else:
                print(f"Failed to upload the file. ResponseCode: {response.status_code}")
    
        except Exception as e:
            print(f"An error occurred: {str(e)}")
    
    if __name__ == "__main__":
    
        pre_signed_url_or_http_url = "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.ParamUrl字段的值"
    
        # 文档来源可以是本地,上传本地文档至百炼临时存储
        file_path = "请替换为您需要上传文档的实际本地路径"
        upload_file(pre_signed_url_or_http_url, file_path)
    
        # 文档来源还可以是OSS
        # file_path = "请替换为您需要上传文档的实际OSS可公网访问地址"
        # upload_file_link(pre_signed_url_or_http_url, file_path)
    

    Java

    示例代码

    // 示例代码仅供参考,请勿在生产环境中直接使用
    import java.io.BufferedInputStream;
    import java.io.DataOutputStream;
    import java.io.FileInputStream;
    import java.io.InputStream;
    import java.net.HttpURLConnection;
    import java.net.URL;
    
    public class UploadFile{
    
        public static void uploadFile(String preSignedUrl, String filePath) {
            HttpURLConnection connection = null;
            try {
                // 创建URL对象
                URL url = new URL(preSignedUrl);
                connection = (HttpURLConnection) url.openConnection();
    
                // 设置请求方法用于文档上传,需与您在上一步中调用ApplyFileUploadLease接口实际返回的Data.ParamMethod字段的值一致
                connection.setRequestMethod("PUT");
    
                // 允许向connection输出,因为这个连接是用于上传文档的
                connection.setDoOutput(true);
    
                connection.setRequestProperty("X-bailian-extra", "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param.HeadersX-bailian-extra字段的值");
                connection.setRequestProperty("Content-Type", "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param.HeadersContent-Type字段的值");
    
                // 读取文档并通过连接上传
                try (DataOutputStream outStream = new DataOutputStream(connection.getOutputStream());
                     FileInputStream fileInputStream = new FileInputStream(filePath)) {
                    byte[] buffer = new byte[4096];
                    int bytesRead;
    
                    while ((bytesRead = fileInputStream.read(buffer)) != -1) {
                        outStream.write(buffer, 0, bytesRead);
                    }
    
                    outStream.flush();
                }
    
                // 检查响应
                int responseCode = connection.getResponseCode();
                if (responseCode == HttpURLConnection.HTTP_OK) {
                    // 文档上传成功处理
                    System.out.println("File uploaded successfully.");
                } else {
                    // 文档上传失败处理
                    System.out.println("Failed to upload the file. ResponseCode: " + responseCode);
                }
            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                if (connection != null) {
                    connection.disconnect();
                }
            }
        }
    
        public static void uploadFileLink(String preSignedUrl, String sourceUrlString) {
            HttpURLConnection connection = null;
            try {
                // 创建URL对象
                URL url = new URL(preSignedUrl);
                connection = (HttpURLConnection) url.openConnection();
    
                // 设置请求方法用于文档上传,需与您在上一步中调用ApplyFileUploadLease接口实际返回的Data.ParamMethod字段的值一致
                connection.setRequestMethod("PUT");
    
                // 允许向connection输出,因为这个连接是用于上传文档的
                connection.setDoOutput(true);
    
                connection.setRequestProperty("X-bailian-extra", "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param.HeadersX-bailian-extra字段的值");
                connection.setRequestProperty("Content-Type", "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.Param.HeadersContent-Type字段的值");
    
                URL sourceUrl = new URL(sourceUrlString);
                HttpURLConnection sourceConnection = (HttpURLConnection) sourceUrl.openConnection();
    
                // 设置访问OSS的请求方法为GET
                sourceConnection.setRequestMethod("GET");
                // 获取响应码,200表示请求成功
                int sourceFileResponseCode = sourceConnection.getResponseCode();
    
                // 从OSS读取文档并通过连接上传
                if (sourceFileResponseCode != HttpURLConnection.HTTP_OK){
                    throw new RuntimeException("Failed to get source file.");
                }
                try (DataOutputStream outStream = new DataOutputStream(connection.getOutputStream());
                     InputStream in = new BufferedInputStream(sourceConnection.getInputStream())) {
                    byte[] buffer = new byte[4096];
                    int bytesRead;
    
                    while ((bytesRead = in.read(buffer)) != -1) {
                        outStream.write(buffer, 0, bytesRead);
                    }
    
                    outStream.flush();
                }
    
                // 检查响应
                int responseCode = connection.getResponseCode();
                if (responseCode == HttpURLConnection.HTTP_OK) {
                    // 文档上传成功
                    System.out.println("File uploaded successfully.");
                } else {
                    // 文档上传失败
                    System.out.println("Failed to upload the file. ResponseCode: " + responseCode);
                }
            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                if (connection != null) {
                    connection.disconnect();
                }
            }
        }
    
        public static void main(String[] args) {
    
            String preSignedUrlOrHttpUrl = "请替换为您在上一步中调用ApplyFileUploadLease接口实际返回的Data.ParamUrl字段的值";
    
            // 文档来源可以是本地,上传本地文档至百炼临时存储
            String filePath = "请替换为您需要上传文档的实际本地路径";
            uploadFile(preSignedUrlOrHttpUrl, filePath);
    
            // 文档来源还可以是OSS
            // String filePath = "请替换为您需要上传文档的实际OSS可公网访问地址";
            // uploadFileLink(preSignedUrlOrHttpUrl, filePath);
        }
    }
    
  3. 调用AddFile接口将文档添加至百炼的数据管理

    上一步操作成功后,文档将暂存于百炼的临时存储空间内 12 小时,请及时调用AddFile接口以完成最终上传(上传文档至百炼的数据管理)。

  4. 调用DescribeFile接口,轮询添加文档的解析状态

    上一步AddFile接口调用成功后,百炼将开始上传并解析文档。整个过程需一定时间,请耐心等待。您可以通过访问百炼的数据管理,或调用DescribeFile接口查询文档最新状态。上传完成后,DescribeFile接口响应参数中的Data.Status字段的值为PARSE_SUCCESS

下一步

相关文档

除了使用API上传文档,您还可以通过控制台上传文档,详情参见知识库