通过API上传文件

您可以将私有领域的文件上传为百炼的知识库,让大模型应用可以回答私有领域的问题。百炼支持通过控制台或者API上传文件,本文介绍如何使用API将文件上传至百炼中。

使用流程

您可以参考以下流程,通过API将文件上传到百炼。

image
  1. 调用ApplyFileUploadLease接口申请文件上传租约。

    调用ApplyFileUploadLease接口后,申请文件上传租约,即获取上传文件的HTTP链接。其中,请求参数Md5指文件的MD5值,用于验证文件是否完整,您可以使用JavaMessageDigest类生成该值。

    具体操作,请参见ApplyFileUploadLease - 申请文档上传租约

    说明

    调用ApplyFileUploadLease接口返回的HTTP链接有效期为分钟级,请尽快上传文件,否则链接过期则无法上传。

  2. 使用文件上传租约将文件上传到百炼文件服务器。

    您可以通过HTTP链接以及相应的参数,上传文件到百炼文件服务器。此时,文件仅暂存在百炼的存储空间内。

    1. 成功调用ApplyFileUploadLease接口后,您将得到类似下面的返回值,其中Param参数是用来上传文件的临时HTTP请求URL,Method为方法,Headers为需要在请求Header中指定的KV对。

      {
        "RequestId": "778C0B3B-xxxx-5FC1-A947-36EDD13606AB",
        "Success": true,
        "Message": "",
        "Code": "success",
        "Status": "200",
        "Data": {
          "FileUploadLeaseId": "1e6a159107384782be5e45ac4759b247.1719325231035",
          "Type": "HTTP",
          "Param": {
            "Method": "PUT",
            "Url": "https://bailian-datahub-data-origin-prod.oss-cn-hangzhou.aliyuncs.com/1005426495169178/10024405/68abd1dea7b6404d8f7d7b9f7fbd332d.1716698936847.pdf?Expires=1716699536&OSSAccessKeyId=TestID&Signature=HfwPUZo4pR6DatSDym0zFKVh9Wg%3D",
            "Headers": "        \"X-bailian-extra\": \"MTAwNTQyNjQ5NTE2OTE3OA==\",\n        \"Content-Type\": \"application/pdf\""
          }
        }
      }
    2. 以下代码为您展示如何将文件上传到百炼文件服务器。

      说明

      该代码仅为示例代码,请勿直接在生产环境使用。

      单击展开查看详情

      import requests
      from urllib.parse import urlparse
      
      def upload_file(pre_signed_url, file_path):
          try:
              # 设置请求头
              headers = {
                  "X-bailian-extra": "NTQ0MzUyMDc2MzgzNzcwMw==",
                  "Content-Type": "application/pdf"
              }
      
              # 读取文件并上传
              with open(file_path, 'rb') as file:
                  response = requests.put(pre_signed_url, data=file, headers=headers)
      
              # 检查响应状态码
              if response.status_code == 200:
                  print("File uploaded successfully.")
              else:
                  print(f"Failed to upload the file. ResponseCode: {response.status_code}")
      
          except Exception as e:
              print(f"An error occurred: {str(e)}")
      
      def upload_file_link(pre_signed_url, source_url_string):
          try:
              # 设置请求头
              headers = {
                  "X-bailian-extra": "NTQ0MzUyMDc2MzgzNzcwMw==",
                  "Content-Type": "application/pdf"
              }
      
              # 获取源文件
              source_response = requests.get(source_url_string)
              if source_response.status_code != 200:
                  raise RuntimeError("Failed to get source file.")
      
              # 上传文件
              response = requests.put(pre_signed_url, data=source_response.content, headers=headers)
      
              # 检查响应状态码
              if response.status_code == 200:
                  print("File uploaded successfully.")
              else:
                  print(f"Failed to upload the file. ResponseCode: {response.status_code}")
      
          except Exception as e:
              print(f"An error occurred: {str(e)}")
      
      if __name__ == "__main__":
          pre_signed_url_or_http_url = "https://bailian-datahub-data-origin-prod.oss-cn-beijing.aliyuncs.com/1005426495169178/10036719/2070f50790a8482b985c36691cc7b093.1725003661081.pdf?Expires=1725004261&OSSAccessKeyId=LTAI5tKzNnKPFwCJSCpx****&Signature=OPgdNJ%2BMU%2FLtRjBzXiUjVYQsphw%3D"
      
          # 上传网络文件
          file_path = "https://test-lxg-quanxian.oss-cn-beijing.aliyuncs.com/%E6%B5%8B%E8%AF%95-%E6%96%B0%E9%97%BB.pdf?Expires=1725010144&OSSAccessKeyId=TMP.3KfyS1Pyk8YQ4F9fTYGhVpRXe9QJbRfFrKiP6ujzXWr2zu77Pmb8syzh8nLBZkSUskbdLd9KsNTC6RpeUt8pzScnJ9****&Signature=4jxj7hfJTnHWeM49dcd9sWWkXWs%3D"
          upload_file_link(pre_signed_url_or_http_url, file_path)
      
          # 上传本地文件
          # file_path = "/Users/legolas/Downloads/测试-新闻.pdf"
          # upload_file(pre_signed_url_or_http_url, file_path)
      //代码仅为示例代码,未进行各类测试,请勿在生产环境直接使用
      import java.io.BufferedInputStream;
      import java.io.DataOutputStream;
      import java.io.FileInputStream;
      import java.io.InputStream;
      import java.net.HttpURLConnection;
      import java.net.URL;
      
      public class UploadFile{
      
      
          public static void uploadFile(String preSignedUrl, String filePath) {
              HttpURLConnection connection = null;
              try {
                  // 创建URL对象
                  URL url = new URL(preSignedUrl);
                  connection = (HttpURLConnection) url.openConnection();
      
                  // 设置请求方法为PUT,预签名URL默认用于PUT操作进行文件上传
                  connection.setRequestMethod("PUT");
      
                  // 允许向connection输出,因为这个连接是用于上传文件的
                  connection.setDoOutput(true);
      
                  // 设置请求头,这里设置ApplyFileUploadLease接口返回的Data.Param.Headers中的参数
                  connection.setRequestProperty("X-bailian-extra", "NTQ0MzUyMDc2MzgzNzcwMw==");
                  connection.setRequestProperty("Content-Type", "application/pdf");
      
                  // 读取文件并通过连接上传
                  try (DataOutputStream outStream = new DataOutputStream(connection.getOutputStream());
                       FileInputStream fileInputStream = new FileInputStream(filePath)) {
                      byte[] buffer = new byte[4096];
                      int bytesRead;
      
                      while ((bytesRead = fileInputStream.read(buffer)) != -1) {
                          outStream.write(buffer, 0, bytesRead);
                      }
      
                      outStream.flush();
                  }
      
                  // 检查响应代码
                  int responseCode = connection.getResponseCode();
                  if (responseCode == HttpURLConnection.HTTP_OK) {
                      // 文件上传成功处理
                      System.out.println("File uploaded successfully.");
                  } else {
                      // 文件上传失败处理
                      System.out.println("Failed to upload the file. ResponseCode: " + responseCode);
                  }
              } catch (Exception e) {
                  e.printStackTrace();
              } finally {
                  if (connection != null) {
                      connection.disconnect();
                  }
              }
          }
      
          public static void uploadFileLink(String preSignedUrl, String sourceUrlString) {
              HttpURLConnection connection = null;
              try {
                  // 创建URL对象
                  URL url = new URL(preSignedUrl);
                  connection = (HttpURLConnection) url.openConnection();
      
                  // 设置请求方法为PUT,预签名URL默认用于PUT操作进行文件上传
                  connection.setRequestMethod("PUT");
      
                  // 允许向connection输出,因为这个连接是用于上传文件的
                  connection.setDoOutput(true);
      
                  // 设置请求头,这里设置ApplyFileUploadLease接口返回的Data.Param.Headers中的参数
                  connection.setRequestProperty("X-bailian-extra", "NTQ0MzUyMDc2MzgzNzcwMw==");
                  connection.setRequestProperty("Content-Type", "application/pdf");
      
                  URL sourceUrl = new URL(sourceUrlString);
                  HttpURLConnection sourceConnection = (HttpURLConnection) sourceUrl.openConnection();
      
                  // 设置请求方法为GET
                  sourceConnection.setRequestMethod("GET");
                  // 获取响应码,200表示请求成功
                  int sourceFileResponseCode = sourceConnection.getResponseCode();
      
                  // 读取文件并通过连接上传
                  if (sourceFileResponseCode != HttpURLConnection.HTTP_OK){
                      throw new RuntimeException("Failed to get source file.");
                  }
                  try (DataOutputStream outStream = new DataOutputStream(connection.getOutputStream());
                       InputStream in = new BufferedInputStream(sourceConnection.getInputStream())) {
                      byte[] buffer = new byte[4096];
                      int bytesRead;
      
                      while ((bytesRead = in.read(buffer)) != -1) {
                          outStream.write(buffer, 0, bytesRead);
                      }
      
                      outStream.flush();
                  }
      
                  // 检查响应代码
                  int responseCode = connection.getResponseCode();
                  if (responseCode == HttpURLConnection.HTTP_OK) {
                      // 文件上传成功处理
                      System.out.println("File uploaded successfully.");
                  } else {
                      // 文件上传失败处理
                      System.out.println("Failed to upload the file. ResponseCode: " + responseCode);
                  }
              } catch (Exception e) {
                  e.printStackTrace();
              } finally {
                  if (connection != null) {
                      connection.disconnect();
                  }
              }
          }
      
          public static void main(String[] args) {
              String preSignedUrlOrHttpUrl = "https://bailian-datahub-data-origin-prod.oss-cn-beijing.aliyuncs.com/1005426495169178/10036719/2070f50790a8482b985c36691cc7b093.1725003661081.pdf?Expires=1725004261&OSSAccessKeyId=LTAI5tKzNnKPFwCJSCpx****&Signature=OPgdNJ%2BMU%2FLtRjBzXiUjVYQsphw%3D";
      
              //以下代码是上传网络文件,请将filepath替换为自己的网络文件路径
              String filePath = "https://test-lxg-quanxian.oss-cn-beijing.aliyuncs.com/%E6%B5%8B%E8%AF%95-%E6%96%B0%E9%97%BB.pdf?Expires=1725010144&OSSAccessKeyId=TMP.3KfyS1Pyk8YQ4F9fTYGhVpRXe9QJbRfFrKiP6ujzXWr2zu77Pmb8syzh8nLBZkSUskbdLd9KsNTC6RpeUt8pzScnJ9****&Signature=4jxj7hfJTnHWeM49dcd9sWWkXWs%3D";
              uploadFileLink(preSignedUrlOrHttpUrl,filePath);
      
              //以下代码是上传本地文件
              //String filePath = "/Users/legolas/Downloads/测试-新闻.pdf";
              //uploadFile(preSignedUrlOrHttpUrl, filePath);
          }
      }

      其中,部分参数值需要根据您的实际业务替换:

      参数

      说明

      connection.setRequestProperty

      设置HTTP请求头。对应ApplyFileUploadLease接口返回的Headers值。

      preSignedUrlOrHttpUrl

      申请文件上传租约后得到的HTTP链接。

      对应ApplyFileUploadLease接口返回的Url值。

      filePath

      文件的路径。

  3. 调用AddFile接口,将文件添加到百炼系统中。

    将文件成功上传到百炼文件服务器后,需要调用AddFile接口,将文件添加到百炼系统中。

    具体操作,请参见AddFile - 添加文档

  4. 调用DescribeFile接口,查询已经添加到百炼中的文件详情,例如文件解析状态等。

    待文件成功添加到百炼后,系统会自动启动文件的解析。文件解析有排队机制,如果队列较长,文件可能需要等待一段时间才能解析完成。您可以登录阿里云百炼大模型服务平台,在数据中心>数据管理页面查看文件解析进度,或者调用DescribeFile接口查询文档解析状态。

    具体操作,请参见DescribeFile - 查询文档状态

后续步骤

数据导入后,您需要为知识数据创建知识索引,步骤如下:

  1. 创建索引

  2. 查询索引创建任务状态

  3. 查询索引下的文档列表

相关文档

除了使用API上传文件,您还可以通过控制台方式上传文件。具体操作,请参见数据导入操作说明