快速开始

适配阿里云智能文档解析输出格式的文档切分,相比普通切分会有更好的效果。

前言

区别于普通markdown,阿里云智能问答解析输出结果为结合layout信息的特殊格式。文档切分用于对这种特殊格式实现文档切分,相比转成markdown的普通切分会有更好的效果。

快速调用

调用前准备

代码示例

API-KEY设置

export DASHSCOPE_API_KEY=YOUR_DASHSCOPE_API_KEY

同步调用示例

import os
import logging
import json
import requests


DASHSCOPE_API_KEY = os.environ.get("DASHSCOPE_API_KEY", None)
if DASHSCOPE_API_KEY is None:
    logging.error("DASHSCOPE_API_KEY is not set")
    raise ValueError("DASHSCOPE_API_KEY is not set")
headers = {
    "Content-Type": "application/json",
    "Accept-Encoding": "utf-8",
    "Authorization": "Bearer " + DASHSCOPE_API_KEY,
}
service_url = (
   "https://dashscope.aliyuncs.com"
    + "/api/v1/indices/component/configed_transformations/spliter"
)
my_input = dict()
my_input["text"] = # 文档智能解析结果
my_input["file_type"] = "idp"
my_input["chunk_size"] = 512
my_input["overlap_size"] = 100

response = requests.post(
    service_url, data=json.dumps(my_input), headers=headers
)
response_text = response.json()



同步调用输出

{
  "chunkService": {
    "chunkResult": [
      {
        "chunk_id": 1,
        "content": "- 这是一段文本内容标题3- 这又是一段文本内容",
        "hier_title": "标题2|>标题3",
        "nid": "",
        "table_desc": "",
        "title": "标题2"
      }
    ]
  },
  "ret": 0,
  "session_id": 1703746646370
}