SplitDoc-文本切片及向量化

描述:进行文本切分和切块向量化

请求语法

POST /v3/openapi/apps/{app_group_identity}/actions/knowledge-split

:app_group_identity表示应用名称。

请求参数

SplitDoc

参数名

参数类型

描述

备注

title

String

数据标题

选填

content

String

处理数据内容

必填

use_embedding

Boolean

是否需要向量化:

  • true:是

  • false:否

不填则为false

model

String

需要使用的向量化模型

请求体示例

{
  "title":"测试标题",
  "content":"测试文本",
  "use_embedding":true,
}

返回参数

响应名

响应类型

描述

chunks

List<ChunkContext>

切片后的文本数据对象

ChunkContext

响应名

响应类型

描述

chunk_id

String

切片id

chunk

String

切片后的文本数据

embedding

String

向量化后的向量

type

String

文本类型:

文本类型:text,图片类型:image

img_url

String

若是图片类型数据,需要有图片的url

响应体示例

{
  "request_id":"111111111",
  "status":"OK";
  "errors":[],
  "result":[
  {
    "chunk_id":"1",
    "chunk":"测试切片文本1",
    "embedding":"-0.010441,-0.002826,-0.022911,0.000847,0.025610,0.019213,-0.019912,0.008210,0.011974,-0.010120,-0.003866,-0.008091,-0.006889,-0.034774,...-0.012572,0.009668,0.010963,-0.005273,-0.005072,-0.002190,-0.001554,-0.000058",
    "type":"text"
  },
  {
    "chunk_id":"2",
    "chunk":"测试切片文本2",
    "embedding":"-0.010441,-0.002826,-0.022911,0.000847,0.025610,0.019213,-0.019912,0.008210,0.011974,-0.010120,-0.003866,-0.008091,-0.006889,-0.034774,...-0.012572,0.009668,0.010963,-0.005273,-0.005072,-0.002190,-0.001554,-0.000058",
    "type":"image",
    "img_url":"http://127.0.0.1"
  },
  {
    "chunk_id":"3",
    "chunk":"测试切片文本3",
    "type":"text"
  }
]
}
说明

文本切片向量化后的向量维度为1536维。