ChatWithKnowledgeBaseStream

更新时间:
复制 MD 格式

This service combines a knowledge base with a large model to provide intelligent Q&A. You can access the streaming interface using Server-Sent Events (SSE) or the Java asynchronous SDK.

Operation description

Use this API to retrieve answers from a large language model based on content from a specified knowledge base. You can customize the request by configuring various parameters, including the database instance ID, knowledge retrieval parameters, and model inference parameters. The API includes a default system prompt template, and you can also specify a custom one.

  • DBInstanceId: Required. The ID of the database instance.

  • KnowledgeParams: Optional. Parameters for knowledge retrieval, such as retrieval content and the merge policy.

  • ModelParams: Required. Parameters for model inference, such as the message list and the model name.

  • PromptTemplate: Optional. A custom system prompt template.

Try it now

Try this API in OpenAPI Explorer, no manual signing needed. Successful calls auto-generate SDK code matching your parameters. Download it with built-in credential security for local usage.

Test

RAM authorization

The table below describes the authorization required to call this API. You can define it in a Resource Access Management (RAM) policy. The table's columns are detailed below:

  • Action: The actions can be used in the Action element of RAM permission policy statements to grant permissions to perform the operation.

  • API: The API that you can call to perform the action.

  • Access level: The predefined level of access granted for each API. Valid values: create, list, get, update, and delete.

  • Resource type: The type of the resource that supports authorization to perform the action. It indicates if the action supports resource-level permission. The specified resource must be compatible with the action. Otherwise, the policy will be ineffective.

    • For APIs with resource-level permissions, required resource types are marked with an asterisk (*). Specify the corresponding Alibaba Cloud Resource Name (ARN) in the Resource element of the policy.

    • For APIs without resource-level permissions, it is shown as All Resources. Use an asterisk (*) in the Resource element of the policy.

  • Condition key: The condition keys defined by the service. The key allows for granular control, applying to either actions alone or actions associated with specific resources. In addition to service-specific condition keys, Alibaba Cloud provides a set of common condition keys applicable across all RAM-supported services.

  • Dependent action: The dependent actions required to run the action. To complete the action, the RAM user or the RAM role must have the permissions to perform all dependent actions.

Action

Access level

Resource type

Condition key

Dependent action

gpdb:ChatWithKnowledgeBaseStream

get

*DBInstance

acs:gpdb:{#regionId}:{#accountId}:dbinstance/{#DBInstanceId}

None None

Request syntax

POST / HTTP/1.1

Request parameters

Parameter

Type

Required

Description

Example

DBInstanceId

string

Yes

The instance ID.

Note

You can call the DescribeDBInstances operation to query the IDs of all AnalyticDB for PostgreSQL instances in a specified region.

gp-xxxxxxxxx

RegionId

string

Yes

The instance's region ID.

cn-hangzhou

KnowledgeParams

object

No

Parameters for knowledge retrieval. If omitted, the API performs a chat-only operation.

MergeMethod

string

No

Specifies the method for merging results from multiple knowledge bases. Default: RRF. Valid values:

  • RRF

  • Weight

"RRF"

MergeMethodArgs

object

No

The arguments for the result merging method.

Rrf

object

No

Parameters for the RRF merge method.

K

integer

No

The constant k used in the reciprocal rank fusion (RRF) formula 1/(k + rank_i). The value must be an integer greater than 1.

60

Weight

object

No

Parameters for the Weight merge method.

Weights

array

No

An array of weights for each SourceCollection.

number

No

The weight for a SourceCollection.

0.01

RerankFactor

number

No

Specifies the factor for reranking vector search results. The value must be greater than 1 and less than or equal to 5.

Note
  • Reranking may be inefficient if document chunks are sparse.

  • The number of items to rerank, calculated as ceil(TopK * RerankFactor), should not exceed 50.

5.0

RerankModel

object

No

The rerank model to use.

Name

string

No

The name of the rerank model.

qwen3-rerank

Instruct

string

No

An instruction for the rerank model.

Given a web search query, retrieve relevant passages that answer the query

SourceCollection

array<object>

Yes

An array of knowledge bases to search.

array<object>

No

An item in the SourceCollection array.

Collection

string

Yes

The name of the collection to search.

cloud_index_adb_50943_prod

Namespace

string

No

The namespace that contains the collection.

Note

You can call the ListNamespaces operation to view available namespaces.

ddstar_vector

NamespacePassword

string

Yes

The password for the specified namespace.

Note

This value is specified in the CreateNamespace operation.

namespacePassword

QueryParams

object

No

Parameters for the knowledge base query.

Filter

string

No

A filter expression to apply to the search, similar to a SQL WHERE clause.

method_id='e41695f0-2851-40ac-b21d-dd337b60d71c'

GraphEnhance

boolean

No

Specifies whether to enable knowledge graph enhancement. Default value: false.

true

GraphSearchArgs

object

No

The parameters for knowledge graph search.

GraphTopK

integer

No

The number of top entities and relationship edges to return. Default value: 60.

60

HybridSearch

string

No

Specifies the hybrid search algorithm. If omitted, the system performs a basic score comparison of vector search and full-text retrieval results.

Valid values:

  • RRF: Reciprocal rank fusion. Configure the k parameter in HybridSearchArgs.

  • Weight: Weighted score fusion. Use the alpha parameter in HybridSearchArgs to control the balance between vector and full-text search scores.

  • Cascaded: First performs full-text retrieval, then runs a vector search on the results.

Cascaded

HybridSearchArgs

object

No

The arguments for the specified hybrid search algorithm. Supports RRF and Weight.

  • RRF: Specifies the constant k in the score calculation formula 1/(k+rank_i). k must be an integer greater than 1. Format:

{ 
   "RRF": {
    "k": 60
   }
}
  • Weight: Calculates the final score using the formula alpha * vector_score + (1 - alpha) * text_score. The alpha parameter balances the scores, ranging from 0 (full-text only) to 1 (vector only). Format:

{ 
   "Weight": {
    "alpha": 0.5
   }
}

any

No

{"RRF":{"k":60}}

Metrics

string

No

The distance metric for vector search. Valid values:

  • l2: Euclidean distance.

  • ip: Inner product.

  • cosine: Cosine similarity.

cosine

RecallWindow

array

No

The recall window. Specifies a window of context to include around retrieved chunks. The value must be a two-element array [A, B], where -10 <= A <= 0 and 0 <= B <= 10.

Note
  • This parameter is useful when document chunks are small and a search might miss important surrounding context.

  • The window is applied after reranking.

integer

No

An integer that specifies a bound of the recall window. The first element of the array represents the number of chunks to include before the retrieved chunk, and the second element represents the number of chunks to include after.

Note
  • This parameter is recommended when document chunks are finely split and retrieval may miss important context.

  • The window is applied after reranking.

[-1,1]

RerankFactor

number

No

The rerank factor. If specified, the system reranks the results from the vector search. The value must be greater than 1 and less than or equal to 5.

Note
  • Reranking may be inefficient if document chunks are sparse.

  • The number of items to rerank, calculated as ceil(TopK * RerankFactor), should not exceed 50.

2.0

RerankModel

object

No

The rerank model to use.

Name

string

No

The name of the rerank model.

qwen3-rerank

Instruct

string

No

An instruction for the rerank model.

Given a web search query, retrieve relevant passages that answer the query

TopK

integer

No

The number of top results to return from this collection.

101

UseFullTextRetrieval

boolean

No

Specifies whether to use full-text retrieval for hybrid search. If false (the default), only vector search is performed.

true

TopK

integer

No

The total number of top results to return after merging results from all collections.

10

PromptParams

string

No

A template for the system prompt. It must include placeholders such as {{text_chunks}}, {{user_system_prompt}}, {{graph_entities}}, and {{graph_relations}}. If omitted, no custom prompt template is applied.

"参考以下知识回答问题:{{ text_chunks }}"

ModelParams

object

Yes

An object that contains parameters for the Large Language Model (LLM) call.

MaxTokens

integer

No

The maximum number of tokens to generate.

8192

Messages

array<object>

Yes

A list of messages in the conversation.

object

Yes

An object representing a single message.

Content

string

Yes

The content of the message.

你是一个有帮助的助手。

Role

string

Yes

The role of the message author. Valid values:

  • system

  • user

  • assistant

user

Model

string

Yes

The name of the Large Language Model to use. For a list of available models, refer to the Model Studio documentation.

qwen-plus

N

integer

No

The number of candidate responses to generate.

1

PresencePenalty

number

No

The presence penalty. A value between -2.0 and 2.0.

1.0

Seed

integer

No

The random seed for sampling.

42

Stop

array

No

A list of stop sequences.

string

No

A stop sequence.

"\n"

Temperature

number

No

The sampling temperature. A value between 0 and 2.

0.6

Tools

array<object>

No

A list of tools the model can call.

array<object>

No

The details of a tool.

Function

object

No

The function information.

Description

string

No

A description of the function tool.

获取天气。

Name

string

No

The name of the function tool.

get_weather

Parameters

any

No

The parameters of the function, described as a JSON Schema object.

{"type": "object", ...}

TopP

number

No

The nucleus sampling probability threshold. A value between 0 and 1.

0.9

IncludeKnowledgeBaseResults

boolean

No

Specifies whether to include the retrieved knowledge base results in the response. Default value: false.

false

Response elements

Element

Type

Description

Example

object

The response schema.

RequestId

string

The request ID.

ABB39CC3-4488-4857-905D-2E4A051D0521

MultiCollectionRecallResult

object

The retrieval results from multiple knowledge bases.

Entities

array

A list of retrieved entities.

string

The details of a retrieved entity.

{'entities': []}

Matches

array<object>

A list of retrieved matches.

array<object>

A retrieved match.

Content

string

The document content.

ADBPG向量数据库。

FileName

string

The file name.

a14b0221-e3f2-4cf2-96cd-b3c293510770.jpg

FileURL

string

The public URL of the retrieved image. By default, the URL is valid for 2 hours.

You can use the UrlExpiration parameter to specify a custom validity period.

http://dailyshort-sh.oss-cn-shanghai.aliyuncs.com/vod-8efba5/f06147795c6c71f080605420848d0302/0ca34d5743a84bf7c68f489a60715dac-ld.mp4

Id

string

The unique ID of the vector record.

Note

If this parameter is left empty, the database automatically generates a UUID. If you provide an ID that conflicts with an existing one, the existing record is updated with the data from the request.

273e3fc7-8f56-4167-a1bb-d35d2f3b9043

LoaderMetadata

any

Metadata from the document loader, captured during document ingestion.

{"page":1}

Metadata

object

The user-defined metadata.

any

{"update_time":"1754446789199","is_publish":"1"}

RerankScore

number

The rerank score.

0.12

RetrievalSource

integer

The source of the match. 1 indicates vector search, 2 indicates full-text search, and 3 indicates hybrid recall.

0.12

Score

number

The similarity score. The score is calculated based on the distance metric specified when the index was created (l2/ip/cosine).

10

Vector

array

The vector data.

number

A value in the vector.

[]

Relations

array

A list of relationship edges.

string

The details of a relationship edge.

{'relations': []}

RequestId

string

The request ID.

ABB39CC3-4488-4857-905D-2E4A051D0521

Status

string

The status of the API call. Valid values:

  • success: The call succeeded.

  • fail: The call failed.

success

Tokens

integer

The number of tokens consumed.

42

Usage

object

The number of tokens consumed for embedding.

EmbeddingTokens

integer

The number of tokens used for embedding.

Note

A token is the smallest unit created by splitting the input text. A token can be a unit such as a word, a phrase, a punctuation mark, or a character.

158

ChatCompletion

object

The model response.

Choices

array<object>

The streaming output content.

array<object>

An item in the streaming output.

FinishReason

string

The reason the model stopped generating output.

finish

Index

integer

The index of the choice.

0

Message

object

The response from the large language model (LLM).

Content

string

The message content.

杭州的天气是晴天。

Role

string

The role of the message author. Valid values:

  • system

  • user

  • assistant

user

ToolCalls

array<object>

The tool call responses.

array<object>

A tool call response.

Id

string

The ID of the tool call.

"chatcmpl-c1bebafa-cc48-44e2-88c6-1a3572952f8e"

Function

object

Details of the function that the model wants to call.

Arguments

string

The arguments for the function call, generated by the model in JSON format.

{"city":"hangzhou"}

Name

string

The name of the function to call.

"get_weather"

Index

integer

The index of the tool in the Input parameter of the request, starting from 0.

1

ReasoningContent

string

The model's chain of thought (CoT) content.

逻辑推导过程

Created

integer

The creation time, in Unix timestamp format.

1758529748

Id

string

The response ID.

273e3fc7-8f56-4167-a1bb-d35d2f3b9043

Model

string

The name of the model used.

qwen-plus

Usage

object

The token usage statistics for the completion.

CompletionTokens

integer

The number of tokens in the generated response.

42

PromptTokens

integer

The number of tokens in the input prompt.

42

PromptTokensDetails

object

Details about the prompt token usage.

CachedTokens

integer

The number of prompt tokens served from the cache.

24

TotalTokens

integer

The total number of tokens.

42

Message

string

The response message.

Successful

Status

string

The status of the request. Valid values:

  • success: The request succeeded.

  • fail: The request failed.

success

Examples

Success response

JSON format

{
  "RequestId": "ABB39CC3-4488-4857-905D-2E4A051D0521",
  "MultiCollectionRecallResult": {
    "Entities": [
      "{'entities': []}"
    ],
    "Matches": [
      {
        "Content": "ADBPG向量数据库。\n",
        "FileName": "a14b0221-e3f2-4cf2-96cd-b3c293510770.jpg",
        "FileURL": "http://dailyshort-sh.oss-cn-shanghai.aliyuncs.com/vod-8efba5/f06147795c6c71f080605420848d0302/0ca34d5743a84bf7c68f489a60715dac-ld.mp4",
        "Id": "273e3fc7-8f56-4167-a1bb-d35d2f3b9043",
        "LoaderMetadata": "{\"page\":1}\n",
        "Metadata": {
          "key": "{\"update_time\":\"1754446789199\",\"is_publish\":\"1\"}"
        },
        "RerankScore": 0.12,
        "RetrievalSource": 0.12,
        "Score": 10,
        "Vector": [
          0
        ]
      }
    ],
    "Relations": [
      "{'relations': []}"
    ],
    "RequestId": "ABB39CC3-4488-4857-905D-2E4A051D0521",
    "Status": "success",
    "Tokens": 42,
    "Usage": {
      "EmbeddingTokens": 158
    }
  },
  "ChatCompletion": {
    "Choices": [
      {
        "FinishReason": "finish",
        "Index": 0,
        "Message": {
          "Content": "杭州的天气是晴天。\n",
          "Role": "user",
          "ToolCalls": [
            {
              "Id": "\"chatcmpl-c1bebafa-cc48-44e2-88c6-1a3572952f8e\"\n",
              "Function": {
                "Arguments": "{\"city\":\"hangzhou\"}\n",
                "Name": "\"get_weather\"\n"
              },
              "Index": 1
            }
          ],
          "ReasoningContent": "逻辑推导过程"
        }
      }
    ],
    "Created": 1758529748,
    "Id": "273e3fc7-8f56-4167-a1bb-d35d2f3b9043\n",
    "Model": "qwen-plus\n",
    "Usage": {
      "CompletionTokens": 42,
      "PromptTokens": 42,
      "PromptTokensDetails": {
        "CachedTokens": 24
      },
      "TotalTokens": 42
    }
  },
  "Message": "Successful",
  "Status": "success"
}

Error codes

See Error Codes for a complete list.

Release notes

See Release Notes for a complete list.