ChatWithKnowledgeBaseStream-阿里云帮助中心

Provides AI chat services by combining knowledge bases with large language models. This is a streaming API called through SSE or the Java asynchronous SDK.

Operation description

This API allows you to interact with a large language model by using specified knowledge base collections to obtain answers based on knowledge base content. You can configure various parameters to customize requests, including but not limited to the database instance ID, knowledge retrieval parameters, and model inference parameters. A default system prompt template is provided, and you can also customize the system prompt.

DBInstanceId: Required. Specifies the database instance ID.
KnowledgeParams: Optional. Contains knowledge retrieval parameters such as retrieval content and merge strategy.
ModelParams: Required. Contains model inference parameters such as the message list and model name.
PromptTemplate: Optional. Specifies a custom system prompt template.

Try it now

Try this API in OpenAPI Explorer, no manual signing needed. Successful calls auto-generate SDK code matching your parameters. Download it with built-in credential security for local usage.

Test

RAM authorization

The table below describes the authorization required to call this API. You can define it in a Resource Access Management (RAM) policy. The table's columns are detailed below:

Action: The actions can be used in the Action element of RAM permission policy statements to grant permissions to perform the operation.
API: The API that you can call to perform the action.
Access level: The predefined level of access granted for each API. Valid values: create, list, get, update, and delete.
Resource type: The type of the resource that supports authorization to perform the action. It indicates if the action supports resource-level permission. The specified resource must be compatible with the action. Otherwise, the policy will be ineffective.
- For APIs with resource-level permissions, required resource types are marked with an asterisk (*). Specify the corresponding Alibaba Cloud Resource Name (ARN) in the Resource element of the policy.
- For APIs without resource-level permissions, it is shown as All Resources. Use an asterisk (*) in the Resource element of the policy.
Condition key: The condition keys defined by the service. The key allows for granular control, applying to either actions alone or actions associated with specific resources. In addition to service-specific condition keys, Alibaba Cloud provides a set of common condition keys applicable across all RAM-supported services.
Dependent action: The dependent actions required to run the action. To complete the action, the RAM user or the RAM role must have the permissions to perform all dependent actions.

Action

Access level

Resource type

Condition key

Dependent action

gpdb:ChatWithKnowledgeBaseStream

get

*DBInstance

acs:gpdb:{#regionId}:{#accountId}:dbinstance/{#DBInstanceId}

None

Request syntax

POST / HTTP/1.1

Request parameters

Parameter	Type	Required	Description	Example
DBInstanceId	string	Yes	The instance ID. Note You can call the DescribeDBInstances operation to query the IDs of all AnalyticDB for PostgreSQL instances in a region.	gp-xxxxxxxxx
RegionId	string	Yes	The ID of the region where the instance resides.	cn-hangzhou
KnowledgeParams	object	No	The knowledge retrieval parameter object. If this parameter is not specified, only chat is performed.
MergeMethod	string	No	The method for merging results from multiple knowledge bases. Default value: RRF. Valid values: RRF Weight.	"RRF"
MergeMethodArgs	object	No	The parameters for merging results from multiple knowledge bases.
Rrf	object	No	The configurable parameters when MergeMethod is set to RRF.
K	integer	No	The constant `k` used in the reciprocal rank fusion (RRF) formula `1/(k + rank_i)`. The value must be an integer greater than 1.	60
Weight	object	No	The configurable parameters when MergeMethod is set to Weight.
Weights	array	No	An array of weights for each `SourceCollection`.
	number	No	The weight for a `SourceCollection`.	0.01
RerankFactor	number	No	The reranking factor. If this value is not empty, the AISearch retrieve results are reranked. Valid values: 1 < RerankFactor <= 5. Note Reranking is slow when document chunks are sparse. The recommended number of reranked items (TopK × Factor, rounded up) should not exceed 50.	5.0
RerankModel	object	No	The reranking model parameters for performing an additional reranking on the merged results from multiple retrieval paths.
Name	string	No	The reranking model name. Valid values: qwen3-rerank, gte-rerank-v2.	qwen3-rerank
Instruct	string	No	This parameter can be set when RerankModel.Name is set to qwen3-rerank. Specifies a custom ranking task type description to guide the model to adopt different ranking strategies.	Given a web search query, retrieve relevant passages that answer the query
SourceCollection	array<object>	Yes	The knowledge base.
	array<object>	No
Collection	string	Yes	The name of the collection to recall.	cloud_index_adb_50943_prod
Namespace	string	No	The namespace. Note You can call the ListNamespaces operation to query the list.	ddstar_vector
NamespacePassword	string	Yes	The password of the namespace. Note This value is specified in the CreateNamespace operation.	namespacePassword
QueryParams	object	No	The parameters related to retrieval from this knowledge base.
Filter	string	No	A filter expression to apply to the search, similar to a SQL `WHERE` clause.	method_id='e41695f0-2851-40ac-b21d-dd337b60d71c'
GraphEnhance	boolean	No	Specifies whether to enable knowledge graph enhancement. Default value: `false`.	true
GraphSearchArgs	object	No	The parameters for knowledge graph search.
GraphTopK	integer	No	The number of top entities and relationship edges to return. Default value: `60`.	60
HybridSearch	string	No	Specifies the hybrid search algorithm. If omitted, the system performs a basic score comparison of vector search and full-text retrieval results. Valid values: `RRF`: Reciprocal rank fusion. Configure the `k` parameter in `HybridSearchArgs`. `Weight`: Weighted score fusion. Use the `alpha` parameter in `HybridSearchArgs` to control the balance between vector and full-text search scores. `Cascaded`: First performs full-text retrieval, then runs a vector search on the results.	Cascaded
HybridSearchArgs	object	No	The arguments for the specified hybrid search algorithm. Supports `RRF` and `Weight`. `RRF`: Specifies the constant `k` in the score calculation formula `1/(k+rank_i)`. `k` must be an integer greater than 1. Format: `{ "RRF": { "k": 60 } }` `Weight`: Calculates the final score using the formula `alpha * vector_score + (1 - alpha) * text_score`. The `alpha` parameter balances the scores, ranging from 0 (full-text only) to 1 (vector only). Format: `{ "Weight": { "alpha": 0.5 } }`
	any	No	Parameter values for the dual-path retrieval algorithm	{"RRF":{"k":60}}
Metrics	string	No	The distance metric for vector search. Valid values: `l2`: Euclidean distance. `ip`: Inner product. `cosine`: Cosine similarity.	cosine
RecallWindow	array	No	The recall window. Specifies a window of context to include around retrieved chunks. The value must be a two-element array `[A, B]`, where -10 <= A <= 0 and 0 <= B <= 10. Note This parameter is useful when document chunks are small and a search might miss important surrounding context. The window is applied after reranking.
	integer	No	An integer that specifies a bound of the recall window. The first element of the array represents the number of chunks to include before the retrieved chunk, and the second element represents the number of chunks to include after. Note This parameter is recommended when document chunks are finely split and retrieval may miss important context. The window is applied after reranking.	[-1,1]
RerankFactor	number	No	The rerank factor. If specified, the system reranks the results from the vector search. The value must be greater than 1 and less than or equal to 5. Note Reranking may be inefficient if document chunks are sparse. The number of items to rerank, calculated as `ceil(TopK * RerankFactor)`, should not exceed 50.	2.0
RerankModel	object	No	The rerank model to use.
Name	string	No	The name of the rerank model.	qwen3-rerank
Instruct	string	No	An instruction for the rerank model.	Given a web search query, retrieve relevant passages that answer the query
RerankMetadataFields	string	No
TopK	integer	No	The number of top results to return from this collection.	101
UseFullTextRetrieval	boolean	No	Specifies whether to use full-text retrieval for hybrid search. If `false` (the default), only vector search is performed.	true
TopK	integer	No	The number of top results to return after merging recall results from multiple vector collections.	10
PromptParams	string	No	The system prompt template. The template must include {{ text_chunks }}, {{ user_system_prompt }}, {{ graph_entities }}, and {{ graph_relations }}. If not specified, this part does not take effect.	"Answer the question based on the following knowledge: {{ text_chunks }}"
ModelParams	object	Yes	The large language model (LLM) invocation parameter object.
MaxTokens	integer	No	The maximum number of tokens to generate.	8192
Messages	array<object>	Yes	The message list.
	object	Yes	The message list.
Content	string	Yes	The message content.	You are a helpful assistant.
Role	string	Yes	The message role. Valid values: system user assistant.	user
Model	string	Yes	The name of the large language model to use. For valid values, see Model Studio documentation.	qwen-plus
N	integer	No	The number of candidate responses to generate.	1
PresencePenalty	number	No	The presence penalty coefficient. Valid values: -2.0 to 2.0.	1.0
Seed	integer	No	The random seed.	42
Stop	array	No	The list of stop words.
	string	No	The stop word.	"\n"
Temperature	number	No	The sampling temperature. Valid values: 0 to 2.	0.6
Tools	array<object>	No	The tool list.
	array<object>	No	The tool details.
Function	object	No	The function information.
Description	string	No	A description of the function tool.	Get weather.
Name	string	No	The name of the function tool.	get_weather
Parameters	any	No	The parameters of the function, described as a JSON Schema object.	{"type": "object", ...}
TopP	number	No	The nucleus sampling probability threshold. Valid values: 0 to 1.	0.9
IncludeKnowledgeBaseResults	boolean	No	Specifies whether to return recall results. Default value: false.	false

Response elements

Element	Type	Description	Example
	object	Schema of Response
RequestId	string	The request ID.	ABB39CC3-4488-4857-905D-2E4A051D0521
MultiCollectionRecallResult	object	The recall information from multiple knowledge bases.
Entities	array	The entity details.
	string	The entity details.	{'entities': []}
Matches	array<object>	The recall items.
	array<object>	The recall items.
Content	string	The document content.	AnalyticDB PostgreSQL vector database.
FileName	string	The file name.	a14b0221-e3f2-4cf2-96cd-b3c293510770.jpg
FileURL	string	The public URL of the retrieved image. By default, the URL is valid for 2 hours. You can use the `UrlExpiration` parameter to specify a custom validity period.	http://dailyshort-sh.oss-cn-shanghai.aliyuncs.com/vod-8efba5/f06147795c6c71f080605420848d0302/0ca34d5743a84bf7c68f489a60715dac-ld.mp4
Id	string	The unique ID of the vector record. Note If this parameter is left empty, the database automatically generates a UUID. If you provide an ID that conflicts with an existing one, the existing record is updated with the data from the request.	273e3fc7-8f56-4167-a1bb-d35d2f3b9043
LoaderMetadata	any	Metadata from the document loader, captured during document ingestion.	{"page":1}
Metadata	object	The user-defined metadata.
	any	Metadata value	{"update_time":"1754446789199","is_publish":"1"}
RerankScore	number	The rerank score.	0.12
RetrievalSource	integer	The source of the match. `1` indicates vector search, `2` indicates full-text search, and `3` indicates hybrid recall.	0.12
Score	number	The similarity score. The score is calculated based on the distance metric specified when the index was created (`l2/ip/cosine`).	10
Vector	array	The vector data.
	number	A value in the vector.	[]
Relations	array	The relation names.
	string	The relationship edge details.	{'relations': []}
RequestId	string	The request ID.	ABB39CC3-4488-4857-905D-2E4A051D0521
Status	string	The API execution status. Valid values: success: The execution is successful. fail: The execution failed.	success
Tokens	integer	The number of tokens consumed.	42
Usage	object	The number of tokens or entries consumed by document understanding or embedding.
EmbeddingTokens	integer	The number of tokens used for embedding. Note A token is the smallest unit created by splitting the input text. A token can be a unit such as a word, a phrase, a punctuation mark, or a character.	158
ChatCompletion	object	The model response.
Choices	array<object>	The text content generated in real time.
	array<object>	The text content generated in real time.
FinishReason	string	The reason the model stopped generating output.	finish
Index	integer	The index of the choice.	0
Message	object	The response from the large language model (LLM).
Content	string	The message content.	The weather in Hangzhou is sunny.
Role	string	The role of the message author. Valid values: `system` `user` `assistant`	user
ToolCalls	array<object>	The tool call responses.
	array<object>	A tool call response.
Id	string	The ID of the tool call.	"chatcmpl-c1bebafa-cc48-44e2-88c6-1a3572952f8e"
Function	object	Details of the function that the model wants to call.
Arguments	string	The arguments for the function call, generated by the model in JSON format.	{"city":"hangzhou"}
Name	string	The name of the function to call.	"get_weather"
Index	integer	The index of the tool in the `Input` parameter of the request, starting from 0.	1
ReasoningContent	string	The model's chain of thought (CoT) content.	Logical reasoning process
Created	integer	The creation time.	1758529748
Id	string	The response ID.	273e3fc7-8f56-4167-a1bb-d35d2f3b9043
Model	string	The name of the model used.	qwen-plus
Usage	object	The number of tokens used by the large language model output.
CompletionTokens	integer	The number of tokens in the generated response.	42
PromptTokens	integer	The number of tokens in the input prompt.	42
PromptTokensDetails	object	Details about the prompt token usage.
CachedTokens	integer	The number of prompt tokens served from the cache.	24
TotalTokens	integer	The total number of tokens.	42
Message	string	The response message.	Successful
Status	string	The status. Valid values: success: Successful. fail: Failed.	success

Examples

Success response

JSON format

{
  "RequestId": "ABB39CC3-4488-4857-905D-2E4A051D0521",
  "MultiCollectionRecallResult": {
    "Entities": [
      "{'entities': []}"
    ],
    "Matches": [
      {
        "Content": "AnalyticDB PostgreSQL vector database.",
        "FileName": "a14b0221-e3f2-4cf2-96cd-b3c293510770.jpg",
        "FileURL": "http://dailyshort-sh.oss-cn-shanghai.aliyuncs.com/vod-8efba5/f06147795c6c71f080605420848d0302/0ca34d5743a84bf7c68f489a60715dac-ld.mp4",
        "Id": "273e3fc7-8f56-4167-a1bb-d35d2f3b9043",
        "LoaderMetadata": "{\"page\":1}\n",
        "Metadata": {
          "key": "{\"update_time\":\"1754446789199\",\"is_publish\":\"1\"}"
        },
        "RerankScore": 0.12,
        "RetrievalSource": 0.12,
        "Score": 10,
        "Vector": [
          0
        ]
      }
    ],
    "Relations": [
      "{'relations': []}"
    ],
    "RequestId": "ABB39CC3-4488-4857-905D-2E4A051D0521",
    "Status": "success",
    "Tokens": 42,
    "Usage": {
      "EmbeddingTokens": 158
    }
  },
  "ChatCompletion": {
    "Choices": [
      {
        "FinishReason": "finish",
        "Index": 0,
        "Message": {
          "Content": "The weather in Hangzhou is sunny.",
          "Role": "user",
          "ToolCalls": [
            {
              "Id": "\"chatcmpl-c1bebafa-cc48-44e2-88c6-1a3572952f8e\"\n",
              "Function": {
                "Arguments": "{\"city\":\"hangzhou\"}\n",
                "Name": "\"get_weather\"\n"
              },
              "Index": 1
            }
          ],
          "ReasoningContent": "Logical reasoning process"
        }
      }
    ],
    "Created": 1758529748,
    "Id": "273e3fc7-8f56-4167-a1bb-d35d2f3b9043\n",
    "Model": "qwen-plus\n",
    "Usage": {
      "CompletionTokens": 42,
      "PromptTokens": 42,
      "PromptTokensDetails": {
        "CachedTokens": 24
      },
      "TotalTokens": 42
    }
  },
  "Message": "Successful",
  "Status": "success"
}

Error codes

See Error Codes for a complete list.

Release notes

See Release Notes for a complete list.