|
Parameter |
Type |
Description |
Example |
|---|---|---|---|
|
object |
Parameters for the AI agent template. |
||
| Greeting |
string |
The welcome message. This change takes effect in the next call session. If this parameter is not set, no welcome message is played. |
你好 |
| EnableIntelligentSegment |
boolean |
Specifies whether to enable intelligent segmentation. If you enable this feature, short and consecutive speech segments from the user are merged into a complete sentence. Default value: |
true |
| AsrConfig |
object |
The automatic speech recognition (ASR) configurations. |
|
| AsrMaxSilence |
integer |
The sentence segmentation threshold. If the duration of a silence exceeds this threshold, the system determines that the sentence is complete. Valid values: 200 to 1200. Unit: ms. Default value: 400. |
400 |
| AsrLanguageId |
string |
The language ID for ASR. Valid values:
|
zh_mandarin |
| AsrHotWords |
array |
The list of hotwords for ASR. You can specify a maximum of 128 hotwords in the list. |
|
|
string |
The hotword string. The string can contain 1 to 10 characters in length. |
检查 |
|
| VadLevel |
integer |
The interruption threshold for voice activity detection (VAD). Valid values: 0 to 11. Default value: 11.
|
11 |
| CustomParams |
string |
The passthrough parameters for proprietary ASR. |
mode=fast&sample=16000&format=wav |
| VadDuration |
integer |
The minimum duration threshold for VAD. This parameter controls the interruption sensitivity. A value of 0 indicates that this feature is disabled. Valid values: 200 to 2000. Unit: ms. A value from 200 to 500 corresponds to 1 to 4 words. The default value is empty, which indicates that this parameter is not in effect. |
300 |
| LlmConfig |
object |
The configurations of the large language model (LLM). |
|
| LlmHistoryLimit |
integer |
The maximum number of conversational turns to retain in the history of the LLM or multimodal large language model (MLLM). Default value: 10. |
10 |
| LlmHistory |
array |
The conversation history of the LLM or MLLM. |
|
|
object |
A single conversational turn. |
||
| Role |
string |
The role of the participant in the conversation. Valid values:
|
user |
| Content |
string |
The text of the conversation content that records the specific expressions or responses of the role in the conversation. |
你好 |
| LlmSystemPrompt |
string |
The system prompt for the LLM after the call is initiated. |
你是一位友好且乐于助人的助手,专注于为用户提供准确的信息和建议。 |
| BailianAppParams |
string |
The parameters for Alibaba Cloud Model Studio. For more information about the parameter format, see Alibaba Cloud Model Studio parameters. |
"{\"biz_params\":{\"user_defined_params\":{\"your_plugin_id\":{\"article_index\":2}}},\"memory_id\":\"your_memory_id\",\"image_list\":[\"https://your_image_url\"],\"rag_options\":{\"pipeline_ids\":[\"your_id\"],\"file_ids\":[\"文档ID1\",\"文档ID2\"],\"metadata_filter\":{\"name\":\"张三\"},\"structured_filter\":{\"key1\":\"value1\",\"key2\":\"value2\"},\"tags\":[\"标签1\",\"标签2\"]}}" |
| OpenAIExtraQuery |
string |
The additional query parameters for an LLM that is compatible with the OpenAI protocol. The parameters must be in the key=value format. If you specify multiple parameters, separate them with ampersands ( |
api-version=2024-02-01&api-key=sk-xxx |
| LlmCompleteReply |
boolean |
If you enable this feature, the system sends the complete LLM-generated result to the client after the generation is complete. |
true |
| FunctionMap |
array |
The list of function mappings, which is used to map AI agent capabilities to LLM functions. This feature is supported only when function calls are used in custom LLMs that are compatible with the OpenAI protocol. |
|
|
object |
A single mapping rule. |
||
| Function |
string |
The name of the built-in function provided by the AI agent in Alibaba Cloud. The value hangup is supported. |
hangup |
| MatchFunction |
string |
The name of the LLM function that corresponds to this function. This parameter is customized and used to call the corresponding function in the LLM. For more information about the protocol for custom LLMs, see Standard LLM API. |
hangup |
| OutputMinLength |
integer |
The minimum length of text output. The unit is characters. Text shorter than this length is cached and waits for concatenation. Valid values: 0 to 100. A value of 0 or empty indicates that this parameter is not in effect. Default value: empty. |
5 |
| OutputMaxDelay |
string |
The maximum delay for text output. If this threshold is exceeded, the cached text is forcibly output. Valid values: 1000 to 10000. Unit: ms. A value of 0 or empty indicates that this parameter is not in effect. Default value: empty. |
2000 |
| HistorySyncWithTTS |
boolean |
Specifies whether to keep the LLM message history consistent with the TTS playback content. Default value: false. If you enable this feature, the saved LLM messages are consistent with the TTS playback content. |
false |
| TtsConfig |
object |
The text-to-speech (TTS) configurations. |
|
| VoiceId |
string |
The voice ID. The change takes effect on the next sentence. If you do not specify this parameter, the voice ID configured in the AI agent template is used. This parameter is valid only for preset TTS voices. The value can be up to 64 characters in length. For more information about the valid values, see Intelligent speech effect samples. |
longcheng_v2 |
| VoiceIdList |
array |
The list of available voices. |
|
|
string |
The voice ID. |
zhixiaoxia |
|
| PronunciationRules |
array |
The TTS pronunciation rules. You can specify a maximum of 20 rules in the array. The rules are executed in sequence. |
|
|
object |
The TTS pronunciation rule. |
||
| Word |
string |
The word to be replaced. The word must be a Chinese character string of up to 10 characters in length and cannot contain spaces. |
大栅栏 |
| Pronunciation |
string |
The target pronunciation. The pronunciation must be a Chinese character string of up to 10 characters in length and cannot contain spaces. |
大石烂儿 |
| Type |
string |
The type of the pronunciation rule. Valid value:
|
replacement |
| ModelId |
string |
Only MiniMax is supported. Valid values: |
speech-01-turbo |
| LanguageId |
string |
Only MiniMax is supported. The default value is empty. This parameter enhances the recognition of specific minority languages and dialects. After you set this parameter, the speech performance in the specified minority language or dialect scenarios is improved. If the minority language type is unknown, you can set this parameter to |
Chinese |
| Emotion |
string |
Only MiniMax is supported. The following seven emotions are supported:
|
happy |
| SpeechRate |
number |
This parameter is supported on all platforms. For CosyVoice, the default value is 1.0 and the valid values are 0.5 to 2.0. For MiniMax, the default value is 1.0 and the valid values are 0.5 to 2.0. |
1.0 |
| InterruptConfig |
object |
The speech interruption policy configurations. |
|
| EnableVoiceInterrupt |
boolean |
Specifies whether to support speech interruption. Default value: true. |
true |
| InterruptWords |
array |
The specific words or phrases that trigger a conversation interruption. |
|
|
string |
A specific word or phrase that triggers a conversation interruption. |
打断一下 |
|
| Eagerness |
string |
||
| NoInterruptMode |
string |
The ASR processing policy in
Default value: cache. |
cache |
| KeepInterruptWordsForLLM |
boolean |
true |
|
| TurnDetectionConfig |
object |
The configurations for conversational turn detection. |
|
| TurnEndWords |
array |
The list of keywords that are used to determine the end of a user's conversational turn. |
|
|
string |
A keyword that is used to determine the end of a user's conversational turn. |
我说完了 |
|
| Mode |
string |
The mode for conversational turn detection. Valid values:
Default value: |
Semantic |
| SemanticWaitDuration |
integer |
The pause duration in AI mode that is used to determine whether a conversational turn has ended. Unit: ms. Default value: -1.
Note: This parameter is invalid in Normal mode. |
-1 |
| Eagerness |
string |
Low |
|
| GreetingDelay |
integer |
The delay before the welcome message is played. Unit: ms. Default value: 0. Valid values: 0 to 5000. |
0 |
| AmbientSoundConfig |
object |
The configurations for ambient sound. |
|
| ResourceId |
string |
The ID of the ambient sound. You can obtain the ID from the advanced configurations of the AI agent on the console. |
f67901c595834************ |
| Volume |
integer |
The volume of the ambient sound. Valid values: 0 to 100. A value of 0 disables the sound. |
50 |
| ExperimentalConfig |
string |
The parameters for experimental features. If you have any requirements, contact technical support. |
"" |
| AutoSpeechConfig |
object |
The configurations for the automatic speech module of the AI agent, which includes prompts during LLM delays and inquiries during prolonged user silence. |
|
| UserIdle |
object |
The configurations for inquiry broadcasts during prolonged user silence. |
|
| WaitTime |
integer |
The silence duration threshold. This parameter is required. An inquiry is triggered if this threshold is exceeded. Unit: ms. Valid values: 5000 to 600000. |
5000 |
| MaxRepeats |
integer |
The maximum number of inquiries. This parameter is required. Valid values: 0 to 10. After the maximum number of inquiries is reached, no more inquiries are triggered, and the call is disconnected. |
5 |
| Messages |
array |
The collection of inquiry prompts. You can specify a maximum of 10 prompts. Each prompt can be up to 100 characters in length. The sum of the probabilities of all prompts must be 100%. |
|
|
object |
The structure of an inquiry word. |
||
| Text |
string |
The text of the inquiry prompt. The text can be up to 100 characters in length. |
您还在吗? |
| Probability |
number |
The selection probability of the prompt. Valid values: 0 to 1, which corresponds to 0% to 100%. |
0.5 |
| HangupEndWord |
string |
||
| LlmPending |
object |
The configurations for broadcasts during LLM response delays. |
|
| WaitTime |
integer |
The wait time threshold for LLM responses. This parameter is required. A broadcast prompt is triggered if this threshold is exceeded. Unit: ms. Valid values: 500 to 10000. You need to configure this parameter based on the actual usage of the LLM. |
3000 |
| Mode |
string |
||
| Messages |
array |
The collection of inquiry prompts. You can specify a maximum of 10 prompts. Each prompt can be up to 100 characters in length. The sum of the probabilities of all prompts must be 100%. |
|
|
object |
The structure of an inquiry word. |
||
| Text |
string |
The text of the inquiry prompt. The text can be up to 100 characters in length. |
稍等一下 |
| Probability |
number |
The selection probability of the prompt. Valid values: 0 to 1, which corresponds to 0% to 100%. |
0.5 |
| MaxIdleTime |
integer |
The maximum wait time for interaction with the AI agent. If the wait time is exceeded, the AI agent goes offline. Unit: seconds. Default value: 600. |
600 |
| BackChannelingConfig |
object |
Important This parameter is deprecated. Use BackChannelingConfigs instead. |
|
| Enabled |
boolean |
||
| TriggerStage |
string |
||
| Probability |
number |
||
| Words |
object |
||
| Text |
string |
||
| Probability |
number |
||
| BackChannelingConfigs |
array |
The configurations for the back-channeling feature module. If you enable this feature, the system randomly plays short and affirmative phrases at specific trigger points. |
|
|
object |
A single back-channeling configuration. |
||
| Enabled |
boolean |
Specifies whether to enable the back-channeling feature. This parameter is required. Valid values: true and false. |
true |
| TriggerStage |
string |
The trigger point for back-channeling. Valid value:
|
pause_detected |
| Probability |
number |
The trigger probability. This parameter is required. Valid values: 0.0 to 1.0. |
0.5 |
| Words |
array |
The collection of back-channeling phrases. You can specify a maximum of 10 phrases. Each phrase can be up to 20 characters in length. The sum of the probabilities of all phrases must be 1.0. |
|
|
object |
The configuration of a single back-channeling phrase. |
||
| Text |
string |
The text of the phrase. This parameter is required. The text can be up to 20 characters in length and supports multiple languages. |
嗯嗯 |
| Probability |
number |
The selection probability of this phrase. This parameter is required. Valid values: 0.0 to 1.0. |
0.3 |
该文章对您有帮助吗?