|
Parameter |
Type |
Description |
Example |
||||||
|---|---|---|---|---|---|---|---|---|---|
|
object |
The parameters for the AI agent template. |
||||||||
| Greeting |
string |
The welcome message the AI agent plays when joining the session. Changes apply to subsequent sessions. If omitted, no welcome message is played. |
你好 |
||||||
| WakeUpQuery |
string |
A user-provided command that the AI agent responds to immediately after the call starts. |
今天天气怎么样? |
||||||
| MaxIdleTime |
integer |
The maximum idle duration in seconds before the AI agent disconnects. If the agent receives no user interaction within this period, it ends the task. Default: 600. |
600 |
||||||
| UserOnlineTimeout |
integer |
The duration in seconds the AI agent waits for a user to join. If the user does not join within this time, the agent terminates the task. Default: 60. |
60 |
||||||
| UserOfflineTimeout |
integer |
The duration in seconds the AI agent waits before terminating the task after a user leaves the session. Default: 5. |
5 |
||||||
| EnablePushToTalk |
boolean |
Specifies whether to enable push-to-talk mode. Default: |
false |
||||||
| GracefulShutdown |
boolean |
Specifies whether to enable graceful shutdown. Default: If enabled, the AI agent completes its current utterance before disconnecting when the task is stopped. The agent will not speak for more than 10 seconds. |
false |
||||||
| Volume |
integer |
The speaking volume of the AI agent.
|
100 |
||||||
| WorkflowOverrideParams |
string |
A JSON string containing parameters to override the default workflow configuration. |
{} |
||||||
| AvatarUrl |
string |
The URL of the avatar to display during voice calls. If omitted, no avatar is displayed. |
http://example.com/a.jpg |
||||||
| AvatarUrlType |
string |
The type of the avatar URL. By default, this parameter is not set. |
USER |
||||||
| EnableIntelligentSegment |
boolean |
Specifies whether to enable intelligent segmentation. When enabled, short user utterances are merged into a single sentence. Default: |
true |
||||||
| AsrConfig |
object |
Configuration for automatic speech recognition (ASR). |
|||||||
| AsrLanguageId |
string |
The language for ASR. Valid values:
|
zh_mandarin |
||||||
| AsrMaxSilence |
integer |
The maximum duration of silence in milliseconds before the ASR engine finalizes an utterance. A pause longer than this value signals a sentence break. Range: 200–1200. Default: 400. |
400 |
||||||
| AsrHotWords |
array |
A list of hotwords to improve ASR accuracy. You can specify a maximum of 128 hotwords. |
|||||||
|
string |
A hotword string. Length: 1 to 10 characters. |
检查 |
|||||||
| VadLevel |
integer |
The Voice Activity Detection (VAD) threshold for interruptions. Range: 0–11. Default: 11.
|
11 |
||||||
| CustomParams |
string |
Passthrough parameters for proprietary ASR integrations. |
mode=fast&sample=16000&format=wav |
||||||
| VadDuration |
integer |
The minimum duration in milliseconds of continuous user speech required to trigger an interruption. This controls interruption sensitivity. A value of 0 disables this feature. Range: 200–2000. A common range is 200–500 ms, which typically corresponds to 1 to 4 Chinese characters. If omitted, this feature is disabled. |
300 |
||||||
| TtsConfig |
object |
Configuration for text-to-speech (TTS). |
|||||||
| VoiceId |
string |
The ID of the preset TTS voice. Changes apply to the next utterance. If omitted, the voice from the AI agent template is used. The ID can be a maximum of 64 characters. For available voices, see Intelligent Voice Samples. |
longcheng_v2 |
||||||
| VoiceIdList |
array |
A list of available voices. |
|||||||
|
string |
A voice. |
zhixiaoxia |
|||||||
| PronunciationRules |
array |
A list of TTS pronunciation rules, executed in order. You can specify a maximum of 20 rules. |
|||||||
|
object |
A TTS pronunciation rule. |
||||||||
| Word |
string |
The word to be replaced. It must be 1 to 9 Chinese characters long and cannot contain spaces. |
一一零 |
||||||
| Pronunciation |
string |
The replacement pronunciation. It must be 1 to 9 Chinese characters long and cannot contain spaces. |
幺幺零 |
||||||
| Type |
string |
The type of pronunciation rule. Valid value:
|
replacement |
||||||
| ModelId |
string |
This parameter applies only to the Minimax provider. Valid values:
|
speech-01-turbo |
||||||
| LanguageId |
string |
This parameter is for the minimax provider only. It enhances recognition for specific low-resource languages and dialects. If the language is unknown, set this to |
Chinese |
||||||
| Emotion |
string |
This parameter applies only to the Minimax provider. Supported emotions include:
|
happy |
||||||
| SpeechRate |
number |
The speech rate, where a value of 1.0 is normal speed. The supported range can vary by provider. For CosyVoice, the range is 0.5 to 2.0 (default: 1.0). For Minimax, the range is 0.5 to 2.0 (default: 1.0). |
1.0 |
||||||
| LlmConfig |
object |
Configuration for the large language model (LLM). |
|||||||
| LlmHistory |
array |
The conversation history context for the LLM/MLLM. |
|||||||
|
object |
A single conversational turn. |
||||||||
| Role |
string |
The role of the participant in the conversation. Valid values:
|
user |
||||||
| Content |
string |
The text content of the message from this role. |
你好 |
||||||
| LlmHistoryLimit |
integer |
The maximum number of recent conversational turns to include in the LLM/MLLM context. Default: 10. |
10 |
||||||
| LlmSystemPrompt |
string |
The system prompt for the LLM after the call starts. |
你是一位友好且乐于助人的助手,专注于为用户提供准确的信息和建议。 |
||||||
| BailianAppParams |
string |
Parameters for Alibaba Cloud Model Studio, provided as a JSON string. For the parameter format, see Alibaba Cloud Model Studio Parameters |
"{\"biz_params\":{\"user_defined_params\":{\"your_plugin_id\":{\"article_index\":2}}},\"memory_id\":\"your_memory_id\",\"image_list\":[\"https://your_image_url\"],\"rag_options\":{\"pipeline_ids\":[\"your_id\"],\"file_ids\":[\"文档ID1\",\"文档ID2\"],\"metadata_filter\":{\"name\":\"张三\"},\"structured_filter\":{\"key1\":\"value1\",\"key2\":\"value2\"},\"tags\":[\"标签1\",\"标签2\"]}}" |
||||||
| OpenAIExtraQuery |
string |
Additional query parameters for an OpenAI-compatible LLM. Parameters must be provided as a URL query string (e.g., |
api-version=2024-02-01&api-key=sk-xxx |
||||||
| LlmCompleteReply |
boolean |
When set to |
true |
||||||
| FunctionMap |
array |
Maps built-in agent functions to custom LLM functions. Currently, this only supports function calling for custom, OpenAI-compatible LLMs. |
|||||||
|
object |
A single mapping rule. |
||||||||
| Function |
string |
The name of a built-in function provided by the AI agent system. Currently, only |
hangup |
||||||
| MatchFunction |
string |
The name of the custom LLM function that maps to the agent's built-in function. For details on the custom LLM protocol, see LLM Standard Interface. |
hangup |
||||||
| OutputMinLength |
integer |
The minimum number of characters in a text chunk before it is sent to the TTS engine. Shorter chunks are buffered. Range: 0–100. A value of |
5 |
||||||
| OutputMaxDelay |
integer |
The maximum delay in milliseconds before buffered text is sent to the TTS engine, even if |
2000 |
||||||
| HistorySyncWithTTS |
boolean |
Specifies whether the LLM message history is synchronized with the content played by the TTS. Default: Note
When a user interrupts the agent, the |
false |
||||||
| AvatarConfig |
object |
Configuration for the avatar. This takes effect only if the workflow includes an avatar node. |
|||||||
| AvatarId |
string |
The model ID of the avatar. |
5257 |
||||||
| InterruptConfig |
object |
Configuration for the speech interruption policy. |
|||||||
| EnableVoiceInterrupt |
boolean |
Specifies whether to enable speech interruption. Default: |
true |
||||||
| InterruptWords |
array |
A list of specific words or phrases that trigger an interruption. |
|||||||
|
string |
A specific word or phrase that triggers an interruption. |
打断一下 |
|||||||
| NoInterruptMode |
string |
Specifies how to handle user speech that occurs during a non-interruptible section of the agent's utterance.
Default: |
cache |
||||||
| KeepInterruptWordsForLLM |
boolean |
Specifies whether to include the interrupt words in the text sent to the LLM. Default: Note
For example, if "hold on" is an interrupt word and the user says "hold on, what is the weather like today?", setting this to |
true |
||||||
| VoiceprintConfig |
object |
Configuration for voiceprint recognition. |
|||||||
| UseVoiceprint |
boolean |
Specifies whether to enable voiceprint recognition. Default: |
false |
||||||
| VoiceprintId |
string |
The unique identifier for the voiceprint. This is not set by default. The ID must correspond to a voiceprint registered using the voiceprint registration API. For more information, see Register a voiceprint. |
zhixiaoxia |
||||||
| RegistrationMode |
string |
The voiceprint registration mode. Default:
|
Explicit |
||||||
| TurnDetectionConfig |
object |
Configuration for conversational turn detection. |
|||||||
| TurnEndWords |
array |
A list of keywords used to determine the end of a user's conversational turn. |
|||||||
|
string |
A keyword used to determine the end of a user's conversational turn. |
我说完了 |
|||||||
| Mode |
string |
The conversational turn detection mode.
|
Semantic |
||||||
| SemanticWaitDuration |
integer |
The pause detection time in AI mode, in milliseconds. Default: -1.
Note
This parameter has no effect in |
-1 |
||||||
| Eagerness |
string |
Controls the agent's response speed after detecting a user pause. This parameter applies only in
This field is empty by default. |
High |
||||||
| ExperimentalConfig |
string |
Parameters for experimental features. Contact support for assistance. |
"" |
||||||
| VcrConfig |
object |
Configuration for video content recognition. This enables the system to send callbacks to the client about events detected in the video stream. |
|||||||
| StillFrameMotion |
object |
Configuration for still frame detection. |
|||||||
| Enabled |
boolean |
Specifies whether to enable still frame detection. Default: |
false |
||||||
| CallbackDelay |
integer |
The duration in milliseconds that a frame must remain still before a notification is sent. If not specified, the setting from the console is used. Range: 200–5000. |
3000 |
||||||
| InvalidFrameMotion |
object |
Configuration for invalid frame detection. |
|||||||
| Enabled |
boolean |
Specifies whether to enable invalid frame detection. |
false |
||||||
| CallbackDelay |
integer |
The duration in milliseconds that an invalid frame must persist before a notification is sent. If not specified, the setting from the console is used. Range: 200–5000. |
3000 |
||||||
| PeopleCount |
object |
Configuration for the people counting feature. |
|||||||
| Enabled |
boolean |
Specifies whether to enable people counting. Default: |
false |
||||||
| Equipment |
object |
Configuration for device identification. |
|||||||
| Enabled |
boolean |
Specifies whether to enable device identification. Default: |
false |
||||||
| HeadMotion |
object |
Configuration for head motion detection. |
|||||||
| Enabled |
boolean |
Specifies whether to enable head motion detection. Default: |
false |
||||||
| LookAway |
object |
Configuration for look-away detection. |
|||||||
| Enabled |
boolean |
Specifies whether to enable look-away detection. Default: |
true |
||||||
| AmbientSoundConfig |
object |
Configuration for ambient sound during the call. |
|||||||
| ResourceId |
string |
The ID of the ambient sound resource. You can obtain this ID from the advanced settings of the agent configuration in the console. |
f67901c595834************ |
||||||
| Volume |
integer |
The volume of the ambient sound. Range: 0–100. A value of 0 disables the sound. |
50 |
||||||
| AutoSpeechConfig |
object |
Configuration for the agent's automatic speech, including prompts for LLM latency and long periods of user silence. |
|||||||
| UserIdle |
object |
Configuration for prompts to play when the user is silent for an extended period. |
|||||||
| WaitTime |
integer |
The silence duration threshold in milliseconds. If the user is silent for longer than this period, a prompt is triggered. Range: 5000–600000. This is a required field. |
5000 |
||||||
| MaxRepeats |
integer |
The maximum number of times the prompt can be repeated. Range: 0–10. This is a required field. If the limit is exceeded, the call is terminated. |
5 |
||||||
| Messages |
array |
A collection of prompt messages. A maximum of 10 messages are supported, each up to 100 characters. The sum of all probabilities must be 100%. |
|||||||
|
object |
The structure of a prompt message. |
||||||||
| Text |
string |
The text of the prompt message, up to 100 characters. |
您还在吗? |
||||||
| Probability |
number |
The probability of this message being selected. Range: 0–1, corresponding to 0%–100%. |
0.5 |
||||||
| HangupEndWord |
string |
A farewell message played before hanging up due to user inactivity. |
|||||||
| LlmPending |
object |
Configuration for prompts to play during LLM response latency. |
|||||||
| WaitTime |
integer |
The wait time threshold for LLM responses. If the threshold is exceeded, a prompt is played. This is a required field. Unit: ms. Range: 500–10000. Set this value based on the actual performance of your LLM. |
3000 |
||||||
| Mode |
string |
The mode for handling LLM latency prompts. |
|||||||
| Messages |
array |
A collection of prompt messages. A maximum of 10 messages are supported, each up to 100 characters. The sum of all probabilities must be 100%. |
|||||||
|
object |
The structure of a prompt message. |
||||||||
| Text |
string |
The text of the prompt message, up to 100 characters. |
稍等一下 |
||||||
| Probability |
number |
The probability of this message being selected. Range: 0–1, corresponding to 0%–100%. |
0.5 |
||||||
| BackChannelingConfigs |
array |
Configuration for back-channeling. When enabled, the system plays short, responsive phrases at specific trigger points. |
|||||||
|
object |
A single back-channeling configuration. |
||||||||
| Enabled |
boolean |
Specifies whether to enable this back-channeling rule. This is a required field. |
true |
||||||
| TriggerStage |
string |
The trigger for the back-channeling. Valid value:
|
pause_detected |
||||||
| Probability |
number |
The trigger probability. Range: 0.0–1.0. This is a required field. |
0.5 |
||||||
| Words |
array |
A collection of acknowledgment phrases. You can specify a maximum of 10 phrases. Each phrase must be 20 characters or less, and the sum of their probabilities must be 1.0. |
|||||||
|
object |
Configuration for a responsive phrase. |
||||||||
| Text |
string |
短语文本,长度 ≤ 20 字符,支持多语言。必填。 |
嗯嗯 |
||||||
| Probability |
number |
本短语的触发概率,范围 0.0–1.0,必填。 |
0.3 |
||||||
| BackChannelingConfig |
array |
Important 已废弃,请使用 BackChannelingConfigs |
|||||||
|
object |
单个附和语配置 |
||||||||
| Enabled |
boolean |
是否启用附和功能。必填,取值 true/false。 |
true |
||||||
| TriggerStage |
string |
附和触发的时机。可选值:
|
pause_detected |
||||||
| Probability |
number |
功能触发概率。范围 0.0–1.0。必填。 |
0.5 |
||||||
| Words |
array |
附和短语集合。最大 10 条,每条短语长度 ≤ 20 字符,概率总和为 1.0。 |
|||||||
|
object |
附和短语配置 |
||||||||
| Text |
string |
短语文本,长度 ≤ 20 字符,支持多语言。必填。 |
嗯嗯 |
||||||
| Probability |
number |
本短语的触发概率,范围 0.0–1.0,必填。 |
0.3 |
该文章对您有帮助吗?