|
header object (Required)
Properties
action string (Required)
The command type. Fixed at run-task.
task_id string (Required)
A UUID-format task ID generated by the client. Subsequent events are correlated by this ID.
streaming string (Required)
|
{
"header": {
"action": "run-task",
"task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
"streaming": "duplex"
},
"payload": {
"task_group": "audio",
"task": "asr",
"function": "recognition",
"model": "fun-asr-realtime",
"parameters": {
"format": "pcm",
"sample_rate": 16000
},
"input": {}
}
}
|
|
payload object (Required)
Properties
task_group string (Required)
The task group. Fixed at audio.
task string (Required)
The task type. Fixed at asr.
function string (Required)
The function type. Fixed at recognition.
parameters object (Required)
Speech recognition parameters.
Properties
format string (Required)
The audio format.
Valid values:
-
pcm
-
wav
-
mp3
-
opus
-
speex
-
aac
-
amr
sample_rate integer (Required)
The sample rate, in Hz.
Valid values:
vocabulary_id string (Optional)
The custom vocabulary (hotword) list ID.
language_hints array[string] (Optional)
The spoken language of the audio. No default; the model auto-detects the language when this parameter is omitted.
Only the first value in the array is used; additional values are ignored.
Valid values:
-
zh: Chinese
-
en: English
-
ja: Japanese
semantic_punctuation_enabled boolean (Optional)
Whether to enable semantic punctuation.
Default: false.
Semantic punctuation provides higher accuracy and is suitable for meeting transcription. VAD (Voice Activity Detection)-based segmentation has lower latency and is suitable for conversational scenarios.
max_sentence_silence integer (Optional)
The VAD silence threshold, in milliseconds. When the silence after a speech segment exceeds this threshold, the system marks the sentence as ended.
Default: 1300.
Valid range: [200, 6000].
multi_threshold_mode_enabled boolean (Optional)
Whether to enable multi-threshold mode. When enabled, prevents VAD from producing overly long segments.
Default: false.
heartbeat boolean (Optional)
Whether to enable heartbeat packets.
Default: false.
-
true: Keeps the connection alive even when the client continuously sends silent audio.
-
false (default): The connection times out and disconnects after 60 seconds, even when the client continuously sends silent audio.
speech_noise_threshold float (Optional)
Important
Only Fun-ASR supports this parameter.
The speech-noise discrimination threshold. Adjusts VAD sensitivity.
Valid range: [-1.0, 1.0].
Behavior:
-
Values closer to -1 lower the noise threshold. Noise is more likely to be classified as speech, which can cause additional noise to be transcribed.
-
Values closer to +1 raise the noise threshold. Speech is more likely to be classified as noise, which can cause some speech to be filtered out.
This is an advanced parameter, and changes can significantly affect recognition quality. Recommendations:
|