Fun-ASR client events

更新时间:
复制 MD 格式

Client events are WebSocket commands that the client sends to the Fun-ASR real-time speech recognition service: run-task starts a recognition task, and finish-task ends it. Each section below describes the message structure and field semantics of one event.

User guide: For model details and selection guidance, see Speech-to-text.

Event sequence: For the event interaction diagram, see WebSocket API.

run-task

Description: Starts a speech recognition task. Sets the model, audio format, sample rate, and other parameters.

When to send: Immediately after the WebSocket connection is established.

Response: The client can send audio only after the service returns a task-started event.

header object (Required)

Properties

action string (Required)

The command type. Fixed at run-task.

task_id string (Required)

A UUID-format task ID generated by the client. Subsequent events are correlated by this ID.

streaming string (Required)

Fixed at duplex.

{
    "header": {
        "action": "run-task",
        "task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
        "streaming": "duplex"
    },
    "payload": {
        "task_group": "audio",
        "task": "asr",
        "function": "recognition",
        "model": "fun-asr-realtime",
        "parameters": {
            "format": "pcm",
            "sample_rate": 16000
        },
        "input": {}
    }
}

payload object (Required)

Properties

task_group string (Required)

The task group. Fixed at audio.

task string (Required)

The task type. Fixed at asr.

function string (Required)

The function type. Fixed at recognition.

model string (Required)

The model name.

input object (Required)

Fixed at {}.

parameters object (Required)

Speech recognition parameters.

Properties

format string (Required)

The audio format.

Valid values:

  • pcm

  • wav

  • mp3

  • opus

  • speex

  • aac

  • amr

sample_rate integer (Required)

The sample rate, in Hz.

Valid values:

  • 8 kHz models support only 8000 Hz; all other models support any sample rate.

vocabulary_id string (Optional)

The custom vocabulary (hotword) list ID.

language_hints array[string] (Optional)

The spoken language of the audio. No default; the model auto-detects the language when this parameter is omitted.

Only the first value in the array is used; additional values are ignored.

Valid values:

  • zh: Chinese

  • en: English

  • ja: Japanese

semantic_punctuation_enabled boolean (Optional)

Whether to enable semantic punctuation.

Default: false.

  • true: Enables semantic punctuation and disables VAD-based segmentation.

  • false (default): Enables VAD-based segmentation and disables semantic punctuation.

Semantic punctuation provides higher accuracy and is suitable for meeting transcription. VAD (Voice Activity Detection)-based segmentation has lower latency and is suitable for conversational scenarios.

max_sentence_silence integer (Optional)

Important
  • Effective only when semantic_punctuation_enabled is false.

The VAD silence threshold, in milliseconds. When the silence after a speech segment exceeds this threshold, the system marks the sentence as ended.

Default: 1300.

Valid range: [200, 6000].

multi_threshold_mode_enabled boolean (Optional)

Important
  • Effective only when semantic_punctuation_enabled is false.

Whether to enable multi-threshold mode. When enabled, prevents VAD from producing overly long segments.

Default: false.

heartbeat boolean (Optional)

Whether to enable heartbeat packets.

Default: false.

  • true: Keeps the connection alive even when the client continuously sends silent audio.

  • false (default): The connection times out and disconnects after 60 seconds, even when the client continuously sends silent audio.

speech_noise_threshold float (Optional)

Important

Only Fun-ASR supports this parameter.

The speech-noise discrimination threshold. Adjusts VAD sensitivity.

Valid range: [-1.0, 1.0].

Behavior:

  • Values closer to -1 lower the noise threshold. Noise is more likely to be classified as speech, which can cause additional noise to be transcribed.

  • Values closer to +1 raise the noise threshold. Speech is more likely to be classified as noise, which can cause some speech to be filtered out.

This is an advanced parameter, and changes can significantly affect recognition quality. Recommendations:

  • Test thoroughly to verify the effect before applying changes.

  • Adjust in small increments based on the actual audio environment (suggested step: 0.1).

finish-task

Description: Notifies the service that all audio has been sent and requests that the service finish the task.

When to send: After all audio data has been sent.

Response: The service returns a task-finished event.

header object (Required)

Properties

action string (Required)

The command type. Fixed at finish-task.

task_id string (Required)

A UUID-format task ID generated by the client. Must match the task_id used in the corresponding run-task event.

streaming string (Required)

Fixed at duplex.

{
    "header": {
        "action": "finish-task",
        "task_id": "2bf83b9a-baeb-4fda-8d9a-xxxxxxxxxxxx",
        "streaming": "duplex"
    },
    "payload": {
        "input": {}
    }
}

payload object (Required)

Properties

input object (Required)

Fixed at {}.