Realtime API的服务端事件-大模型服务平台百炼(Model Studio)-阿里云帮助中心

本文介绍 Qwen-Omni-Realtime API 的服务端事件。

相关文档：实时多模态。

error

服务端返回的错误信息。

event_id string

本次事件唯一标识符。

{
  "event_id": "event_RoUu4T8yExPMI37GKwaOC",
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "code": "invalid_value",
    "message": "Invalid modalities: ['audio']. Supported combinations are: ['text'] and ['audio', 'text'].",
    "param": "session.modalities"
  }
}

type string

事件类型，固定为error。

error object

错误的详细信息。

属性

type string

错误类型。

code string

错误码。

message string

错误信息。

param string

与错误相关的参数，如session.modalities。

session.created

客户端连接后，服务端返回的第一个事件，包含本次连接的默认配置信息。

event_id string

本次事件唯一标识符。

{
    "event_id": "event_RdvlSpbBb2ssyBjYrDHjt",
    "type": "session.created",
    "session": {
        "object": "realtime.session",
        "model": "qwen3-omni-flash-realtime",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Cherry",
        "input_audio_format": "pcm16",
        "output_audio_format": "pcm24",
        "input_audio_transcription": {
            "model": "gummy-realtime-v1"
        },
        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.5,
            "prefix_padding_ms": 300,
            "silence_duration_ms": 800,
            "create_response": true,
            "interrupt_response": true
        },
        "tools": [],
        "tool_choice": "auto",
        "temperature": 0.8,
        "id": "sess_Ov7GOXoNXhNjlxXtOGKQS"
    }
}

type string

事件类型，固定为session.created。

session object

会话的配置信息。

属性

object string

固定为realtime.session。

model string

使用的模型。

modalities array

模型输出模态设置。

voice string

模型生成音频的音色。

input_audio_format string

输入音频的格式，固定为pcm16。

output_audio_format string

模型输出音频的格式：

Qwen3-Omni-Flash-Realtime：仅支持设置为pcm24
Qwen-Omni-Turbo-Realtime：仅支持设置为 pcm16

input_audio_transcription object

语音转录的配置。

属性

model string

语音转录模型，固定为gummy-realtime-v1。

turn_detection object

语音活动检测（VAD）的配置。

属性

type string

服务端VAD类型，固定为server_vad。

threshold float

VAD检测阈值。

silence_duration_ms integer

检测语音停止的静音持续时间。

temperature float

模型的温度参数。

session.updated

收到用户的 session.update 请求后，若处理成功，则返回此事件；若出错，则返回 error 事件。

event_id string

本次事件唯一标识符。

{
    "event_id": "event_X1HsXS4b4uptp6yo1LgKd",
    "type": "session.updated",
    "session": {
        "id": "sess_Aih6vAcY5Ddt6jwFx1tCa",
        "object": "realtime.session",
        "model": "qwen3-omni-flash-realtime",
        "modalities": [
            "text",
            "audio"
        ],
        "instructions": "你是个人助理小云，请你准确且友好地解答用户的问题，始终以乐于助人的态度回应。",
        "voice": "Cherry",
        "input_audio_format": "pcm16",
        "output_audio_format": "pcm24",
        "input_audio_transcription": {
            "model": "gummy-realtime-v1"
        },
        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.1,
            "prefix_padding_ms": 500,
            "silence_duration_ms": 900,
            "create_response": true,
            "interrupt_response": true
        },
        "temperature": 0.8,
        "max_response_output_token": "inf",
        "max_tokens": 16384,
        "repetition_penalty": 1.05,
        "presence_penalty": 0.0,
        "top_k": 50,
        "top_p": 1.0,
        "seed":-1
    }
}

type string

事件类型，固定为session.updated。

session object

会话的配置信息。

属性

temperature float

模型的温度参数。

modalities array

模型输出模态设置。

voice string

模型生成音频的音色。

instructions string

模型的目标与角色。

input_audio_format string

输入音频的格式，固定为pcm16。

output_audio_format string

输出音频的格式，固定为pcm24。

input_audio_transcription object

语音转录的配置。

属性

model string

语音转录模型，固定为gummy-realtime-v1。

turn_detection object

语音活动检测（VAD）的配置。

属性

type string

服务端VAD类型，固定为server_vad。

threshold float

VAD检测阈值。

silence_duration_ms integer

检测语音停止的静音持续时间。

top_pfloat

核采样的概率阈值。

top_k integer

模型生成过程中，采样候选集的大小。

max_tokens integer

模型在本次请求返回的最大 Token 数。

repetition_penalty float

控制模型生成时，连续序列中的重复度。

presence_penalty float

控制模型在生成内容时的重复度。

seed integer

模型在每次请求时，运行结果一致性程度。

input_audio_buffer.speech_started

在 VAD 模式下，当服务端在音频缓冲区中检测到语音开始时，会返回此事件。

若服务端尚未检测到语音，则每次向缓冲区添加音频时都可能触发此事件。

event_id string

本次事件唯一标识符。

{
    "event_id": "event_Pvp8nEhsQuGCQbFJ9x58n",
    "type": "input_audio_buffer.speech_started",
    "audio_start_ms": 3647,
    "item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}

type string

事件类型，固定为input_audio_buffer.speech_started。

audio_start_ms integer

从音频开始写入缓冲区到首次检测到语音所经过的毫秒数。

item_id string

语音停止时将创建的用户消息项的 ID。

用户消息项用于将用户输入追加到对话历史，供模型后续推理与生成使用。

input_audio_buffer.speech_stopped

在 VAD 模式下，当音频缓冲区中检测到语音结束时，服务端会返回此事件。

同时，服务端还会返回一个 conversation.item.created 事件，以创建对应的用户消息项。

event_id string

本次事件唯一标识符。

{
    "event_id": "event_UhQiqNVRsgUiq4KUS5Xb5",
    "type": "input_audio_buffer.speech_stopped",
    "audio_end_ms": 4453,
    "item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}

type string

事件类型，固定为input_audio_buffer.speech_stopped。

audio_end_ms integer

语音停止时刻距会话开始经过的毫秒数。

item_id string

将创建的用户消息项的 ID。

input_audio_buffer.committed

当输入音频缓冲区被提交时返回此事件。

在VAD模式下，当检测到用户说话结束时，服务端会自动提交音频缓冲区并返回此事件。
在 Manual 模式下，当客户端发送input_audio_buffer.commit事件后，服务端返回此事件。

event_id string

本次事件唯一标识符。

{
    "event_id": "event_Iy6sUzL1nmdFgshFYxJEz",
    "type": "input_audio_buffer.committed",
    "item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}

type string

事件类型，固定为input_audio_buffer.committed。

item_id string

将创建的用户消息项的 ID。

input_audio_buffer.cleared

客户端发送input_audio_buffer.clear事件后，服务端将返回此事件。

event_id string

本次事件唯一标识符。

{
  "event_id": "event_RoUu4T8yExPMI37GKwaOC",
  "type": "input_audio_buffer.cleared"
}

type string

事件类型，固定为input_audio_buffer.cleared。

conversation.item.created

当对话项创建时返回此事件。

event_id string

本次事件唯一标识符。

{
    "event_id": "event_JEfkrr9gO3Ny7Xcv9bGVd",
    "type": "conversation.item.created",
    "item": {
        "id": "item_YbAiGvK2H7YaS34o4R6Ba",
        "object": "realtime.item",
        "type": "message",
        "status": "in_progress",
        "role": "assistant",
        "content": [
            {
                "type": "input_audio"
            }
        ]
    }
}

type string

事件类型，固定为conversation.item.created。

item object

要添加到对话中的项。

属性

id string

对话项的唯一ID。

object string

始终为 realtime.item 。

status string

对话项的状态。

role string

消息的角色。

content string

消息的内容。

conversation.item.input_audio_transcription.completed

此事件表示用户音频写入缓冲区后生成的转录结果。其转录由独立的语音识别模型（当前固定为 gummy-realtime-v1）处理。

语音识别模型生成的转录文本可能与 Qwen-Omni-Realtime 模型的理解存在差异，仅供参考。

event_id string

本次事件唯一标识符。

{
    "event_id": "event_FrrZcxiDfTB9LD9p4pVng",
    "type": "conversation.item.input_audio_transcription.completed",
    "item_id": "item_YbAiGvK2H7YaS34o4R6Ba",
    "content_index": 0,
    "transcript": "喂，你好。"
}

type string

事件类型，固定为conversation.item.input_audio_transcription.completed。

item_id string

用户消息项的 ID。

content_index integer

当前固定为0。

transcript string

转录的文本内容。

conversation.item.input_audio_transcription.failed

启用输入音频转录后，若用户音频转录失败，服务端会返回此事件。此事件独立于 error 事件，便于客户端识别。

event_id string

本次事件唯一标识符。

{
  "type": "conversation.item.input_audio_transcription.failed",
  "item_id": "<item_id>",
  "content_index": 0,
  "error": {
    "code": "<code>",
    "message": "<message>",
    "param": "<param>"
  }
}

type string

事件类型，固定为conversation.item.input_audio_transcription.failed。

item_id string

用户消息项的 ID。

content_index integer

当前固定为0。

error object

错误信息。

属性

code string

错误码。

message string

错误消息。

param string

错误相关的参数。

response.created

当服务端生成新的模型响应时，会返回此事件。

event_id string

本次事件唯一标识符。

{
    "event_id": "event_XuDavMzQN3KKepqGu3KRh",
    "type": "response.created",
    "response": {
        "id": "resp_HaVOPdbmX6vifiV5pAfJY",
        "object": "realtime.response",
        "conversation_id": "conv_FjJaccpnvwHNo9cPVuzGc",
        "status": "in_progress",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Cherry",
        "output_audio_format": "pcm24",
        "output": []
    }
}

type string

事件类型，固定为response.created。

response object

响应对象。

属性

id string

响应的唯一 ID。

conversation_id string

当前会话的唯一ID。

object string

对象类型，此事件下固定为realtime.response。

status string

响应的状态。在[completed, failed, in_progress, or incomplete]范围内。

modalities array

响应的模态。

voice string

模型生成音频的音色。

output string

此事件下目前为空。

response.done

响应生成完成后，服务端会返回此事件。事件中的 response 对象包含除原始音频数据外的全部输出项。

event_id string

本次事件唯一标识符。

{
    "event_id": "event_CSaxRRYLvbrfexDXAEuDG",
    "type": "response.done",
    "response": {
        "id": "resp_HaVOPdbmX6vifiV5pAfJY",
        "object": "realtime.response",
        "conversation_id": "conv_FjJaccpnvwHNo9cPVuzGc",
        "status": "completed",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Cherry",
        "output_audio_format": "pcm24",
        "output": [
            {
                "id": "item_Ls6MtCUWO7LM4E59QziNv",
                "object": "realtime.item",
                "type": "message",
                "status": "completed",
                "role": "assistant",
                "content": [
                    {
                        "type": "audio",
                        "transcript": "你好呀！有什么我可以帮你的吗？"
                    }
                ]
            }
        ],
        "usage": {
            "total_tokens": 377,
            "input_tokens": 336,
            "output_tokens": 41,
            "input_tokens_details": {
                "text_tokens": 228,
                "audio_tokens": 108
            },
            "output_tokens_details": {
                "text_tokens": 9,
                "audio_tokens": 32
            }
        }
    }
}

type string

事件类型，固定为response.done。

response object

响应对象。

属性

id string

响应的唯一 ID。

conversation_id string

当前会话的唯一ID。

object string

对象类型，此事件下固定为realtime.response。

status string

响应的状态。

modalities array

响应的模态。

voice string

模型生成音频的音色。

output object

响应的输出。

属性

id string

响应输出对应的ID。

type string

输出项的类型，当前固定为message。

object string

输出项的对象类型，当前固定为realtime.item。

status string

输出项的状态。

role string

输出项的角色。

content array

输出项的内容。

属性

type string

输出内容的类型。输出为纯文本时，为text；输出包含音频时，为audio。

text string

输出的文本内容。

transcript string

音频转录为文字后的内容。

usage object

本次响应的 Token 消耗信息。

response.text.delta

当输出模态仅包含文本，且模型增量生成新的文本时，服务端将返回此事件。

event_id string

本次事件唯一标识符。

{
    "delta": "喂",
    "event_id": "event_TH49MauuPmRo1RGaMSlP7",
    "type": "response.text.delta",
    "response_id": "resp_PrRSvPVpnCExdUOGHHLuP",
    "item_id": "item_L8IRm9kRXFpxoOjDqDC96",
    "output_index": 0,
    "content_index": 0
}

type string

事件类型，固定为response.text.delta。

delta string

返回的增量文本。

response_id string

回复的ID。

item_id string

消息项ID，可以关联同一个消息项。

output_index integer

响应中输出项的索引, 目前固定为 0。

content_index integer

响应中输出项中内部部分的索引, 目前固定为 0。

response.text.done

当输出模态仅包含文本，且模型生成的文本结束时，服务端将返回此事件。

当响应中断、不完整或取消时，也会返回此事件。

event_id string

本次事件唯一标识符。

{
  "event_id": "event_B1lIeE2Nac33zn5V7h2mm",
  "type": "response.text.done",
  "response_id": "resp_B1lIdtjF4Noqpn5NOjznj",
  "item_id": "item_B1lIdJsAJlJiFs8ztWpJt",
  "output_index": 0,
  "content_index": 0,
  "text": "How can I assist you today?"
}

type string

事件类型，固定为response.text.done。

response_id string

响应的ID。

item_id string

消息项ID。

output_indexinteger

响应输出项的索引。

content_indexinteger

响应输出项的索引。

text string

模型输出的完整文本。

response.audio.delta

当输出模态包含音频，且模型增量生成新的音频数据时，服务端将返回此事件。

event_id string

本次事件唯一标识符。

{
  "event_id": "event_B1osWMZBtrEQbiIwW0qHQ",
  "type": "response.audio.delta",
  "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
  "item_id": "item_OFaPGtzfWCPyGzxnuEX9i",
  "output_index": 0,
  "content_index": 0,
  "delta": "{base64 audio}"
}

type string

事件类型，固定为response.audio.delta。

response_id string

响应的ID。

item_id string

消息项ID。

output_indexinteger

响应输出项的索引。

content_indexinteger

响应输出项的索引。

delta string

模型增量输出的音频数据，使用Base64编码。

response.audio.done

当输出模态包含音频，且模型完成生成音频数据时，服务端将返回此事件。

当响应中断、不完整或取消时，也会返回此事件。

event_id string

本次事件唯一标识符。

{
    "event_id": "event_Le1TDl7VfyHQxl47DtGxI",
    "type": "response.audio.done",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0
}

type string

事件类型，固定为response.audio.done。

response_id string

响应的ID。

item_id string

消息项ID。

output_indexinteger

响应输出项的索引。

content_indexinteger

响应输出项的索引。

response.audio_transcript.delta

当输出模态包含音频，且模型增量生成新的音频对应的文本时，服务端将返回 response.audio_transcript.delta 事件。

event_id string

本次事件唯一标识符。

{
    "event_id": "event_BksW7fOwnyavZdDxIzZYM",
    "type": "response.audio_transcript.delta",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0,
    "delta": "有什么"
}

type string

事件类型，固定为response.audio_transcript.delta。

response_id string

响应的ID。

item_id string

消息项ID。

output_indexinteger

响应输出项的索引。

content_indexinteger

响应输出项的索引。

delta string

增量文本。

response.audio_transcript.done

当输出模态包含音频，且模型完成音频转录后，服务端将返回 response.audio_transcript.done 事件。

event_id string

本次事件唯一标识符。

{
    "event_id": "event_X49tL2WerT4WjxcmH16lS",
    "type": "response.audio_transcript.done",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0,
    "transcript": "你好呀！有什么我可以帮你的吗？"
}

type string

事件类型，固定为response.audio_transcript.done。

response_id string

响应的ID。

item_id string

消息项ID。

output_indexinteger

响应输出项的索引。

content_indexinteger

响应输出项的索引。

transcript string

完整文本。

response.output_item.added

在响应生成过程中创建新项目时，服务端返回此事件。

event_id string

本次事件唯一标识符。

{
    "event_id": "event_DsCO341DEVtiATtCB6BUY",
    "type": "response.output_item.added",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "output_index": 0,
    "item": {
        "id": "item_Ls6MtCUWO7LM4E59QziNv",
        "object": "realtime.item",
        "type": "message",
        "status": "in_progress",
        "role": "assistant",
        "content": []
    }
}

type string

事件类型，固定为response.output_item.added。

response_id string

响应的ID。

output_indexinteger

响应输出项的索引。

itemobject

输出项信息。

属性

id string

输出项的唯一ID。

object string

始终为 realtime.item 。

status string

输出项的状态。

role string

发送消息的角色。

content string

消息的内容。

response.output_item.done

当新的项目输出完成时，服务端返回此事件。

event_id string

本次事件唯一标识符。

{
    "event_id": "event_MEu5nlLw1LsOguHiehIP8",
    "type": "response.output_item.done",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "output_index": 0,
    "item": {
        "id": "item_Ls6MtCUWO7LM4E59QziNv",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "assistant",
        "content": [
            {
                "type": "audio",
                "text": "你好呀！有什么我可以帮你的吗？"
            }
        ]
    }
}

type string

事件类型，固定为response.output_item.done。

response_id string

响应的ID。

output_indexinteger

响应输出项的索引。

itemobject

输出项信息。

属性

id string

输出项的唯一ID。

object string

始终为 realtime.item 。

status string

输出项的状态。

role string

发送消息的角色。

content string

消息的内容。

response.content_part.added

在响应生成过程中，向助手消息项中添加新内容部分时，服务端返回此事件。

event_id string

本次事件唯一标识符。

{
    "event_id": "event_AVBOmrgY3C8bjlRajfSUT",
    "type": "response.content_part.added",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0,
    "part": {
        "type": "audio",
        "text": ""
    }
}

type string

事件类型，固定为response.content_part.added。

response_id string

响应的ID。

item_id string

消息项ID。

output_indexinteger

响应输出项的索引，目前固定为 0。

content_indexinteger

响应输出项中内部部分的索引, 目前固定为 0。

partobject

输出项信息。

属性

type string

内容部分的类型。

text string

内容部分的文本。

response.content_part.done

在助手消息项中的内容部分完成流式传输时，服务端返回此事件。

event_id string

本次事件唯一标识符。

{
    "event_id": "event_Il8HD19v58Qr5IBkw7LtN",
    "type": "response.content_part.done",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0,
    "part": {
        "type": "audio",
        "text": "你好呀！有什么我可以帮你的吗？"
    }
}

type string

事件类型，固定为response.content_part.done。

response_id string

响应的ID。

item_id string

消息项ID。

output_indexinteger

响应输出项的索引，目前固定为 0。

content_indexinteger

该项内容数组中内容部分的索引，目前固定为 0。

partobject

输出项信息。

属性

type string

内容部分的类型。

text string

内容部分的文本。