参数 | 类型 | 说明 | type | string | 固定为response.done。 | response | object | 响应对象。 | response.id | string | 响应的唯一 ID。 | response.object | string | 对象类型,此事件下固定为realtime.response。 | response.output | array | 响应的输出。 | response.usage | object | 本次语音合成计费信息。 | response.usage.characters | integer | Qwen3-TTS Realtime计费字符数。 | response.usage.total_tokens | integer | Qwen-TTS Realtime输入和输出(合成的音频)内容总长度(Token)。 | response.usage.input_tokens | integer | Qwen-TTS Realtime输入内容总长度(Token)。 | response.usage.output_tokens | integer | Qwen-TTS Realtime输出内容总长度(Token)。 | response.usage.input_tokens_details | integer | Qwen-TTS Realtime输入内容长度(Token)详情。 | response.usage.input_tokens_details.text_tokens | integer | Qwen-TTS Realtime输入文本内容总长度(Token)。 | response.usage.output_tokens_details | integer | Qwen-TTS Realtime输出内容长度(Token)详情。 | response.usage.output_tokens_details.text_tokens | integer | Qwen-TTS Realtime输出文本内容总长度(Token)。 | response.usage.output_tokens_details.audio_tokens | integer | Qwen-TTS Realtime输出音频内容总长度(Token)。 音频转换为 Token 的规则:每1秒的音频对应 50个 Token 。若音频时长不足1秒,则按 50个 Token 计算。 |
| Qwen3-TTS Realtime{
"event_id": "event_Aemy83XqHFFDDSeJIDn6N",
"type": "response.done",
"response": {
"id": "resp_LFeR42yXZ9SxUAeXjmyTz",
"object": "realtime.response",
"conversation_id": "",
"status": "completed",
"modalities": [
"text",
"audio"
],
"voice": "Cherry",
"output": [
{
"id": "item_Ae1lv2XmRljRSG96L8Zm1",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "audio",
"transcript": ""
}
]
}
],
"usage": {
"characters": 25
}
}
}
Qwen-TTS Realtime{
"event_id": "event_xxx",
"type": "response.done",
"response": {
"id": "resp_xxx",
"object": "realtime.response",
"conversation_id": "",
"status": "completed",
"modalities": [
"text",
"audio"
],
"voice": "Cherry",
"output": [
{
"id": "item_FIrYGaNVK3rbIZqeY4QjM",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "audio",
"transcript": ""
}
]
}
],
"usage": {
"total_tokens": 67,
"input_tokens": 3,
"output_tokens": 64,
"input_tokens_details": {
"text_tokens": 3
},
"output_tokens_details": {
"text_tokens": 0,
"audio_tokens": 64
}
}
}
}
|