qwen3-livetranslate-flash translates audio and video through the OpenAI-compatible chat completions endpoint. All requests are streamed.
Note: The DashScope interface is not supported.
Supported models
-
qwen3-livetranslate-flash -
qwen3-livetranslate-flash-2025-12-01
Prerequisites
Before you begin, complete the following:
-
Install the OpenAI SDK (for Python or Node.js)
Endpoints
| Region | SDK base_url |
HTTP endpoint |
|---|---|---|
| Singapore | https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1 |
POST https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions |
| Beijing | https://dashscope.aliyuncs.com/compatible-mode/v1 |
POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions |
Model Studio has released a workspace-specific domain for the Singapore region: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com. The new dedicated domain delivers superior performance and higher stability for inference requests. We recommend migrating from https://dashscope-intl.aliyuncs.com to the new domain.
{WorkspaceId} is your workspace ID, which can be found on the Workspace Details page in the Model Studio console. The existing domain remains fully functional.
Quick start
The following examples translate an audio file and return both translated text and audio through streaming. Replace the base_url if you use the Beijing region.
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the Singapore region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
base_url="https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-livetranslate-flash",
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
"format": "wav",
},
}
],
}
],
modalities=["text", "audio"],
audio={"voice": "Cherry", "format": "wav"},
stream=True,
stream_options={"include_usage": True},
extra_body={"translation_options": {"source_lang": "zh", "target_lang": "en"}},
)
for chunk in completion:
print(chunk)
Node.js
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the Singapore region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
baseURL: "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1",
});
async function main() {
const completion = await client.chat.completions.create({
model: "qwen3-livetranslate-flash",
messages: [
{
role: "user",
content: [
{
type: "input_audio",
input_audio: {
data: "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
format: "wav",
},
},
],
},
],
modalities: ["text", "audio"],
audio: { voice: "Cherry", format: "wav" },
stream: true,
stream_options: { include_usage: true },
translation_options: { source_lang: "zh", target_lang: "en" },
});
for await (const chunk of completion) {
console.log(JSON.stringify(chunk));
}
}
main();
curl
## The following is the Singapore region URL. Replace {WorkspaceId} with your actual workspace ID. URLs vary by region.
curl -X POST https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-livetranslate-flash",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
"format": "wav"
}
}
]
}
],
"modalities": ["text", "audio"],
"audio": {
"voice": "Cherry",
"format": "wav"
},
"stream": true,
"stream_options": {
"include_usage": true
},
"translation_options": {
"source_lang": "zh",
"target_lang": "en"
}
}'
Video input
To translate video instead of audio, set the content type to video_url:
messages = [
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
},
}
],
},
]
All other parameters remain the same.
Request body
Required parameters
| Parameter | Type | Description |
|---|---|---|
model |
string | Model name. Valid values: qwen3-livetranslate-flash, qwen3-livetranslate-flash-2025-12-01. |
messages |
array | An array of messages. Only one user message is supported. |
stream |
boolean | Must be true. Default is false, but only streaming output is supported, so you must set this to true. |
translation_options |
object | Translation configuration. See Translation options. This is a non-standard OpenAI parameter. In the Python SDK, pass it inside extra_body. In Node.js or HTTP, pass it at the top level. |
Optional parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
modalities |
array | ["text"] |
Output modality. Set to ["text", "audio"] to receive both text and audio output, or ["text"] for text only. |
audio |
object | - | Output audio configuration. Required when modalities includes "audio". See Audio output options. |
stream_options |
object | - | Streaming configuration. See Stream options. |
max_tokens |
integer | Model maximum | The maximum number of tokens to generate. Generation stops at this limit or when complete. |
seed |
integer | - | Random seed for reproducibility. The same seed produces identical output for identical requests. Range: [0, 2^31-1]. |
Sampling parameters
For translation accuracy, keep these parameters at their default values.
| Parameter | Type | Default | Range | Notes |
|---|---|---|---|---|
temperature |
float | 0.000001 | [0, 2) | Controls output diversity. |
top_p |
float | 0.8 | (0, 1.0] | Nucleus sampling threshold. |
presence_penalty |
float | 0 | [-2.0, 2.0] | Reduces repetition when positive. |
top_k |
integer | 1 | >= 0 | Candidate set size. If the value is None or greater than 100, top_k is disabled and only top_p takes effect. Non-standard OpenAI parameter. Python SDK: use extra_body. |
repetition_penalty |
float | 1.05 | > 0 | Penalizes repeated sequences. Non-standard OpenAI parameter. Python SDK: use extra_body. |
Message object
The messages array must contain exactly one object with role set to user.
Properties of content array items:
| Field | Type | Required | Description |
|---|---|---|---|
type |
string | Yes | input_audio for audio input, video_url for video input. |
input_audio |
object | When type is input_audio |
Audio input. See below. |
video_url |
object | When type is video_url |
Video input. See below. |
input_audio object:
| Field | Type | Required | Description |
|---|---|---|---|
data |
string | Yes | URL of the audio file, or a Base64 data URL. For local files, see Input a Base64-encoded local file. |
format |
string | Yes | Audio format, such as mp3 or wav. |
video_url object:
| Field | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | Public URL of the video file, or a Base64 data URL. For local files, see Input a Base64-encoded local file. |
Translation options
| Field | Type | Required | Description |
|---|---|---|---|
source_lang |
string | No | Full English name of the source language. See Supported languages. If omitted, language is auto-detected. |
target_lang |
string | Yes | Full English name of the target language. See Supported languages. |
Note:translation_optionsis a non-standard OpenAI parameter. In the Python SDK, pass it insideextra_body: In Node.js or HTTP, pass it at the top level of the request body.
extra_body={"translation_options": {"source_lang": "zh", "target_lang": "en"}}
Audio output options
Required when modalities is ["text", "audio"].
| Field | Type | Required | Description |
|---|---|---|---|
voice |
string | Yes | Voice for the output audio. See Supported voices. |
format |
string | Yes | Output audio format. Only wav is supported. |
Stream options
| Field | Type | Default | Description |
|---|---|---|---|
include_usage |
boolean | false |
When true, the final chunk includes token usage details. |
Response
The API returns a series of streaming chunks, each as a chat.completion.chunk object. Chunks fall into three categories: text, audio, and token usage.
Text chunk
Contains incremental translated text in choices[0].delta.content:
{
"id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
"choices": [
{
"delta": {
"content": " of",
"role": null,
"audio": null
},
"finish_reason": null,
"index": 0
}
],
"created": 1764755440,
"model": "qwen3-livetranslate-flash",
"object": "chat.completion.chunk"
}
Audio chunk
Contains incremental Base64-encoded audio in choices[0].delta.audio.data:
{
"id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
"choices": [
{
"delta": {
"content": null,
"role": null,
"audio": {
"data": "///+//7////+////////////AAAAAAAAAAABA......",
"expires_at": 1764755440,
"id": "audio_c22a54b8-40cc-4a1d-988b-f84cdf86868f"
}
},
"finish_reason": null,
"index": 0
}
],
"created": 1764755440,
"model": "qwen3-livetranslate-flash",
"object": "chat.completion.chunk"
}
Token usage chunk
Returned as the final chunk when include_usage is true. The choices array is empty, and usage contains the token breakdown:
{
"id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
"choices": [],
"created": 1764755440,
"model": "qwen3-livetranslate-flash",
"object": "chat.completion.chunk",
"usage": {
"completion_tokens": 242,
"prompt_tokens": 415,
"total_tokens": 657,
"completion_tokens_details": {
"accepted_prediction_tokens": null,
"audio_tokens": 191,
"reasoning_tokens": null,
"rejected_prediction_tokens": null,
"text_tokens": 51
},
"prompt_tokens_details": {
"audio_tokens": 415,
"cached_tokens": null,
"text_tokens": 0,
"video_tokens": null
}
}
}
Note: For video input,prompt_tokens_details.audio_tokensincludes the audio tokens extracted from the video.video_tokensreports the video-specific token count.
Response fields
| Field | Type | Description |
|---|---|---|
id |
string | The request identifier. Identical across all chunks. |
choices |
array | Generated content. Empty in the final usage chunk. |
choices[].delta.content |
string | Incremental translated text. null in audio chunks. |
choices[].delta.audio |
object | Incremental audio data. null in text chunks. |
choices[].delta.audio.data |
string | Base64-encoded audio segment. |
choices[].delta.audio.id |
string | Unique identifier for the output audio. |
choices[].delta.audio.expires_at |
integer | Timestamp when the request was created. |
choices[].delta.role |
string | Message role. Present only in the first chunk. |
choices[].finish_reason |
string | stop when generation completes normally, length when truncated by max_tokens, null while in progress. |
choices[].index |
integer | Always 0. |
created |
integer | Unix timestamp for the request. Identical across all chunks. |
model |
string | The model name. |
object |
string | Always chat.completion.chunk. |
usage |
object | Token consumption. Present only in the final chunk when include_usage is true. |
usage.prompt_tokens |
integer | Total input tokens. |
usage.completion_tokens |
integer | Total output tokens. |
usage.total_tokens |
integer | Sum of prompt_tokens and completion_tokens. |
usage.completion_tokens_details.audio_tokens |
integer | Output audio tokens. |
usage.completion_tokens_details.text_tokens |
integer | Output text tokens. |
usage.prompt_tokens_details.audio_tokens |
integer | Input audio tokens. For video input, this includes audio extracted from the video. |
usage.prompt_tokens_details.text_tokens |
integer | Input text tokens. Always 0. |
usage.prompt_tokens_details.video_tokens |
integer | Input video tokens. Present only for video input. |
Fields fixed to null
The following fields are present in the response for OpenAI compatibility but always return null:
reasoning_content, function_call, refusal, tool_calls, logprobs, service_tier, system_fingerprint
Usage notes
-
Streaming only. Set
streamtotrue. Non-streaming calls are unsupported. -
Single message. The
messagesarray accepts one user message only. -
Non-standard parameters.
translation_options,top_k, andrepetition_penaltyare not in the standard OpenAI API. Python SDK: pass inextra_body. Node.js/HTTP: include at top level. -
Sampling defaults. Defaults for
temperature,top_p,top_k,presence_penalty, andrepetition_penaltyare optimized for translation accuracy. Changing them may degrade quality. -
Output audio format. Only
wavis supported. -
Automatic language detection. If
source_langis omitted, the input language is auto-detected.