Qwen3-Omni-Captioner is an open-source model built on Qwen3-Omni. It automatically generates accurate and comprehensive descriptions for complex audio—speech, ambient sounds, music, and sound effects—without prompts. The model identifies speaker emotions, musical elements (style, instruments), and sensitive information. Ideal for audio content analysis, security audits, intent recognition, and video editing.
Supported models
Token conversion rule for audio: Total tokens = Audio duration (in seconds) × 12.5. If the audio duration is less than one second, it is counted as one second.
Getting started
Prerequisites
-
If you use an SDK to make calls, install the latest version of the SDK.
Qwen3-Omni-Captioner supports API calls only. Online testing in the Model Studio console is not available.
These code samples analyze online audio via a URL, not local files. Learn how to pass local files and audio file limits.
OpenAI compatible
Python
import os
from openai import OpenAI
client = OpenAI(
# If environment variable not configured, replace with your API key: api_key="sk-xxx"
# API keys differ by region. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# Beijing region base_url. For Singapore, use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-omni-30b-a3b-captioner",
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"
}
}
]
}
]
)
print(completion.choices[0].message.content)
Node.js
import OpenAI from "openai";
const openai = new OpenAI(
{
// If environment variable not configured, replace with your API key: apiKey: "sk-xxx"
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// Beijing region base_url. For Singapore, use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3-omni-30b-a3b-captioner",
messages: [
{
"role": "user",
"content": [{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"
}
}]
}]
});
console.log(completion.choices[0].message.content)
curl
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
# Beijing region base_url. For Singapore, use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"
}
}
]
}
]
}'
DashScope
Python
import dashscope
import os
# If you use a model in the Singapore region, uncomment the following line and replace {WorkspaceId} with your actual workspace ID.
# dashscope.base_http_api_url = "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1"
messages = [
{
"role": "user",
"content": [
{"audio": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"}
]
}
]
response = dashscope.MultiModalConversation.call(
# API keys differ by region. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
# If environment variable not configured, replace with your API key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model="qwen3-omni-30b-a3b-captioner",
messages=messages
)
print("Output:")
print(response["output"]["choices"][0]["message"].content[0]["text"])
Java
import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;
public class Main {
// If you use a model in the Singapore region, uncomment the following line and replace {WorkspaceId} with your actual workspace ID.
// static {Constants.baseHttpApiUrl="https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1";}
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav")))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// If environment variable not configured, replace with your API key: .apiKey("sk-xxx")
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-omni-30b-a3b-captioner")
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println("Output:\n" + result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
curl
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
# Beijing region base_url. For Singapore, use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before execution ===
curl -X POST 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"input":{
"messages":[
{
"role": "user",
"content": [
{"audio": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"}
]
}
]
}
}'
How it works
-
Single-turn interaction: Each request is an independent analysis task. Multi-turn conversation is not supported.
-
Fixed task: The model generates audio descriptions in English only. You cannot use instructions (e.g., system messages) to change behavior, output format, or content focus.
-
Audio input only: The model accepts audio only. Text prompts are not needed. The
messageparameter format is fixed.
Streaming output
Streaming output generates intermediate results step-by-step and returns them simultaneously, allowing you to read responses as they're generated. This reduces wait time.
OpenAI compatible
Set stream to true to enable streaming output.
Python
import os
from openai import OpenAI
# Initialize the OpenAI client
client = OpenAI(
# If environment variable not configured, replace with your API key: api_key="sk-xxx"
# API keys differ by region. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# Beijing region base_url. For Singapore, use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-omni-30b-a3b-captioner",
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"
}
}
]
}
],
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
# If stream_options.include_usage is True, the choices field of the last chunk is an empty list and should be skipped. You can get the token usage from chunk.usage.
if chunk.choices and chunk.choices[0].delta.content != "":
print(chunk.choices[0].delta.content,end="")
Node.js
import OpenAI from "openai";
const openai = new OpenAI(
{
// If environment variable not configured, replace with your API key: apiKey: "sk-xxx"
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// Beijing region base_url. For Singapore, use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3-omni-30b-a3b-captioner",
messages: [
{
"role": "user",
"content": [{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"
},
}]
}],
stream: true,
stream_options: {
include_usage: true
},
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta.content);
} else {
console.log(chunk.usage);
}
}
curl
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
# Beijing region base_url. For Singapore, use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"
}
}
]
}
],
"stream":true,
"stream_options":{
"include_usage":true
}
}'
DashScope
Call via DashScope SDK or HTTP. Set parameters based on your method:
-
Python SDK: Set the
streamparameter to True. -
Java SDK: Use the
streamCallmethod. -
HTTP: In the header, set
X-DashScope-SSEtoenable.
By default, streaming output is non-incremental. This means each returned chunk contains all previously generated content. For incremental output, setincremental_output(incrementalOutputin Java) totrue.
Python
import dashscope
import os
# If you use a model in the Singapore region, uncomment the following line and replace {WorkspaceId} with your actual workspace ID.
# dashscope.base_http_api_url = "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1"
messages = [
{
"role": "user",
"content": [
{"audio": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"}
]
}
]
response = dashscope.MultiModalConversation.call(
# API keys differ by region. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
# If environment variable not configured, replace with your API key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model="qwen3-omni-30b-a3b-captioner",
messages=messages,
stream=True,
incremental_output=True
)
full_content = ""
print("Streaming output:")
for response in response:
if response["output"]["choices"][0]["message"].content:
print(response["output"]["choices"][0]["message"].content[0]["text"])
full_content += response["output"]["choices"][0]["message"].content[0]["text"]
print(f"Full content: {full_content}")
Java
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.utils.Constants;
public class Main {
// If you use a model in the Singapore region, uncomment the following line and replace {WorkspaceId} with your actual workspace ID.
// static {Constants.baseHttpApiUrl="https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1";}
public static void streamCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
// qwen3-omni-30b-a3b-captioner supports only one audio file as input.
.content(Arrays.asList(
new HashMap<String, Object>(){{put("audio", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav");}}
)).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// If environment variable not configured, replace with your API key: .apiKey("sk-xxx")
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-omni-30b-a3b-captioner")
.message(userMessage)
.incrementalOutput(true)
.build();
Flowable<MultiModalConversationResult> result = conv.streamCall(param);
result.blockingForEach(item -> {
try {
List<com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult.Output.Choice.Message.Content> content = item.getOutput().getChoices().get(0).getMessage().getContent();
// Check if content exists and is not empty.
if (content != null && !content.isEmpty()) {
System.out.println(content.get(0).get("text"));
}
} catch (Exception e){
System.exit(0);
}
});
}
public static void main(String[] args) {
try {
streamCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
curl
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
# Beijing region base_url. For Singapore, use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before execution ===
curl -X POST 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"input":{
"messages":[
{
"role": "user",
"content": [
{"audio": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"}
]
}
]
},
"parameters": {
"incremental_output": true
}
}'
Pass local file (Base64 encoding or file path)
Two methods are available to upload local files:
-
Use Base64 encoding
-
Direct file path (Recommended for greater transmission stability)
Upload methods:
Pass by file path
Pass the file path directly to the model. Supported by DashScope Python and Java SDKs only, not HTTP. See the table below for path format by language and OS.
Pass by Base64 encoding
Convert the file to a Base64 string and pass it to the model.
Limits:
-
Recommended: pass the file path directly for greater transmission stability. For files under 1 MB, Base64 encoding also works.
-
When passing by file path, audio files must be under 10 MB.
-
When using Base64, the encoded string must be under 10 MB. Note: Base64 increases file size.
Pass by file path
File path passing is supported by DashScope Python and Java SDKs only, not HTTP.
Python
import dashscope
import os
# If you use a model in the Singapore region, uncomment the following line and replace {WorkspaceId} with your actual workspace ID.
# dashscope.base_http_api_url = "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1"
# Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path of your local audio file.
# The full path of the local file must be prefixed with file:// to ensure a valid path, for example: file:///home/images/test.mp3
audio_file_path = "file://ABSOLUTE_PATH/welcome.mp3"
messages = [
{
"role": "user",
# Pass the file path prefixed with file:// in the audio parameter.
"content": [{"audio": audio_file_path}],
}
]
response = dashscope.MultiModalConversation.call(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
# API keys differ by region. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
api_key=os.getenv('DASHSCOPE_API_KEY'),
model="qwen3-omni-30b-a3b-captioner",
messages=messages)
print("Output:")
print(response["output"]["choices"][0]["message"].content[0]["text"])
Java
import java.util.Arrays;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;
public class Main {
// If you use a model in the Singapore region, uncomment the following line and replace {WorkspaceId} with your actual workspace ID.
// static {Constants.baseHttpApiUrl="https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1";}
public static void callWithLocalFile()
throws ApiException, NoApiKeyException, UploadFileException {
// Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path of your local audio file.
// The full path of the local file must be prefixed with file:// to ensure a valid path, for example: file:///home/images/test.mp3
// The current test system is macOS. If you use Windows, use "file:///ABSOLUTE_PATH/welcome.mp3" instead.
String localFilePath = "file://ABSOLUTE_PATH/welcome.mp3";
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
new HashMap<String, Object>(){{put("audio", localFilePath);}}
))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// If environment variable not configured, replace with your API key: .apiKey("sk-xxx")
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-omni-30b-a3b-captioner")
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println("Output:\n" + result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
callWithLocalFile();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
Pass by Base64 encoding
OpenAI compatible
Python
import os
from openai import OpenAI
import base64
client = OpenAI(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
# API keys differ by region. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
api_key=os.getenv('DASHSCOPE_API_KEY'),
# Beijing region base_url. For Singapore, use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
def encode_audio(audio_path):
with open(audio_path, "rb") as audio_file:
return base64.b64encode(audio_file.read()).decode("utf-8")
# Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path of your local audio file.
audio_file_path = "xxx/ABSOLUTE_PATH/welcome.mp3"
base64_audio = encode_audio(audio_file_path)
completion = client.chat.completions.create(
model="qwen3-omni-30b-a3b-captioner",
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
# When passing a local file with Base64 encoding, you must use the data: prefix to ensure a valid file URL.
# The "base64" keyword must be included before the Base64-encoded data (base64_audio), otherwise an error will occur.
"data": f"data:;base64,{base64_audio}"
},
}
],
},
]
)
print(completion.choices[0].message.content)
Node.js
import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// If you have not configured the environment variable, replace the following line with your Model Studio API key: apiKey: "sk-xxx"
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// Beijing region base_url. For Singapore, use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
}
);
const encodeAudio = (audioPath) => {
const audioFile = readFileSync(audioPath);
return audioFile.toString('base64');
};
// Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path of your local audio file.
const base64Audio = encodeAudio("xxx/ABSOLUTE_PATH/welcome.mp3")
const completion = await openai.chat.completions.create({
model: "qwen3-omni-30b-a3b-captioner",
messages: [
{
"role": "user",
"content": [{
"type": "input_audio",
"input_audio": { "data": `data:;base64,${base64Audio}`}
}]
}]
});
console.log(completion.choices[0].message.content);
curl
-
For information about how to convert a file to a Base64-encoded string, see the code sample.
-
For demonstration purposes, the
"data:;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5...."Base64 string is truncated. In practice, you must pass the complete encoded string.
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
# Beijing region base_url. For Singapore, use: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "data:;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5...."
}
}
]
}
]
}'
DashScope
Python
import os
import base64
import dashscope
from dashscope import MultiModalConversation
# If you use a model in the Singapore region, uncomment the following line and replace {WorkspaceId} with your actual workspace ID.
# dashscope.base_http_api_url = "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1"
# Encoding function: Converts a local file to a Base64-encoded string
def encode_audio(audio_file_path):
with open(audio_file_path, "rb") as audio_file:
return base64.b64encode(audio_file.read()).decode("utf-8")
# Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path of your local audio file.
audio_file_path = "xxx/ABSOLUTE_PATH/welcome.mp3"
base64_audio = encode_audio(audio_file_path)
messages = [
{
"role": "user",
# When passing a local file with Base64 encoding, you must use the data: prefix to ensure a valid file URL.
# The "base64" keyword must be included before the Base64-encoded data (base64_audio), otherwise an error will occur.
"content": [{"audio":f"data:;base64,{base64_audio}"}],
}
]
response = MultiModalConversation.call(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
# API keys differ by region. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen3-omni-30b-a3b-captioner",
messages=messages,
)
print(response.output.choices[0].message.content[0]["text"])
Java
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.Base64;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
// If you use a model in the Singapore region, uncomment the following line and replace {WorkspaceId} with your actual workspace ID.
// static {Constants.baseHttpApiUrl="https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1";}
private static String encodeAudioToBase64(String audioPath) throws IOException {
Path path = Paths.get(audioPath);
byte[] audioBytes = Files.readAllBytes(path);
return Base64.getEncoder().encodeToString(audioBytes);
}
public static void callWithLocalFile()
throws ApiException, NoApiKeyException, UploadFileException,IOException{
// Replace ABSOLUTE_PATH/welcome.mp3 with the actual path of your local file.
String localFilePath = "ABSOLUTE_PATH/welcome.mp3";
String base64Audio = encodeAudioToBase64(localFilePath);
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
// When passing a local file with Base64 encoding, you must use the data: prefix to ensure a valid file URL.
// The "base64" keyword must be included before the Base64-encoded data (base64_audio), otherwise an error will occur.
.content(Arrays.asList(
new HashMap<String, Object>(){{put("audio", "data:;base64," + base64Audio);}}
))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.model("qwen3-omni-30b-a3b-captioner")
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
// If environment variable not configured, replace with your API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println("Output:\n" + result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
callWithLocalFile();
} catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
curl
-
For information about how to convert a file to a Base64-encoded string, see the code sample.
-
For demonstration purposes, the
"data:;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5...."Base64 string is truncated. In practice, you must pass the complete encoded string.
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://help.aliyun.com/en/model-studio/get-api-key
# The following is the URL for the Beijing region. If you use a model in the Singapore region, replace the URL with: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === Delete this comment before execution ===
curl -X POST 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"input":{
"messages":[
{
"role": "user",
"content": [
{"audio": "data:;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5...."}
]
}
]
}
}'
API reference
For Qwen3-Omni-Captioner parameters, see Text generation.
Error codes
If the model call fails and returns an error message, see Error codes for resolution.
FAQ
Limitations
Audio file limits:
-
Duration: Up to 40 minutes.
-
Number of files: Only one audio file is supported per request.
-
File formats: AMR, WAV (CodecID: GSM_MS), WAV (PCM), 3GP, 3GPP, AAC, and MP3.
-
File input methods: Public URL, Base64 encoding, or local file path.
-
File size:
-
Public URL: No more than 1 GB.
-
File path: The audio file must be smaller than 10 MB.
-
Base64 encoding: The encoded Base64 string must be smaller than 10 MB. For more information, see Pass local file.
To compress a file, see How to compress an audio file to the required size?
-