Qwen-Omni_大模型服务平台百炼(Model Studio)-阿里云帮助中心

Qwen-Omni 系列模型支持输入多种模态的数据，包括视频、音频、图片、文本，并输出音频与文本。

模型介绍与计费

相比于 Qwen-VL 与 Qwen-Audio 模型，Qwen-Omni 模型可以：

理解视频文件中的视觉与音频信息；
理解多种模态的数据；
输出音频。

在视觉理解、音频理解等能力上，Qwen-Omni 模型也表现出色。

商业版模型

开源版模型

相较于开源版，商业版模型具有最新的能力和改进。

模型名称	版本	上下文长度	最大输入	最大输出	免费额度（注）
		（Token数）
qwen-omni-turbo 当前等同qwen-omni-turbo-2025-03-26	稳定版	32,768	30,720	2,048	各100万Token（不区分模态）有效期：百炼开通后180天内
qwen-omni-turbo-latest 始终等同最新快照版	最新版
qwen-omni-turbo-2025-03-26 又称qwen-omni-turbo-0326	快照版
qwen-omni-turbo-2025-01-19 又称qwen-omni-turbo-0119

商业版模型的免费额度用完后，输入与输出的计费规则如下：

稳定版模型qwen-omni-turbo支持Batch调用，费用为以下价格的50%。注：Batch调用不支持抵扣免费额度。

输入计费项	单价（每千 Token）
输入：文本	0.0004元
输入：音频	0.025元
输入：图片/视频	0.0015元

输出计费项

单价（每千 Token）

输出：文本

0.0016元（输入仅包含文本时）

0.0045元（输入包含图片/音频/视频时）

输出：文本+音频

0.05元（音频）

输出的文本不计费。

计费示例：某次请求输入了1000 Token 的文本和1000 Token 的图片，输出了1000 Token 的文本和1000 Token 的音频，则该请求花费：0.0004元（文本输入）+ 0.0015元（图片输入）+ 0.05元（音频输出）= 0.0519元。在Batch调用模式下，该请求花费按50%计收，为0.02595元。

模型名称

上下文长度

最大输入

最大输出

免费额度

（注）

（Token数）

qwen2.5-omni-7b

32,768

30,720

2,048

100万Token（不区分模态）

有效期：百炼开通后180天内

开源版模型的免费额度用完后，输入与输出的计费规则如下：

输入计费项	单价（每千 Token）
输入：文本	0.0006元
输入：音频	0.038元
输入：图片/视频	0.002元

输出计费项

单价（每千 Token）

输出：文本

0.0024元（输入仅包含文本时）

0.006元（输入包含图片/音频/视频时）

输出：文本+音频

0.076元（音频）

输出的文本不计费。

计费示例：某次请求输入了1000 Token 的文本和1000 Token 的图片，输出了1000 Token 的文本和1000 Token 的音频，则该请求花费：0.0006元（文本输入）+ 0.002元（图片输入）+ 0.076元（音频输出）= 0.0786元。

音频、图片与视频转换为Token数的规则

音频

图片

视频

每1秒的音频对应25个 Token 。若音频时长不足1秒，则按25个 Token 计算。

每28x28像素对应1个 Token，1张图最少需要 4个 Token，最多需要 1280个 Token。您可以通过运行以下代码来估计传入图片的 Token。

Python

Node.js

Python

        

          
      
        
import math
# 使用以下命令安装Pillow库：pip install Pillow
from PIL import Image

def token_calculate(image_path):
    # 打开指定的PNG图片文件
    image = Image.open(image_path)
    # 获取图片的原始尺寸
    height = image.height
    width = image.width
    # 将高度调整为28的整数倍
    h_bar = round(height / 28) * 28
    # 将宽度调整为28的整数倍
    w_bar = round(width / 28) * 28
    # 图像的Token下限：4个Token
    min_pixels = 28 * 28 * 4
    # 图像的Token上限：1280个Token
    max_pixels = 1280 * 28 * 28
    # 对图像进行缩放处理，调整像素的总数在范围[min_pixels,max_pixels]内
    if h_bar * w_bar > max_pixels:
        # 计算缩放因子beta，使得缩放后的图像总像素数不超过max_pixels
        beta = math.sqrt((height * width) / max_pixels)
        # 重新计算调整后的高度，确保为28的整数倍
        h_bar = math.floor(height / beta / 28) * 28
        # 重新计算调整后的宽度，确保为28的整数倍
        w_bar = math.floor(width / beta / 28) * 28
    elif h_bar * w_bar < min_pixels:
        # 计算缩放因子beta，使得缩放后的图像总像素数不低于min_pixels
        beta = math.sqrt(min_pixels / (height * width))
        # 重新计算调整后的高度，确保为28的整数倍
        h_bar = math.ceil(height * beta / 28) * 28
        # 重新计算调整后的宽度，确保为28的整数倍
        w_bar = math.ceil(width * beta / 28) * 28
    print(f"缩放后的图像尺寸为：高度为{h_bar}，宽度为{w_bar}")
    # 计算图像的Token数：总像素除以28 * 28
    token = int((h_bar * w_bar) / (28 * 28))
    # 系统会自动添加<|vision_bos|>和<|vision_eos|>视觉标记（各1个Token）
    total_token = token + 2
    print(f"图像的Token数为{total_token}")    
    return total_token
if __name__ == "__main__":
    total_token = token_calculate(image_path="test.png")

Node.js

        

          
      
        
// 使用以下命令安装sharp: npm install sharp
import sharp from 'sharp';

async function tokenCalculate(imagePath) {
    // 打开指定的PNG图片文件
    const image = sharp(imagePath);
    const metadata = await image.metadata();

    // 获取图片的原始尺寸
    const height = metadata.height;
    const width = metadata.width;

    // 将高度调整为28的整数倍
    let hBar = Math.round(height / 28) * 28;
    // 将宽度调整为28的整数倍
    let wBar = Math.round(width / 28) * 28;

    // 图像的Token下限：4个Token
    const minPixels = 28 * 28 * 4;
    // 图像的Token上限：1280个Token
    const maxPixels = 1280 * 28 * 28;

    // 对图像进行缩放处理，调整像素的总数在范围[min_pixels,max_pixels]内
    if (hBar * wBar > maxPixels) {
        // 计算缩放因子beta，使得缩放后的图像总像素数不超过max_pixels
        const beta = Math.sqrt((height * width) / maxPixels);
        // 重新计算调整后的高度，确保为28的整数倍
        hBar = Math.floor(height / beta / 28) * 28;
        // 重新计算调整后的宽度，确保为28的整数倍
        wBar = Math.floor(width / beta / 28) * 28;
    } else if (hBar * wBar < minPixels) {
        // 计算缩放因子beta，使得缩放后的图像总像素数不低于min_pixels
        const beta = Math.sqrt(minPixels / (height * width));
        // 重新计算调整后的高度，确保为28的整数倍
        hBar = Math.ceil(height * beta / 28) * 28;
        // 重新计算调整后的宽度，确保为28的整数倍
        wBar = Math.ceil(width * beta / 28) * 28;
    }
    console.log(`缩放后的图像尺寸为：高度为${hBar}，宽度为${wBar}`);
    // 计算图像的Token数：总像素除以28 * 28
    const token = Math.floor((hBar * wBar) / (28 * 28));
    // 系统会自动添加<|vision_bos|>和<|vision_eos|>视觉标记（各占1个Token）
    console.log(`图像的总Token数为${token + 2}`);
    const totalToken = token + 2;
    return totalToken;
}

// 将test.png替换为本地的图像路径
tokenCalculate('test.png').catch(err => {
    console.error('Error processing image:', err);
});

视频文件的 Token 分为 video_tokens（视觉）与 audio_tokens（音频）。

video_tokens

计算过程较为复杂。请参见以下代码：

Python

Node.js

Python

        

          
      
        
# 使用前安装：pip install opencv-python
import math
import os
import logging
import cv2

logger = logging.getLogger(__name__)

# 固定参数
FRAME_FACTOR = 2
IMAGE_FACTOR = 28
# 视频帧的长宽比
MAX_RATIO = 200

# 视频帧的 Token 下限
VIDEO_MIN_PIXELS = 128 * 28 * 28
# 视频帧的 Token 上限
VIDEO_MAX_PIXELS = 768 * 28 * 28

# Qwen-Omni 模型 FPS 为 2
FPS = 2
# 最少抽取帧数
FPS_MIN_FRAMES = 4
# 最大抽取帧数
FPS_MAX_FRAMES = 512

# 视频输入的最大像素值
VIDEO_TOTAL_PIXELS = 65536 * 28 * 28

def round_by_factor(number, factor):
    return round(number / factor) * factor

def ceil_by_factor(number, factor):
    return math.ceil(number / factor) * factor

def floor_by_factor(number, factor):
    return math.floor(number / factor) * factor

def get_video(video_path):
    cap = cv2.VideoCapture(video_path)
    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    video_fps = cap.get(cv2.CAP_PROP_FPS)
    cap.release()
    return frame_height, frame_width, total_frames, video_fps

def smart_nframes(total_frames, video_fps):
    min_frames = ceil_by_factor(FPS_MIN_FRAMES, FRAME_FACTOR)
    max_frames = floor_by_factor(min(FPS_MAX_FRAMES, total_frames), FRAME_FACTOR)
    duration = total_frames / video_fps if video_fps != 0 else 0
    if duration - int(duration) > (1 / FPS):
        total_frames = math.ceil(duration * video_fps)
    else:
        total_frames = math.ceil(int(duration) * video_fps)
    nframes = total_frames / video_fps * FPS
    nframes = int(min(min(max(nframes, min_frames), max_frames), total_frames))
    if not (FRAME_FACTOR <= nframes <= total_frames):
        raise ValueError(f"nframes should in interval [{FRAME_FACTOR}, {total_frames}], but got {nframes}.")
    return nframes

def smart_resize(height, width, nframes, factor=IMAGE_FACTOR):
    min_pixels = VIDEO_MIN_PIXELS
    total_pixels = VIDEO_TOTAL_PIXELS
    max_pixels = max(min(VIDEO_MAX_PIXELS, total_pixels / nframes * FRAME_FACTOR), int(min_pixels * 1.05))
    if max(height, width) / min(height, width) > MAX_RATIO:
        raise ValueError(f"absolute aspect ratio must be smaller than {MAX_RATIO}, got {max(height, width) / min(height, width)}")
    h_bar = max(factor, round_by_factor(height, factor))
    w_bar = max(factor, round_by_factor(width, factor))
    if h_bar * w_bar > max_pixels:
        beta = math.sqrt((height * width) / max_pixels)
        h_bar = floor_by_factor(height / beta, factor)
        w_bar = floor_by_factor(width / beta, factor)
    elif h_bar * w_bar < min_pixels:
        beta = math.sqrt(min_pixels / (height * width))
        h_bar = ceil_by_factor(height * beta, factor)
        w_bar = ceil_by_factor(width * beta, factor)
    return h_bar, w_bar

def video_token_calculate(video_path):
    height, width, total_frames, video_fps = get_video(video_path)
    nframes = smart_nframes(total_frames, video_fps)
    resized_height, resized_width = smart_resize(height, width, nframes)
    video_token = int(math.ceil(nframes / FPS) * resized_height / 28 * resized_width / 28)
    video_token += 2  # 视觉标记
    return video_token

if __name__ == "__main__":
    video_path = "spring_mountain.mp4"  # 你的视频路径
    video_token = video_token_calculate(video_path)
    print("video_tokens:", video_token)

Node.js

        

          
      
        
// 使用前请安装：npm install node-ffprobe @ffprobe-installer/ffprobe
import ffprobeInstaller from '@ffprobe-installer/ffprobe';
import ffprobe from 'node-ffprobe';
// 设置 ffprobe 路径
ffprobe.FFPROBE_PATH = ffprobeInstaller.path;

// 获取视频信息
async function getVideoInfo(videoPath) {
  try {
    const probeData = await ffprobe(videoPath);
    const videoStream = probeData.streams.find(s => s.codec_type === 'video');
    if (!videoStream) throw new Error('未找到视频流');
    const { width, height, nb_frames: totalFrames, avg_frame_rate } = videoStream;
    const [numerator, denominator] = avg_frame_rate.split('/');
    const frameRate = parseFloat(numerator) / parseFloat(denominator);
    return { width, height, totalFrames: parseInt(totalFrames), frameRate };
  } catch (error) {
    console.error('获取视频信息失败:', error.message);
    throw error;
  }
}

// 常量配置
const CONFIG = {
  FRAME_FACTOR: 2,
  IMAGE_FACTOR: 28,
  MAX_RATIO: 200,
  VIDEO_MIN_PIXELS: 128 * 28 * 28,
  VIDEO_MAX_PIXELS: 768 * 28 * 28,
  FPS: 2,
  FPS_MIN_FRAMES: 4,
  FPS_MAX_FRAMES: 512,
  VIDEO_TOTAL_PIXELS: 65536 * 28 * 28,
};

// 按因子取整工具
function byFactor(number, factor, mode = 'round') {
  if (mode === 'ceil') return Math.ceil(number / factor) * factor;
  if (mode === 'floor') return Math.floor(number / factor) * factor;
  return Math.round(number / factor) * factor;
}

// 计算抽帧数
function smartNFrames(ele, totalFrames, frameRate) {
  const fps = ele.fps || CONFIG.FPS;
  const minFrames = byFactor(ele.min_frames || CONFIG.FPS_MIN_FRAMES, CONFIG.FRAME_FACTOR, 'ceil');
  const maxFrames = byFactor(
    ele.max_frames || Math.min(CONFIG.FPS_MAX_FRAMES, totalFrames),
    CONFIG.FRAME_FACTOR,
    'floor'
  );
  const duration = frameRate ? totalFrames / frameRate : 0;
  let totalFramesAdjusted = duration % 1 > (1 / fps)
    ? Math.ceil(duration * frameRate)
    : Math.ceil(Math.floor(duration) * frameRate);
  const nframes = (totalFramesAdjusted / frameRate) * fps;
  const finalNFrames = Math.min(
    Math.max(nframes, minFrames),
    Math.min(maxFrames, totalFramesAdjusted)
  );
  if (finalNFrames < CONFIG.FRAME_FACTOR || finalNFrames > totalFramesAdjusted) {
    throw new Error(`帧数应在 ${CONFIG.FRAME_FACTOR} ~ ${totalFramesAdjusted} 之间, 当前: ${finalNFrames}`);
  }
  return Math.floor(finalNFrames);
}

// 智能调整分辨率
async function smartResize(ele, videoInfo) {
  const { height, width, totalFrames, frameRate } = videoInfo;
  const minPixels = CONFIG.VIDEO_MIN_PIXELS;
  const nframes = smartNFrames(ele, totalFrames, frameRate);
  const maxPixels = Math.max(
    Math.min(CONFIG.VIDEO_MAX_PIXELS, CONFIG.VIDEO_TOTAL_PIXELS / nframes * CONFIG.FRAME_FACTOR),
    Math.floor(minPixels * 1.05)
  );
  const ratio = Math.max(height, width) / Math.min(height, width);
  if (ratio > CONFIG.MAX_RATIO) throw new Error(`宽高比 ${ratio} 超过限制 ${CONFIG.MAX_RATIO}`);
  let hBar = Math.max(CONFIG.IMAGE_FACTOR, byFactor(height, CONFIG.IMAGE_FACTOR));
  let wBar = Math.max(CONFIG.IMAGE_FACTOR, byFactor(width, CONFIG.IMAGE_FACTOR));
  if (hBar * wBar > maxPixels) {
    const beta = Math.sqrt((height * width) / maxPixels);
    hBar = byFactor(height / beta, CONFIG.IMAGE_FACTOR, 'floor');
    wBar = byFactor(width / beta, CONFIG.IMAGE_FACTOR, 'floor');
  } else if (hBar * wBar < minPixels) {
    const beta = Math.sqrt(minPixels / (height * width));
    hBar = byFactor(height * beta, CONFIG.IMAGE_FACTOR, 'ceil');
    wBar = byFactor(width * beta, CONFIG.IMAGE_FACTOR, 'ceil');
  }
  return { hBar, wBar };
}

// 计算 Token 数量
async function tokenCalculate(videoPath) {
  const messages = [{ content: [{ video: videoPath, FPS: CONFIG.FPS }] }];
  const visionInfos = extractVisionInfo(messages);
  const videoInfo = await getVideoInfo(videoPath);
  const { hBar, wBar } = await smartResize(visionInfos[0], videoInfo);
  const { totalFrames, frameRate } = videoInfo;
  const numFrames = smartNFrames(visionInfos[0], totalFrames, frameRate);
  const videoToken = Math.ceil(numFrames / 2) * Math.floor(hBar / 28) * Math.floor(wBar / 28) + 2;
  return videoToken;
}

// 解析视觉信息
function extractVisionInfo(conversations) {
  const visionInfos = [];
  if (!Array.isArray(conversations)) conversations = [conversations];
  conversations.forEach(conversation => {
    if (!Array.isArray(conversation)) conversation = [conversation];
    conversation.forEach(message => {
      if (Array.isArray(message.content)) {
        message.content.forEach(ele => {
          if (ele.image || ele.image_url || ele.video || ['image', 'image_url', 'video'].includes(ele.type)) {
            visionInfos.push(ele);
          }
        });
      }
    });
  });
  return visionInfos;
}

async function main() {
  try {
    const videoPath = "spring_mountain.mp4"; // 替换为本地路径
    const videoToken = await tokenCalculate(videoPath);
    console.log('视频 tokens:', videoToken);
  } catch (error) {
    console.error('Error:', error.message);
  }
}

main();

audio_tokens
每1秒的音频对应25个 Token 。若音频时长不足1秒，则按25个 Token 计算。

使用方法

输入

支持的输入模态

支持以下输入组合：

文本输入
图片+文本输入
音频+文本输入
视频（包括图像列表与视频文件形式）+文本输入

无法在一个 User Message中输入多种非文本模态的数据。

输入多模态数据的方式

输入的图片、音频、视频文件支持 Base64 编码与公网 URL 进行传入。以下示例代码均以传入公网 URL 为例，如果需要传入 Base64 编码，请参见输入 Base64 编码的本地文件。

输出

当前仅支持以流式输出的形式调用 Qwen-Omni 模型。

支持的输出模态

输出可以包含文本与音频数据，您可以通过modalities参数控制。

输出模态	`modalities`参数值	回复风格

输出模态

modalities参数值

回复风格

文本

["text"]（默认值）

比较书面化，回复内容较为正式。

文本+音频

["text","audio"]

比较口语化，回复内容包含语气词，会引导用户与其进一步交流。

输出模态包括音频时不支持设定 System Message。

输出的音频为 Base64 编码数据，您需要在接收后进行解码，解码方法请参见解析输出的Base64 编码的音频数据。

支持输出的音频语言

当前输出音频仅支持汉语（普通话）和英语。

支持的音频音色

输出音频的音色与文件格式（只支持设定为"wav"）通过audio参数来配置，如：audio={"voice": "Cherry", "format": "wav"}，其中商业版模型voice参数可选值为：["Cherry", "Serena", "Ethan", "Chelsie"]，开源版模型voice参数可选值为：["Ethan", "Chelsie"]。

音色名称	音色效果

音色名称	音色效果
Cherry 不支持开源版模型。
Serena 不支持开源版模型。
Ethan
Chelsie

开始使用

前提条件

Qwen-Omni 系列模型仅支持 OpenAI 兼容方式调用。您需要已获取API Key并配置API Key到环境变量。如果通过 OpenAI SDK 调用，需要安装SDK（建议参考该文档安装最新SDK，否则可能运行失败）。

OpenAI Python SDK 最低版本为 1.52.0， Node.js SDK 最低版本为 4.68.0。

文本输入

Qwen-Omni 模型支持接收纯文本作为输入。当前只支持以流式输出的方式进行调用。

OpenAI 兼容

Python

Node.js

curl

Python

        

          
      
        
import os
from openai import OpenAI

client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-omni-turbo",
    messages=[{"role": "user", "content": "你是谁"}],
    # 设置输出数据的模态，当前支持两种：["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream 必须设置为 True，否则会报错
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)

Node.js

        

          
      
        
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx",
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen-omni-turbo",
    messages: [
        { role: "user", content: "你是谁？" }
    ],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

curl

        

          
      
        
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-omni-turbo",
    "messages": [
        {
            "role": "user", 
            "content": "你是谁？"
        }
    ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    },
    "modalities":["text","audio"],
    "audio":{"voice":"Cherry","format":"wav"}
}'

图片+文本输入

Qwen-Omni 模型支持传入多张图片。对输入图片的要求如下：

单个图片文件的大小不超过10 MB;
图片数量受模型图文总 Token 上限（即最大输入）的限制，所有图片的总 Token 数必须小于模型的最大输入;
图片的宽度和高度均应大于10像素，宽高比不应超过200:1或1:200。

支持输入的图片类型请参见支持的图像。

当前只支持以流式输出的方式进行调用。

以下示例代码以传入图片公网 URL 为例，传入本地图片请参见：输入 Base64 编码的本地文件。

OpenAI 兼容

Python

Node.js

curl

Python

        

          
      
        
import os
from openai import OpenAI

client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-omni-turbo",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
                    },
                },
                {"type": "text", "text": "图中描绘的是什么景象？"},
            ],
        },
    ],
    # 设置输出数据的模态，当前支持两种：["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Chelsie", "format": "wav"},
    # stream 必须设置为 True，否则会报错
    stream=True,
    stream_options={
        "include_usage": True
    }
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)

Node.js

        

          
      
        
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx",
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen-omni-turbo",  //模型列表：https://help.aliyun.com/zh/model-studio/getting-started/models
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "image_url",
                "image_url": { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg" },
            },
            { "type": "text", "text": "图中描绘的是什么景象？" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

curl

        

          
      
        
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-omni-turbo",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
          }
        },
        {
          "type": "text",
          "text": "图中描绘的是什么景象？"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    },
    "modalities":["text","audio"],
    "audio":{"voice":"Cherry","format":"wav"}
}'

音频+文本输入

只可以输入一个音频文件，大小不能超过 10 MB，时长最长 3 分钟。当前只支持以流式输出的方式进行调用。

以下示例代码以传入音频公网 URL 为例，传入本地音频请参见：输入 Base64 编码的本地文件。

OpenAI 兼容

Python

Node.js

curl

Python

        

          
      
        
import os
from openai import OpenAI

client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-omni-turbo-0119",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
                        "format": "wav",
                    },
                },
                {"type": "text", "text": "这段音频在说什么"},
            ],
        },
    ],
    # 设置输出数据的模态，当前支持两种：["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream 必须设置为 True，否则会报错
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    print(chunk)
    # if chunk.choices:
    #     print(chunk.choices[0].delta)
    # else:
    #     print(chunk.usage)

Node.js

        

          
      
        
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx",
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen-omni-turbo",  //模型列表：https://help.aliyun.com/zh/model-studio/getting-started/models
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "input_audio",
                "input_audio": { "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav", "format": "wav" },
            },
            { "type": "text", "text": "这段音频在说什么" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

curl

        

          
      
        
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-omni-turbo",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_audio",
          "input_audio": {
            "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
            "format": "wav"
          }
        },
        {
          "type": "text",
          "text": "这段音频在说什么"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    },
    "modalities":["text","audio"],
    "audio":{"voice":"Cherry","format":"wav"}
}'

视频+文本输入

视频的传入方式可以为图片列表形式或视频文件形式（可理解视频中的音频）。当前只支持以流式输出的方式进行调用。

图片列表形式
最少传入4张图片，最多可传入80张图片。
视频文件形式
视频文件只能输入一个，大小限制为 150 MB，时长限制为 40s。
视频文件中的视觉信息与音频信息会分开计费。

以下示例代码以传入视频公网 URL 为例，传入本地视频请参见：输入 Base64 编码的本地文件。

图片列表形式

OpenAI 兼容

Python

Node.js

curl

Python

        

          
      
        
import os
from openai import OpenAI

client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-omni-turbo",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "video",
                    "video": [
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg",
                    ],
                },
                {"type": "text", "text": "描述这个视频的具体过程"},
            ],
        }
    ],
    # 设置输出数据的模态，当前支持两种：["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Chelsie", "format": "wav"},
    # stream 必须设置为 True，否则会报错
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)

Node.js

        

          
      
        
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx",
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen-omni-turbo",  //模型列表：https://help.aliyun.com/zh/model-studio/getting-started/models
    messages: [{
        role: "user",
        content: [
            {
                type: "video",
                video: [
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
                ]
            },
            {
                type: "text",
                text: "描述这个视频的具体过程"
            }
        ]
    }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Chelsie", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

curl

        

          
      
        
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-omni-turbo",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "video",
                    "video": [
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
                    ]
                },
                {
                    "type": "text",
                    "text": "描述这个视频的具体过程"
                }
            ]
        }
    ],
    "stream": true,
    "stream_options": {
        "include_usage": true
    },
    "modalities": ["text", "audio"],
    "audio": {
        "voice": "Cherry",
        "format": "wav"
    }
}'

视频文件形式（可理解视频中的音频）

OpenAI 兼容

Python

Node.js

curl

Python

        

          
      
        
import os
from openai import OpenAI

client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-omni-turbo",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "video_url",
                    "video_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
                    },
                },
                {"type": "text", "text": "视频的内容是什么?"},
            ],
        },
    ],
    # 设置输出数据的模态，当前支持两种：["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream 必须设置为 True，否则会报错
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)

Node.js

        

          
      
        
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx",
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen-omni-turbo",  //模型列表：https://help.aliyun.com/zh/model-studio/getting-started/models
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "video_url",
                "video_url": { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4" },
            },
            { "type": "text", "text": "视频的内容是什么?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});


for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

curl

        

          
      
        
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-omni-turbo",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "video_url",
          "video_url": {
            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
          }
        },
        {
          "type": "text",
          "text": "视频的内容是什么"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options": {
        "include_usage": true
    },
    "modalities":["text","audio"],
    "audio":{"voice":"Cherry","format":"wav"}
}'

多轮对话

您在使用 Qwen-Omni 模型的多轮对话功能时，需要注意：

Assistant Message
添加到 messages 数组中的 Assistant Message 只可以包含文本数据。
User Message
一条 User Message 只可以包含文本和一种模态的数据，在多轮对话中您可以在不同的 User Message 中输入不同模态的数据。

OpenAI 兼容

Python

Node.js

curl

Python

        

          
      
        
import os
from openai import OpenAI

client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-omni-turbo",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3",
                        "format": "mp3",
                    },
                },
                {"type": "text", "text": "这段音频在说什么"},
            ],
        },
        {
            "role": "assistant",
            "content": [{"type": "text", "text": "这段音频在说：欢迎使用阿里云"}],
        },
        {
            "role": "user",
            "content": [{"type": "text", "text": "介绍一下这家公司？"}],
        },
    ],
    # 设置输出数据的模态，当前支持两种：["text","audio"]、["text"]
    modalities=["text"],
    # stream 必须设置为 True，否则会报错
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)

Node.js

        

          
      
        
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx",
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen-omni-turbo",  //模型列表：https://help.aliyun.com/zh/model-studio/getting-started/models
    messages: [
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3",
                        "format": "mp3",
                    },
                },
                { "type": "text", "text": "这段音频在说什么" },
            ],
        },
        {
            "role": "assistant",
            "content": [{ "type": "text", "text": "这段音频在说：欢迎使用阿里云" }],
        },
        {
            "role": "user",
            "content": [{ "type": "text", "text": "介绍一下这家公司？" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text"]
});


for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

curl

        

          
      
        
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "qwen-omni-turbo",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_audio",
          "input_audio": {
            "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
          }
        },
        {
          "type": "text",
          "text": "这段音频在说什么"
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "这段音频在说：欢迎使用阿里云"
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "介绍一下这家公司？"
        }
      ]
    }
  ],
  "stream": true,
  "stream_options": {
    "include_usage": true
  },
  "modalities": ["text"]
}'

解析输出的Base64 编码的音频数据

Qwen-Omni 模型输出的音频为流式输出的 Base64 编码数据。您可以在模型生成过程中维护一个字符串变量，将每个返回片段的 Base64 编码添加到字符串变量后，待生成结束后进行 Base64 解码，得到音频文件；也可以将每个返回片段的 Base64 编码数据实时解码并播放。

Python

Node.js

Python

        

          
      
        
# Installation instructions for pyaudio:
# APPLE Mac OS X
#   brew install portaudio
#   pip install pyaudio
# Debian/Ubuntu
#   sudo apt-get install python-pyaudio python3-pyaudio
#   or
#   pip install pyaudio
# CentOS
#   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
#   python -m pip install pyaudio

import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf

client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-omni-turbo",
    messages=[{"role": "user", "content": "你是谁"}],
    # 设置输出数据的模态，当前支持两种：["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream 必须设置为 True，否则会报错
    stream=True,
    stream_options={"include_usage": True},
)

# 方式1: 待生成结束后再进行解码
audio_string = ""
for chunk in completion:
    if chunk.choices:
        if hasattr(chunk.choices[0].delta, "audio"):
            try:
                audio_string += chunk.choices[0].delta.audio["data"]
            except Exception as e:
                print(chunk.choices[0].delta.audio["transcript"])
    else:
        print(chunk.usage)

wav_bytes = base64.b64decode(audio_string)
audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
sf.write("audio_assistant_py.wav", audio_np, samplerate=24000)

# 方式2: 边生成边解码(使用方式2请将方式1的代码进行注释)
# # 初始化 PyAudio
# import pyaudio
# import time
# p = pyaudio.PyAudio()
# # 创建音频流
# stream = p.open(format=pyaudio.paInt16,
#                 channels=1,
#                 rate=24000,
#                 output=True)

# for chunk in completion:
#     if chunk.choices:
#         if hasattr(chunk.choices[0].delta, "audio"):
#             try:
#                 audio_string = chunk.choices[0].delta.audio["data"]
#                 wav_bytes = base64.b64decode(audio_string)
#                 audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
#                 # 直接播放音频数据
#                 stream.write(audio_np.tobytes())
#             except Exception as e:
#                 print(chunk.choices[0].delta.audio["transcript"])

# time.sleep(0.8)
# # 清理资源
# stream.stop_stream()
# stream.close()
# p.terminate()

Node.js

        

          
      
        
// 运行前的准备工作:
// Windows/Mac/Linux 通用:
// 1. 确保已安装 Node.js (建议版本 >= 14)
// 2. 运行以下命令安装必要的依赖:
//    npm install openai wav
// 
// 如果要使用实时播放功能 (方式2), 还需要:
// Windows:
//    npm install speaker
// Mac:
//    brew install portaudio
//    npm install speaker
// Linux (Ubuntu/Debian):
//    sudo apt-get install libasound2-dev
//    npm install speaker

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx",
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen-omni-turbo",  //模型列表：https://help.aliyun.com/zh/model-studio/getting-started/models
    messages: [
        {
            "role": "user",
            "content": "你是谁？"
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

// 方式1: 待生成结束后再进行解码
// 需要安装: npm install wav
import { createWriteStream } from 'node:fs';  // node:fs 是 Node.js 内置模块，无需安装
import { Writer } from 'wav';

async function convertAudio(audioString, audioPath) {
    try {
        // 解码Base64字符串为Buffer
        const wavBuffer = Buffer.from(audioString, 'base64');
        // 创建WAV文件写入流
        const writer = new Writer({
            sampleRate: 24000,  // 采样率
            channels: 1,        // 单声道
            bitDepth: 16        // 16位深度
        });
        // 创建输出文件流并建立管道连接
        const outputStream = createWriteStream(audioPath);
        writer.pipe(outputStream);

        // 写入PCM数据并结束写入
        writer.write(wavBuffer);
        writer.end();

        // 使用Promise等待文件写入完成
        await new Promise((resolve, reject) => {
            outputStream.on('finish', resolve);
            outputStream.on('error', reject);
        });

        // 添加额外等待时间确保音频完整
        await new Promise(resolve => setTimeout(resolve, 800));

        console.log(`音频文件已成功保存为 ${audioPath}`);
    } catch (error) {
        console.error('处理过程中发生错误:', error);
    }
}

let audioString = "";
for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        if (chunk.choices[0].delta.audio) {
            if (chunk.choices[0].delta.audio["data"]) {
                audioString += chunk.choices[0].delta.audio["data"];
            }
        }
    } else {
        console.log(chunk.usage);
    }
}
// 执行转换
convertAudio(audioString, "audio_assistant_mjs.wav");


// 方式2: 边生成边实时播放
// 需要先按照上方系统对应的说明安装必要组件
// import Speaker from 'speaker'; // 引入音频播放库

// // 创建扬声器实例（配置与 WAV 文件参数一致）
// const speaker = new Speaker({
//     sampleRate: 24000,  // 采样率
//     channels: 1,        // 声道数
//     bitDepth: 16,       // 位深
//     signed: true        // 有符号 PCM
// });
// for await (const chunk of completion) {
//     if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
//         if (chunk.choices[0].delta.audio) {
//             if (chunk.choices[0].delta.audio["data"]) {
//                 const pcmBuffer = Buffer.from(chunk.choices[0].delta.audio.data, 'base64');
//                 // 直接写入扬声器播放
//                 speaker.write(pcmBuffer);
//             }
//         }
//     } else {
//         console.log(chunk.usage);
//     }
// }
// speaker.on('finish', () => console.log('播放完成'));
// speaker.end(); // 根据实际 API 流结束情况调用

输入 Base64 编码的本地文件

图片

音频

视频

以保存在本地的eagle.png为例。

Python

Node.js

Python

        

          
      
        
import os
from openai import OpenAI
import base64

client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)


#  Base64 编码格式
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


base64_image = encode_image("eagle.png")

completion = client.chat.completions.create(
    model="qwen-omni-turbo",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{base64_image}"},
                },
                {"type": "text", "text": "图中描绘的是什么景象？"},
            ],
        },
    ],
    # 设置输出数据的模态，当前支持两种：["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream 必须设置为 True，否则会报错
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)

Node.js

        

          
      
        
import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx",
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeImage = (imagePath) => {
    const imageFile = readFileSync(imagePath);
    return imageFile.toString('base64');
};
const base64Image = encodeImage("eagle.png")

const completion = await openai.chat.completions.create({
    model: "qwen-omni-turbo",  //模型列表：https://help.aliyun.com/zh/model-studio/getting-started/models
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "image_url",
                "image_url": { "url": `data:image/png;base64,${base64Image}` },
            },
            { "type": "text", "text": "图中描绘的是什么景象？" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

以保存在本地的welcome.mp3为例。

Python

Node.js

Python

        

          
      
        
import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf
import requests

client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)


def encode_audio(audio_path):
    with open(audio_path, "rb") as audio_file:
        return base64.b64encode(audio_file.read()).decode("utf-8")


base64_audio = encode_audio("welcome.mp3")

completion = client.chat.completions.create(
    model="qwen-omni-turbo",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": f"data:;base64,{base64_audio}",
                        "format": "mp3",
                    },
                },
                {"type": "text", "text": "这段音频在说什么"},
            ],
        },
    ],
    # 设置输出数据的模态，当前支持两种：["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream 必须设置为 True，否则会报错
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)

Node.js

        

          
      
        
import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx",
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeAudio = (audioPath) => {
    const audioFile = readFileSync(audioPath);
    return audioFile.toString('base64');
};
const base64Audio = encodeAudio("welcome.mp3")

const completion = await openai.chat.completions.create({
    model: "qwen-omni-turbo",  //模型列表：https://help.aliyun.com/zh/model-studio/getting-started/models
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "input_audio",
                "input_audio": { "data": `data:;base64,${base64Audio}`, "format": "mp3" },
            },
            { "type": "text", "text": "这段音频在说什么" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

视频文件

图片列表

以保存在本地的spring_mountain.mp4为例。

Python

Node.js

Python

        

          
      
        
import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf

client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)


#  Base64 编码格式
def encode_video(video_path):
    with open(video_path, "rb") as video_file:
        return base64.b64encode(video_file.read()).decode("utf-8")


base64_video = encode_video("spring_mountain.mp4")

completion = client.chat.completions.create(
    model="qwen-omni-turbo",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "video_url",
                    "video_url": {"url": f"data:;base64,{base64_video}"},
                },
                {"type": "text", "text": "她在唱什么"},
            ],
        },
    ],
    # 设置输出数据的模态，当前支持两种：["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream 必须设置为 True，否则会报错
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)

Node.js

        

          
      
        
import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx",
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeVideo = (videoPath) => {
    const videoFile = readFileSync(videoPath);
    return videoFile.toString('base64');
};
const base64Video = encodeVideo("spring_mountain.mp4")

const completion = await openai.chat.completions.create({
    model: "qwen-omni-turbo",  //模型列表：https://help.aliyun.com/zh/model-studio/getting-started/models
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "video_url",
                "video_url": { "url": `data:;base64,${base64Video}` },
            },
            { "type": "text", "text": "她在唱什么" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

以保存在本地的football1.jpg、football2.jpg、football3.jpg与football4.jpg为例。

Python

Node.js

Python

        

          
      
        
import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf

client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)


#  Base64 编码格式
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


base64_image_1 = encode_image("football1.jpg")
base64_image_2 = encode_image("football2.jpg")
base64_image_3 = encode_image("football3.jpg")
base64_image_4 = encode_image("football4.jpg")

completion = client.chat.completions.create(
    model="qwen-omni-turbo",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "video",
                    "video": [
                        f"data:image/jpeg;base64,{base64_image_1}",
                        f"data:image/jpeg;base64,{base64_image_2}",
                        f"data:image/jpeg;base64,{base64_image_3}",
                        f"data:image/jpeg;base64,{base64_image_4}",
                    ],
                },
                {"type": "text", "text": "描述这个视频的具体过程"},
            ],
        }
    ],
    # 设置输出数据的模态，当前支持两种：["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream 必须设置为 True，否则会报错
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)

Node.js

        

          
      
        
import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx",
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeImage = (imagePath) => {
    const imageFile = readFileSync(imagePath);
    return imageFile.toString('base64');
  };
const base64Image1 = encodeImage("football1.jpg")
const base64Image2 = encodeImage("football2.jpg")
const base64Image3 = encodeImage("football3.jpg")
const base64Image4 = encodeImage("football4.jpg")

const completion = await openai.chat.completions.create({
    model: "qwen-omni-turbo",  //模型列表：https://help.aliyun.com/zh/model-studio/getting-started/models
    messages: [{
        role: "user",
        content: [
            {
                type: "video",
                video: [
                    `data:image/jpeg;base64,${base64Image1}`,
                    `data:image/jpeg;base64,${base64Image2}`,
                    `data:image/jpeg;base64,${base64Image3}`,
                    `data:image/jpeg;base64,${base64Image4}`
                ]
            },
            {
                type: "text",
                text: "描述这个视频的具体过程"
            }
        ]
    }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

常见问题

Q：如何给 Qwen-Omni 模型设置角色？

A：Qwen-Omni 模型在输出模态包括音频时不支持设定 System Message，即使您在 System Message 中说明：“你是XXX...”等角色信息，Qwen-Omni 的自我认知依然会是通义千问。您可以在 messages 数组的开头手动添加用于角色设定的 User Message 和 Assistant Message，达到给 Qwen-Omni 模型设置角色的效果。请参见以下代码：

OpenAI 兼容

Python

Node.js

curl

Python

        

          
      
        
import os
from openai import OpenAI

client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-omni-turbo",
    messages=[
        {"role": "user", "content": "你是一个商场的导购员，你负责的商品有手机、电脑、冰箱"},
        {"role": "assistant", "content": "好的，我记住了你的设定。"},
        {"role": "user", "content": "你是谁"},
    ],
    # 设置输出数据的模态，当前支持两种：["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream 必须设置为 True，否则会报错
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)

Node.js

        

          
      
        
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx",
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen-omni-turbo",
    messages: [
        { role: "user", content: "你是一个商场的导购员，你负责的商品有手机、电脑、冰箱" },
        { role: "assistant", content: "好的，我记住了你的设定。" },
        { role: "user", content: "你是谁？" }
    ],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

curl

        

          
      
        
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-omni-turbo",
    "messages": [
        {
            "role": "user", 
            "content": "你是一个商场的导购员，你负责的商品有手机、电脑、冰箱"
        },
        {
            "role": "assistant", 
            "content": "好的，我记住了你的设定。"
        },
        {
            "role": "user", 
            "content": "你是谁？"
        }
    ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    },
    "modalities":["text","audio"],
    "audio":{"voice":"Cherry","format":"wav"}
}'

错误码

如果模型调用失败并返回报错信息，请参见错误信息进行解决。