如何使用Qwen-VL模型_大模型服务平台百炼(Model Studio)-阿里云帮助中心

通义千问VL模型可以根据您传入的图片或视频来进行回答。

在线体验：视觉模型（北京或新加坡）

应用场景

图像问答：描述图像中的内容或者对其进行分类打标，如识别人物、地点、动植物等。
数学题目解答：解答图像中的数学问题，适用于中小学、大学以及成人教育阶段。
视频理解：分析视频内容，如对具体事件进行定位并获取时间戳，或生成关键时间段的摘要。
物体定位：支持二维和三维定位，可用于判断物体方位、视角变化、遮挡关系。
文档解析：将图像类的文档（如扫描件/图片PDF）解析为 QwenVL HTML或 Markdown格式，该格式不仅能精准识别文本，还能获取图像、表格等元素的位置信息。
视觉编码：可通过图像或视频生成代码，可用于将设计图、网站截图等生成HTML、CSS、JS 代码。
文字识别与信息抽取：识别图像中的文字、公式，或者抽取票据、证件、表单中的信息，支持格式化输出文本；Qwen3-VL模型支持的语言已增加至33种。
支持的语言
Qwen3-VL模型：支持的语言共33种，分别为中文、日语、韩语、印尼语、越南语、泰语、英语、法语、德语、俄语、葡萄牙语、西班牙语、意大利语、瑞典语、丹麦语、捷克语、挪威语、荷兰语、芬兰语、土耳其语、波兰语、斯瓦希里语、罗马尼亚语、塞尔维亚语、希腊语、哈萨克语、乌兹别克语、宿务语、阿拉伯语、乌尔都语、波斯语、印地语 / 天城语、希伯来语。
Qwen2.5-VL模型：支持的语言共11种，分别为中文、英语、日语、韩语、阿拉伯语、越南语、法语、德语、意大利语、西班牙语和俄语。

为提高模型效果，建议您根据实际的业务需求选择应用示例的推荐提示词。

模型列表与计费

阿里云百炼平台提供了商业版和开源版两种模型；相对于开源版，商业版模型会持续更新和升级，具有最新的能力。

qwen-vl-max、qwen-vl-plus模型已支持上下文缓存（ Context Cache ）和结构化输出。

中国大陆（北京）

商业版模型

模型名称	版本	模式	上下文长度	最大输入	最长思维链	最大输出	输入成本	输出成本思维链+输出	免费额度（注）
			（Token数）				（每千Token）
qwen3-vl-plus 当前与qwen3-vl-plus-2025-09-23能力相同 Batch调用半价	稳定版	思考	262,144	258,048 单图最大16384	81,920	32,768	阶梯计价，请参见表格下方说明。		各100万Token 有效期：百炼开通后90天内
		非思考		260,096 单图最大16384	-
qwen3-vl-plus-2025-09-23	快照版	思考		258,048 单图最大16384	81,920
		非思考		260,096 单图最大16384	-
qwen3-vl-flash 当前与qwen3-vl-flash-2025-10-15能力相同 Batch调用半价	稳定版	思考		258,048 单图最大16384	81,920
		非思考		260,096 单图最大16384	-
qwen3-vl-flash-2025-10-15	快照版	思考		258,048 单图最大16384	81,920
		非思考		260,096 单图最大16384	-

以上模型根据本次请求输入的 Token数，采取阶梯计费。思考模式与非思考模式的输入输出价格相同。

qwen3-vl-plus系列

单次请求的输入Token数	输入价格（每千Token）	输出价格（每千Token）
0<Token≤32K	0.001元	0.01元
32K<Token≤128K	0.0015元	0.015元
128K<Token≤256K	0.003元	0.03元

qwen3-vl-flash系列

单次请求的输入Token数	输入价格（每千Token）	输出价格（每千Token）
0<Token≤32K	0.00015元	0.0015元
32K<Token≤128K	0.0003元	0.003元
128K<Token≤256K	0.0006元	0.006元

更多模型

通义千问VL-Max系列

模型名称	版本	上下文长度	最大输入	最大输出	输入成本	输出成本	免费额度（注）
		（Token数）			（每千Token）
qwen-vl-max 相比qwen-vl-plus再次提升视觉推理和指令遵循能力，在更多复杂任务中提供最佳性能。当前与qwen-vl-max-2025-08-13能力相同	稳定版	131,072	129,024 单图最大16384	8,192	0.0016元 Batch调用半价	0.004元 Batch调用半价	各100万Token 有效期：百炼开通后90天内
qwen-vl-max-latest 始终与最新快照版能力相同	最新版
qwen-vl-max-2025-08-13 又称qwen-vl-max-0813 视觉理解指标全面提升，数学、推理、物体识别、多语言处理能力显著增强。	快照版				0.0016元	0.004元
qwen-vl-max-2025-04-08 又称qwen-vl-max-0408 增强数学和推理能力					0.003元	0.009元
qwen-vl-max-2025-04-02 又称qwen-vl-max-0402 显著提高解决复杂数学问题的准确性
qwen-vl-max-2025-01-25 又称qwen-vl-max-0125 升级至Qwen2.5-VL系列，扩展上下文至128k，显著增强图像和视频的理解能力
qwen-vl-max-2024-12-30 又称qwen-vl-max-1230		32,768	30,720 单图最大16384	2,048	0.003元	0.009元
qwen-vl-max-2024-11-19 又称qwen-vl-max-1119
qwen-vl-max-2024-10-30 又称qwen-vl-max-1030					0.02元
qwen-vl-max-2024-08-09 又称qwen-vl-max-0809

通义千问VL-Plus系列

模型名称	版本	上下文长度	最大输入	最大输出	输入成本	输出成本	免费额度（注）
		（Token数）			（每千Token）
qwen-vl-plus 当前与qwen-vl-plus-2025-08-15能力相同	稳定版	131,072	129,024 单图最大16384	8,192	0.0008元 Batch调用半价	0.002元 Batch调用半价	各100万Token 有效期：百炼开通后90天内
qwen-vl-plus-latest 始终与最新快照版能力相同	最新版
qwen-vl-plus-2025-08-15 又称qwen-vl-plus-0815 在物体识别与定位、多语言处理的能力上有显著提升	快照版				0.0008元	0.002元
qwen-vl-plus-2025-07-10 又称qwen-vl-plus-0710 进一步提升监控视频内容的理解能力		32,768	30,720 单图最大16384		0.00015元	0.0015元
qwen-vl-plus-2025-05-07 又称qwen-vl-plus-0507 显著提升数学、推理、监控视频内容的理解能力		131,072	129,024 单图最大16384		0.0015元	0.0045元
qwen-vl-plus-2025-01-25 又称qwen-vl-plus-0125 升级至Qwen2.5-VL系列，扩展上下文至128k，显著增强图像和视频理解能力
qwen-vl-plus-2025-01-02 又称qwen-vl-plus-0102		32,768	30,720 单图最大16384	2,048	0.0015元	0.0045元
qwen-vl-plus-2024-08-09 又称qwen-vl-plus-0809

开源版模型

模型名称	模式	上下文长度	最大输入	最大思维链长度	最大回复长度	输入成本	输出成本思维链+输出	免费额度（注）
模型名称	模式	（Token数）				（每千Token）		免费额度（注）
qwen3-vl-235b-a22b-thinking	仅思考模式	131,072	126,976	81,920	32,768	0.002元	0.02元	各100万 Token 有效期：百炼开通后90天内
qwen3-vl-235b-a22b-instruct	仅非思考模式		129,024	-		0.002元	0.008元
qwen3-vl-32b-thinking	仅思考模式		126,976	81,920		0.002元	0.02元
qwen3-vl-32b-instruct	仅非思考模式		129,024	-		0.002元	0.008元
qwen3-vl-30b-a3b-thinking	仅思考模式		126,976	81,920		0.00075元	0.0075元
qwen3-vl-30b-a3b-instruct	仅非思考模式		129,024	-		0.00075元	0.003元
qwen3-vl-8b-thinking	仅思考模式		126,976	81,920		0.0005元	0.005元
qwen3-vl-8b-instruct	仅非思考模式		129,024	-		0.0005元	0.002元

更多模型

模型名称	上下文长度	最大输入	最大输出	输入成本	输出成本	免费额度（注）
	（Token数）			（每千Token）
qwen2.5-vl-72b-instruct	131,072	129,024 单图最大16384	8,192	0.016元	0.048元	各100万Token 有效期：百炼开通后90天内
qwen2.5-vl-32b-instruct				0.008元	0.024元
qwen2.5-vl-7b-instruct				0.002元	0.005元
qwen2.5-vl-3b-instruct				0.0012元	0.0036元
qwen2-vl-72b-instruct	32,768	30,720 单图最大16384	2,048	0.016元	0.048元
qwen2-vl-7b-instruct	32,000	30,000 单图最大16384	2,000	目前仅供免费体验。免费额度用完后不可调用，建议改用qwen-vl-max、qwen-vl-plus模型。		各10万Token 有效期：百炼开通后90天内
qwen2-vl-2b-instruct				限时免费
qwen-vl-v1	8,000	6,000 单图最大1280	1,500	目前仅供免费体验。免费额度用完后不可调用，建议改用qwen-vl-max、qwen-vl-plus模型。
qwen-vl-chat-v1

国际（新加坡）

商业版模型

模型名称	版本	模式	上下文长度	最大输入	最长思维链	最大输出	输入成本	输出成本思维链+输出	免费额度（注）
			（Token数）				（每千Token）
qwen3-vl-plus 当前与qwen3-vl-plus-2025-09-23能力相同	稳定版	思考	262,144	258,048 单图最大16384	81,920	32,768	阶梯计价，请参见表格下方说明。		无免费额度
		非思考		260,096 单图最大16384	-
qwen3-vl-plus-2025-09-23	快照版	思考		258,048 单图最大16384	81,920
		非思考		260,096 单图最大16384	-
qwen3-vl-flash 当前与qwen3-vl-flash-2025-10-15能力相同	稳定版	思考		258,048 单图最大16384	81,920
		非思考		260,096 单图最大16384	-
qwen3-vl-flash-2025-10-15	快照版	思考		258,048 单图最大16384	81,920
		非思考		260,096 单图最大16384	-

以上模型根据本次请求输入的 Token数，采取阶梯计费。思考模式与非思考模式的输入输出价格相同。

qwen3-vl-plus系列

单次请求的输入Token数	输入价格（每千Token）	输出价格（每千Token）
0<Token≤32K	0.001468元	0.011743元
32K<Token≤128K	0.002202元	0.017614 元
128K<Token≤256K	0.004404元	0.035228元

qwen3-vl-flash系列

单次请求的输入Token数	输入价格（每千Token）	输出价格（每千Token）
0<Token≤32K	0.000367元	0.002936元
32K<Token≤128K	0.00055元	0.004404元
128K<Token≤256K	0.000881元	0.007046元

更多模型

通义千问VL-Max系列

模型名称	版本	上下文长度	最大输入	最大输出	输入成本	输出成本	免费额度（注）
		（Token数）			（每千Token）
qwen-vl-max 相比qwen-vl-plus再次提升视觉推理和指令遵循能力，在更多复杂任务中提供最佳性能。当前与qwen-vl-max-2025-08-13能力相同	稳定版	131,072	129,024 单图最大16384	8,192	0.005871元 Batch调用半价	0.023486 Batch调用半价	无免费额度
qwen-vl-max-latest 始终与最新快照版能力相同	最新版				0.005871元	0.023486元
qwen-vl-max-2025-08-13 又称qwen-vl-max-0813 视觉理解指标全面提升，数学、推理、物体识别、多语言处理能力显著增强。	快照版
qwen-vl-max-2025-04-08 又称qwen-vl-max-0408 属于Qwen2.5-VL系列模型，扩展上下文至128k，显著增强数学和推理能力。

通义千问VL-Plus系列

模型名称	版本	上下文长度	最大输入	最大输出	输入成本	输出成本	免费额度（注）
		（Token数）			（每千Token）
qwen-vl-plus 当前与qwen-vl-plus-2025-08-15能力相同	稳定版	131,072	129,024 单图最大16384	8,192	0.001541元 Batch调用半价	0.004624元 Batch调用半价	无免费额度
qwen-vl-plus-latest 始终与最新快照版能力相同	最新版				0.001541元	0.004624元
qwen-vl-plus-2025-08-15 又称qwen-vl-plus-0815 在物体识别与定位、多语言处理的能力上有显著提升	快照版
qwen-vl-plus-2025-05-07 又称qwen-vl-plus-0507 显著提升数学、推理、监控视频内容的理解能力
qwen-vl-plus-2025-01-25 又称qwen-vl-plus-0125 属于Qwen2.5-VL系列模型，扩展上下文至128k，显著增强图像和视频的理解能力。

开源版模型

模型名称	模式	上下文长度	最大输入	最大思维链长度	最大回复长度	输入成本	输出成本思维链+输出	免费额度（注）
模型名称	模式	（Token数）				（每千Token）		免费额度（注）
qwen3-vl-235b-a22b-thinking	仅思考模式		126,976	81,920		0.005137元	0.061650元	无免费额度
qwen3-vl-235b-a22b-instruct	仅非思考模式		129,024	-		0.005137元	0.020550元
qwen3-vl-32b-thinking	仅思考模式	131,072	126,976	81,920	32,768	0.005137元	0.06165元
qwen3-vl-32b-instruct	仅非思考模式		129,024	-		0.005137元	0.02055元
qwen3-vl-30b-a3b-thinking	仅思考模式		126,976	81,920		0.001468元	0.017614元
qwen3-vl-30b-a3b-instruct	仅非思考模式		129,024	-		0.001468元	0.005871元
qwen3-vl-8b-thinking	仅思考模式		126,976	81,920		0.001321元	0.015412元
qwen3-vl-8b-instruct	仅非思考模式		129,024	-		0.001321元	0.005137元

更多模型

模型名称	上下文长度	最大输入	最大输出	输入成本	输出成本	免费额度（注）
	（Token数）			（每百万Token）
qwen2.5-vl-72b-instruct	131,072	129,024 单图最大16384	8,192	0.02055元	0.06165元	无免费额度
qwen2.5-vl-32b-instruct				0.010275元	0.030825元
qwen2.5-vl-7b-instruct				0.002569元	0.007706元
qwen2.5-vl-3b-instruct				0.001541元	0.004624元

图像与视频转换为Token的规则

图像

Qwen3-VL、qwen-vl-max-0813以后及qwen-vl-plus-0710以后更新的模型：每 32x32 像素对应一个Token，一张图最少需要4个Token。
其余模型：每28x28像素对应一个Token，一张图最少需要4个Token。

您可以通过以下代码估算图像的Token：

import math
# 使用以下命令安装Pillow库：pip install Pillow
from PIL import Image

def token_calculate(image_path):
    # 打开指定的PNG图片文件
    image = Image.open(image_path)

    # 获取图片的原始尺寸
    height = image.height
    width = image.width
    
    # Qwen3-VL、qwen-vl-max-0815、qwen-vl-max-0813以后和qwen-vl-plus-0815以后更新的模型：将宽高都调整为32的整数倍以后更新的模型：将宽高都调整为32的整数倍
    # 其余模型：将宽高都调整为28的整数倍
    h_bar = round(height / 32) * 32 
    w_bar = round(width / 32) * 32
    
    # 图像的Token下限：4个Token
    min_pixels = 32 * 32 * 4
    # 图像的Token上限：1280个Token
    max_pixels = 1280 * 32 * 32
        
    # 对图像进行缩放处理，调整像素的总数在范围[min_pixels,max_pixels]内
    if h_bar * w_bar > max_pixels:
        # 计算缩放因子beta，使得缩放后的图像总像素数不超过max_pixels
        beta = math.sqrt((height * width) / max_pixels)
        # 重新计算调整后的宽高，对于Qwen3-VL、qwen-vl-max-0815以后及qwen-vl-plus-0815以后更新的模型：将宽高都调整为32的整数倍，对于其他模型，确保为28的整数倍
        h_bar = math.floor(height / beta / 32) * 32
        w_bar = math.floor(width / beta / 32) * 32
    elif h_bar * w_bar < min_pixels:
        # 计算缩放因子beta，使得缩放后的图像总像素数不低于min_pixels
        beta = math.sqrt(min_pixels / (height * width))
        # 重新计算调整后的高度，对于Qwen3-VL、qwen-vl-max-0815以后及qwen-vl-plus-0815以后更新的模型，确保为32的整数倍，对于其他模型，确保为28的整数倍
        h_bar = math.ceil(height * beta / 32) * 32
        w_bar = math.ceil(width * beta / 32) * 32
    return h_bar, w_bar

# 将test.png替换为本地的图像路径
h_bar, w_bar = token_calculate("test.png")
print(f"缩放后的图像尺寸为：高度为{h_bar}，宽度为{w_bar}")

# 计算图像的Token数：对于Qwen3-VL、qwen-vl-max-0815以后及qwen-vl-plus-0815以后更新的模型，Token数 = 总像素除以32 * 32，对于其他模型，Token数 = 总像素除以28 * 28
token = int((h_bar * w_bar) / (32 * 32))

# 系统会自动添加<|vision_bos|>和<|vision_eos|>视觉标记（各计1个Token）
print(f"图像的Token数为{token + 2}")

视频

您可以通过以下代码估算视频的Token：

# 使用前安装：pip install opencv-python
import math
import os
import logging
import cv2

logger = logging.getLogger(__name__)

FRAME_FACTOR = 2
# 对于Qwen3-VL、qwen-vl-max-0813以后及qwen-vl-plus-0815qwen-vl-plus-0710，IMAGE_FACTOR为32
# 对于其他模型，IMAGE_FACTOR为28
IMAGE_FACTOR = 28
# 视频帧的长宽比
MAX_RATIO = 200

# 视频帧的 Token 下限
VIDEO_MIN_PIXELS = 128 * 32 * 32
# 视频帧的 Token 上限
VIDEO_MAX_PIXELS = 768 * 32 * 32

# 用户未传入FPS参数，则fps使用默认值
FPS = 2.0
# 最少抽取帧数
FPS_MIN_FRAMES = 4
# 最大抽取帧数，使用Qwen3-VL和Qwen2.5-vl模型时，请FPS_MAX_FRAMES将设置为512，其他模型则设置为80
FPS_MAX_FRAMES = 512

# 视频输入的最大像素值，
# 使用Qwen3-VL和Qwen2.5-vl模型时，请将VIDEO_TOTAL_PIXELS设置为65536 * 32 * 32和65536 * 28 * 28，其他模型则设置为24576 * 28 * 28
VIDEO_TOTAL_PIXELS = int(float(os.environ.get('VIDEO_MAX_PIXELS', 65536 * 28 * 28)))

def round_by_factor(number: int, factor: int) -> int:
    """返回与”number“最接近的整数，该整数可被”factor“整除。"""
    return round(number / factor) * factor

def ceil_by_factor(number: int, factor: int) -> int:
    """返回大于或等于“number”且可被“factor”整除的最小整数。"""
    return math.ceil(number / factor) * factor

def floor_by_factor(number: int, factor: int) -> int:
    """返回小于或等于“number”且可被“factor”整除的最大整数。"""
    return math.floor(number / factor) * factor

def smart_nframes(ele,total_frames,video_fps):
    """用于计算抽取的视频帧数。

    Args:
        ele (dict): 包含视频配置的字典格式
            - fps: fps用于控制提取模型输入帧的数量。
        total_frames (int): 视频的原始总帧数。
        video_fps (int | float): 视频的原始帧率

    Raises:
        nframes应该在[FRAME_FACTOR，total_frames]间隔内，否则会报错

    Returns:
        用于模型输入的视频帧数。
    """
    assert not ("fps" in ele and "nframes" in ele), "Only accept either `fps` or `nframes`"
    fps = ele.get("fps", FPS)
    min_frames = ceil_by_factor(ele.get("min_frames", FPS_MIN_FRAMES), FRAME_FACTOR)
    max_frames = floor_by_factor(ele.get("max_frames", min(FPS_MAX_FRAMES, total_frames)), FRAME_FACTOR)
    duration = total_frames / video_fps if video_fps != 0 else 0
    if duration-int(duration)>(1/fps):
        total_frames = math.ceil(duration * video_fps)
    else:
        total_frames = math.ceil(int(duration)*video_fps)
    nframes = total_frames / video_fps * fps
    if nframes > total_frames:
        logger.warning(f"smart_nframes: nframes[{nframes}] > total_frames[{total_frames}]")
    nframes = int(min(min(max(nframes, min_frames), max_frames), total_frames))
    if not (FRAME_FACTOR <= nframes and nframes <= total_frames):
        raise ValueError(f"nframes should in interval [{FRAME_FACTOR}, {total_frames}], but got {nframes}.")

    return nframes

def get_video(video_path):
    # 获取视频信息
    cap = cv2.VideoCapture(video_path)

    frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    # 获取视频高度
    frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

    video_fps = cap.get(cv2.CAP_PROP_FPS)
    return frame_height,frame_width,total_frames,video_fps

def smart_resize(ele,path,factor = IMAGE_FACTOR):
    # 获取原视频的宽和高
    height, width, total_frames, video_fps = get_video(path)
    # 视频帧的Token下限
    min_pixels = VIDEO_MIN_PIXELS
    total_pixels = VIDEO_TOTAL_PIXELS
    # 抽取的视频帧数
    nframes = smart_nframes(ele, total_frames, video_fps)
    max_pixels = max(min(VIDEO_MAX_PIXELS, total_pixels / nframes * FRAME_FACTOR),int(min_pixels * 1.05))

    # 视频的长宽比不应超过200:1或1:200
    if max(height, width) / min(height, width) > MAX_RATIO:
        raise ValueError(
            f"absolute aspect ratio must be smaller than {MAX_RATIO}, got {max(height, width) / min(height, width)}"
        )

    h_bar = max(factor, round_by_factor(height, factor))
    w_bar = max(factor, round_by_factor(width, factor))
    if h_bar * w_bar > max_pixels:
        beta = math.sqrt((height * width) / max_pixels)
        h_bar = floor_by_factor(height / beta, factor)
        w_bar = floor_by_factor(width / beta, factor)
    elif h_bar * w_bar < min_pixels:
        beta = math.sqrt(min_pixels / (height * width))
        h_bar = ceil_by_factor(height * beta, factor)
        w_bar = ceil_by_factor(width * beta, factor)
    return h_bar, w_bar


def token_calculate(video_path, fps):
    # 传入视频路径和fps抽帧参数
    messages = [{"content": [{"video": video_path, "fps":fps}]}]
    vision_infos = extract_vision_info(messages)[0]

    resized_height, resized_width=smart_resize(vision_infos,video_path)

    height, width, total_frames,video_fps = get_video(video_path)
    num_frames = smart_nframes(vision_infos,total_frames,video_fps)
    print(f"原视频尺寸：{height}*{width}，输入模型的尺寸：{resized_height}*{resized_width}，视频总帧数:{total_frames}，fps等于{fps}时，抽取的总帧数：{num_frames}",end="，")
    video_token = int(math.ceil(num_frames / 2) * resized_height / 28 * resized_width / 28)
    video_token += 2 # 系统会自动添加<|vision_bos|>和<|vision_eos|>视觉标记（各计1个Token）
    return video_token

def extract_vision_info(conversations):
    vision_infos = []
    if isinstance(conversations[0], dict):
        conversations = [conversations]
    for conversation in conversations:
        for message in conversation:
            if isinstance(message["content"], list):
                for ele in message["content"]:
                    if (
                        "image" in ele
                        or "image_url" in ele
                        or "video" in ele
                        or ele.get("type","") in ("image", "image_url", "video")
                    ):
                        vision_infos.append(ele)
    return vision_infos


video_token = token_calculate("xxx/test.mp4", 1)
print("视频tokens:", video_token)

模型选型建议

Qwen3-VL-Plus模型的视觉理解能力最强，Qwen-VL-Max（基于Qwen2.5-VL）次之；Qwen-VL-Plus（基于Qwen2.5-VL）模型在效果、成本上比较均衡，如果您暂时不确定使用某种模型，可以优先尝试使用通义千问Qwen3-VL-Plus模型。
若图像中涉及复杂的数学推理问题，建议开启Qwen3-VL-Plus的Thinking模式解决。开启思考模式的Qwen3-VL-Plus模型支持视觉输入及思维链输出，在数学、编程、视觉分析、创作以及通用任务上表现更强。
若处理文字提取任务，可使用通义千问OCR模型解决。通义千问OCR是文字提取专有模型，能够识别多种文字，专注于文档、表格、试题、手写体文字等类型图像的文字提取能力。

快速开始

前提条件

已获取API Key并配置API Key到环境变量。
如果通过 SDK 进行调用，需安装最新版SDK，其中 DashScope Python SDK 版本不低于1.24.6，DashScope Java SDK 版本不低于 2.21.10。

下面是理解在线图像（通过URL指定，非本地图像）的示例代码。了解如何传入本地文件和图像限制。

OpenAI兼容

Python

import os
from openai import OpenAI

client = OpenAI(
    # 若没有配置环境变量，请用阿里云百炼API Key将下行替换为：api_key="sk-xxx",
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-vl-plus", # 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
                    },
                },
                {"type": "text", "text": "图中描绘的是什么景象?"},
            ],
        },
    ],
)
print(completion.choices[0].message.content)

返回结果

这是一张在海滩上拍摄的照片。照片中，一个人和一只狗坐在沙滩上，背景是大海和天空。人和狗似乎在互动，狗的前爪搭在人的手上。阳光从画面的右侧照射过来，给整个场景增添了一种温暖的氛围。

Node.js

import OpenAI from "openai";

const openai = new OpenAI({
  // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx"
 // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
  apiKey: process.env.DASHSCOPE_API_KEY,
  // 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
  baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});

async function main() {
  const response = await openai.chat.completions.create({
    model: "qwen3-vl-plus",  // 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages: [
      {
        role: "user",
        content: [{
            type: "image_url",
            image_url: {
              "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
            }
          },
          {
            type: "text",
            text: "图中描绘的是什么景象?"
          }
        ]
      }
    ]
  });
  console.log(response.choices[0].message.content);
}
main()

返回结果

这是一张在海滩上拍摄的照片。照片中，一位穿着格子衬衫的女性坐在沙滩上，与一只戴着项圈的黄色拉布拉多犬互动。背景是大海和天空，阳光洒在她们身上，营造出温暖的氛围。

curl

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
# === 执行时请删除该注释 ===

curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
  "model": "qwen3-vl-plus",
  "messages": [
  {
    "role": "user",
    "content": [
      {"type": "image_url", "image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}},
      {"type": "text", "text": "图中描绘的是什么景象?"}
    ]
  }]
}'

返回结果

{
  "choices": [
    {
      "message": {
        "content": "这张图片展示了一位女士和一只狗在海滩上互动。女士坐在沙滩上，微笑着与狗握手。背景是大海和天空，阳光洒在她们身上，营造出温暖的氛围。狗戴着项圈，显得很温顺。",
        "role": "assistant"
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null
    }
  ],
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 1270,
    "completion_tokens": 54,
    "total_tokens": 1324
  },
  "created": 1725948561,
  "system_fingerprint": null,
  "model": "qwen3-vl-plus",
  "id": "chatcmpl-0fd66f46-b09e-9164-a84f-3ebbbedbac15"
}

DashScope

Python

import os
import dashscope
# 若使用新加坡地域的模型，请取消下列注释
# dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"

messages = [
{
    "role": "user",
    "content": [
    {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
    {"text": "图中描绘的是什么景象?"}]
}]
response = dashscope.MultiModalConversation.call(
    # 若没有配置环境变量， 请用百炼API Key将下行替换为： api_key ="sk-xxx"
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key = os.getenv('DASHSCOPE_API_KEY'),
    model = 'qwen3-vl-plus',  # 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages = messages
)
print(response.output.choices[0].message.content[0]["text"])

返回结果

是一张在海滩上拍摄的照片。照片中有一位女士和一只狗。女士坐在沙滩上，微笑着与狗互动。狗戴着项圈，似乎在与女士握手。背景是大海和天空，阳光洒在她们身上，营造出温馨的氛围。

Java

import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;

public class Main {

// 若使用新加坡地域的模型，请取消下列注释
//    static {Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";}
        
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"),
                        Collections.singletonMap("text", "图中描绘的是什么景象?"))).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                 // 若没有配置环境变量，请用百炼API Key将下行替换为：.apiKey("sk-xxx")
                 // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen3-vl-plus")  // 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
                .messages(Arrays.asList(userMessage))
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }
    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回结果

这是一张在海滩上拍摄的照片。照片中有一个穿着格子衬衫的人和一只戴着项圈的狗。人和狗面对面坐着，似乎在互动。背景是大海和天空，阳光洒在他们身上，营造出温暖的氛围。

curl

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
                    {"text": "图中描绘的是什么景象?"}
                ]
            }
        ]
    }
}'

返回结果

{
  "output": {
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "这是一张在海滩上拍摄的照片。照片中有一个穿着格子衬衫的人和一只戴着项圈的狗。他们坐在沙滩上，背景是大海和天空。阳光从画面的右侧照射过来，给整个场景增添了一种温暖的氛围。"
            }
          ]
        }
      }
    ]
  },
  "usage": {
    "output_tokens": 55,
    "input_tokens": 1271,
    "image_tokens": 1247
  },
  "request_id": "ccf845a3-dc33-9cda-b581-20fe7dc23f70"
}

开启/关闭思考模式

qwen3-vl-plus、qwen3-vl-flash系列模型属于混合思考模型，模型可以在思考后回复，也可直接回复；通过enable_thinking参数控制是否开启思考模式：
- true：开启思考模式
- false（默认）：关闭思考模式
qwen3-vl-235b-a22b-thinking等带thinking后缀的属于仅思考模型，模型总会在回复前进行思考，且无法关闭。

重要

模型配置：为保持最佳效果，不建议对Qwen3-VL设置 System Message 和修改默认超参数。
优先使用流式输出： 开启思考模式时，支持流式和非流式两种输出方式。为避免超时，建议您优先使用流式输出方式。
限制思考长度：深度思考模型有时会输出冗长的推理过程，可使用 thinking_budget 参数限制思考过程的长度。若模型思考过程生成的 Token 数超过thinking_budget，推理内容会进行截断并立刻开始生成最终回复内容。thinking_budget 默认值为模型的最大思维链长度，请参见模型列表与价格。

OpenAI 兼容

enable_thinking非 OpenAI 标准参数，若使用 OpenAI Python SDK 请通过 extra_body传入.

Python

from openai import OpenAI
import os

# 初始化OpenAI客户端
client = OpenAI(
    # 若没有配置环境变量，请用阿里云百炼API Key将下行替换为：api_key="sk-xxx",
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

reasoning_content = ""  # 定义完整思考过程
answer_content = ""     # 定义完整回复
is_answering = False   # 判断是否结束思考过程并开始回复
enable_thinking = True
# 创建聊天完成请求
completion = client.chat.completions.create(
    model="qwen3-vl-plus",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"
                    },
                },
                {"type": "text", "text": "这道题怎么解答？"},
            ],
        },
    ],
    stream=True,
    # enable_thinking 参数开启思考过程，thinking_budget 参数设置最大推理过程 Token 数
    # qwen3-vl-plus、 qwen3-vl-plus-2025-09-23可通过enable_thinking开启或关闭思考、对于qwen3-vl-235b-a22b-thinking，enable_thinking仅支持开启，其他Qwen-VL模型均不适用
    extra_body={
        'enable_thinking': enable_thinking,
        "thinking_budget": 500},

    # 解除以下注释会在最后一个chunk返回Token使用量
    # stream_options={
    #     "include_usage": True
    # }
)

if enable_thinking:
    print("\n" + "=" * 20 + "思考过程" + "=" * 20 + "\n")

for chunk in completion:
    # 如果chunk.choices为空，则打印usage
    if not chunk.choices:
        print("\nUsage:")
        print(chunk.usage)
    else:
        delta = chunk.choices[0].delta
        # 打印思考过程
        if hasattr(delta, 'reasoning_content') and delta.reasoning_content != None:
            print(delta.reasoning_content, end='', flush=True)
            reasoning_content += delta.reasoning_content
        else:
            # 开始回复
            if delta.content != "" and is_answering is False:
                print("\n" + "=" * 20 + "完整回复" + "=" * 20 + "\n")
                is_answering = True
            # 打印回复过程
            print(delta.content, end='', flush=True)
            answer_content += delta.content

# print("=" * 20 + "完整思考过程" + "=" * 20 + "\n")
# print(reasoning_content)
# print("=" * 20 + "完整回复" + "=" * 20 + "\n")
# print(answer_content)

Node.js

import OpenAI from "openai";

// 初始化 openai 客户端
const openai = new OpenAI({
     // 若没有配置环境变量，请用阿里云百炼API Key将下行替换为：api_key="sk-xxx",
    // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    apiKey: process.env.DASHSCOPE_API_KEY, 
    // 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
    baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1'
});

let reasoningContent = '';
let answerContent = '';
let isAnswering = false;
let enableThinking = true;

let messages = [
    {
        role: "user",
        content: [
        { type: "image_url", image_url: { "url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg" } },
        { type: "text", text: "解答这道题" },
    ]
}]

async function main() {
    try {
        const stream = await openai.chat.completions.create({
            model: 'qwen3-vl-plus',
            messages: messages,
            stream: true,
          // 注意：在 Node.js SDK，enableThinking 这样的非标准参数作为顶层属性传递的，无需放在 extra_body 中
          enable_thinking: enableThinking,
          thinking_budget: 500

        });

        if (enableThinking){console.log('\n' + '='.repeat(20) + '思考过程' + '='.repeat(20) + '\n');}

        for await (const chunk of stream) {
            if (!chunk.choices?.length) {
                console.log('\nUsage:');
                console.log(chunk.usage);
                continue;
            }

            const delta = chunk.choices[0].delta;

            // 处理思考过程
            if (delta.reasoning_content) {
                process.stdout.write(delta.reasoning_content);
                reasoningContent += delta.reasoning_content;
            }
            // 处理正式回复
            else if (delta.content) {
                if (!isAnswering) {
                    console.log('\n' + '='.repeat(20) + '完整回复' + '='.repeat(20) + '\n');
                    isAnswering = true;
                }
                process.stdout.write(delta.content);
                answerContent += delta.content;
            }
        }
    } catch (error) {
        console.error('Error:', error);
    }
}

main();

curl

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
# === 执行时请删除该注释 ===

curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen3-vl-plus",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"
          }
        },
        {
          "type": "text",
          "text": "请解答这道题"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{"include_usage":true},
    "enable_thinking": true,
    "thinking_budget": 500
}'

DashScope

Python

import os
import dashscope
from dashscope import MultiModalConversation

# 若使用新加坡地域的模型，请取消下列注释
# dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"
enable_thinking=True
messages = [
    {
        "role": "user",
        "content": [
            {"image": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"},
            {"text": "解答这道题？"}
        ]
    }
]

response = MultiModalConversation.call(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model="qwen3-vl-plus",
    messages=messages,
    stream=True,
    # enable_thinking 参数开启思考过程
    # qwen3-vl-plus、 qwen3-vl-plus-2025-09-23可通过enable_thinking开启或关闭思考、对于qwen3-vl-235b-a22b-thinking，enable_thinking仅支持开启；其他Qwen-VL模型均不适用
    enable_thinking=enable_thinking,
    # thinking_budget 参数设置最大推理过程 Token 数，仅对qwen-vl-plus、 qwen3-vl-plus-2025-09-23，qwen3-vl-235b-a22b-thinking
    thinking_budget=50,

)

# 定义完整思考过程
reasoning_content = ""
# 定义完整回复
answer_content = ""
# 判断是否结束思考过程并开始回复
is_answering = False
if enable_thinking:
    print("=" * 20 + "思考过程" + "=" * 20)

for chunk in response:
    # 如果思考过程与回复皆为空，则忽略
    message = chunk.output.choices[0].message
    reasoning_content_chunk = message.get("reasoning_content", None)
    if (chunk.output.choices[0].message.content == [] and
        reasoning_content_chunk == ""):
        pass
    else:
        # 如果当前为思考过程
        if reasoning_content_chunk != None and chunk.output.choices[0].message.content == []:
            print(chunk.output.choices[0].message.reasoning_content, end="")
            reasoning_content += chunk.output.choices[0].message.reasoning_content
        # 如果当前为回复
        elif chunk.output.choices[0].message.content != []:
            if not is_answering:
                print("\n" + "=" * 20 + "完整回复" + "=" * 20)
                is_answering = True
            print(chunk.output.choices[0].message.content[0]["text"], end="")
            answer_content += chunk.output.choices[0].message.content[0]["text"]

# 如果您需要打印完整思考过程与完整回复，请将以下代码解除注释后运行
# print("=" * 20 + "完整思考过程" + "=" * 20 + "\n")
# print(f"{reasoning_content}")
# print("=" * 20 + "完整回复" + "=" * 20 + "\n")
# print(f"{answer_content}")

Java

// dashscope SDK的版本 >= 2.21.10
import java.util.*;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.exception.InputRequiredException;
import java.lang.System;
import com.alibaba.dashscope.utils.Constants;

public class Main {

    // 若使用新加坡地域的模型，请取消下列注释
    //  static {Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";}

    private static final Logger logger = LoggerFactory.getLogger(Main.class);
    private static StringBuilder reasoningContent = new StringBuilder();
    private static StringBuilder finalContent = new StringBuilder();
    private static boolean isFirstPrint = true;

    private static void handleGenerationResult(MultiModalConversationResult message) {
        String re = message.getOutput().getChoices().get(0).getMessage().getReasoningContent();
        String reasoning = Objects.isNull(re)?"":re; // 默认值

        List<Map<String, Object>> content = message.getOutput().getChoices().get(0).getMessage().getContent();
        if (!reasoning.isEmpty()) {
            reasoningContent.append(reasoning);
            if (isFirstPrint) {
                System.out.println("====================思考过程====================");
                isFirstPrint = false;
            }
            System.out.print(reasoning);
        }

        if (Objects.nonNull(content) && !content.isEmpty()) {
            Object text = content.get(0).get("text");
            finalContent.append(content.get(0).get("text"));
            if (!isFirstPrint) {
                System.out.println("\n====================完整回复====================");
                isFirstPrint = true;
            }
            System.out.print(text);
        }
    }
    public static MultiModalConversationParam buildMultiModalConversationParam(MultiModalMessage Msg)  {
        return MultiModalConversationParam.builder()
                // 若没有配置环境变量，请用百炼API Key将下行替换为：.apiKey("sk-xxx")
                // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen3-vl-plus")
                .messages(Arrays.asList(Msg))
                .enableThinking(true)
                .thinkingBudget(500)
                .incrementalOutput(true)
                .build();
    }

    public static void streamCallWithMessage(MultiModalConversation conv, MultiModalMessage Msg)
            throws NoApiKeyException, ApiException, InputRequiredException, UploadFileException {
        MultiModalConversationParam param = buildMultiModalConversationParam(Msg);
        Flowable<MultiModalConversationResult> result = conv.streamCall(param);
        result.blockingForEach(message -> {
            handleGenerationResult(message);
        });
    }
    public static void main(String[] args) {
        try {
            MultiModalConversation conv = new MultiModalConversation();
            MultiModalMessage userMsg = MultiModalMessage.builder()
                    .role(Role.USER.getValue())
                    .content(Arrays.asList(Collections.singletonMap("image", "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"),
                            Collections.singletonMap("text", "请解答这道题")))
                    .build();
            streamCallWithMessage(conv, userMsg);
//             打印最终结果
//            if (reasoningContent.length() > 0) {
//                System.out.println("\n====================完整回复====================");
//                System.out.println(finalContent.toString());
//            }
        } catch (ApiException | NoApiKeyException | UploadFileException | InputRequiredException e) {
            logger.error("An exception occurred: {}", e.getMessage());
        }
        System.exit(0);
    }
}

curl

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"},
                    {"text": "请解答这道题"}
                ]
            }
        ]
    },
    "parameters":{
        "enable_thinking": true,
        "incremental_output": true,
        "thinking_budget": 50
    }
}'

多轮对话（参考历史对话信息）

通义千问VL模型可以参考历史对话信息实现多轮对话，您需要维护一个messages 数组，将每一轮的对话历史以及新的指令添加到 messages 数组中。

OpenAI兼容

Python

from openai import OpenAI
import os

client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx" 
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)
messages = [
        {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
                },
            },
            {"type": "text", "text": "图中描绘的是什么景象？"},
        ],
    }
]
completion = client.chat.completions.create(
    model="qwen3-vl-plus",  # 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages=messages,
    )
print(f"第一轮输出：{completion.choices[0].message.content}")
assistant_message = completion.choices[0].message
messages.append(assistant_message.model_dump())
messages.append({
        "role": "user",
        "content": [
        {
            "type": "text",
            "text": "做一首诗描述这个场景"
        }
        ]
    })
completion = client.chat.completions.create(
    model="qwen3-vl-plus",
    messages=messages,
    )
print(f"第二轮输出：{completion.choices[0].message.content}")

返回结果

第一轮输出：这是一张在海滩上拍摄的照片。照片中，一位穿着格子衬衫的女士坐在沙滩上，与一只戴着项圈的金毛犬互动。背景是大海和天空，阳光洒在她们身上，营造出温暖的氛围。
第二轮输出：沙滩上，阳光洒，
女子与犬，笑语哗。
海浪轻拍，风儿吹，
快乐时光，心儿醉。

Node.js

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx",
       // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

let messages = [
    {
        role: "user",
	content: [
        { type: "image_url", image_url: { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg" } },
        { type: "text", text: "图中描绘的是什么景象？" },
    ]
}]
async function main() {
    let response = await openai.chat.completions.create({
        model: "qwen3-vl-plus",  // 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
        messages: messages
    });
    console.log(`第一轮输出：${response.choices[0].message.content}`);
    messages.push(response.choices[0].message);
    messages.push({"role": "user", "content": "做一首诗描述这个场景"});
    response = await openai.chat.completions.create({
        model: "qwen3-vl-plus",
        messages: messages
    });
    console.log(`第二轮输出：${response.choices[0].message.content}`);
}

main()

返回结果

第一轮输出：这是一张在海滩上拍摄的照片。照片中有一个穿着格子衬衫的人和一只戴着项圈的狗。人和狗面对面坐着，似乎在互动。背景是大海和天空，阳光从画面的右侧照射过来，营造出温暖的氛围。
第二轮输出：沙滩上，人与狗，  
面对面，笑语稠。  
海风轻拂，阳光柔，  
心随波浪，共潮头。  

项圈闪亮，情意浓，  
格子衫下，心相通。  
海天一色，无尽空，  
此刻温馨，永铭中。

curl

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
  "model": "qwen3-vl-plus",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
          }
        },
        {
          "type": "text",
          "text": "图中描绘的是什么景象？"
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "这是一个女孩和一只狗。"
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "写一首诗描述这个场景"
        }
      ]
    }
  ]
}'

返回结果

{
    "choices": [
        {
            "message": {
                "content": "海风轻拂笑颜开，  \n沙滩上与犬相陪。  \n夕阳斜照人影短，  \n欢乐时光心自醉。",
                "role": "assistant"
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 1295,
        "completion_tokens": 32,
        "total_tokens": 1327
    },
    "created": 1726324976,
    "system_fingerprint": null,
    "model": "qwen3-vl-plus",
    "id": "chatcmpl-3c953977-6107-96c5-9a13-c01e328b24ca"
}

DashScope

Python

import os
import dashscope 
from dashscope import MultiModalConversation

# 若使用新加坡地域的模型，请取消下列注释
# dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"

messages = [
    {
        "role": "user",
        "content": [
            {
                "image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
            },
            {"text": "图中描绘的是什么景象？"},
        ],
    }
]
response = MultiModalConversation.call(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen3-vl-plus',   # 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages=messages)
print(f"模型第一轮输出：{response.output.choices[0].message.content[0]['text']}")
messages.append(response['output']['choices'][0]['message'])
user_msg = {"role": "user", "content": [{"text": "做一首诗描述这个场景"}]}
messages.append(user_msg)
response = MultiModalConversation.call(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen3-vl-plus',
    messages=messages)
print(f"模型第二轮输出：{response.output.choices[0].message.content[0]['text']}")

返回结果

模型第一轮输出：这是一张在海滩上拍摄的照片。照片中有一个穿着格子衬衫的人和一只戴着项圈的狗。人和狗面对面坐着，似乎在互动。背景是大海和天空，阳光洒在他们身上，营造出温暖的氛围。
模型第二轮输出：在阳光照耀的海滩上，人与狗共享欢乐时光。

Java

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    // 若使用新加坡地域的模型，请取消下列注释
   // static {Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";}
    private static final String modelName = "qwen3-vl-plus";  // 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    public static void MultiRoundConversationCall() throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"),
                        Collections.singletonMap("text", "图中描绘的是什么景象？"))).build();
        List<MultiModalMessage> messages = new ArrayList<>();
        messages.add(userMessage);
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // 若没有配置环境变量，请用百炼API Key将下行替换为：.apiKey("sk-xxx")
                // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))                
                .model(modelName)
                .messages(messages)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println("第一轮输出："+result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));        // add the result to conversation
        messages.add(result.getOutput().getChoices().get(0).getMessage());
        MultiModalMessage msg = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(Collections.singletonMap("text", "做一首诗描述这个场景"))).build();
        messages.add(msg);
        param.setMessages((List)messages);
        result = conv.call(param);
        System.out.println("第二轮输出："+result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));    }

    public static void main(String[] args) {
        try {
            MultiRoundConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回结果

第一轮输出：这是一张在海滩上拍摄的照片。照片中有一个穿着格子衬衫的人和一只戴着项圈的狗。人和狗面对面坐着，似乎在互动。背景是大海和天空，阳光洒在他们身上，营造出温暖的氛围。
第二轮输出：在阳光洒满的海滩上，人与狗共享欢乐时光。

curl

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
                    {"text": "图中描绘的是什么景象？"}
                ]
            },
            {
                "role": "assistant",
                "content": [
                    {"text": "图中是一名女子和一只拉布拉多犬在沙滩上玩耍。"}
                ]
            },
            {
                "role": "user",
                "content": [
                    {"text": "写一首七言绝句描述这个场景"}
                ]
            }
        ]
    }
}'

返回结果

{
    "output": {
        "choices": [
            {
                "finish_reason": "stop",
                "message": {
                    "role": "assistant",
                    "content": [
                        {
                            "text": "海浪轻拍沙滩边，女孩与狗同嬉戏。阳光洒落笑颜开，快乐时光永铭记。"
                        }
                    ]
                }
            }
        ]
    },
    "usage": {
        "output_tokens": 27,
        "input_tokens": 1298,
        "image_tokens": 1247
    },
    "request_id": "bdf5ef59-c92e-92a6-9d69-a738ecee1590"
}

流式输出

大模型接收到输入后，会逐步生成中间结果，最终结果由这些中间结果拼接而成。这种一边生成一边输出中间结果的方式称为流式输出。采用流式输出时，您可以在模型进行输出的同时阅读，减少等待模型回复的时间。

OpenAI兼容

通过 OpenAI 兼容方式开启流式输出十分方便，只需在请求参数中设置stream参数为true即可。

流式输出默认不会返回本次请求所使用的 Token 量。您可以通过设置stream_options参数为{"include_usage": True}，使最后一个返回的 chunk 包含本次请求所使用的 Token 量。

Python

from openai import OpenAI
import os

client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx"
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    
    # 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-vl-plus",  # 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages=[
        {"role": "user",
         "content": [{"type": "image_url",
                    "image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},},
                    {"type": "text", "text": "图中描绘的是什么景象？"}]}],
    stream=True
)
full_content = ""
print("流式输出内容为：")
for chunk in completion:
    # 如果stream_options.include_usage为True，则最后一个chunk的choices字段为空列表，需要跳过（可以通过chunk.usage获取 Token 使用量）
    if chunk.choices and chunk.choices[0].delta.content != "":
        full_content += chunk.choices[0].delta.content
        print(chunk.choices[0].delta.content)
print(f"完整内容为：{full_content}")

返回结果

流式输出内容为：

图
中
描绘
的是
一个
女人
......
温暖
和谐
的
氛围
。
完整内容为：图中描绘的是一个女人和一只狗在海滩上互动的场景。女人坐在沙滩上，微笑着与狗握手，显得非常开心。背景是大海和天空，阳光洒在她们身上，营造出一种温暖和谐的氛围。

Node.js

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx"
        // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

const completion = await openai.chat.completions.create({
    model: "qwen3-vl-plus",  // 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages: [
        {role: "user",
        content: [{"type": "image_url",
                    "image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},},
                    {"type": "text", "text": "图中描绘的是什么景象？"}]}],
    stream: true,
});

let fullContent = ""
console.log("流式输出内容为：")
for await (const chunk of completion) {
    // 如果stream_options.include_usage为true，则最后一个chunk的choices字段为空数组，需要跳过（可以通过chunk.usage获取 Token 使用量）
    if (chunk.choices[0] && chunk.choices[0].delta.content != null) {
      fullContent += chunk.choices[0].delta.content;
      console.log(chunk.choices[0].delta.content);
    }
}
console.log(`完整输出内容为：${fullContent}`)

返回结果

流式输出内容为：

图中描绘的是
一个女人和一只
狗在海滩上
互动的景象。
......
在她们身上，
营造出温暖和谐
的氛围。
完整内容为：图中描绘的是一个女人和一只狗在海滩上互动的景象。女人穿着格子衬衫，坐在沙滩上，微笑着与狗握手。狗戴着项圈，看起来很开心。背景是大海和天空，阳光洒在她们身上，营造出温暖和谐的氛围。

curl

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
# === 执行时请删除该注释 ===

curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen3-vl-plus",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
          }
        },
        {
          "type": "text",
          "text": "图中描绘的是什么景象？"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{"include_usage":true}
}'

返回结果

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen3-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"finish_reason":null,"delta":{"content":"图"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen3-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"delta":{"content":"中"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen3-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

......

data: {"choices":[{"delta":{"content":"分拍摄的照片。整体氛围显得非常"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen3-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"finish_reason":"stop","delta":{"content":"和谐而温馨。"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen3-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":1276,"completion_tokens":85,"total_tokens":1361},"created":1721823635,"system_fingerprint":null,"model":"qwen3-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: [DONE]

DashScope

可通过DashScope SDK或HTTP方式调用通义千问VL模型，体验流式输出的功能。根据不同的调用方式需设置相应的参数来实现流式输出：

Python SDK方式：设置stream参数为True。
Java SDK方式：需要通过streamCall接口调用。
HTTP方式：需要在Header中指定X-DashScope-SSE为enable。

流式输出的内容默认是非增量式（即每次返回的内容都包含之前生成的内容），如果您需要使用增量式流式输出，请设置incremental_output（Java 为incrementalOutput）参数为 true 。

Python

import os
from dashscope import MultiModalConversation
import dashscope

# 若使用新加坡地域的模型，请取消下列注释
# dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"

messages = [
    {
        "role": "user",
        "content": [
            {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
            {"text": "图中描绘的是什么景象?"}
        ]
    }
]
responses = MultiModalConversation.call(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx"
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model='qwen3-vl-plus',  # 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages=messages,
    stream=True,
    incremental_output=True
    )
full_content = ""
print("流式输出内容为：")
for response in responses:
    if response["output"]["choices"][0]["message"].content:
        print(response["output"]["choices"][0]["message"].content[0]["text"])
        full_content += response["output"]["choices"][0]["message"].content[0]["text"]
print(f"完整内容为：{full_content}")

返回结果

流式输出内容为：
图中描绘的是
一个人和一只狗
在海滩上互动
......
阳光洒在他们
身上，营造出
温暖和谐的氛围
。
完整内容为：图中描绘的是一个人和一只狗在海滩上互动的景象。这个人穿着格子衬衫，坐在沙滩上，与一只戴着项圈的金毛猎犬握手。背景是海浪和天空，阳光洒在他们身上，营造出温暖和谐的氛围。

Java

import java.util.Arrays;
import java.util.Collections;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.utils.Constants;

public class Main {

    // 若使用新加坡地域的模型，请取消下列注释
    //  static {Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";}

    public static void streamCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"),
                        Collections.singletonMap("text", "图中描绘的是什么景象？"))).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // 若没有配置环境变量，请用百炼API Key将下行替换为：.apiKey("sk-xxx")
                // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen3-vl-plus")  // 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
                .messages(Arrays.asList(userMessage))
                .incrementalOutput(true)
                .build();
        Flowable<MultiModalConversationResult> result = conv.streamCall(param);
        result.blockingForEach(item -> {
            try {
                var content = item.getOutput().getChoices().get(0).getMessage().getContent();
                    // 判断content是否存在且不为空
                if (content != null &&  !content.isEmpty()) {
                    System.out.println(content.get(0).get("text"));
                    }
            } catch (Exception e) {
                System.out.println(e.getMessage());
            }
        });
    }

    public static void main(String[] args) {
        try {
            streamCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回结果

图
中
描绘
的是
一个
女人
和
一只
狗
在
海滩
......
营造
出
一种
温暖
和谐
的
氛围
。

curl

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
                    {"text": "图中描绘的是什么景象？"}
                ]
            }
        ]
    },
    "parameters": {
        "incremental_output": true
    }
}'

返回结果

iid:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"这张"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1276,"output_tokens":1,"image_tokens":1247},"request_id":"00917f72-d927-9344-8417-2c4088d64c16"}

id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"图片"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1276,"output_tokens":2,"image_tokens":1247},"request_id":"00917f72-d927-9344-8417-2c4088d64c16"}

......

id:17
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"的欣赏。这是一个温馨的画面，展示了"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1276,"output_tokens":112,"image_tokens":1247},"request_id":"00917f72-d927-9344-8417-2c4088d64c16"}

id:18
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"人与动物之间深厚的情感纽带。"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1276,"output_tokens":120,"image_tokens":1247},"request_id":"00917f72-d927-9344-8417-2c4088d64c16"}

id:19
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[],"role":"assistant"},"finish_reason":"stop"}]},"usage":{"input_tokens":1276,"output_tokens":121,"image_tokens":1247},"request_id":"00917f72-d927-9344-8417-2c4088d64c16"}

开启高分辨率模式

对于需要关注大量细节的图像，可设置vl_high_resolution_images参数为True以开启高分辨率模式。将单图的默认Token 上限（1280 或 2560）提升至 16384。

模式

单图Token上限

适用场景

成本与延迟

开启（vl_high_resolution_images = True）

16384

内容丰富、需要关注细节的场景

较高

关闭（vl_high_resolution_images = False（默认值））

Qwen3-VL、qwen-vl-max-0813及之后、qwen-vl-plus-0710及之后更新的模型：2560
其他Qwen-VL模型：1280

细节较少、对速度要求高或成本敏感的场景。

较低

支持设置vl_high_resolution_images的模型

QVQ
Qwen3-VL
qwen-vl-max商业版：qwen-vl-max-0809及之后、qwen-vl-plus-0809及之后的模型
Qwen-VL开源：qwen2-vl、qwen2.5-vl、qwen3-vl

OpenAI 兼容

vl_high_resolution_images非 OpenAI 标准参数，若使用 OpenAI Python SDK 请通过 extra_body传入.

Python

import os
import time
from openai import OpenAI

client = OpenAI(
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
def test_resolution(high_resolution=False):
    completion = client.chat.completions.create(
        model="qwen3-vl-plus",
        messages=[
           {"role": "user","content": [
               {"type": "image_url","image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250212/earbrt/vcg_VCG211286867973_RF.jpg"},},
               {"type": "text", "text": "这些图描绘了什么内容？"},
                ],
            }
        ],
        extra_body={'enable_thinking': False,
                    "vl_high_resolution_images":high_resolution}

    )
    usage_info= completion.usage.prompt_tokens
    return {
        'usage_info': usage_info
    }

# 输出对比结果
print("\n==================== Token用量对比 ====================")
# 测试低分辨率
result_low = test_resolution(high_resolution=False)
# 等待一下避免API限制
time.sleep(2)
# 测试高分辨率
result_high = test_resolution(high_resolution=True)


if result_low['usage_info'] and result_high['usage_info']:
    low_tokens = result_low['usage_info']
    high_tokens = result_high['usage_info']
    print(f"高分辨率模式-输入总Tokens: {high_tokens}")
    print(f"低分辨率模式-输入总Tokens: {low_tokens}")
    print(f"差异: {high_tokens - low_tokens} tokens")

Node.js

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx"
        // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

async function test_resolution(high_resolution) {
    const response = await openai.chat.completions.create({
        model: "qwen3-vl-plus",
        messages: [
        {role: "user",content: [
            {type: "image_url",image_url: {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250212/earbrt/vcg_VCG211286867973_RF.jpg"}},
            {type: "text", text: "这些图描绘了什么内容？" },
        ]}],
        enable_thinking: false,
        vl_high_resolution_images:high_resolution
    });
    return response.usage.prompt_tokens;
}


// 测试低分辨率和高分辨率
(async function main() {
    console.log("\n==================== Token用量对比 ====================")
    const result_low = await test_resolution(false);
    const result_high = await test_resolution(true);

    console.log("高分辨率 输入总Token 数量:",result_high);
    console.log("低分辨率 输入总Token 数量:", result_low);
    console.log("差异:", result_high-result_low);

})();

curl

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
  "model": "qwen3-vl-plus",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250212/earbrt/vcg_VCG211286867973_RF.jpg"
          }
        },
        {
          "type": "text",
          "text": "这些图描绘了什么内容？"
        }
      ]
    }
  ],
  "enable_thinking": false,
  "vl_high_resolution_images":true
}'

Token使用量对比

==================== Token用量对比 ====================
高分辨率 总输入总Token 数量: 4073
低分辨率 总输入总Token 数量: 2518
差异: 1555

DashScope

Python

import os
import time

import dashscope

# 若使用新加坡地域的模型，请取消下列注释
# dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"

messages = [
    {
        "role": "user",
        "content": [
            {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250212/earbrt/vcg_VCG211286867973_RF.jpg"},
            {"text": "这张图表现了什么内容?"}
        ]
    }
]

def test_resolution(high_resolution=False):
    """测试不同分辨率设置的结果"""

    response = dashscope.MultiModalConversation.call(
        # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx"
        # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
        api_key=os.getenv('DASHSCOPE_API_KEY'),
        model='qwen3-vl-plus',
        enable_thinking=False,
        messages=messages,
        vl_high_resolution_images=high_resolution
    )
    return {
        'usage_info': response.usage
    }

# 输出对比结果
print("\n==================== Token用量对比 ====================")
# 测试低分辨率
result_low = test_resolution(high_resolution=False)
# 等待一下避免API限制
time.sleep(2)
# 测试高分辨率
result_high = test_resolution(high_resolution=True)

if result_low['usage_info'] and result_high['usage_info']:
    low_tokens = result_low['usage_info'].input_tokens_details['image_tokens']
    high_tokens = result_high['usage_info'].input_tokens_details['image_tokens']
    print(f"高分辨率模式-图像Tokens: {high_tokens}")
    print(f"低分辨率模式-图像Tokens: {low_tokens}")
    print(f"差异: {high_tokens - low_tokens} tokens ")

Java

// dashscope SDK的版本 >= 2.21.10
import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {

    // 若使用新加坡地域的模型，请取消下列注释
    // static {Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";}

    public static Integer simpleMultiModalConversationCall(boolean highResolution)
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250212/earbrt/vcg_VCG211286867973_RF.jpg"),
                        Collections.singletonMap("text", "这张图表现了什么内容?"))).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // 若没有配置环境变量，请用百炼API Key将下行替换为：.apiKey("sk-xxx")
                // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen3-vl-plus")
                .enableThinking(false)
                .messages(Arrays.asList(userMessage))
                .vlHighResolutionImages(highResolution)
                .build();
        MultiModalConversationResult result = conv.call(param);
        return result.getUsage().getImageTokens();
    }

    public static void main(String[] args) {
        try {
            // 调用高分辨率模式
            Integer highResToken = simpleMultiModalConversationCall(true);
            // 调用低分辨率模式
            Integer lowResToken = simpleMultiModalConversationCall(false);

            // 输出对比结果
            System.out.println("=== Token使用对比 ===");
            System.out.println("高分辨率模式Token数：" + highResToken);
            System.out.println("低分辨率模式Token数：" + lowResToken);
            System.out.println("差异：" + (highResToken - lowResToken));
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {
             "role": "user",
             "content": [
               {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250212/earbrt/vcg_VCG211286867973_RF.jpg"},
               {"text": "这张图表现了什么内容?"}
                ]
            }
        ]
    },
    "parameters": {
        "vl_high_resolution_images": true,
        "enable_thinking": false
    }
}'

Token使用量对比

==================== Token用量对比 ====================
高分辨率模式-图像tokens: 4058
低分辨率模式-图像tokens: 2503
差异: 1555 tokens

多图像输入

通义千问VL 模型支持请求传入多张图片，只需在content数组中包含多个图片对象即可。

图片数量受模型图文总 Token 上限（即最大输入）的限制，所有图片的总 Token 数必须小于模型的最大输入。

OpenAI兼容

Python

import os
from openai import OpenAI

client = OpenAI(
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
    model="qwen3-vl-plus", # 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages=[
       {"role": "user","content": [
           {"type": "image_url","image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},},
           {"type": "image_url","image_url": {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"},},
           {"type": "text", "text": "这些图描绘了什么内容？"},
            ],
        }
    ],
)

print(completion.choices[0].message.content)

返回结果

图1中是一位女士和一只拉布拉多犬在海滩上互动的场景。女士穿着格子衬衫，坐在沙滩上，与狗进行握手的动作，背景是海浪和天空，整个画面充满了温馨和愉快的氛围。

图2中是一只老虎在森林中行走的场景。老虎的毛色是橙色和黑色条纹相间，它正向前迈步，周围是茂密的树木和植被，地面上覆盖着落叶，整个画面给人一种野生自然的感觉。

Node.js

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx"
        // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

async function main() {
    const response = await openai.chat.completions.create({
        model: "qwen3-vl-plus",  // 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
        messages: [
          {role: "user",content: [
            {type: "image_url",image_url: {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}},
            {type: "image_url",image_url: {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"}},
            {type: "text", text: "这些图描绘了什么内容？" },
        ]}]
    });
    console.log(response.choices[0].message.content);
}

main()

返回结果

第一张图片中，一个人和一只狗在海滩上互动。人穿着格子衬衫，狗戴着项圈，他们似乎在握手或击掌。

第二张图片中，一只老虎在森林中行走。老虎的毛色是橙色和黑色条纹，背景是绿色的树木和植被。

curl

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
  "model": "qwen3-vl-plus",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
          }
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
          }
        },
        {
          "type": "text",
          "text": "这些图描绘了什么内容？"
        }
      ]
    }
  ]
}'

返回结果

{
  "choices": [
    {
      "message": {
        "content": "图1中是一位女士和一只拉布拉多犬在海滩上互动的场景。女士穿着格子衬衫，坐在沙滩上，与狗进行握手的动作，背景是海景和日落的天空，整个画面显得非常温馨和谐。\n\n图2中是一只老虎在森林中行走的场景。老虎的毛色是橙色和黑色条纹相间，它正向前迈步，周围是茂密的树木和植被，地面上覆盖着落叶，整个画面充满了自然的野性和生机。",
        "role": "assistant"
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null
    }
  ],
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 2497,
    "completion_tokens": 109,
    "total_tokens": 2606
  },
  "created": 1725948561,
  "system_fingerprint": null,
  "model": "qwen-vl-max",
  "id": "chatcmpl-0fd66f46-b09e-9164-a84f-3ebbbedbac15"
}

DashScope

Python

import os
import dashscope

# 若使用新加坡地域的模型，请取消下列注释
# dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"

messages = [
    {
        "role": "user",
        "content": [
            {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
            {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"},
            {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/hbygyo/rabbit.jpg"},
            {"text": "这些图描绘了什么内容?"}
        ]
    }
]

response = dashscope.MultiModalConversation.call(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx"
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen3-vl-plus', # 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages=messages
)

print(response.output.choices[0].message.content[0]["text"])

返回结果

这些图片展示了一些动物和自然场景。第一张图片中，一个人和一只狗在海滩上互动。第二张图片是一只老虎在森林中行走。第三张图片是一只卡通风格的兔子在草地上跳跃。

Java

import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import java.util.HashMap;
import com.alibaba.dashscope.utils.Constants;

public class Main {

// 若使用新加坡地域的模型，请取消下列注释
//  static {Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";}

    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"),
                        Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"),
                        Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/hbygyo/rabbit.jpg"),
                        Collections.singletonMap("text", "这些图描绘了什么内容?"))).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx"
               // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen3-vl-plus")  // 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
                .messages(Arrays.asList(userMessage))
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));    }
    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回结果

这些图片展示了一些动物和自然场景。

1. 第一张图片：一个女人和一只狗在海滩上互动。女人穿着格子衬衫，坐在沙滩上，狗戴着项圈，伸出爪子与女人握手。
2. 第二张图片：一只老虎在森林中行走。老虎的毛色是橙色和黑色条纹，背景是树木和树叶。
3. 第三张图片：一只卡通风格的兔子在草地上跳跃。兔子是白色的，耳朵是粉红色的，背景是蓝天和黄色的花朵。

curl

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === 执行时请删除该注释 ===

curl --location 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
                    {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"},
                    {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/hbygyo/rabbit.jpg"},
                    {"text": "这些图展现了什么内容？"}
                ]
            }
        ]
    }
}'

返回结果

{
  "output": {
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "这些图片展示了一些动物和自然场景。第一张图片中，一个人和一只狗在海滩上互动。第二张图片是一只老虎在森林中行走。第三张图片是一只卡通风格的兔子在草地上跳跃。"
            }
          ]
        }
      }
    ]
  },
  "usage": {
    "output_tokens": 81,
    "input_tokens": 1277,
    "image_tokens": 1247
  },
  "request_id": "ccf845a3-dc33-9cda-b581-20fe7dc23f70"
}

视频理解

部分通义千问VL模型支持对视频内容的理解，文件形式包括图像列表（视频帧）或视频文件。

建议使用性能较优的最新版或近期快照版模型理解视频文件。

视频文件

视频文件限制

视频大小：
- 公网URL：
  - Qwen3-VL、qwen-vl-max、qwen-vl-max-latest、qwen-vl-max-2025-08-13、qwen-vl-max-2025-04-08：不超过 2GB；
  - qwen-vl-plus系列及qwen-vl-max-2025-04-08之前的更新的模型：不超过 1GB；
  - 其他模型不超过 150MB。
- Base64编码：经Base64编码后的视频需小于10MB；
- 本地文件路径：视频本身需小于100MB。

视频时长：
- Qwen3-VL、qwen-vl-max、qwen-vl-max-latest、qwen-vl-max-2025-08-13及qwen-vl-max-2025-04-08：2秒至20分钟；
- 其他Qwen2.5-VL系列模型：2秒至10分钟；
- 其他模型：2秒至40秒。
视频格式： MP4、AVI、MKV、MOV、FLV、WMV 等。
视频尺寸：无特定限制，模型处理前会被调整到约60万像素数，更大尺寸的视频文件不会有更好的理解效果。
暂时不支持对视频文件的音频进行理解。

视频抽帧说明

通义千问VL模型通过抽帧来分析视频，抽帧频率决定了模型分析的精细度，不同SDK的控制方式如下：

使用 DashScope SDK：
- 可通过设置 fps 参数来控制抽帧频率，表示每隔 $\frac{1}{f p s}$ 秒抽取一帧图像， (0.1, 10)，默认值为2.0。
- 建议为高速运动场景（如体育赛事、动作电影）设置较大的fps值，为内容静态或较长的视频设置较小的fps值。
使用 OpenAI SDK：抽帧频率固定为每隔0.5秒抽取一帧，无法通过参数修改。

以下是理解在线视频（通过URL指定）的示例代码。了解如何传入本地文件。

OpenAI兼容

使用OpenAI SDK或HTTP方式向通义千问VL模型直接输入视频文件时，需要将用户消息中的"type"参数设为"video_url"。

Python

import os
from openai import OpenAI

client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
    model="qwen3-vl-plus",
    messages=[
        {"role": "user","content": [{
            # 直接传入视频文件时，请将type的值设置为video_url
            # 使用OpenAI SDK时，视频文件默认每间隔0.5秒抽取一帧，且不支持修改，如需自定义抽帧频率，请使用DashScope SDK.
            "type": "video_url",            
            "video_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"}},
            {"type": "text","text": "这段视频的内容是什么?"}]
         }]
)
print(completion.choices[0].message.content)

Node.js

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx"
        // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

async function main() {
    const response = await openai.chat.completions.create({
        model: "qwen3-vl-plus",
        messages: [
        {role: "user",content: [
            // 直接传入视频文件时，请将type的值设置为video_url
            // 使用OpenAI SDK时，视频文件默认每间隔0.5秒抽取一帧，且不支持修改，如需自定义抽帧频率，请使用DashScope SDK.
            {type: "video_url", video_url: {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"}},
            {type: "text", text: "这段视频的内容是什么?" },
        ]}]
    });
    console.log(response.choices[0].message.content);
}

main()

curl

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen3-vl-plus",
    "messages": [
    {"role": "user","content": [{"type": "video_url","video_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"}},
    {"type": "text","text": "这段视频的内容是什么?"}]}]
}'

DashScope

Python

import dashscope
import os

# 若使用新加坡地域的模型，请取消下列注释
# dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"

messages = [
    {"role": "user",
        "content": [
            # fps 可参数控制视频抽帧频率，表示每隔 1/fps 秒抽取一帧，完整用法请参见：https://help.aliyun.com/zh/model-studio/use-qwen-by-calling-api?#2ed5ee7377fum
            {"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4","fps":2},
            {"text": "这段视频的内容是什么?"}
        ]
    }
]

response = dashscope.MultiModalConversation.call(
    # 若没有配置环境变量， 请用百炼API Key将下行替换为： api_key ="sk-xxx"
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen3-vl-plus',
    messages=messages
)

print(response.output.choices[0].message.content[0]["text"])

Java

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;

public class Main {

    // 若使用新加坡地域的模型，请取消下列注释
    //  static {Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";}
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        // fps 可参数控制视频抽帧频率，表示每隔 1/fps 秒抽取一帧，完整用法请参见：https://help.aliyun.com/zh/model-studio/use-qwen-by-calling-api?#2ed5ee7377fum
        Map<String, Object> params = new HashMap<>();
        params.put("video", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4");
        params.put("fps", 2);
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        params,
                        Collections.singletonMap("text", "这段视频的内容是什么?"))).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // 若没有配置环境变量，请用百炼API Key将下行替换为：.apiKey("sk-xxx")
                // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen3-vl-plus")
                .messages(Arrays.asList(userMessage))
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }
    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {"role": "user","content": [{"video": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4","fps":2},
            {"text": "这段视频的内容是什么?"}]}]}
}'

图像列表

图像列表数量限制

Qwen3-VL及Qwen2.5-VL模型：最少传入4张图片，最多512张图片
其他模型：最少传入4张图片，最多80张图片

视频抽帧说明

以图像列表（即预先抽取的视频帧）传入时，可通过fps参数告知模型视频帧之间的时间间隔，帮助模型更准确地理解事件的顺序、持续时间和动态变化。

使用 DashScope SDK：
可在调用 Qwen2.5-VL、Qwen3-VL模型 时设置fps参数，表示视频帧是每隔 $\frac{1}{f p s}$ 秒从原始视频中抽取的。
使用 OpenAI SDK：
无法设置 fps参数，模型将默认视频帧是按照每 0.5 秒一帧的频率抽取的。

以下是理解在线视频帧（通过URL指定）的示例代码。了解如何传入本地文件。

OpenAI兼容

使用OpenAI SDK或HTTP方式向通义千问VL模型输入图片列表形式的视频时，需要将用户消息中的"type"参数设为"video"。

Python

import os
from openai import OpenAI


client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
     # 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
    model="qwen3-vl-plus", # 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages=[{"role": "user","content": [
        # 传入图像列表时，用户消息中的"type"参数为"video"，
        # 使用OpenAI SDK时，图像列表默认是以每隔0.5秒从视频中抽取出来的，且不支持修改。如需自定义抽帧频率，请使用DashScope SDK.
        {"type": "video","video": ["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"]},
        {"type": "text","text": "描述这个视频的具体过程"},
    ]}]
)
print(completion.choices[0].message.content)

Node.js

// 确保之前在 package.json 中指定了 "type": "module"
import OpenAI from "openai";

const openai = new OpenAI({
    // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx",
    // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    apiKey: process.env.DASHSCOPE_API_KEY,
    // 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
    baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});

async function main() {
    const response = await openai.chat.completions.create({
        model: "qwen3-vl-plus",  // 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
        messages: [{
            role: "user",
            content: [
                {
                    // 传入图像列表时，用户消息中的"type"参数为"video"
                    // 使用OpenAI SDK时，图像列表默认是以每隔0.5秒从视频中抽取出来的，且不支持修改。如需自定义抽帧频率，请使用DashScope SDK.
                    type: "video",
                    video: [
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
                    ]
                },
                {
                    type: "text",
                    text: "描述这个视频的具体过程"
                }
            ]
        }]
    });
    console.log(response.choices[0].message.content);
}

main();

curl

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen3-vl-plus",
    "messages": [{"role": "user",
                "content": [{"type": "video",
                "video": ["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"]},
                {"type": "text",
                "text": "描述这个视频的具体过程"}]}]
}'

DashScope

Python

import os
# dashscope版本需要不低于1.20.10
import dashscope

# 若使用新加坡地域的模型，请取消下列注释
# dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"

messages = [{"role": "user",
             "content": [
                  # 若模型属于Qwen2.5-VL系列且传入图像列表时，可设置fps参数，表示图像列表是由原视频每隔 1/fps 秒抽取的
                 {"video":["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"],
                   "fps":2},
                 {"text": "描述这个视频的具体过程"}]}]
response = dashscope.MultiModalConversation.call(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model='qwen3-vl-plus', # 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages=messages
)
print(response["output"]["choices"][0]["message"].content[0]["text"])

Java

// DashScope SDK版本需要不低于2.18.3
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {

    // 若使用新加坡地域的模型，请取消下列注释
    // static {Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";}

    private static final String MODEL_NAME = "qwen3-vl-plus"; // 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    public static void videoImageListSample() throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        //  若模型属于Qwen2.5-VL或Qwen3-VL模型且传入的是图像列表时，可设置fps参数，表示图像列表是由原视频每隔 1/fps 秒抽取的
        Map<String, Object> params = new HashMap<>();
        params.put("video", Arrays.asList("https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"));
        params.put("fps", 2);
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(
                        params,
                        Collections.singletonMap("text", "描述这个视频的具体过程")))
                .build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // 若没有配置环境变量，请用百炼API Key将下行替换为：.apiKey("sk-xxx")
                // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model(MODEL_NAME)
                .messages(Arrays.asList(userMessage)).build();
        MultiModalConversationResult result = conv.call(param);
        System.out.print(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }
    public static void main(String[] args) {
        try {
            videoImageListSample();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
  "model": "qwen3-vl-plus",
  "input": {
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "video": [
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
            ],
            "fps":2
                 
          },
          {
            "text": "描述这个视频的具体过程"
          }
        ]
      }
    ]
  }
}'

传入本地文件（Base64 编码或文件路径）

通义千问VL 提供两种本地文件上传方式：

Base64 编码上传
文件路径直接上传（传输更稳定，推荐方式）

上传方式：

Base64 编码上传

将文件转换为 Base64 编码字符串，再传入模型。适用于 OpenAI 和 DashScope SDK 及 HTTP 方式

传入 Base64 编码字符串的步骤（以图像为例）

文件编码：将本地图像转换为 Base64 编码；

图像转换为 Base64 编码的示例代码

#  编码函数： 将本地文件转换为 Base64 编码的字符串
import base64
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

# 将xxxx/eagle.png替换为你本地图像的绝对路径
base64_image = encode_image("xxx/eagle.png")

构建Data URL：格式如下：data:[MIME_type];base64,{base64_image}；
1. MIME_type需替换为实际的媒体类型，确保与支持的图像表格中MIME Type 的值匹配（如image/jpeg、image/png）；
2. base64_image为上一步生成的 Base64 字符串；
调用模型：通过image或image_url参数传递Data URL并调用模型。

文件路径上传

直接向模型传入本地文件路径。仅 DashScope Python 和 Java SDK 支持，不支持 DashScope HTTP 和OpenAI 兼容方式。

请您参考下表，结合您的编程语言与操作系统指定文件的路径。

指定文件路径（以图像为例）

系统	SDK	传入的文件路径	示例
Linux或macOS系统	Python SDK	file://{文件的绝对路径}	file:///home/images/test.png
Linux或macOS系统	Java SDK	file://{文件的绝对路径}	file:///home/images/test.png
Windows系统	Python SDK	file://{文件的绝对路径}	file://D:/images/test.png
Windows系统	Java SDK	file:///{文件的绝对路径}	file:///D:/images/test.png

使用限制：

建议优先选择文件路径上传（稳定性更高），1MB以下的文件也可使用 Base64 编码；
直接传入文件路径时，单张图像或视频帧（图像列表）本身需小于 10MB，单个视频需小于100MB；
Base64编码方式传入时，由于Base64编码会增加数据体积，需保证编码后的单个图像或视频需小于 10MB。

如需压缩文件体积请参见如何将图像或视频压缩到满足要求的大小？

图像

文件路径传入

传入文件路径仅支持 DashScope Python 和 Java SDK方式调用，不支持 DashScope HTTP 和OpenAI 兼容方式。

Python

import os
from dashscope import MultiModalConversation
import dashscope 

# 若使用新加坡地域的模型，请取消下列注释
# dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"

# 将xxx/eagle.png替换为你本地图像的绝对路径
local_path = "xxx/eagle.png"
image_path = f"file://{local_path}"
messages = [
                {'role':'user',
                'content': [{'image': image_path},
                            {'text': '图中描绘的是什么景象?'}]}]
response = MultiModalConversation.call(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx"
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen3-vl-plus',  # 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages=messages)
print(response["output"]["choices"][0]["message"].content[0]["text"])

Java

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    // 若使用新加坡地域的模型，请取消下列注释
    // static {Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";}
    public static void callWithLocalFile(String localPath)
            throws ApiException, NoApiKeyException, UploadFileException {
        String filePath = "file://"+localPath;
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(new HashMap<String, Object>(){{put("image", filePath);}},
                        new HashMap<String, Object>(){{put("text", "图中描绘的是什么景象？");}})).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // 若没有配置环境变量，请用百炼API Key将下行替换为：.apiKey("sk-xxx")
                // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen3-vl-plus")  // 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
                .messages(Arrays.asList(userMessage))
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));}

    public static void main(String[] args) {
        try {
            // 将xxx/eagle.png替换为你本地图像的绝对路径
            callWithLocalFile("xxx/eagle.png");
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

Base64 编码传入

OpenAI兼容

Python

from openai import OpenAI
import os
import base64


#  编码函数： 将本地文件转换为 Base64 编码的字符串
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

# 将xxxx/eagle.png替换为你本地图像的绝对路径
base64_image = encode_image("xxx/eagle.png")

client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx"
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    # 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
    model="qwen3-vl-plus", # 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    # 需要注意，传入Base64，图像格式（即image/{format}）需要与支持的图片列表中的Content Type保持一致。"f"是字符串格式化的方法。
                    # PNG图像：  f"data:image/png;base64,{base64_image}"
                    # JPEG图像： f"data:image/jpeg;base64,{base64_image}"
                    # WEBP图像： f"data:image/webp;base64,{base64_image}"
                    "image_url": {"url": f"data:image/png;base64,{base64_image}"}, 
                },
                {"type": "text", "text": "图中描绘的是什么景象?"},
            ],
        }
    ],
)
print(completion.choices[0].message.content)

Node.js

import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx"
        // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeImage = (imagePath) => {
    const imageFile = readFileSync(imagePath);
    return imageFile.toString('base64');
  };
// 将xxx/eagle.png替换为你本地图像的绝对路径
const base64Image = encodeImage("xxx/eagle.png")
async function main() {
    const completion = await openai.chat.completions.create({
        model: "qwen3-vl-plus",  // 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
        messages: [
            {"role": "user",
             "content": [{"type": "image_url",
                            // 需要注意，传入Base64，图像格式（即image/{format}）需要与支持的图片列表中的Content Type保持一致。
                           // PNG图像：  data:image/png;base64,${base64Image}
                          // JPEG图像： data:image/jpeg;base64,${base64Image}
                         // WEBP图像： data:image/webp;base64,${base64Image}
                        "image_url": {"url": `data:image/png;base64,${base64Image}`},},
                        {"type": "text", "text": "图中描绘的是什么景象?"}]}]
    });
    console.log(completion.choices[0].message.content);
}

main();

curl

将文件转换为 Base64 编码的字符串的方法可参见示例代码；
为了便于展示，代码中的"data:image/png;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA..." ，该Base64 编码字符串是截断的。在实际使用中，请务必传入完整的编码字符串。

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
# === 执行时请删除该注释 ===

curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
  "model": "qwen-vl-max",
  "messages": [
  {
    "role": "user",
    "content": [
      {"type": "image_url", "image_url": {"url": "data:image/png;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA..."}},
      {"type": "text", "text": "图中描绘的是什么景象?"}
    ]
  }]
}'

DashScope

Python

import base64
import os
from dashscope import MultiModalConversation
import dashscope 
# 若使用新加坡地域的模型，请取消下列注释
# dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"

#  编码函数： 将本地文件转换为 Base64 编码的字符串
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


# 将xxxx/eagle.png替换为你本地图像的绝对路径
base64_image = encode_image("xxxx/eagle.png")

messages = [
    {
        "role": "user",
        "content": [
            # 需要注意，传入Base64，图像格式（即image/{format}）需要与支持的图片列表中的Content Type保持一致。"f"是字符串格式化的方法。
            # PNG图像：  f"data:image/png;base64,{base64_image}"
            # JPEG图像： f"data:image/jpeg;base64,{base64_image}"
            # WEBP图像： f"data:image/webp;base64,{base64_image}"
            {"image": f"data:image/png;base64,{base64_image}"},
            {"text": "图中描绘的是什么景象?"},
        ],
    },
]
response = MultiModalConversation.call(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="qwen3-vl-plus",  # 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages=messages,
)
print(response["output"]["choices"][0]["message"].content[0]["text"])

Java

import java.io.IOException;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Base64;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

import com.alibaba.dashscope.aigc.multimodalconversation.*;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {

// 若使用新加坡地域的模型，请取消下列注释
//  static {Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";}

    private static String encodeImageToBase64(String imagePath) throws IOException {
        Path path = Paths.get(imagePath);
        byte[] imageBytes = Files.readAllBytes(path);
        return Base64.getEncoder().encodeToString(imageBytes);
    }

    public static void callWithLocalFile(String localPath) throws ApiException, NoApiKeyException, UploadFileException, IOException {

        String base64Image = encodeImageToBase64(localPath); // Base64编码

        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        new HashMap<String, Object>() {{ put("image", "data:image/png;base64," + base64Image); }},
                        new HashMap<String, Object>() {{ put("text", "图中描绘的是什么景象？"); }}
                )).build();

        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen3-vl-plus")
                .messages(Arrays.asList(userMessage))
                .build();

        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }

    public static void main(String[] args) {
        try {
            // 将 xxx/eagle.png 替换为你本地图像的绝对路径
            callWithLocalFile("xxx/eagle.png");
        } catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

将文件转换为 Base64 编码的字符串的方法可参见示例代码；
为了便于展示，代码中的"data:image/png;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA..." ，该Base64 编码字符串是截断的。在实际使用中，请务必传入完整的编码字符串。

curl代码：
# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {
             "role": "user",
             "content": [
               {"image": "data:image/png;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA..."},
               {"text": "图中描绘的是什么景象?"}
                ]
            }
        ]
    }
}'

视频文件

以保存在本地的test.mp4为例。

文件路径传入

传入文件路径仅支持 DashScope Python 和 Java SDK方式调用，不支持 DashScope HTTP 和OpenAI 兼容方式。

Python

import os
from dashscope import MultiModalConversation
import dashscope 

# 若使用新加坡地域的模型，请取消下列注释
# dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"

# 将xxxx/test.mp4替换为你本地视频的绝对路径
local_path = "xxx/test.mp4"
video_path = f"file://{local_path}"
messages = [
                {'role':'user',
                # fps参数控制视频抽帧数量，表示每隔1/fps 秒抽取一帧
                'content': [{'video': video_path,"fps":2},
                            {'text': '这段视频描绘的是什么景象?'}]}]
response = MultiModalConversation.call(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx"
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen3-vl-plus',  
    messages=messages)
print(response["output"]["choices"][0]["message"].content[0]["text"])

Java

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {
// 若使用新加坡地域的模型，请取消下列注释
//  static {Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";}
    public static void callWithLocalFile(String localPath)
            throws ApiException, NoApiKeyException, UploadFileException {
        String filePath = "file://"+localPath;
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(new HashMap<String, Object>()
                                       {{
                                           put("video", filePath);// fps参数控制视频抽帧数量，表示每隔1/fps 秒抽取一帧
                                           put("fps", 2);
                                       }}, 
                        new HashMap<String, Object>(){{put("text", "这段视频描绘的是什么景象？");}})).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen3-vl-plus")  
                .messages(Arrays.asList(userMessage))
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));}

    public static void main(String[] args) {
        try {
            // 将xxxx/test.mp4替换为你本地视频的绝对路径
            callWithLocalFile("xxx/test.mp4");
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

Base64 编码传入

OpenAI兼容

Python

from openai import OpenAI
import os
import base64


# 编码函数： 将本地文件转换为 Base64 编码的字符串
def encode_video(video_path):
    with open(video_path, "rb") as video_file:
        return base64.b64encode(video_file.read()).decode("utf-8")

# 将xxxx/test.mp4替换为你本地视频的绝对路径
base64_video = encode_video("xxx/test.mp4")
client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx"
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    # 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
    model="qwen3-vl-plus",  
    messages=[
        {
            "role": "user",
            "content": [
                {
                    # 直接传入视频文件时，请将type的值设置为video_url
                    "type": "video_url",
                    "video_url": {"url": f"data:video/mp4;base64,{base64_video}"},
                },
                {"type": "text", "text": "这段视频描绘的是什么景象?"},
            ],
        }
    ],
)
print(completion.choices[0].message.content)

Node.js

import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx"
        // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeVideo = (videoPath) => {
    const videoFile = readFileSync(videoPath);
    return videoFile.toString('base64');
  };
// 将xxxx/test.mp4替换为你本地视频的绝对路径
const base64Video = encodeVideo("xxx/test.mp4")
async function main() {
    const completion = await openai.chat.completions.create({
        model: "qwen3-vl-plus", 
        messages: [
            {"role": "user",
             "content": [{
                 // 直接传入视频文件时，请将type的值设置为video_url
                "type": "video_url", 
                "video_url": {"url": `data:video/mp4;base64,${base64Video}`}},
                 {"type": "text", "text": "这段视频描绘的是什么景象?"}]}]
    });
    console.log(completion.choices[0].message.content);
}

main();

curl

将文件转换为 Base64 编码的字符串的方法可参见示例代码；
为了便于展示，代码中的"data:video/mp4;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA..." ，该Base64 编码字符串是截断的。在实际使用中，请务必传入完整的编码字符串。

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
# === 执行时请删除该注释 ===

curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
  "model": "qwen-vl-max",
  "messages": [
  {
    "role": "user",
    "content": [
      {"type": "video_url", "video_url": {"url": "data:video/mp4;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA..."}},
      {"type": "text", "text": "图中描绘的是什么景象?"}
    ]
  }]
}'

DashScope

Python

import base64
import os
import dashscope 
from dashscope import MultiModalConversation

# 若使用新加坡地域的模型，请取消下列注释
# dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"

# 编码函数： 将本地文件转换为 Base64 编码的字符串
def encode_video(video_path):
    with open(video_path, "rb") as video_file:
        return base64.b64encode(video_file.read()).decode("utf-8")

# 将xxxx/test.mp4替换为你本地视频的绝对路径
base64_video = encode_video("xxxx/test.mp4")

messages = [{'role':'user',
            # fps参数控制视频抽帧数量，表示每隔1/fps 秒抽取一帧
             'content': [{'video': f"data:video/mp4;base64,{base64_video}","fps":2},
                            {'text': '这段视频描绘的是什么景象?'}]}]
response = MultiModalConversation.call(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx"
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen3-vl-plus',
    messages=messages)

print(response["output"]["choices"][0]["message"].content[0]["text"])

Java

import java.io.IOException;
import java.util.*;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

import com.alibaba.dashscope.aigc.multimodalconversation.*;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {
// 若使用新加坡地域的模型，请取消下列注释
//  static {Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";}
    private static String encodeVideoToBase64(String videoPath) throws IOException {
        Path path = Paths.get(videoPath);
        byte[] videoBytes = Files.readAllBytes(path);
        return Base64.getEncoder().encodeToString(videoBytes);
    }

    public static void callWithLocalFile(String localPath)
            throws ApiException, NoApiKeyException, UploadFileException, IOException {

        String base64Video = encodeVideoToBase64(localPath); // Base64编码

        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(new HashMap<String, Object>()
                                       {{
                                           put("video", "data:video/mp4;base64," + base64Video);// fps参数控制视频抽帧数量，表示每隔1/fps 秒抽取一帧
                                           put("fps", 2);
                                       }},
                        new HashMap<String, Object>(){{put("text", "这段视频描绘的是什么景象？");}})).build();

        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // 若没有配置环境变量，请用百炼API Key将下行替换为：.apiKey("sk-xxx")
                // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen3-vl-plus")
                .messages(Arrays.asList(userMessage))
                .build();

        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }

    public static void main(String[] args) {
        try {
            // 将 xxx/test.mp4 替换为你本地图像的绝对路径
            callWithLocalFile("xxx/test.mp4");
        } catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

将文件转换为 Base64 编码的字符串的方法可参见示例代码；
为了便于展示，代码中的"f"data:video/mp4;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA..." ，该Base64 编码字符串是截断的。在实际使用中，请务必传入完整的编码字符串。

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {
             "role": "user",
             "content": [
               {"video": "data:video/mp4;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA..."},
               {"text": "图中描绘的是什么景象?"}
                ]
            }
        ]
    }
}'

图像列表

以保存在本地的football1.jpg、football2.jpg、football3.jpg、football4.jpg为例。

文件路径传入

传入文件路径仅支持 DashScope Python 和 Java SDK方式调用，不支持 DashScope HTTP 和OpenAI 兼容方式。

Python

import os
from dashscope import MultiModalConversation
import dashscope 

# 若使用新加坡地域的模型，请取消下列注释
# dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"

local_path1 = "football1.jpg"
local_path2 = "football2.jpg"
local_path3 = "football3.jpg"
local_path4 = "football4.jpg"

image_path1 = f"file://{local_path1}"
image_path2 = f"file://{local_path2}"
image_path3 = f"file://{local_path3}"
image_path4 = f"file://{local_path4}"

messages = [{'role':'user',
                # 若模型属于Qwen2.5-VL、Qwen3-VL，且传入图像列表时，可设置fps参数，表示图像列表是由原视频每隔 1/fps 秒抽取的，其他模型设置则不生效
             'content': [{'video': [image_path1,image_path2,image_path3,image_path4],"fps":2},
                            {'text': '这段视频描绘的是什么景象?'}]}]
response = MultiModalConversation.call(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx"
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen3-vl-plus',  # 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages=messages)

print(response["output"]["choices"][0]["message"].content[0]["text"])

Java

// DashScope SDK版本需要不低于2.18.3
import java.util.Arrays;
import java.util.Map;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;


public class Main {

// 若使用新加坡地域的模型，请取消下列注释
//  static {Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";}

    private static final String MODEL_NAME = "qwen3-vl-plus";  // 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    public static void videoImageListSample(String localPath1, String localPath2, String localPath3, String localPath4)
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        String filePath1 = "file://" + localPath1;
        String filePath2 = "file://" + localPath2;
        String filePath3 = "file://" + localPath3;
        String filePath4 = "file://" + localPath4;
        Map<String, Object> params = Map.of(
                "video", Arrays.asList(filePath1,filePath2,filePath3,filePath4),
                // 若模型属于Qwen2.5-VL系列且传入图像列表时，可设置fps参数，表示图像列表是由原视频每隔 1/fps 秒抽取的，其他模型设置则不生效
                "fps",2);
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(params,
                        Collections.singletonMap("text", "描述这个视频的具体过程")))
                .build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // 新加坡和北京地域的API Key不同。获取API Key：https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model(MODEL_NAME)
                .messages(Arrays.asList(systemMessage, userMessage)).build();
        MultiModalConversationResult result = conv.call(param);
        System.out.print(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }
    public static void main(String[] args) {
        try {
            videoImageListSample(
                    "xxx/football1.jpg",
                    "xxx/football2.jpg",
                    "xxx/football3.jpg",
                    "xxx/football4.jpg");
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

Base64 编码传入

OpenAI兼容

Python

import os
from openai import OpenAI
import base64

# 编码函数： 将本地文件转换为 Base64 编码的字符串
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

base64_image1 = encode_image("football1.jpg")
base64_image2 = encode_image("football2.jpg")
base64_image3 = encode_image("football3.jpg")
base64_image4 = encode_image("football4.jpg")
client = OpenAI(
    # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key="sk-xxx",
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
    model="qwen3-vl-plus",  # 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages=[
    {"role": "user","content": [
        {"type": "video","video": [
            f"data:image/jpeg;base64,{base64_image1}",
            f"data:image/jpeg;base64,{base64_image2}",
            f"data:image/jpeg;base64,{base64_image3}",
            f"data:image/jpeg;base64,{base64_image4}",]},
        {"type": "text","text": "描述这个视频的具体过程"},
    ]}]
)
print(completion.choices[0].message.content)

Node.js

import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
        // 若没有配置环境变量，请用百炼API Key将下行替换为：apiKey: "sk-xxx"
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeImage = (imagePath) => {
    const imageFile = readFileSync(imagePath);
    return imageFile.toString('base64');
  };
  
const base64Image1 = encodeImage("football1.jpg")
const base64Image2 = encodeImage("football2.jpg")
const base64Image3 = encodeImage("football3.jpg")
const base64Image4 = encodeImage("football4.jpg")
async function main() {
    const completion = await openai.chat.completions.create({
        model: "qwen3-vl-plus", // 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
        messages: [
            {"role": "user",
             "content": [{"type": "video",
                            // 需要注意，传入Base64，图像格式（即image/{format}）需要与支持的图片列表中的Content Type保持一致。
                           // PNG图像：  data:image/png;base64,${base64Image}
                          // JPEG图像： data:image/jpeg;base64,${base64Image}
                         // WEBP图像： data:image/webp;base64,${base64Image}
                        "video": [
                            `data:image/jpeg;base64,${base64Image1}`,
                            `data:image/jpeg;base64,${base64Image2}`,
                            `data:image/jpeg;base64,${base64Image3}`,
                            `data:image/jpeg;base64,${base64Image4}`]},
                        {"type": "text", "text": "这段视频描绘的是什么景象?"}]}]
    });
    console.log(completion.choices[0].message.content);
}

main();

curl

将文件转换为 Base64 编码的字符串的方法可参见示例代码；
为了便于展示，代码中的"data:image/png;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA..." ，该Base64 编码字符串是截断的。在实际使用中，请务必传入完整的编码字符串。

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下是北京地域base_url，如果使用新加坡地域的模型，需要将base_url替换为：https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen3-vl-plus",
    "messages": [{"role": "user",
                "content": [{"type": "video",
                "video": [
                          "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA...",
                          "data:image/jpeg;base64,nEpp6jpnP57MoWSyOWwrkXMJhHRCWYeFYb...",
                          "data:image/jpeg;base64,JHWQnJPc40GwQ7zERAtRMK6iIhnWw4080s...",
                          "data:image/jpeg;base64,adB6QOU5HP7dAYBBOg/Fb7KIptlbyEOu58..."
                          ]},
                {"type": "text",
                "text": "描述这个视频的具体过程"}]}]
}'

DashScope

Python

import base64
import os
from dashscope import MultiModalConversation
import dashscope 

# 若使用新加坡地域的模型，请取消下列注释
# dashscope.base_http_api_url = "https://dashscope-intl.aliyuncs.com/api/v1"

#  编码函数： 将本地文件转换为 Base64 编码的字符串
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

base64_image1 = encode_image("football1.jpg")
base64_image2 = encode_image("football2.jpg")
base64_image3 = encode_image("football3.jpg")
base64_image4 = encode_image("football4.jpg")


messages = [{'role':'user',
             'content': [
                    {'video':
                         [f"data:image/png;base64,{base64_image1}",
                          f"data:image/png;base64,{base64_image2}",
                          f"data:image/png;base64,{base64_image3}",
                          f"data:image/png;base64,{base64_image4}"
                         ]
                    },
                    {'text': '请描绘这个视频的具体过程?'}]}]
response = MultiModalConversation.call(
    # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model='qwen3-vl-plus',  # 此处以qwen3-vl-plus为例，可按需更换模型名称。模型列表：https://help.aliyun.com/zh/model-studio/models
    messages=messages)

print(response["output"]["choices"][0]["message"].content[0]["text"])

Java

import java.io.IOException;
import java.util.*;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

import com.alibaba.dashscope.aigc.multimodalconversation.*;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;

public class Main {

// 若使用新加坡地域的模型，请取消下列注释
// static {Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";}

    private static String encodeImageToBase64(String imagePath) throws IOException {
        Path path = Paths.get(imagePath);
        byte[] imageBytes = Files.readAllBytes(path);
        return Base64.getEncoder().encodeToString(imageBytes);
    }

    public static void videoImageListSample(String localPath1,String localPath2,String localPath3,String localPath4)
            throws ApiException, NoApiKeyException, UploadFileException, IOException {

        String base64Image1 = encodeImageToBase64(localPath1); // Base64编码
        String base64Image2 = encodeImageToBase64(localPath2);
        String base64Image3 = encodeImageToBase64(localPath3);
        String base64Image4 = encodeImageToBase64(localPath4);

        MultiModalConversation conv = new MultiModalConversation();
        Map<String, Object> params = Map.of(
                "video", Arrays.asList(
                        "data:image/jpeg;base64," + base64Image1,
                        "data:image/jpeg;base64," + base64Image2,
                        "data:image/jpeg;base64," + base64Image3,
                        "data:image/jpeg;base64," + base64Image4),
                // 若模型属于Qwen2.5-VL系列且传入图像列表时，可设置fps参数，表示图像列表是由原视频每隔 1/fps 秒抽取的，其他模型设置则不生效
                    "fps",2
        );
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(params,
                        Collections.singletonMap("text", "描述这个视频的具体过程")))
                .build();

        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen3-vl-plus")
                .messages(Arrays.asList(userMessage))
                .build();

        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }

    public static void main(String[] args) {
        try {
            // 将 xxx/football1.png 等替换为你本地图像的绝对路径
            videoImageListSample(
                    "xxx/football1.jpg",
                    "xxx/football2.jpg",
                    "xxx/football3.jpg",
                    "xxx/football4.jpg"
            );
        } catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

将文件转换为 Base64 编码的字符串的方法可参见示例代码；
为了便于展示，代码中的"data:image/png;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA..." ，该Base64 编码字符串是截断的。在实际使用中，请务必传入完整的编码字符串。

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
  "model": "qwen3-vl-plus",
  "input": {
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "video": [
                      "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAA...",
                      "data:image/jpeg;base64,nEpp6jpnP57MoWSyOWwrkXMJhHRCWYeFYb...",
                      "data:image/jpeg;base64,JHWQnJPc40GwQ7zERAtRMK6iIhnWw4080s...",
                      "data:image/jpeg;base64,adB6QOU5HP7dAYBBOg/Fb7KIptlbyEOu58..."
            ],
            "fps":2     
          },
          {
            "text": "描述这个视频的具体过程"
          }
        ]
      }
    ]
  }
}'

使用限制

支持的图像

模型支持的图像格式如下表：

图像格式	常见扩展名	MIME Type
BMP	.bmp	image/bmp
JPEG	.jpe, .jpeg, .jpg	image/jpeg
PNG	.png	image/png
TIFF	.tif, .tiff	image/tiff
WEBP	.webp	image/webp
HEIC	.heic	image/heic

图像大小限制

单个图像的大小不超过10 MB。如果传入 Base64编码的图像，需保证编码后的字符串小于10MB，详情请参见传入本地文件。如需压缩文件体积请参见如何将图像或视频压缩到满足要求的大小？
对单图的像素总数无严格限制，图像的宽度和高度均应大于10像素，宽高比不应超过200:1或1:200。
模型在进行图像理解前会对图像进行缩放处理。过大的图像不会有更好的理解效果。
点击查看推荐的像素值
- 输入qwen-vl-max、qwen-vl-max-latest、qwen-vl-max-1230、qwen-vl-max-1119、qwen-vl-max-1030、qwen-vl-max-0809、qwen-vl-plus-latest、qwen-vl-plus-0102、qwen-vl-plus-0809、qwen2-vl-72b-instruct、qwen2-vl-7b-instruct与qwen2-vl-2b-instruct模型的单张图像，像素数推荐不超过 1200万，可以支持标准的 4K 图像。
- 输入qwen-vl-plus模型的单张图像，像素数推荐不超过 1003520。

图像输入方式

图像的URL链接：需确保URL可被公网访问。
说明
可将图像上传到OSS或上传到阿里云百炼的免费临时存储空间，获取公网 URL。
- 如果要传入的是OSS中读写权限为私有的图像，可使用外网endpoint生成签名URL。该URL允许他人临时访问文件，具体请参见使用预签名URL下载或预览文件。
- 由于OSS内网与百炼服务不互通，请勿使用OSS内网URL
本地图像文件：传入 Base64 编码数据或直接传入本地文件的路径。

应用示例

看图做题

Prompt技巧：您可以通过“思维链”Prompt方法解决复杂的数学问题，这是一种通过引导模型生成推理过程或帮助模型拆解复杂任务并逐步推理的方式，让模型在生成推理结果前生成更多的推理依据，从而提升模型在复杂问题上的表现。

输入示例

示例代码

输出示例

提示词：请你分步骤解答这道题，输出对这道题的思考判断过程。

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://img.alicdn.com/imgextra/i2/O1CN01e99Hxt1evMlWM6jUL_!!6000000003933-0-tps-1294-760.jpg"},
                    {"text": "请你分步骤解答这道题，输出对这道题的思考判断过程。"}
                ]
            }
        ]
    }
}'

信息抽取

通义千问VL模型支持抽取票据证件表单中的信息，并以结构化的形式返回。

Prompt技巧：

使用分隔符强调需要提取的字段
明确输出格式，例如JSON格式
在提示词中明确禁止可能的```json```代码段，如“请你以JSON格式输出，不要输出```json```代码段”

输入示例

示例代码

输出示例

提示词：提取图中的：['发票代码','发票号码','到站','燃油费','票价','乘车日期','开车时间','车次','座号']，请你以JSON格式输出，不要输出```json```代码段”。

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "http://duguang-labelling.oss-cn-shanghai.aliyuncs.com/demo_ocr/receipt_zh_demo.jpg"},
                    {"text": "提取图中的：['发票代码','发票号码','到站','燃油费','票价','乘车日期','开车时间','车次','座号']，请你以JSON格式输出，不要输出```json```代码段”。"}
                ]
            }
        ]
    }
}'

{
    "发票代码": "221021325353",
    "发票号码": "10283819",
    "到站": "开发区",
    "燃油费": "2.0",
    "票价": "8.00<全>",
    "乘车日期": "2013-06-29",
    "开车时间": "流水",
    "车次": "040",
    "座号": "371"
}

物体定位

模型支持对物体进行定位：

Qwen2.5-VL模型返回的坐标均相对于缩放后的图像左上角的绝对值，单位为像素。可参考Qwen2.5-VL中的代码将坐标映射到原图中。Qwen2.5-VL模型480*480～ 2560*2560分辨率范围内，物体定位效果较为鲁棒，在此范围之外可能会偶发bbox漂移现象。
Qwen3-VL模型返回的坐标将为相对坐标，坐标值会归一化到0-999。

推荐Prompt

Prompt技巧

定位方式	支持的输出方式	推荐Prompt
Box定位	JSON或纯文本	检测图中所有{物体}并以{JSON/纯文本}格式输出其bbox的坐标
Point定位	JSON或XML	以点的形式定位图中所有{物体}，以{JSON/XML}格式输出其point坐标

Prompt改进思路

当检测密集排列的物体时，如Prompt为“检测图中所有人”，模型可能会混淆了“每个人”和“所有人”的语义，从而仅输出将所有人物包含在内的框，可以通过下列提示词向模型强调检测每个对象：
- Box定位：定位图中每一个{某类物体}并描述其各自的{某种特征}，以{JSON/纯文本}格式输出其bbox坐标。
- Point定位：以点的形式定位图中每一个{某类物体}并描述各自的{某种特征}，以{JSON/XML}格式输出其point坐标
定位结果中可能会出现```json```或者```xml```等无关内容，可在Prompt中明确禁止该内容输出，如“请你以JSON格式输出，不要输出```json```代码段”。

输入示例	示例代码	输出示例
3D 定位提示词：查找此图像中的所有汽车。对于每辆车，提供其3D边界框。所需的输出格式为JSON：`[{“bbox_3d"：[x_center，y_center，z_center，x_size，y_size，z_size，roll，pitch，yaw]，“label"：“category"}]`。 Qwen3-VL模型新增功能。	curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H 'Content-Type: application/json' \ -d '{ "model": "qwen3-vl-plus-2025-09-23", "input":{ "messages":[ { "role": "user", "content": [ {"image": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-cn-shanghai.aliyuncs.com/Qwen3VL/demo/grounding/3d_grounding/autonomous_driving.jpg"}, {"text": "Find all cars in this image. For each car, provide its 3D bounding box. The output format required is JSON: `[{\"bbox_3d\":[x_center, y_center, z_center, x_size, y_size, z_size, roll, pitch, yaw],\"label\":\"category\"}]`."} ] } ] } }'	`[ {"bbox_3d": [2.81, 0.56, 6.41, 4.17, 1.7, 1.88, 0.29, 0.5, 0.29], "label":"car" }, {"bbox_3d": [2.66, 0.56, 13.72, 4.7, 1.59, 1.78, 0.89, 0.49, 0.89], "label":"car"}]`
Box定位：提示词：定位每一个蛋糕的位置，并描述其各自的特征，以JSON格式输出所有的bbox的坐标，不要输出```json```代码段。	# ======= 重要提示 ======= # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key # 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation # === 执行时请删除该注释 === curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H 'Content-Type: application/json' \ -d '{ "model": "qwen3-vl-plus", "input":{ "messages":[ { "role": "user", "content": [ {"image": "https://img.alicdn.com/imgextra/i3/O1CN01I1CXf21UR0Ld20Yzs_!!6000000002513-2-tps-1024-1024.png"}, {"text": "用一个个框定位图像每一个蛋糕的位置并描述其各自的特征，以JSON格式输出所有的bbox的坐标，不要输出```json```代码段"} ] } ], "vl_high_resolution_images":true, "temperature":0, "top_k":"1", "seed":"3407" } }'	Box定位方式会返回矩形框的左上角和右下角的坐标。 `[ { "bbox": [60, 395, 204, 578], "description": "巧克力蛋糕，顶部覆盖红色糖霜和彩色糖粒" }, { "bbox": [248, 381, 372, 542], "description": "粉色糖霜的蛋糕，顶部有白色和蓝色的糖粒" } .... ]`
Point定位：提示词：以点的形式定位图中见义勇为的人，并以XML格式输出结果，不要输出```xml```代码段。	# ======= 重要提示 ======= # 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key # 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation # === 执行时请删除该注释 === curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H 'Content-Type: application/json' \ -d '{ "model": "qwen3-vl-plus", "input":{ "messages":[ { "role": "user", "content": [ {"image": "https://img.alicdn.com/imgextra/i1/O1CN01ILRlNK1gvU5xqbaxb_!!6000000004204-49-tps-1138-640.webp"}, {"text": "以点的形式定位图中见义勇为的人，并以XML格式输出结果，不要输出```xml```代码段。"} ] } ], "vl_high_resolution_images":true, "temperature":0, "top_k":"1", "seed":"3407" } }'	以Point定位的方式会返回矩形框中心点的坐标。 `< points x1 = "284" y1 = "305" alt = "见义勇为的人" > 见义勇为的人 < /points>`

文档解析

模型支持将图像类的文档（如扫描件/图片PDF）解析为 QwenVL HTML或Markdown（仅 Qwen3-VL支持）格式，该格式不仅能精准识别文本，还能获取图像、表格等元素的位置信息。

Prompt技巧：您需要在提示词中引导模型输出QwenVL HTML，否则将解析为不带位置信息的HTML格式的文本：

推荐系统提示词："You are an AI specialized in recognizing and extracting text from images. Your mission is to analyze the image document and generate the result in QwenVL Document Parser HTML format using specified tags while maintaining user privacy and data integrity."
推荐用户提示词："QwenVL HTML"或"QwenVL Markdown"

输入示例

示例代码

输出示例

为防止输出结果过长导致超时的风险，以下为使用流式输出的代码：

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://img.alicdn.com/imgextra/i3/O1CN01nVbWzy1vx3iInC3z0_!!6000000006238-0-tps-1430-2022.jpg"},
                    {"text": "QwenVL HTML"}
                ]
            }
        ],
    "parameters": {
        "incremental_output":true
    }
    }
}'

完整的输出结果如下：

```html
<html><body>
<h2 data-bbox=\"91 95 223 120\"> 1 Introduction</h2> 
 <p data-bbox=\"91 128 742 296\">The sparks of artificial general intelligence (AGI) are increasingly visible through the fast development of large foundation models, notably large language models (LLMs) (Brown et al., 2020; OpenAI, 2023; 2024; Gemini Team, 2024; Anthropic, 2023a,b; 2024; Bai et al., 2023; Yang et al., 2024a; Touvron et al., 2023a,b; Dubey et al., 2024). The continuous advancement in model and data scaling, combined with the paradigm of large-scale pre-training followed by high-quality supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) (Ouyang et al., 2022), has enabled large language models (LLMs) to develop emergent capabilities in language understanding, generation, and reasoning. Building on this foundation, recent breakthroughs in inference time scaling, particularly demonstrated by o1 (OpenAI, 2024b), have enhanced LLMs’ capacity for deep thinking through step-by-step reasoning and reflection. These developments have elevated the potential of language models, suggesting they may achieve significant breakthroughs in scientific exploration as they continue to demonstrate emergent capabilities indicative of more general artificial intelligence.</p> 
 <p data-bbox=\"91 296 742 448\">Besides the fast development of model capabilities, the recent two years have witnessed a burst of open (open-weight) large language models in the LLM community, for example, the Llama series (Touvron et al., 2023a,b; Dubey et al., 2024), Mistral series (Jiang et al., 2023a; 2024a), and our Qwen series (Bai et al., 2023; Yang et al., 2024a; Qwen Team, 2024a; Hui et al., 2024; Qwen Team, 2024c; Yang et al., 2024b). The open-weight models have democratized the access of large language models to common users and developers, enabling broader research participation, fostering innovation through community collaboration, and accelerating the development of AI applications across diverse domains.</p> 
 <p data-bbox=\"91 448 742 586\">Recently, we release the details of our latest version of the Qwen series, Qwen2.5. In terms of the openweight part, we release pre-trained and instruction-tuned models of 7 sizes, including $0.5 \\mathrm{~B}, 1.5 \\mathrm{~B}, 3 \\mathrm{~B}, 7 \\mathrm{~B}$, $14 \\mathrm{~B}, 32 \\mathrm{~B}$, and $72 \\mathrm{~B}$, and we provide not only the original models in bfloat16 precision but also the quantized models in different precisions. Specifically, the flagship model Qwen2.5-72B-Instruct demonstrates competitive performance against the state-of-the-art open-weight model, Llama-3-405B-Instruct, which is around 5 times larger. Additionally, we also release the proprietary models of Mixture-of-Experts (MoE, Lepikhin et al., 2020; Fedus et al., 2022; Zoph et al., 2022), namely Qwen2.5-Turbo and Qwen2.5-Plus ${ }^{1}$, which performs competitively against GPT-4o-mini and GPT-4o respectively.</p> 
 <p data-bbox=\"91 586 742 868\">In this technical report, we introduce Qwen2.5, the result of our continuous endeavor to create better LLMs. Below, we show the key features of the latest version of Qwen: </p> 
 <ul data-bbox=\"136 614 742 868\"><li data-bbox=\"136 614 742 712\">Better in Size: Compared with Qwen2, in addition to $0.5 \\mathrm{~B}, 1.5 \\mathrm{~B}, 7 \\mathrm{~B}$, and $72 \\mathrm{~B}$ models, Qwen2.5 brings back the $3 \\mathrm{~B}, 14 \\mathrm{~B}$, and $32 \\mathrm{~B}$ models, which are more cost-effective for resource-limited scenarios and are under-represented in the current field of open foundation models. Qwen2.5Turbo and Qwen2.5-Plus offer a great balance among accuracy, latency, and cost.</li><li data-bbox=\"136 708 742 784\">Better in Data: The pre-training and post-training data have been improved significantly. The pre-training data increased from 7 trillion tokens to 18 trillion tokens, with focus on knowledge, coding, and mathematics. The pre-training is staged to allow transitions among different mixtures. The post-training data amounts to 1 million examples, across the stage of supervised finetuning (SFT, Ouyang et al., 2022), direct preference optimization (DPO, Raffelov et al., 2023), and group relative policy optimization (GRPO, Shao et al., 2024).</li><li data-bbox=\"136 780 742 868\">Better in Use: Several key limitations of Qwen2 in use have been eliminated, including larger generation length (from 2K tokens to 8K tokens), better support for structured input and output, (e.g., tables and JSON), and easier tool use. In addition, Qwen2.5-Turbo supports a context length of up to 1 million tokens.</li></ul> 
 <h2 data-bbox=\"91 892 338 920\"> 2 Architecture &amp; Tokenizer</h2> 
 <p data-bbox=\"91 926 742 978\">Basically, the Qwen2.5 series include dense models for opensource, namely Qwen2.5-0.5B / 1.5B / 3B / $7 \\mathrm{~B} / 14 \\mathrm{~B} / 32 \\mathrm{~B} / 72 \\mathrm{~B}$, and MoE models for API service, namely Qwen2.5-Turbo and Qwen2.5-Plus. Below, we provide details about the architecture of models.</p> 
 <p data-bbox=\"91 982 742 1070\">For dense models, we maintain the Transformer-based decoder architecture (Vaswani et al., 2017; Radford et al., 2018) as Qwen2 (Yang et al., 2024a). The architecture incorporates several key components: Grouped Query Attention (GQA, Ainslie et al., 2023) for efficient KV cache utilization, SwiGLU activation function (Dauphin et al., 2017) for non-linear activation, Rotary Positional Embeddings (RoPE, Su</p> 
 <hr/> 
 <section class=\"footnotes\" data-bbox=\"91 1028 742 1070\"><ol class=\"footnotes-list\" data-bbox=\"91 1028 742 1070\"><li class=\"footnote-item\" data-bbox=\"91 1028 742 1070\"><p data-bbox=\"91 1028 742 1070\">${ }^{1}$ Qwen2.5-Turbo is identified as qwen-turbo-2024-11-01 and Qwen2.5-Plus is identified as qwen-plus-2024-xx-xx (to be released) in the API.</p></li></ol></section> 
</body>
```html

视频理解

模型具有感知时间信息的能力，能从视频中搜索具体事件，或对不同时间段进行要点总结。

Prompt技巧：

明确任务需求：
- 指定视频理解的时间范围，如“请你描述下列视频中的一系列事件”或者“请你描述00:05:00 至 00:10:00时间段中的一系列事件”
- 事件计数：如“统计视频中‘知识点讲解’场景出现的次数及总时长，并记录事件的起始和结束时间戳”
- 动作或者画面定位：如“视频00:03:25附近5秒内是否有‘选手失误’事件？要求精确到最近0.5秒”
- 长视频分段处理：如“将下列2小时会议视频按每3分钟生成一个摘要（含时间戳），重点标注‘提问环节’和‘决议通过’事件”
明确输出要求或者格式：
- JSON结构约束：“在Prompt中要求模型以JSON格式返回时间戳（start_time、end_time）、事件类型（category）、具体事件（event）”
- 时间格式表示：“请使用HH:mm:ss或秒数（如：20秒）表示时间戳”

输入示例

示例代码

输出示例

提示词：请你描述下视频中的人物的一系列动作，以JSON格式输出开始时间（start_time）、结束事件（end_time）、事件（event），请使用HH:mm:ss表示时间戳，不要输出```json```代码段。

您可以通过设置fps参数来控制对视频抽帧的频率，视频文件将以每隔 $\frac{1}{f p s}$ 秒抽取一帧再进行内容的理解，详细用法请参见视频理解。

# ======= 重要提示 =======
# 新加坡和北京地域的API Key不同。获取API Key：https://help.aliyun.com/zh/model-studio/get-api-key
# 以下为北京地域url，若使用新加坡地域的模型，需将url替换为：https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation
# === 执行时请删除该注释 ===

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen3-vl-plus",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"video": "https://cloud.video.taobao.com/vod/C6gCj5AJ3Qrd_UQ9kaMVRY9Ig9G-WToxVYSPRdNXCao.mp4","fps": 8.0},
                    {"text": "请你描述下视频中的人物的一系列动作，以JSON格式输出开始时间（start_time）、结束事件（end_time）、事件（event），请使用HH:mm:ss表示 时间戳，不要输出```json```代码段。"}
                ]
            }
        ]
    }
}'

[
    {
        "start_time": "00:00:00.00",
        "end_time": "00:00:04.00",
        "event": "人物从画面左侧走向桌子，手中拿着一个纸箱。"
    },
    {
        "start_time": "00:00:04.00",
        "end_time": "00:00:06.00",
        "event": "人物将纸箱放在桌子上。"
    },
    {
        "start_time": "00:00:06.00",
        "end_time": "00:00:10.00",
        "event": "人物用右手拿起一个扫描枪，对准纸箱上的条形码进行扫描。"
    },
    {
        "start_time": "00:00:10.00",
        "end_time": "00:00:12.00",
        "event": "人物将扫描枪放回原位。"
    },
    {
        "start_time": "00:00:12.00",
        "end_time": "00:00:15.00",
        "event": "人物用双手拿起纸箱，将其移至一旁。"
    },
    {
        "start_time": "00:00:15.00",
        "end_time": "00:00:20.00",
        "event": "人物用右手拿起一支笔，在桌上的笔记本上记录信息。"
    }
]

API参考

关于通义千问VL模型的输入输出参数，请参见通义千问。

常见问题

如何将图像或视频压缩到满足要求的大小？

通义千问VL 对输入的文件有大小限制，可通过以下方法压缩。

图像压缩方法

在线工具：使用 CompressJPEG 或 TinyPng 等在线工具进行压缩。
本地软件：使用 Photoshop 等软件，在导出时调整质量。

代码实现：

# pip install pillow

from PIL import Image
def compress_image(input_path, output_path, quality=85):
    with Image.open(input_path) as img:
        img.save(output_path, "JPEG", optimize=True, quality=quality)

# 传入本地图像
compress_image("/xxx/before-large.jpeg","/xxx/after-min.jpeg")

视频压缩方法

在线工具：使用 FreeConvert 等在线工具进行压缩。
本地软件：使用 HandBrake 等软件。

代码实现：使用FFmpeg工具，更多用法请参见FFmpeg官网。

# 基础转换命令（万能模板）
# -i，作用：输入文件路径，常用值示例：input.mp4
# -vcodec，作用 视频编码器 ，一般取值有libx264（通用推荐）、libx265（压缩率更高）、
# -crf，作用：控制视频质量，取值范围：[18-28]，数值越小，质量越高，文件体积越大。
# --preset，作用：控制编码速度与压缩效率的平衡。一般取值有 slow、fast、faster
# -y，作用：覆盖已存在文件(无需值)
# output.mp4，作用：输出文件路径

ffmpeg -i input.mp4 -vcodec libx264 -crf 28 -preset slow output.mp4

通义千问VL是否支持批量提交任务？

目前 qwen-vl-max、qwen-vl-max-latest、qwen-vl-plus、qwen-vl-plus-latest 模型兼容OpenAI Batch 接口，支持以文件方式批量提交任务并异步执行。对于处理大规模数据且对时效性有一定容忍度的场景，推荐使用Batch调用，费用仅为实时调用的 50%。

通义千问VL是否支持处理PDF、XLSX、XLS、DOC等文本文件？

不支持，通义千问VL模型属于视觉理解模型，只能处理图像格式的文件，无法直接处理文本文件。如需，可参考以下替代方案：

将文本文件转换为图片格式，但是注意转换后的图像尺寸可能过大，建议您将文件切分为多张图像，使用多图像输入的方式传入模型。
Qwen-Long支持处理文本文件，可用于解析文件内容。

错误码

如果模型调用失败并返回报错信息，请参见错误信息进行解决。