文档

通义千问VL

更新时间:

通义千问VL模型可以根据您传入的图片或视频来进行回答。

访问模型广场可以在线体验图片理解能力。视频理解能力当前仅支持通过API使用。

应用示例

提取图片里的文字

输入示例

传统OCR工具输出示例

视觉语言模型输出示例

image

image

传统OCR工具提取的内容不完整且格式混乱。

image

大模型不仅能精准提取图片中的文字,还能以规整的形式输出。

识别图片里的物体、人物、神态、动作、场景等,理解图片含义

输入示例

输出示例

量身高

图片里是什么

图片里是一位母亲正在给她的孩子测量身高。母亲蹲在地上,用笔在墙上标记孩子的身高。孩子站在一个黄色的身高测量尺旁边,显得非常开心。背景是一个明亮的房间,有一些家具和装饰。

1

图片想表达什么

这张图片通过展示冰山的全貌,包括水面上和水面下的部分,传达了“冰山一角”这个成语的含义。冰山一角通常用来形容事物的表面现象,而真正的本质或深层次的问题往往隐藏在表面之下,就像冰山的大部分体积隐藏在水下一样。图片可能想表达的是,人们在面对问题或事物时,不应该只看到表面,而应该深入探究其背后的真相和本质。

看图做题、看图翻译

输入示例

输出示例

题目10

解这道题

image

巴巴

把图片里的文字翻译成中文

image

根据图片生成前端代码

输入示例

输出示例

image

提示词:转换成HTML、CSS和基于JavaScript的界面。

大模型输出内容

要将这个界面转换成HTML、CSS和JavaScript,你可以按照以下步骤进行:

<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>阿里云百炼</title>
    <link rel="stylesheet" href="styles.css">
</head>
<body>
    <div class="container">
        <h1>你好,欢迎使用阿里云百炼</h1>
        <div class="search-bar">
            <div class="selected-model">
                已选择
                <div class="model-selector">
                    <img src="model-icon.png" alt="Model Icon">
                    通义千问-Max
                    <span class="dropdown-icon">▼</span>
                </div>
            </div>
            <input type="text" placeholder="输入你的问题开启模型体验吧" class="search-input">
            <button class="search-button">发送</button>
        </div>
        <div class="features">
            <div class="feature feature-1">
                <h2>模型调用</h2>
                <p>结合业务需求,选择适合的大模型,支持API/SDK的调用方式</p>
                <img src="feature-1-icon.png" alt="Feature 1 Icon">
                <div class="feature-buttons">
                    <a href="#" class="button">立即调用</a>
                    <a href="#" class="button">查看教程</a>
                </div>
            </div>
            <div class="feature feature-2">
                <h2>应用创建</h2>
                <p>基于Assistant API,支持调用Prompt、插件、RAG等多种能力,完成大模型应用的创建。</p>
                <img src="feature-2-icon.png" alt="Feature 2 Icon">
                <div class="feature-buttons">
                    <a href="#" class="button">立即创建</a>
                    <a href="#" class="button">查看教程</a>
                </div>
            </div>
        </div>
    </div>
    <script src="script.js"></script>
</body>
</html>
body {
    font-family: Arial, sans-serif;
    background: linear-gradient(to bottom, #f8f9fa, #e9ecef);
    color: #343a40;
}

.container {
    max-width: 1200px;
    margin: 0 auto;
    padding: 20px;
}

h1 {
    text-align: center;
    margin-bottom: 30px;
}

.search-bar {
    display: flex;
    align-items: center;
    margin-bottom: 30px;
}

.selected-model {
    display: flex;
    align-items: center;
    margin-right: 10px;
}

.model-selector {
    display: flex;
    align-items: center;
    border: 1px solid #ced4da;
    border-radius: 5px;
    padding: 5px 10px;
    cursor: pointer;
}

.model-selector img {
    width: 20px;
    height: 20px;
    margin-right: 5px;
}

.dropdown-icon {
    margin-left: 5px;
}

.search-input {
    flex: 1;
    padding: 10px;
    border: 1px solid #ced4da;
    border-radius: 5px;
    margin-right: 10px;
}

.search-button {
    padding: 10px 20px;
    background-color: #6c757d;
    color: #fff;
    border: none;
    border-radius: 5px;
    cursor: pointer;
}

.features {
    display: flex;
    justify-content: space-between;
}

.feature {
    flex: 1;
    padding: 20px;
    border-radius: 5px;
    margin: 10px;
}

.feature-1 {
    background-color: #5a5aff;
    color: #fff;
}

.feature-2 {
    background-color: #48dbfb;
    color: #fff;
}

.feature h2 {
    margin-bottom: 10px;
}

.feature p {
    margin-bottom: 20px;
}

.feature img {
    width: 100px;
    height: 100px;
    margin: 0 auto;
}

.feature-buttons {
    display: flex;
    justify-content: space-between;
}

.button {
    padding: 5px 10px;
    border: none;
    border-radius: 5px;
    color: #fff;
    text-decoration: none;
}

.button:hover {
    background-color: #495057;
}
document.addEventListener('DOMContentLoaded', function() {
    const modelSelector = document.querySelector('.model-selector');
    const dropdownIcon = document.querySelector('.dropdown-icon');

    modelSelector.addEventListener('click', function() {
        dropdownIcon.classList.toggle('open');
    });
});

这个HTML、CSS和JavaScript代码将创建一个类似于图片中显示的界面。你可以根据需要调整样式和功能。

基于大模型提供的代码,生成前端界面如下:

image

总结视频内容,基于视频进行问答

视频

对话示例

输入请介绍视频内容。

输出视频中展示了一位女士在客厅中使用吸尘器清洁地板的过程。她从客厅的入口处开始,沿着地板的边缘和家具周围进行清洁,确保每个角落都得到彻底的清理。她使用吸尘器的吸力将灰尘和杂物吸入吸尘器中,使地板变得干净整洁。在清洁过程中,她还注意到了一些细节,比如沙发和桌子下面的灰尘,她也用吸尘器进行了清理。最后,她将吸尘器的电线拔掉,将吸尘器放在沙发旁边,完成了整个清洁过程。

输入吸尘器是什么颜色?

输出吸尘器是红色的。

通义千问VL模型

商业版模型

通义千问VL模型按输入和输出的总Token数进行计费。图像转换为Token的规则如下:分辨率为512*512像素的图像约等于334个Token,其他分辨率的图像按比例换算;最小单位为28x28像素,即每28x28像素对应一个Token,如果图像的长或宽不是28的整数倍,则向上取整至28的整数倍进行计算;一张图最少4个Token,最多1280个Token(qwen-vl-max、qwen-vl-max-0809、qwen-vl-plus-0809模型可接受单张图片的最大输入为 16384 个 Token)。

模型名称

说明

上下文长度

最大输入

最大输出

输入输出单价

免费额度

(Token数)

(每千Token)

qwen-vl-max

本次更新上下文支持32k,增强图像理解和视频推理能力,可以更好地识别图片中的多语言文字和手写体的文字。

32k

30k

2k

0.02元

100万Token

有效期:百炼开通后30天内

qwen-vl-max-0809

本模型为qwen-vl-max的2024年8月9日快照版本,快照版本维护到下个快照版本发布时间(待定)后一个月。

本模型等同于模型Qwen2-VL-72B

32k

30k

2k

0.02元

100万Token

有效期:已开通百炼的用户,自8月23日0点起30天内有效。

新开通百炼的用户,在开通后30天内有效。

qwen-vl-plus-0809

本模型为qwen-vl-plus的2024年8月9日快照版本。

32k

30k

2k

0.008元

100万Token

有效期:百炼开通后30天内

qwen-vl-max-0201

本模型为qwen-vl-max的2024年2月1日快照版本,快照版本维护到下个快照版本发布时间(待定)后一个月。

8k

6k

2k

0.02元

100万Token

有效期:百炼开通后30天内

qwen-vl-plus

大幅提升细节识别能力和文字识别能力,支持超百万像素分辨率和任意长宽比规格的图像。在广泛的视觉任务中提供卓越性能。

2k

0.008元

开源版模型

模型名称

上下文长度

最大输入

最大输出

输入成本

输出成本

免费额度

(Token数)

(每千Token)

qwen-vl-v1

8,000

6,000

1,500

目前仅供免费体验。

免费额度用完后不可调用,敬请关注后续动态。

10万Token

有效期:百炼开通后180天内

qwen-vl-chat-v1

如何使用

您需要已获取API-KEY配置API-KEY到环境变量。如果通过OpenAI SDK或DashScope SDK进行调用,还需要安装SDK

简单示例

OpenAI兼容

您可以通过OpenAI SDK或OpenAI兼容的HTTP方式调用通义千问VL模型。

Python

示例代码

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-vl-max",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
                    },
                },
                {"type": "text", "text": "这是什么"},
            ],
        }
    ],
)

print(completion.model_dump_json())

返回结果

{
  "id": "chatcmpl-4b5a3bb9-221f-9687-bdd7-a7d56aae44df",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "这张图片展示了一位女士和一只狗在海滩上互动。女士坐在沙滩上,微笑着与狗握手。背景是海浪和天空,阳光洒在她们身上,营造出温馨的氛围。狗戴着项圈,显得很温顺。",
        "role": "assistant",
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
  "created": 1725948492,
  "model": "qwen-vl-max",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 55,
    "prompt_tokens": 1270,
    "total_tokens": 1325
  }
}

curl

示例代码

{
    "choices": [
        {
            "message": {
                "content": "这是一张在海滩上拍摄的照片。照片中,一个人和一只狗坐在沙滩上,背景是大海和天空。人和狗似乎在互动,狗的前爪搭在人的手上。阳光从画面的右侧照射过来,给整个场景增添了一种温暖的氛围。",
                "role": "assistant"
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 1270,
        "completion_tokens": 61,
        "total_tokens": 1331
    },
    "created": 1726369725,
    "system_fingerprint": null,
    "model": "qwen-vl-max",
    "id": "chatcmpl-58870858-6eea-9161-9456-4095a68374a4"
}

返回结果

{
  "choices": [
    {
      "message": {
        "content": "这张图片展示了一位女士和一只狗在海滩上互动。女士坐在沙滩上,微笑着与狗握手。背景是大海和天空,阳光洒在她们身上,营造出温暖的氛围。狗戴着项圈,显得很温顺。",
        "role": "assistant"
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null
    }
  ],
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 1270,
    "completion_tokens": 54,
    "total_tokens": 1324
  },
  "created": 1725948561,
  "system_fingerprint": null,
  "model": "qwen-vl-max",
  "id": "chatcmpl-0fd66f46-b09e-9164-a84f-3ebbbedbac15"
}

DashScope

您可以通过DashScope SDK或HTTP方式调用通义千问VL模型。

Python

示例代码

from http import HTTPStatus
import dashscope

messages = [
    {
        "role": "user",
        "content": [
            {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
            {"text": "这是什么?"}
        ]
    }
]

response = dashscope.MultiModalConversation.call(
    model='qwen-vl-max',
    messages=messages
)

if response.status_code == HTTPStatus.OK:
    print(response)
else:
    print(response.code)
    print(response.message)

返回结果

{
    "status_code": 200,
    "request_id": "3a031529-707f-9b7d-968c-172e7533debc",
    "code": "",
    "message": "",
    "output": {
        "text": null,
        "finish_reason": null,
        "choices": [
            {
                "finish_reason": "stop",
                "message": {
                    "role": "assistant",
                    "content": [
                        {
                            "text": "这是一张在海滩上拍摄的照片。照片中有一位女士和一只狗。女士穿着格子衬衫,坐在沙滩上,微笑着与狗互动。狗戴着项圈,似乎在与女士握手。背景是大海和天空,阳光洒在她们身上,营造出温暖的氛围。"
                        }
                    ]
                }
            }
        ]
    },
    "usage": {
        "input_tokens": 1271,
        "output_tokens": 63,
        "image_tokens": 1247
    }
}

Java

示例代码

// Copyright (c) Alibaba, Inc. and its affiliates.

import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"),
                        Collections.singletonMap("text", "这是什么?"))).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .model("qwen-vl-max")
                .message(userMessage)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(JsonUtils.toJson(result));
    }

    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回结果

{
  "requestId": "dcb38a0f-fd69-9071-bcde-c4530f9a7559",
  "usage": {
    "input_tokens": 1271,
    "output_tokens": 58
  },
  "output": {
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "这是一张在海滩上拍摄的照片。照片中有一位女士和一只狗。女士穿着格子衬衫,坐在沙滩上,与狗互动。狗戴着项圈,看起来很开心。背景是大海和天空,阳光洒在她们身上,营造出温暖的氛围。"
            }
          ]
        }
      }
    ]
  }
}

curl

示例代码

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen-vl-max",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
                    {"text": "这是什么?"}
                ]
            }
        ]
    }
}'

返回结果

{
  "output": {
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "这是一张在海滩上拍摄的照片。照片中有一个穿着格子衬衫的人和一只戴着项圈的狗。他们坐在沙滩上,背景是大海和天空。阳光从画面的右侧照射过来,给整个场景增添了一种温暖的氛围。"
            }
          ]
        }
      }
    ]
  },
  "usage": {
    "output_tokens": 55,
    "input_tokens": 1271,
    "image_tokens": 1247
  },
  "request_id": "ccf845a3-dc33-9cda-b581-20fe7dc23f70"
}

多图片输入

您可以在一次请求中向通义千问VL模型输入多张图片,传入方法请参考以下代码。

OpenAI兼容

您可以通过OpenAI SDK或OpenAI兼容的HTTP方式调用通义千问VL模型。

Python

示例代码

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-vl-max",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
                    }
                },
                {
                    "type": "text",
                    "text": "这些是什么"
                }
            ]
        }
    ]
)

print(completion.model_dump_json())

返回结果

{
  "id": "chatcmpl-4b5a3bb9-221f-9687-bdd7-a7d56aae44df",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "图1中是一位女士和一只拉布拉多犬在海滩上互动的场景。女士穿着格子衬衫,坐在沙滩上,与狗进行握手的动作,背景是海浪和天空,整个画面充满了温馨和愉快的氛围。\n\n图2中是一只老虎在森林中行走的场景。老虎的毛色是橙色和黑色相间的条纹,它正向前迈步,周围是茂密的树木和植被,地面上覆盖着落叶,整个画面给人一种野生自然的感觉。",
        "role": "assistant",
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
  "created": 1725948492,
  "model": "qwen-vl-max",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 106,
    "prompt_tokens": 2497,
    "total_tokens": 2603
  }
}

curl

示例代码

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
  "model": "qwen-vl-max",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
          }
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
          }
        },
        {
          "type": "text",
          "text": "这些是什么"
        }
      ]
    }
  ]
}'

返回结果

{
  "choices": [
    {
      "message": {
        "content": "图1中是一位女士和一只拉布拉多犬在海滩上互动的场景。女士穿着格子衬衫,坐在沙滩上,与狗进行握手的动作,背景是海景和日落的天空,整个画面显得非常温馨和谐。\n\n图2中是一只老虎在森林中行走的场景。老虎的毛色是橙色和黑色条纹相间,它正向前迈步,周围是茂密的树木和植被,地面上覆盖着落叶,整个画面充满了自然的野性和生机。",
        "role": "assistant"
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null
    }
  ],
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 2497,
    "completion_tokens": 109,
    "total_tokens": 2606
  },
  "created": 1725948561,
  "system_fingerprint": null,
  "model": "qwen-vl-max",
  "id": "chatcmpl-0fd66f46-b09e-9164-a84f-3ebbbedbac15"
}

DashScope

您可以通过DashScope SDK或HTTP方式调用通义千问VL模型。

Python

示例代码

from http import HTTPStatus
import dashscope

messages = [
    {
        "role": "user",
        "content": [
            {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
            {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"},
            {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/rabbit.png"},
            {"text": "这些是什么?"}
        ]
    }
]

response = dashscope.MultiModalConversation.call(
    model='qwen-vl-plus',
    messages=messages
)

if response.status_code == HTTPStatus.OK:
    print(response)
else:
    print(response.code)
    print(response.message)

返回结果

{
    "status_code": 200,
    "request_id": "3a031529-707f-9b7d-968c-172e7533debc",
    "code": "",
    "message": "",
    "output": {
        "text": null,
        "finish_reason": null,
        "choices": [
            {
                "finish_reason": "stop",
                "message": {
                    "role": "assistant",
                    "content": [
                        {
                            "text": "图1中是一名女子和狗在沙滩上玩耍。\n图2是孟加拉虎的插画,它正向镜头走来。\n图3里是一只可爱的小白兔。"
                        }
                    ]
                }
            }
        ]
    },
    "usage": {
        "input_tokens": 3743,
        "output_tokens": 41,
        "image_tokens": 3697
    }
}

Java

示例代码

import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"),
                        Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"),
                        Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/rabbit.png"),
                        Collections.singletonMap("text", "这些是什么?"))).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .model("qwen-vl-plus")
                .message(userMessage)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(JsonUtils.toJson(result));
    }

    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回结果

{
  "requestId": "dcb38a0f-fd69-9071-bcde-c4530f9a7559",
  "usage": {
    "input_tokens": 3740,
    "output_tokens": 48
  },
  "output": {
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "图1中是一名女子和一只大金毛在沙滩上玩耍。\n图2是孟加拉虎的写实照片,老虎正向镜头走来。\n图3是一幅插画,主要展示了一只兔子。"
            }
          ]
        }
      }
    ]
  }
}

curl

示例代码

curl --location 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-vl-plus",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
                    {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"},
                    {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/rabbit.png"},
                    {"text": "这些是什么?"}
                ]
            }
        ]
    }
}'

返回结果

{
  "output": {
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "这张图片显示了一位女士和她的狗在海滩上。她们似乎正在享受彼此的陪伴,狗狗坐在沙滩上伸出爪子与女士握手或互动。背景是美丽的日落景色,海浪轻轻拍打着海岸线。\n\n请注意,我提供的描述基于图像中可见的内容,并不包括任何超出视觉信息之外的信息。如果您需要更多关于这个场景的具体细节,请告诉我!"
            }
          ]
        }
      }
    ]
  },
  "usage": {
    "output_tokens": 81,
    "input_tokens": 1277,
    "image_tokens": 1247
  },
  "request_id": "ccf845a3-dc33-9cda-b581-20fe7dc23f70"
}

多轮对话

通义千问VL模型可以参考历史对话信息进行回复。您可以参考以下示例代码,通过OpenAI或者DashScope的方式,调用通义千问VL模型,实现多轮对话的功能。

OpenAI兼容

您可以通过OpenAI SDK或OpenAI兼容的HTTP方式调用通义千问VL模型,体验多轮对话的功能。

Python

示例代码

from openai import OpenAI
import os


def get_response():
    client = OpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
    )
    messages=[
        {
            "role": "user",
            "content": [
            {
                "type": "image_url",
                "image_url": {
                "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
                }
            },
            {
                "type": "text",
                "text": "这是什么"
            }
            ]
        }
        ]
    completion = client.chat.completions.create(
        model="qwen-vl-plus",
        messages=messages,
        )
    print(f"模型第一轮输出:\n{completion.model_dump()}")
    assistant_message = completion.choices[0].message
    messages.append(assistant_message.model_dump())
    messages.append({
            "role": "user",
            "content": [
            {
                "type": "text",
                "text": "做一首诗描述这个场景"
            }
            ]
        })
    completion = client.chat.completions.create(
        model="qwen-vl-plus",
        messages=messages,
        )
    print(f"模型第二轮输出:\n{completion.model_dump()}")

if __name__=='__main__':
    get_response()

返回结果

模型第一轮输出:
{
  "id": "chatcmpl-afcdac19-9bbf-942b-91ad-04252fe1722c",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": None,
      "message": {
        "content": "图中是一名女子和她的狗在沙滩上互动。狗狗坐在地上,伸出爪子像是要握手或者击掌的样子。这名女士穿着格子衬衫,似乎正在与狗狗进行亲密的接触,并且面带微笑。背景是海洋和日出或日落时分的天空。这是一幅描绘人与宠物之间温馨时刻的画面。",
        "role": "assistant",
        "function_call": None,
        "tool_calls": None
      }
    }
  ],
  "created": 1721820065,
  "model": "qwen-vl-plus",
  "object": "chat.completion",
  "service_tier": None,
  "system_fingerprint": None,
  "usage": {
    "completion_tokens": 75,
    "prompt_tokens": 1276,
    "total_tokens": 1351
  }
}
模型第二轮输出:
{
  "id": "chatcmpl-3090adc4-91da-95d9-8482-49240d47099a",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": None,
      "message": {
        "content": "朝阳照海涛,\n沙岸女伴犬。\n欢笑共此时,\n友情深似海。\n\n手握同游步,\n默契心间留。\n潮起潮又落,\n此景久难忘。",
        "role": "assistant",
        "function_call": None,
        "tool_calls": None
      }
    }
  ],
  "created": 1721820068,
  "model": "qwen-vl-plus",
  "object": "chat.completion",
  "service_tier": None,
  "system_fingerprint": None,
  "usage": {
    "completion_tokens": 44,
    "prompt_tokens": 1366,
    "total_tokens": 1410
  }
}

curl

示例代码

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
  "model": "qwen-vl-max",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
          }
        },
        {
          "type": "text",
          "text": "这是什么"
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "这是一个女孩和一只狗。"
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "写一首七言绝句描述这个场景"
        }
      ]
    }
  ]
}'

返回结果

{
    "choices": [
        {
            "message": {
                "content": "海风轻拂笑颜开,  \n沙滩上与犬相陪。  \n夕阳斜照人影短,  \n欢乐时光心自醉。",
                "role": "assistant"
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 1295,
        "completion_tokens": 32,
        "total_tokens": 1327
    },
    "created": 1726324976,
    "system_fingerprint": null,
    "model": "qwen-vl-max",
    "id": "chatcmpl-3c953977-6107-96c5-9a13-c01e328b24ca"
}

DashScope

您可以通过DashScope SDK或HTTP方式调用通义千问VL模型,体验多轮对话的功能。

Python

示例代码

from dashscope import MultiModalConversation


def conversation_call():
    messages = [
        {
            "role": "user",
            "content": [
                {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
                {"text": "这是什么?"},
            ]
        }
    ]
    response = MultiModalConversation.call(
        model='qwen-vl-plus',
        messages=messages
        )
    print(f"模型第一轮输出:{response}")
    messages.append(response['output']['choices'][0]['message'])
    user_msg = {
                "role": "user",
                "content": [
                    {
                    "text": "做一首诗描述这个场景"
                    }
                ]
                }
    messages.append(user_msg)
    response = MultiModalConversation.call(
        model='qwen-vl-plus',
        messages=messages
        )
    print(f"模型第二轮输出:{response}")

if __name__ == '__main__':
    conversation_call()

返回结果

模型第一轮输出:
{
  "status_code": 200,
  "request_id": "0468708b-85fb-95d8-a502-bb9098b31b37",
  "code": "",
  "message": "",
  "output": {
    "text": null,
    "finish_reason": null,
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "这张图片显示了一位女士和她的狗在海滩上。她们似乎正在享受彼此的陪伴,狗狗坐在沙滩上伸出爪子与女士握手或互动。背景是美丽的日落景色,海浪轻轻拍打着海岸线。\n\n请注意,我提供的描述基于图像中可见的内容,并不包括任何超出视觉信息之外的信息。如果您需要更多关于这个场景的具体细节,请告诉我!"
            }
          ]
        }
      }
    ]
  },
  "usage": {
    "input_tokens": 1277,
    "output_tokens": 81,
    "image_tokens": 1247
  }
}
模型第二轮输出:
{
  "status_code": 200,
  "request_id": "8f236443-7b01-9bad-87be-abff5d3887de",
  "code": "",
  "message": "",
  "output": {
    "text": null,
    "finish_reason": null,
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "夕阳染红了天边,\n波涛轻抚着沙岸。\n人犬共坐此情深,\n\n手握着手心相牵。\n欢笑回荡于风间,\n这一刻永恒不变。\n爱意如潮水般涌动,\n在这片金色的大地上蔓延。"
            }
          ]
        }
      }
    ]
  },
  "usage": {
    "input_tokens": 1373,
    "output_tokens": 59,
    "image_tokens": 1247
  }
}

Java

示例代码

// Copyright (c) Alibaba, Inc. and its affiliates.

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
    private static final String modelName = "qwen-vl-plus";
    public static void MultiRoundConversationCall() throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage systemMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
                .content(Arrays.asList(Collections.singletonMap("text", "You are a helpful assistant."))).build();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"),
                        Collections.singletonMap("text", "这是什么?"))).build();
        List<MultiModalMessage> messages = new ArrayList<>();
        messages.add(systemMessage);
        messages.add(userMessage);
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .model(modelName)
                .messages(messages)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(JsonUtils.toJson(result));
        // add the result to conversation
        messages.add(result.getOutput().getChoices().get(0).getMessage());
        MultiModalMessage msg = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(Collections.singletonMap("text", "做一首诗描述这个场景"))).build();
        messages.add(msg);
        // new messages
        param.setMessages((List)messages);
        result = conv.call(param);
        System.out.println(JsonUtils.toJson(result));
    }

    public static void main(String[] args) {
        try {
            MultiRoundConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回结果

{
  "requestId": "1239e1de-dbd3-9b46-b508-421023ed3053",
  "usage": {
    "input_tokens": 1277,
    "output_tokens": 81
  },
  "output": {
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "这张图片显示了一位女士和她的狗在海滩上。她们似乎正在享受彼此的陪伴,狗狗坐在沙滩上伸出爪子与女士握手或互动。背景是美丽的日落景色,海浪轻轻拍打着海岸线。\n\n请注意,我提供的描述基于图像中可见的内容,并不包括任何超出视觉信息之外的信息。如果您需要更多关于这个场景的具体细节,请告诉我!"
            }
          ]
        }
      }
    ]
  }
}
{
  "requestId": "045fe96f-26c4-9cfd-b0ad-ec5f1f4033ce",
  "usage": {
    "input_tokens": 1373,
    "output_tokens": 59
  },
  "output": {
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "夕阳染红了天边,\n波涛轻抚着沙岸。\n人犬共坐此情深,\n\n手握着手心相牵。\n欢笑回荡于风间,\n这一刻永恒不变。\n爱意如潮水般涌动,\n在这片金色的大地上蔓延。"
            }
          ]
        }
      }
    ]
  }
}

curl

示例代码

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen-vl-max",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
                    {"text": "这是什么?"}
                ]
            },
            {
                "role": "assistant",
                "content": [
                    {"text": "这是一只狗和一只女孩。"}
                ]
            },
            {
                "role": "user",
                "content": [
                    {"text": "写一首七言绝句描述这个场景"}
                ]
            }
        ]
    }
}'

返回结果

{
    "output": {
        "choices": [
            {
                "finish_reason": "stop",
                "message": {
                    "role": "assistant",
                    "content": [
                        {
                            "text": "海浪轻拍沙滩边,女孩与狗同嬉戏。阳光洒落笑颜开,快乐时光永铭记。"
                        }
                    ]
                }
            }
        ]
    },
    "usage": {
        "output_tokens": 27,
        "input_tokens": 1298,
        "image_tokens": 1247
    },
    "request_id": "bdf5ef59-c92e-92a6-9d69-a738ecee1590"
}

流式输出

大模型并不是一次性生成最终结果,而是逐步地生成中间结果,最终结果由中间结果拼接而成。使用非流式输出方式需要等待模型生成结束后再将生成的中间结果拼接后返回,而流式输出可以实时地将中间结果返回,您可以在模型进行输出的同时进行阅读,减少等待模型回复的时间。

OpenAI兼容

您可以通过OpenAI SDK或OpenAI兼容的HTTP方式调用通义千问VL模型,体验流式输出的功能。

Python

示例代码

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-vl-plus",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
                    }
                },
                {
                    "type": "text",
                    "text": "这是什么"
                }
            ]
        }
    ],
    stream=True,
    stream_options={"include_usage": True}
)

for chunk in completion:
    print(chunk.model_dump())

返回结果

{'id': 'chatcmpl-6cf91cc7-1121-9977-b4bc-5e7d1fbfd693', 'choices': [{'delta': {'content': '', 'function_call': None, 'role': 'assistant', 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1721823365, 'model': 'qwen-vl-plus', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
{'id': 'chatcmpl-6cf91cc7-1121-9977-b4bc-5e7d1fbfd693', 'choices': [{'delta': {'content': '图', 'function_call': None, 'role': None, 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1721823365, 'model': 'qwen-vl-plus', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
{'id': 'chatcmpl-6cf91cc7-1121-9977-b4bc-5e7d1fbfd693', 'choices': [{'delta': {'content': '中', 'function_call': None, 'role': None, 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1721823365, 'model': 'qwen-vl-plus', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
{'id': 'chatcmpl-6cf91cc7-1121-9977-b4bc-5e7d1fbfd693', 'choices': [{'delta': {'content': '是一名', 'function_call': None, 'role': None, 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1721823365, 'model': 'qwen-vl-plus', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
{'id': 'chatcmpl-6cf91cc7-1121-9977-b4bc-5e7d1fbfd693', 'choices': [{'delta': {'content': '女子和她的狗在', 'function_call': None, 'role': None, 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1721823365, 'model': 'qwen-vl-plus', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
{'id': 'chatcmpl-6cf91cc7-1121-9977-b4bc-5e7d1fbfd693', 'choices': [{'delta': {'content': '沙滩上互动。狗狗坐在地上,', 'function_call': None, 'role': None, 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1721823365, 'model': 'qwen-vl-plus', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
{'id': 'chatcmpl-6cf91cc7-1121-9977-b4bc-5e7d1fbfd693', 'choices': [{'delta': {'content': '伸出爪子像是要握手或者击', 'function_call': None, 'role': None, 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1721823365, 'model': 'qwen-vl-plus', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
{'id': 'chatcmpl-6cf91cc7-1121-9977-b4bc-5e7d1fbfd693', 'choices': [{'delta': {'content': '掌的样子。这名女士穿着格子', 'function_call': None, 'role': None, 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1721823365, 'model': 'qwen-vl-plus', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
{'id': 'chatcmpl-6cf91cc7-1121-9977-b4bc-5e7d1fbfd693', 'choices': [{'delta': {'content': '衬衫,似乎正在与狗狗进行亲密', 'function_call': None, 'role': None, 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1721823365, 'model': 'qwen-vl-plus', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
{'id': 'chatcmpl-6cf91cc7-1121-9977-b4bc-5e7d1fbfd693', 'choices': [{'delta': {'content': '的接触,并且面带微笑。', 'function_call': None, 'role': None, 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1721823365, 'model': 'qwen-vl-plus', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
{'id': 'chatcmpl-6cf91cc7-1121-9977-b4bc-5e7d1fbfd693', 'choices': [{'delta': {'content': '背景是海洋和日出或日', 'function_call': None, 'role': None, 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1721823365, 'model': 'qwen-vl-plus', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
{'id': 'chatcmpl-6cf91cc7-1121-9977-b4bc-5e7d1fbfd693', 'choices': [{'delta': {'content': '落时分的天空。这是一', 'function_call': None, 'role': None, 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1721823365, 'model': 'qwen-vl-plus', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
{'id': 'chatcmpl-6cf91cc7-1121-9977-b4bc-5e7d1fbfd693', 'choices': [{'delta': {'content': '幅描绘人与宠物之间温馨时刻', 'function_call': None, 'role': None, 'tool_calls': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1721823365, 'model': 'qwen-vl-plus', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
{'id': 'chatcmpl-6cf91cc7-1121-9977-b4bc-5e7d1fbfd693', 'choices': [{'delta': {'content': '的画面。', 'function_call': None, 'role': None, 'tool_calls': None}, 'finish_reason': 'stop', 'index': 0, 'logprobs': None}], 'created': 1721823365, 'model': 'qwen-vl-plus', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': None}
{'id': 'chatcmpl-6cf91cc7-1121-9977-b4bc-5e7d1fbfd693', 'choices': [], 'created': 1721823365, 'model': 'qwen-vl-plus', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': None, 'usage': {'completion_tokens': 75, 'prompt_tokens': 1276, 'total_tokens': 1351}}

curl

示例代码

curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-vl-plus",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"
          }
        },
        {
          "type": "text",
          "text": "这是什么"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{"include_usage":true}
}'

返回结果

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"finish_reason":null,"delta":{"content":"图"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"delta":{"content":"中"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"delta":{"content":"是一名"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"delta":{"content":"女子和她的狗在"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"delta":{"content":"沙滩上互动。狗狗坐在地上,"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"delta":{"content":"伸出爪子像是要握手或者击"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"delta":{"content":"掌的样子。这名女士穿着格子"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"delta":{"content":"衬衫,似乎正在与狗狗进行亲密"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"delta":{"content":"的接触,并且面带微笑。"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"delta":{"content":"他们背后的海浪拍打着海岸线"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"delta":{"content":",天空看起来很明亮但有些模糊"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"delta":{"content":",可能是日出或日落时"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"delta":{"content":"分拍摄的照片。整体氛围显得非常"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"finish_reason":"stop","delta":{"content":"和谐而温馨。"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":1276,"completion_tokens":85,"total_tokens":1361},"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: [DONE]

DashScope

您可以通过DashScope SDK或HTTP方式调用通义千问VL模型,体验流式输出的功能。

Python

示例代码

from dashscope import MultiModalConversation


def simple_multimodal_conversation_call():
    """Simple single round multimodal conversation call.
    """
    messages = [
        {
            "role": "user",
            "content": [
                {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
                {"text": "这是什么?"}
            ]
        }
    ]
    responses = MultiModalConversation.call(
        model='qwen-vl-plus',
        messages=messages,
        stream=True,
        incremental_output=True
        )
    for response in responses:
        print(response)


if __name__ == '__main__':
    simple_multimodal_conversation_call()

返回结果

{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "这张"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 1, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "图片"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 2, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "显示"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 3, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "了一位女士和一只"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 8, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "狗在海滩上。她们似乎正在"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 16, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "互动,可能是在玩耍或训练中"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 24, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "握手。背景是美丽的日落景色"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 32, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": ",海浪轻轻拍打着海岸线"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 40, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "。\n\n这位女士穿着格子衬衫,并"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 48, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "且戴着一个手镯。她坐在"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 56, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "沙滩上与她的宠物进行着愉快"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 64, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "的时光。这只狗看起来是一只"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 72, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "拉布拉多犬或其他类似的品种,"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 80, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "它也戴着手套以保护它的"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 88, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "爪子并保持清洁。\n\n这个场景"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 96, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "充满了友谊、爱以及对大自然美景"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 104, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "的欣赏。这是一个温馨的画面,展示了"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 112, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "人与动物之间深厚的情感纽带。"}]}}]}, "usage": {"input_tokens": 1276, "output_tokens": 120, "image_tokens": 1247}}
{"status_code": 200, "request_id": "124a9f95-0a92-9ae7-8462-517724722b2b", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "stop", "message": {"role": "assistant", "content": []}}]}, "usage": {"input_tokens": 1276, "output_tokens": 121, "image_tokens": 1247}}

Java

示例代码

import java.util.Arrays;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import io.reactivex.Flowable;

public class Main {
    public static void streamCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        // must create mutable map.
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(new HashMap<String, Object>(){{put("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg");}},
                        new HashMap<String, Object>(){{put("text", "这是什么");}})).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .model("qwen-vl-plus")
                .message(userMessage)
                .incrementalOutput(true)
                .build();
        Flowable<MultiModalConversationResult> result = conv.streamCall(param);
        result.blockingForEach(item -> {
            System.out.println(JsonUtils.toJson(item));
        });
    }

    public static void main(String[] args) {
        try {
            streamCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回结果

{"requestId":"8471902a-9936-9f56-9b84-e786007d633a","usage":{"input_tokens":1275,"output_tokens":1},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"这张"}]}}]}}
{"requestId":"8471902a-9936-9f56-9b84-e786007d633a","usage":{"input_tokens":1275,"output_tokens":2},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"图片"}]}}]}}
{"requestId":"8471902a-9936-9f56-9b84-e786007d633a","usage":{"input_tokens":1275,"output_tokens":3},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"显示"}]}}]}}
{"requestId":"8471902a-9936-9f56-9b84-e786007d633a","usage":{"input_tokens":1275,"output_tokens":8},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"了一位女士和一只"}]}}]}}
{"requestId":"8471902a-9936-9f56-9b84-e786007d633a","usage":{"input_tokens":1275,"output_tokens":16},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"狗在海滩上互动。她们似乎"}]}}]}}
{"requestId":"8471902a-9936-9f56-9b84-e786007d633a","usage":{"input_tokens":1275,"output_tokens":24},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"正在享受彼此的陪伴,狗狗坐在"}]}}]}}
{"requestId":"8471902a-9936-9f56-9b84-e786007d633a","usage":{"input_tokens":1275,"output_tokens":32},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"沙滩上伸出爪子与这位女士"}]}}]}}
{"requestId":"8471902a-9936-9f56-9b84-e786007d633a","usage":{"input_tokens":1275,"output_tokens":40},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"握手或玩耍。\n\n背景中可以看到海"}]}}]}}
{"requestId":"8471902a-9936-9f56-9b84-e786007d633a","usage":{"input_tokens":1275,"output_tokens":48},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"浪拍打着海岸线,并且有"}]}}]}}
{"requestId":"8471902a-9936-9f56-9b84-e786007d633a","usage":{"input_tokens":1275,"output_tokens":56},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"日落时分柔和光线照射下的"}]}}]}}
{"requestId":"8471902a-9936-9f56-9b84-e786007d633a","usage":{"input_tokens":1275,"output_tokens":64},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"天空。这给人一种宁静而温馨的感觉"}]}}]}}
{"requestId":"8471902a-9936-9f56-9b84-e786007d633a","usage":{"input_tokens":1275,"output_tokens":72},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":",可能是在傍晚或者清晨的时候拍摄"}]}}]}}
{"requestId":"8471902a-9936-9f56-9b84-e786007d633a","usage":{"input_tokens":1275,"output_tokens":80},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"的照片。这种场景通常象征着友谊"}]}}]}}
{"requestId":"8471902a-9936-9f56-9b84-e786007d633a","usage":{"input_tokens":1275,"output_tokens":88},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"、爱以及人与宠物之间的深厚"}]}}]}}
{"requestId":"8471902a-9936-9f56-9b84-e786007d633a","usage":{"input_tokens":1275,"output_tokens":92},"output":{"choices":[{"finish_reason":"stop","message":{"role":"assistant","content":[{"text":"情感连接。"}]}}]}}

curl

示例代码

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
    "model": "qwen-vl-plus",
    "input":{
        "messages":[
            {
                "role": "system",
                "content": [
                    {"text": "You are a helpful assistant."}
                ]
            },
            {
                "role": "user",
                "content": [
                    {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"},
                    {"text": "这个图片是哪里?"}
                ]
            }
        ]
    },
    "parameters": {
        "incremental_output": true
    }
}'

返回结果

id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"这张"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1278,"output_tokens":1,"image_tokens":1247},"request_id":"8b037000-c670-94cd-88d4-13318ddce1d0"}

id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"照片"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1278,"output_tokens":2,"image_tokens":1247},"request_id":"8b037000-c670-94cd-88d4-13318ddce1d0"}

......

id:10
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"拍打着海岸线以及远处的地平"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1278,"output_tokens":56,"image_tokens":1247},"request_id":"8b037000-c670-94cd-88d4-13318ddce1d0"}

id:11
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"线上有阳光照射过来。"}],"role":"assistant"},"finish_reason":"stop"}]},"usage":{"input_tokens":1278,"output_tokens":63,"image_tokens":1247},"request_id":"8b037000-c670-94cd-88d4-13318ddce1d0"}

使用本地文件

您可以参考以下示例代码,通过OpenAI或者DashScope的方式,调用通义千问VL模型处理本地文件。以下代码使用的示例图片为:test.png

OpenAI兼容

Python

示例代码

from openai import OpenAI
import os
import base64

#  base 64 编码格式
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

def get_response(image_path):
    base64_image = encode_image(image_path)
    client = OpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
    )
    completion = client.chat.completions.create(
        model="qwen-vl-plus",
        messages=[
            {
              "role": "user",
              "content": [
                {
                  "type": "image_url",
                  "image_url": {
                    "url": f"data:image/jpeg;base64,{base64_image}"
                  }
                },
                {
                  "type": "text",
                  "text": "这是什么"
                }
              ]
            }
          ]
        )
    print(completion.model_dump_json())

if __name__=='__main__':
    get_response("test.png")

返回结果

{
  "id": "chatcmpl-7399dbeb-af0c-9fcb-9083-0a836669476d",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "这是一只在天空中飞翔的鹰。它有着广阔的翅膀,正在翱翔于云层之间。这种鸟类通常与自由、力量和高瞻远瞩等概念相关联。",
        "role": "assistant",
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
  "created": 1725948726,
  "model": "qwen-vl-plus",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 41,
    "prompt_tokens": 1253,
    "total_tokens": 1294
  }
}

HTTP

示例代码

import os
import base64
import requests

#  base 64 编码格式
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

def get_response(image_path):
    base64_image = encode_image(image_path)
    api_key = os.getenv("DASHSCOPE_API_KEY")
    headers = {
       "Content-Type": "application/json",
       "Authorization": f"Bearer {api_key}"
       }
    payload = {
        "model": "qwen-vl-plus",
        "messages": [
            {
            "role": "user",
            "content": [
                {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{base64_image}"
                }
                },
                {
                "type": "text",
                "text": "这是什么"
                }
            ]
            }
        ]
        }
    response = requests.post("https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions", headers=headers, json=payload)
    print(response.json()) 

if __name__=='__main__':
    get_response(image_path="test.png")

返回结果

{
  "choices": [
    {
      "message": {
        "content": "这是一只在天空中飞翔的鹰。它有着广阔的翅膀,正在翱翔于云层之间。这种鸟类通常被认为是力量、自由和雄心壮志的象征,在各种文化中有重要的地位。",
        "role": "assistant"
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": None
    }
  ],
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 1254,
    "completion_tokens": 45,
    "total_tokens": 1299
  },
  "created": 1721732005,
  "system_fingerprint": None,
  "model": "qwen-vl-plus",
  "id": "chatcmpl-13b925d1-ef79-9c15-b890-0079a096d7d3"
}

DashScope

请您参考下表,结合您的使用方式与操作系统进行文件路径的创建。

系统

SDK

传入的文件路径

示例

Linux或macOS系统

Python SDK

file://{文件的绝对路径}

file:///home/images/test.png

Java SDK

Windows系统

Python SDK

file://{文件的绝对路径}

file://D:/images/test.png

Java SDK

file:///{文件的绝对路径}

file:///D:images/test.png

Python

示例代码

from dashscope import MultiModalConversation


def call_with_local_file(local_path):
    image_path = f"file://{local_path}"
    messages = [{'role': 'system',
                 'content': [{'text': 'You are a helpful assistant.'}]}, 
                 {'role':'user',
                  'content': [{'image': image_path},
                              {'text': '这是什么'}]}]
    response = MultiModalConversation.call(model='qwen-vl-plus', messages=messages)
    print(response)


if __name__ == '__main__':
    call_with_local_file("test.png")

返回结果

{
  "status_code": 200,
  "request_id": "65061d9a-d4d9-9d31-9caa-d1c5d9eb3d54",
  "code": "",
  "message": "",
  "output": {
    "text": null,
    "finish_reason": null,
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "这是一只在天空中飞翔的鹰。它有着广阔的翅膀,正在翱翔于云层之间。这种鸟类通常被认为是力量、自由和雄心壮志的象征,在各种文化中有重要的地位。"
            }
          ]
        }
      }
    ]
  },
  "usage": {
    "input_tokens": 1254,
    "output_tokens": 45,
    "image_tokens": 1225
  }
}

Java

示例代码

// Copyright (c) Alibaba, Inc. and its affiliates.

import java.util.Arrays;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;

public class Main {
    public static void callWithLocalFile(String localPath)
            throws ApiException, NoApiKeyException, UploadFileException {
        String filePath = "file://"+localPath;
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(new HashMap<String, Object>(){{put("image", filePath);}},
                        new HashMap<String, Object>(){{put("text", "这是什么?");}})).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .model("qwen-vl-plus")
                .message(userMessage)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(JsonUtils.toJson(result));
    }

    public static void main(String[] args) {
        try {
            callWithLocalFile("test.png");
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回结果

{
  "requestId": "c1fde568-c7fe-951c-a7fd-0c356fe04c1d",
  "usage": {
    "input_tokens": 1255,
    "output_tokens": 38
  },
  "output": {
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "这是一只在天空中飞翔的鹰。它有着广阔的翅膀,正在翱翔于云层之间。这种景象常常象征着自由、力量和勇气等正面意义。"
            }
          ]
        }
      }
    ]
  }
}

视频理解

qwen-vl-maxqwen-vl-max-0809qwen-vl-plus-0809模型支持对视频内容的理解功能。您可以直接传入视频文件,或以图片列表形式传入。请参考以下限制条件:

  • 如果传入图片列表,最多可传入768张图片。

  • 如果传入视频文件:

    • 视频文件大小:最大 150MB。

    • 视频文件格式: MP4、AVI、MKV、MOV、FLV、WMV 等。

    • 视频时长:40秒内的视频能达到最佳效果。

    • 视频尺寸:无限制,但是视频文件会被调整到约 600k 像素数,更大尺寸的视频文件不会有更好的理解效果。

    • 暂时不支持对视频文件的音频进行理解。

如果您需要传入本地视频文件,请使用dashscope Python SDK,文件传入格式请参考DashScope,并确保您的dashscope Python SDK版本不低于1.20.7。
from http import HTTPStatus
import dashscope


def simple_multimodal_conversation_call():
    """Simple single round multimodal conversation call.
    """
    messages = [
        {
            "role": "user",
            "content": [
                # 以视频文件传入
                {"video": "https://cloud.video.taobao.com/vod/S8T54f_w1rkdfLdYjL3S5zKN9CrhkzuhRwOhF313tIQ.mp4"},
                # 或以图片列表形式传入
                # {"video":[
                #     "https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg",
                #     "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
                #     ]},
                {"text": "视频的内容是什么?"}
            ]
        }
    ]
    response = dashscope.MultiModalConversation.call(
        model='qwen-vl-max',
        messages=messages
        )
    if response.status_code == HTTPStatus.OK:
        print(response)
    else:
        print(response.code)  # The error code.
        print(response.message)  # The error message.


if __name__ == '__main__':
    simple_multimodal_conversation_call()
    
curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen-vl-max",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"video": ["https://cloud.video.taobao.com/vod/S8T54f_w1rkdfLdYjL3S5zKN9CrhkzuhRwOhF313tIQ.mp4"]},
                    {"text": "这是什么?"}
                ]
            }
        ]
    }
}'

运行以上代码会返回以下结果。

{
  "status_code": 200,
  "request_id": "a6772f55-5509-9c2c-bcca-3b9132ed6f63",
  "code": "",
  "message": "",
  "output": {
    "text": null,
    "finish_reason": null,
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "视频的内容是一个人使用阿里云的通义千问模型进行对话的演示。在视频中,用户向模型输入了“你好”作为问候语,模型回应了“你好!有什么我能为你效劳的吗?”这个演示展示了通义千问模型的对话功能,以及它如何与用户进行交互。"
            }
          ]
        }
      }
    ]
  },
  "usage": {
    "input_tokens": 5205,
    "output_tokens": 69,
    "video_tokens": 5180
  }
}

支持的图片

图片格式

Content Type

文件扩展名

BMP

image/bmp

.bmp

DIB

image/bmp

.dib

ICNS

image/icns

.icns

ICO

image/x-icon

.ico

JPEG

image/jpeg

.jfif, .jpe, .jpeg, .jpg

JPEG2000

image/jp2

.j2c, .j2k, .jp2, .jpc, .jpf, .jpx

PNG

image/png

.apng, .png

SGI

image/sgi

.bw, .rgb, .rgba, .sgi

TIFF

image/tiff

.tif, .tiff

WEBP

image/webp

.webp

对于输入的图片有以下限制:

  • 图片文件大小不超过10MB。

  • 输入qwen-vl-maxqwen-vl-max-0809qwen-vl-plus-0809模型的单张图片,总的像素数不超过 12M,可以支持标准的 4K 图片;输入qwen-vl-max-0201qwen-vl-plus模型的单张图片,总的像素数不超过 1048576,相当于一张长宽均为 1024 的图片总像素数。

常见问题

我可以删除已上传的图片吗?

答:在模型完成文本生成后,百炼服务器会自动将图片删除,无需手动删除。

API参考

关于通义千问VL模型的输入输出参数,请参考通过API使用通义千问