通义千问VL

更新时间: 2025-01-14 19:43:21

通义千问VL模型可以根据您传入的图片来进行回答。

访问视觉模型可以在线体验图片理解能力。

如何使用

您需要已获取API Key配置API Key到环境变量。如果通过OpenAI SDK或DashScope SDK进行调用,还需要安装最新版SDK,并确保您的DashScope Python SDK版本不低于1.20.7。

简单示例

OpenAI兼容

您可以通过OpenAI SDK或OpenAI兼容的HTTP方式调用通义千问VL模型。

Python

import os
from openai import OpenAI


client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)

completion = client.chat.completions.create(
    model="qwen-vl-max-latest",
    messages=[
        {"role": "user", "content": [
            {"type": "image_url", "image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}},
            {"type": "text", "text": "这是什么?"}
        ]}
    ]
)

print(completion.choices[0].message.content)

返回结果

这是一张在海滩上拍摄的照片。照片中,一个人和一只狗坐在沙滩上,背景是大海和天空。人和狗似乎在互动,狗的前爪搭在人的手上。阳光从画面的右侧照射过来,给整个场景增添了一种温暖的氛围。

Node.js

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量,请用百炼API Key将下行替换为:apiKey: "sk-xxx"
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

async function main() {
    const response = await openai.chat.completions.create({
        model: "qwen-vl-max-latest",
        messages: [{role: "user",content: [
            { type: "image_url",image_url: {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}},
            { type: "text", text: "这是什么?" }
        ]}]
    });
    console.log(response.choices[0].message.content);
}

main()

返回结果

这是一张在海滩上拍摄的照片。照片中,一位穿着格子衬衫的女性坐在沙滩上,与一只戴着项圈的黄色拉布拉多犬互动。背景是大海和天空,阳光洒在她们身上,营造出温暖的氛围。

curl

curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
  "model": "qwen-vl-max",
  "messages": [{
    "role": "user",
    "content": [
      {"type": "image_url", "image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}},
      {"type": "text", "text": "这是什么?"}
    ]
  }]
}'

返回结果

{
  "choices": [
    {
      "message": {
        "content": "这张图片展示了一位女士和一只狗在海滩上互动。女士坐在沙滩上,微笑着与狗握手。背景是大海和天空,阳光洒在她们身上,营造出温暖的氛围。狗戴着项圈,显得很温顺。",
        "role": "assistant"
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null
    }
  ],
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 1270,
    "completion_tokens": 54,
    "total_tokens": 1324
  },
  "created": 1725948561,
  "system_fingerprint": null,
  "model": "qwen-vl-max",
  "id": "chatcmpl-0fd66f46-b09e-9164-a84f-3ebbbedbac15"
}

DashScope

您可以通过DashScope SDK或HTTP方式调用通义千问VL模型。

Python

import os
import dashscope

messages = [
    {
        "role": "user",
        "content": [
            {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
            {"text": "这是什么?"}
        ]
    }
]

response = dashscope.MultiModalConversation.call(
    # 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx"
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen-vl-max-latest',
    messages=messages
)

print(response.output.choices[0].message.content[0]["text"])

返回结果

是一张在海滩上拍摄的照片。照片中有一位女士和一只狗。女士坐在沙滩上,微笑着与狗互动。狗戴着项圈,似乎在与女士握手。背景是大海和天空,阳光洒在她们身上,营造出温馨的氛围。

Java

import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"),
                        Collections.singletonMap("text", "这是什么?"))).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .model("qwen-vl-max-latest")
                .message(userMessage)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }
    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回结果

这是一张在海滩上拍摄的照片。照片中有一个穿着格子衬衫的人和一只戴着项圈的狗。人和狗面对面坐着,似乎在互动。背景是大海和天空,阳光洒在他们身上,营造出温暖的氛围。

curl

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen-vl-max",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
                    {"text": "这是什么?"}
                ]
            }
        ]
    }
}'

返回结果

{
  "output": {
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "这是一张在海滩上拍摄的照片。照片中有一个穿着格子衬衫的人和一只戴着项圈的狗。他们坐在沙滩上,背景是大海和天空。阳光从画面的右侧照射过来,给整个场景增添了一种温暖的氛围。"
            }
          ]
        }
      }
    ]
  },
  "usage": {
    "output_tokens": 55,
    "input_tokens": 1271,
    "image_tokens": 1247
  },
  "request_id": "ccf845a3-dc33-9cda-b581-20fe7dc23f70"
}

多图片输入

您可以在一次请求中向通义千问VL模型输入多张图片,传入方法请参考以下代码。

OpenAI兼容

您可以通过OpenAI SDK或OpenAI兼容的HTTP方式调用通义千问VL模型。

Python

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-vl-max-latest",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
                    },
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
                    },
                },
                {"type": "text", "text": "这些是什么?"},
            ],
        }
    ],
)

print(completion.choices[0].message.content)

返回结果

图1中是一位女士和一只拉布拉多犬在海滩上互动的场景。女士穿着格子衬衫,坐在沙滩上,与狗进行握手的动作,背景是海浪和天空,整个画面充满了温馨和愉快的氛围。

图2中是一只老虎在森林中行走的场景。老虎的毛色是橙色和黑色条纹相间,它正向前迈步,周围是茂密的树木和植被,地面上覆盖着落叶,整个画面给人一种野生自然的感觉。

Node.js

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量,请用百炼API Key将下行替换为:apiKey: "sk-xxx"
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

async function main() {
    const response = await openai.chat.completions.create({
        model: "qwen-vl-max-latest",
        messages: [{role: "user",content: [
            { type: "image_url",image_url: {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"}},
            { type: "image_url",image_url: {"url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"}},
            { type: "text", text: "这些是什么?" },
        ]}]
    });
    console.log(response.choices[0].message.content);
}

main()

返回结果

第一张图片中,一个人和一只狗在海滩上互动。人穿着格子衬衫,狗戴着项圈,他们似乎在握手或击掌。

第二张图片中,一只老虎在森林中行走。老虎的毛色是橙色和黑色条纹,背景是绿色的树木和植被。

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
  "model": "qwen-vl-max",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
          }
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"
          }
        },
        {
          "type": "text",
          "text": "这些是什么?"
        }
      ]
    }
  ]
}'

返回结果

{
  "choices": [
    {
      "message": {
        "content": "图1中是一位女士和一只拉布拉多犬在海滩上互动的场景。女士穿着格子衬衫,坐在沙滩上,与狗进行握手的动作,背景是海景和日落的天空,整个画面显得非常温馨和谐。\n\n图2中是一只老虎在森林中行走的场景。老虎的毛色是橙色和黑色条纹相间,它正向前迈步,周围是茂密的树木和植被,地面上覆盖着落叶,整个画面充满了自然的野性和生机。",
        "role": "assistant"
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null
    }
  ],
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 2497,
    "completion_tokens": 109,
    "total_tokens": 2606
  },
  "created": 1725948561,
  "system_fingerprint": null,
  "model": "qwen-vl-max",
  "id": "chatcmpl-0fd66f46-b09e-9164-a84f-3ebbbedbac15"
}

DashScope

您可以通过DashScope SDK或HTTP方式调用通义千问VL模型。

Python

import os
import dashscope

messages = [
    {
        "role": "user",
        "content": [
            {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
            {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"},
            {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/hbygyo/rabbit.jpg"},
            {"text": "这些图是什么?"}
        ]
    }
]

response = dashscope.MultiModalConversation.call(
    # 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx"
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen-vl-max-latest',
    messages=messages
)

print(response.output.choices[0].message.content[0]["text"])

返回结果

这些图片展示了一些动物和自然场景。第一张图片中,一个人和一只狗在海滩上互动。第二张图片是一只老虎在森林中行走。第三张图片是一只卡通风格的兔子在草地上跳跃。

Java

import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
public class Main {
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"),
                        Collections.singletonMap("image", "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"),
                        Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/hbygyo/rabbit.jpg"),
                        Collections.singletonMap("text", "这些图是什么?"))).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .model("qwen-vl-max-latest")
                .message(userMessage)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));    }
    public static void main(String[] args) {
        try {
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回结果

这些图片展示了一些动物和自然场景。

1. 第一张图片:一个女人和一只狗在海滩上互动。女人穿着格子衬衫,坐在沙滩上,狗戴着项圈,伸出爪子与女人握手。
2. 第二张图片:一只老虎在森林中行走。老虎的毛色是橙色和黑色条纹,背景是树木和树叶。
3. 第三张图片:一只卡通风格的兔子在草地上跳跃。兔子是白色的,耳朵是粉红色的,背景是蓝天和黄色的花朵。

这些图片展示了不同的动物和自然环境。

curl

curl --location 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-vl-plus",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
                    {"image": "https://dashscope.oss-cn-beijing.aliyuncs.com/images/tiger.png"},
                    {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/hbygyo/rabbit.jpg"},
                    {"text": "这些是什么?"}
                ]
            }
        ]
    }
}'

返回结果

{
  "output": {
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "这张图片显示了一位女士和她的狗在海滩上。她们似乎正在享受彼此的陪伴,狗狗坐在沙滩上伸出爪子与女士握手或互动。背景是美丽的日落景色,海浪轻轻拍打着海岸线。\n\n请注意,我提供的描述基于图像中可见的内容,并不包括任何超出视觉信息之外的信息。如果您需要更多关于这个场景的具体细节,请告诉我!"
            }
          ]
        }
      }
    ]
  },
  "usage": {
    "output_tokens": 81,
    "input_tokens": 1277,
    "image_tokens": 1247
  },
  "request_id": "ccf845a3-dc33-9cda-b581-20fe7dc23f70"
}

多轮对话(参考历史对话信息)

通义千问VL模型可以参考历史对话信息进行回复。您可以参考以下示例代码,通过OpenAI或者DashScope的方式,调用通义千问VL模型,实现多轮对话的功能。

OpenAI兼容

您可以通过OpenAI SDK或OpenAI兼容的HTTP方式调用通义千问VL模型,体验多轮对话的功能。

Python

from openai import OpenAI
import os


client = OpenAI(
    # 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
)
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
                },
            },
            {"type": "text", "text": "这是什么?"},
        ],
    }
]
completion = client.chat.completions.create(
    model="qwen-vl-max-latest",
    messages=messages,
    )
print(f"第一轮输出:{completion.choices[0].message.content}")
assistant_message = completion.choices[0].message
messages.append(assistant_message.model_dump())
messages.append({
        "role": "user",
        "content": [
        {
            "type": "text",
            "text": "做一首诗描述这个场景"
        }
        ]
    })
completion = client.chat.completions.create(
    model="qwen-vl-max-latest",
    messages=messages,
    )
print(f"第二轮输出:{completion.choices[0].message.content}")

返回结果

第一轮输出:这是一张在海滩上拍摄的照片。照片中,一位穿着格子衬衫的女士坐在沙滩上,与一只戴着项圈的金毛犬互动。背景是大海和天空,阳光洒在她们身上,营造出温暖的氛围。
第二轮输出:沙滩上,阳光洒,
女子与犬,笑语哗。
海浪轻拍,风儿吹,
快乐时光,心儿醉。

Node.js

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量,请用百炼API Key将下行替换为:apiKey: "sk-xxx",
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

let messages = [{
    role: "user", content: [
        { type: "image_url", image_url: { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg" } },
        { type: "text", text: "这是什么?" },
    ]
}]
async function main() {
    let response = await openai.chat.completions.create({
        model: "qwen-vl-max-latest",
        messages: messages
    });
    console.log(`第一轮输出:${response.choices[0].message.content}`);
    messages.push(response.choices[0].message);
    messages.push({"role": "user", "content": "做一首诗描述这个场景"});
    response = await openai.chat.completions.create({
        model: "qwen-vl-max-latest",
        messages: messages
    });
    console.log(`第二轮输出:${response.choices[0].message.content}`);
}

main()

返回结果

第一轮输出:这是一张在海滩上拍摄的照片。照片中有一个穿着格子衬衫的人和一只戴着项圈的狗。人和狗面对面坐着,似乎在互动。背景是大海和天空,阳光从画面的右侧照射过来,营造出温暖的氛围。
第二轮输出:沙滩上,人与狗,  
面对面,笑语稠。  
海风轻拂,阳光柔,  
心随波浪,共潮头。  

项圈闪亮,情意浓,  
格子衫下,心相通。  
海天一色,无尽空,  
此刻温馨,永铭中。

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
  "model": "qwen-vl-max",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
          }
        },
        {
          "type": "text",
          "text": "这是什么?"
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "这是一个女孩和一只狗。"
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "写一首诗描述这个场景"
        }
      ]
    }
  ]
}'

返回结果

{
    "choices": [
        {
            "message": {
                "content": "海风轻拂笑颜开,  \n沙滩上与犬相陪。  \n夕阳斜照人影短,  \n欢乐时光心自醉。",
                "role": "assistant"
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 1295,
        "completion_tokens": 32,
        "total_tokens": 1327
    },
    "created": 1726324976,
    "system_fingerprint": null,
    "model": "qwen-vl-max",
    "id": "chatcmpl-3c953977-6107-96c5-9a13-c01e328b24ca"
}

DashScope

您可以通过DashScope SDK或HTTP方式调用通义千问VL模型,体验多轮对话的功能。

Python

import os
from dashscope import MultiModalConversation

messages = [
    {
        "role": "user",
        "content": [
            {
                "image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
            },
            {"text": "这是什么?"},
        ],
    }
]
response = MultiModalConversation.call(
    # 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx",
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen-vl-max-latest',
    messages=messages)
print(f"模型第一轮输出:{response.output.choices[0].message.content[0]['text']}")
messages.append(response['output']['choices'][0]['message'])
user_msg = {"role": "user", "content": [{"text": "做一首诗描述这个场景"}]}
messages.append(user_msg)
response = MultiModalConversation.call(
    # 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx",
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen-vl-max-latest',
    messages=messages)
print(f"模型第二轮输出:{response.output.choices[0].message.content[0]['text']}")

返回结果

模型第一轮输出:这是一张在海滩上拍摄的照片。照片中有一个穿着格子衬衫的人和一只戴着项圈的狗。人和狗面对面坐着,似乎在互动。背景是大海和天空,阳光洒在他们身上,营造出温暖的氛围。
模型第二轮输出:在阳光照耀的海滩上,人与狗共享欢乐时光。

Java

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
public class Main {
    private static final String modelName = "qwen-vl-max-latest";
    public static void MultiRoundConversationCall() throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage systemMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
                .content(Arrays.asList(Collections.singletonMap("text", "You are a helpful assistant."))).build();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"),
                        Collections.singletonMap("text", "这是什么?"))).build();
        List<MultiModalMessage> messages = new ArrayList<>();
        messages.add(systemMessage);
        messages.add(userMessage);
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // 若没有配置环境变量,请用百炼API Key将下行替换为:.apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))                .model(modelName)
                .messages(messages)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println("第一轮输出:"+result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));        // add the result to conversation
        messages.add(result.getOutput().getChoices().get(0).getMessage());
        MultiModalMessage msg = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(Collections.singletonMap("text", "做一首诗描述这个场景"))).build();
        messages.add(msg);
        param.setMessages((List)messages);
        result = conv.call(param);
        System.out.println("第二轮输出:"+result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));    }

    public static void main(String[] args) {
        try {
            MultiRoundConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回结果

第一轮输出:这是一张在海滩上拍摄的照片。照片中有一个穿着格子衬衫的人和一只戴着项圈的狗。人和狗面对面坐着,似乎在互动。背景是大海和天空,阳光洒在他们身上,营造出温暖的氛围。
第二轮输出:在阳光洒满的海滩上,人与狗共享欢乐时光。

curl

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen-vl-max",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
                    {"text": "这是什么?"}
                ]
            },
            {
                "role": "assistant",
                "content": [
                    {"text": "这是一只狗和一只女孩。"}
                ]
            },
            {
                "role": "user",
                "content": [
                    {"text": "写一首七言绝句描述这个场景"}
                ]
            }
        ]
    }
}'

返回结果

{
    "output": {
        "choices": [
            {
                "finish_reason": "stop",
                "message": {
                    "role": "assistant",
                    "content": [
                        {
                            "text": "海浪轻拍沙滩边,女孩与狗同嬉戏。阳光洒落笑颜开,快乐时光永铭记。"
                        }
                    ]
                }
            }
        ]
    },
    "usage": {
        "output_tokens": 27,
        "input_tokens": 1298,
        "image_tokens": 1247
    },
    "request_id": "bdf5ef59-c92e-92a6-9d69-a738ecee1590"
}

流式输出

大模型并不是一次性生成最终结果,而是逐步地生成中间结果,最终结果由中间结果拼接而成。使用非流式输出方式需要等待模型生成结束后再将生成的中间结果拼接后返回,而流式输出可以实时地将中间结果返回,您可以在模型进行输出的同时进行阅读,减少等待模型回复的时间。

OpenAI兼容

您可以通过OpenAI SDK或OpenAI兼容的HTTP方式调用通义千问VL模型,体验流式输出的功能。

Python

from openai import OpenAI
import os

client = OpenAI(
    # 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen-vl-max-latest",
    messages=[
        {"role": "user",
        "content": [{"type": "image_url",
                    "image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},},
                    {"type": "text", "text": "这是什么?"}]}],
    stream=True
)
print("流式输出内容为:")
for chunk in completion:
    print(chunk.model_dump_json())

返回结果

流式输出内容为:
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"","function_call":null,"refusal":null,"role":"assistant","tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"这","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"是一","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"张","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"在","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"海滩上拍摄的照片","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"。照片中,","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"一位女士和一只","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"狗坐在沙滩上","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":",女士正在与","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"狗互动,似乎","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"在握手或击","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"掌。背景是","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"大海和天空,","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"阳光从画面的","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"右侧照射过来,","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"营造出温暖的","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":null,"index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}
{"id":"chatcmpl-a6f201a2-0f82-9260-ae13-23243b78239c","choices":[{"delta":{"content":"氛围。","function_call":null,"refusal":null,"role":null,"tool_calls":null},"finish_reason":"stop","index":0,"logprobs":null}],"created":1731683175,"model":"qwen-vl-max-latest","object":"chat.completion.chunk","service_tier":null,"system_fingerprint":null,"usage":null}

Node.js

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 若没有配置环境变量,请用百炼API Key将下行替换为:apiKey: "sk-xxx"
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

const completion = await openai.chat.completions.create({
    model: "qwen-vl-max-latest",
    messages: [
        {"role": "user",
        "content": [{"type": "image_url",
                    "image_url": {"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},},
                    {"type": "text", "text": "这是什么?"}]}],
    stream: true,
});

console.log("流式输出内容为:")
for await (const chunk of completion) {
        console.log(JSON.stringify(chunk));
}

返回结果

{"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1731942585,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"finish_reason":null,"delta":{"content":"这"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1731942585,"ystem_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"是一"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1731942585system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"张"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1731942585,"ystem_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"在"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1731942585,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"海滩上拍摄的照片"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created2585,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"。照片中,"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":17315,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"一位女士和一只"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":585,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"狗坐在沙滩上"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1785,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":",女士正在与"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1785,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"狗互动,似乎"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1785,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"在握手或击"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":17315,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"掌。背景是"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":17315,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"大海和天空,"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1785,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"阳光从画面的"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1785,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"右侧照射过来,"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":585,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"营造出温暖的"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1785,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}
{"choices":[{"delta":{"content":"氛围。"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1731945,"system_fingerprint":null,"model":"qwen-vl-max-latest","id":"chatcmpl-9d466dd4-7cbe-9eb1-a429-c0d5a81f2674"}

curl

curl --location 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-vl-plus",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
          }
        },
        {
          "type": "text",
          "text": "这是什么?"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{"include_usage":true}
}'

返回结果

data: {"choices":[{"delta":{"content":"","role":"assistant"},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"finish_reason":null,"delta":{"content":"图"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"delta":{"content":"中"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

......

data: {"choices":[{"delta":{"content":"分拍摄的照片。整体氛围显得非常"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[{"finish_reason":"stop","delta":{"content":"和谐而温馨。"},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":1276,"completion_tokens":85,"total_tokens":1361},"created":1721823635,"system_fingerprint":null,"model":"qwen-vl-plus","id":"chatcmpl-9a9ec75a-3109-9910-b79e-7bcbce81c8f9"}

data: [DONE]

DashScope

您可以通过DashScope SDK或HTTP方式调用通义千问VL模型,体验流式输出的功能。

Python

import os
from dashscope import MultiModalConversation

messages = [
    {
        "role": "user",
        "content": [
            {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
            {"text": "这是什么?"}
        ]
    }
]
responses = MultiModalConversation.call(
    # 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model='qwen-vl-max-latest',
    messages=messages,
    stream=True,
    incremental_output=True
    )
print("流式输出内容为:")
for response in responses:
    print(response["output"])

返回结果

{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "这"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "是一"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "张"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "在"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "海滩上拍摄的照片"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "。照片中有一位"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "女士和一只狗"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "。女士坐在沙滩"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "上,微笑着"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "与狗互动。"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "狗戴着项圈"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": ",似乎在与"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "女士握手。背景"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "是大海和天空"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": ",阳光洒在"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "她们身上,营造"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "出温馨的氛围"}]}}]}
{"text": null, "finish_reason": null, "choices": [{"finish_reason": "stop", "message": {"role": "assistant", "content": [{"text": "。"}]}}]}

Java

import java.util.Arrays;
import java.util.HashMap;

import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import io.reactivex.Flowable;

public class Main {
    public static void streamCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        // must create mutable map.
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(new HashMap<String, Object>(){{put("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg");}},
                        new HashMap<String, Object>(){{put("text", "这是什么");}})).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // 若没有配置环境变量,请用百炼API Key将下行替换为:.apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen-vl-max-latest")
                .message(userMessage)
                .incrementalOutput(true)
                .build();
        Flowable<MultiModalConversationResult> result = conv.streamCall(param);
        result.blockingForEach(item -> {
            try {
                System.out.println(JsonUtils.toJson(item));
            } catch (Exception e){
                System.exit(0);
            }
        });
    }

    public static void main(String[] args) {
        try {
            streamCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回结果

{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":1},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"这"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":2},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"是一"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":3},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"张"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":4},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"在"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":8},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"海滩上拍摄的照片"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":12},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"。照片中,"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":16},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"一位穿着格子"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":20},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"衬衫的女士坐在"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":24},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"沙滩上,与"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":28},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"一只戴着项圈"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":32},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"的金毛犬"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":36},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"互动。背景是"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":40},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"大海和天空,"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":44},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"阳光洒在她们"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":48},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"身上,营造出"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":52},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"温暖的氛围。"}]}}]}}
{"requestId":"a750aa02-e0c9-9c10-8542-2f38bf9de326","usage":{"input_tokens":1270,"output_tokens":53},"output":{"choices":[{"finish_reason":"stop","message":{"role":"assistant","content":[]}}]}}

curl

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
    "model": "qwen-vl-plus",
    "input":{
        "messages":[
            {
                "role": "system",
                "content": [
                    {"text": "You are a helpful assistant."}
                ]
            },
            {
                "role": "user",
                "content": [
                    {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"},
                    {"text": "这是什么?"}
                ]
            }
        ]
    },
    "parameters": {
        "incremental_output": true
    }
}'

返回结果

iid:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"这张"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1276,"output_tokens":1,"image_tokens":1247},"request_id":"00917f72-d927-9344-8417-2c4088d64c16"}

id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"图片"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1276,"output_tokens":2,"image_tokens":1247},"request_id":"00917f72-d927-9344-8417-2c4088d64c16"}

......

id:17
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"的欣赏。这是一个温馨的画面,展示了"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1276,"output_tokens":112,"image_tokens":1247},"request_id":"00917f72-d927-9344-8417-2c4088d64c16"}

id:18
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"人与动物之间深厚的情感纽带。"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"input_tokens":1276,"output_tokens":120,"image_tokens":1247},"request_id":"00917f72-d927-9344-8417-2c4088d64c16"}

id:19
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[],"role":"assistant"},"finish_reason":"stop"}]},"usage":{"input_tokens":1276,"output_tokens":121,"image_tokens":1247},"request_id":"00917f72-d927-9344-8417-2c4088d64c16"}

使用本地文件

您可以参考以下示例代码,通过OpenAI或者DashScope的方式,调用通义千问VL模型处理本地文件。以下代码使用的示例图片为:test.png

OpenAI兼容

使用OpenAI SDK或OpenAI兼容的HTTP方式来处理本地图像文件的步骤如下:

  • 编码图像文件:读取本地图像文件并编码为BASE64格式。

  • 传递BASE64数据:将编码后的BASE64数据传递给image_url参数,格式为data:image/{format};base64,{base64_image},其中:

    • image/{format}:本地图像的格式。请根据实际的图像格式,将image/{format}设置为与支持的图片表格中Content Type对应的值。如:本地图像为jpg格式,则设置为image/jpeg

    • base64_image:图像的BASE64数据。

  • 调用模型:调用通义千问VL模型,并处理返回的结果。

Python

from openai import OpenAI
import os
import base64


#  base 64 编码格式
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


base64_image = encode_image("test.png")
client = OpenAI(
    # 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx"
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
    model="qwen-vl-max-latest",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    # 需要注意,传入BASE64,图像格式(即image/{format})需要与支持的图片列表中的Content Type保持一致。"f"是字符串格式化的方法。
                    # PNG图像:  f"data:image/png;base64,{base64_image}"
                    # JPEG图像: f"data:image/jpeg;base64,{base64_image}"
                    # WEBP图像: f"data:image/webp;base64,{base64_image}"
                    "image_url": {"url": f"data:image/png;base64,{base64_image}"}, 
                },
                {"type": "text", "text": "这是什么?"},
            ],
        }
    ],
)
print(completion.choices[0].message.content)

返回结果

这是一只飞翔的鹰。鹰是一种猛禽,通常具有强壮的翅膀和锐利的爪子,擅长在高空翱翔和捕猎。图片中的鹰展翅高飞,背景是蓝天白云,显得非常壮观。

Node.js

import OpenAI from "openai";
import { readFileSync } from 'fs';


const openai = new OpenAI(
    {
        // 若没有配置环境变量,请用百炼API Key将下行替换为:apiKey: "sk-xxx"
        apiKey: process.env.DASHSCOPE_API_KEY,
        baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeImage = (imagePath) => {
    const imageFile = readFileSync(imagePath);
    return imageFile.toString('base64');
  };
const base64Image = encodeImage("test.png")
async function main() {
    const completion = await openai.chat.completions.create({
        model: "qwen-vl-max-latest",
        messages: [
            {"role": "user",
            "content": [{"type": "image_url",
                            // 需要注意,传入BASE64,图像格式(即image/{format})需要与支持的图片列表中的Content Type保持一致。
                           // PNG图像:  data:image/png;base64,${base64Image}
                          // JPEG图像: data:image/jpeg;base64,${base64Image}
                         // WEBP图像: data:image/webp;base64,${base64Image}
                        "image_url": {"url": `data:image/png;base64,${base64Image}`},},
                        {"type": "text", "text": "这是什么?"}]}]
    });
    console.log(completion.choices[0].message.content);
} 

main();

返回结果

这是一只飞翔的鹰。鹰是一种猛禽,通常具有强壮的翅膀和锐利的爪子,能够在高空翱翔并捕猎猎物。图片中的鹰展翅高飞,背景是蓝天白云,显得非常壮观。

HTTP

import os
import base64
import requests

#  base 64 编码格式
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


base64_image = encode_image("test.png")
# 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
headers = {"Content-Type": "application/json", "Authorization": f"Bearer {api_key}"}
payload = {
    "model": "qwen-vl-max-latest",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    # 需要注意,传入BASE64,图像格式(即image/{format})需要与支持的图片列表中的Content Type保持一致。"f"是字符串格式化的方法。
                    # PNG图像:  f"data:image/png;base64,{base64_image}"
                    # JPEG图像: f"data:image/jpeg;base64,{base64_image}"
                    # WEBP图像: f"data:image/webp;base64,{base64_image}"
                    "image_url": {"url": f"data:image/png;base64,{base64_image}"},
                },
                {"type": "text", "text": "这是什么?"},
            ],
        }
    ],
}
response = requests.post(
    "https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions",
    headers=headers,
    json=payload,
)
print(response.json()["choices"][0]["message"]["content"])

返回结果

这是一只飞翔的鹰。鹰是一种猛禽,通常具有强壮的翅膀和锐利的爪子,能够在高空翱翔并捕猎猎物。图片中的鹰展翅高飞,背景是蓝天白云,显得非常壮观。

DashScope

使用DashScope SDK处理本地图像文件时,需要传入文件路径。请您参考下表,结合您的使用方式与操作系统进行文件路径的创建。

系统

SDK

传入的文件路径

示例

Linux或macOS系统

Python SDK

file://{文件的绝对路径}

file:///home/images/test.png

Java SDK

Windows系统

Python SDK

file://{文件的绝对路径}

file://D:/images/test.png

Java SDK

file:///{文件的绝对路径}

file:///D:images/test.png

Python

import os
from dashscope import MultiModalConversation

local_path = "test.png"
image_path = f"file://{local_path}"
messages = [{'role': 'system',
                'content': [{'text': 'You are a helpful assistant.'}]}, 
                {'role':'user',
                'content': [{'image': image_path},
                            {'text': '这是什么'}]}]
response = MultiModalConversation.call(
    # 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx"
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model='qwen-vl-max-latest', 
    messages=messages)
print(response["output"]["choices"][0]["message"].content[0]["text"])

返回结果

这是一只飞翔的鹰。鹰是一种猛禽,通常具有强壮的翅膀和锐利的爪子,能够在高空翱翔并捕猎猎物。图片中的鹰展翅高飞,背景是蓝天白云,显得非常壮观。

Java

import java.util.Arrays;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;

public class Main {
    public static void callWithLocalFile(String localPath)
            throws ApiException, NoApiKeyException, UploadFileException {
        String filePath = "file://"+localPath;
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
                .content(Arrays.asList(new HashMap<String, Object>(){{put("image", filePath);}},
                        new HashMap<String, Object>(){{put("text", "这是什么?");}})).build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // 若没有配置环境变量,请用百炼API Key将下行替换为:.apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model("qwen-vl-max-latest")
                .message(userMessage)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));}

    public static void main(String[] args) {
        try {
            callWithLocalFile("test.png");
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回结果

这是一只飞翔的鹰。鹰是一种猛禽,通常具有强壮的翅膀和锐利的爪子,擅长在高空翱翔和捕猎。图片中的鹰展翅高飞,背景是蓝天白云,显得非常壮观。

视频理解

目前仅qwen-vl-max-latestqwen-vl-max-0809qwen-vl-max-1030qwen-vl-max-1119qwen-vl-plus-latestqwen-vl-plus-0809qwen2-vl-7b-instruct模型支持对视频内容的理解功能,以上模型支持通过图片列表形式传入。

最少传入4张图片,最多可传入80张图片。
如果您需要直接输入视频文件,请提交工单进行申请。
仅Python SDK支持传入本地文件,传入方式请参考DashScope

OpenAI兼容

您可以通过OpenAI SDK或HTTP方式实现视频理解。

Python

import os
from openai import OpenAI

client = OpenAI(
    # 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
    model="qwen-vl-max-latest",
    messages=[{"role": "user","content": [
        {"type": "video","video": ["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"]},
        {"type": "text","text": "描述这个视频的具体过程"},
    ]}]
)
print(completion.choices[0].message.content)

返回结果

这个视频展示了一场足球比赛的瞬间。具体过程如下:

1. **背景**:视频是在一个大型体育场拍摄的,观众席上坐满了观众,灯光明亮,气氛热烈。
2. **球员**:场上有两队球员,一队穿着红色球衣,另一队穿着蓝色球衣。守门员穿着绿色球衣。
3. **动作**:一名穿着红色球衣的球员在禁区内准备射门。守门员试图扑救,但未能成功。
4. **进球**:红色球衣的球员成功将球踢入球门,球网被球击中,显示出进球的瞬间。

整个过程充满了紧张和激动,展示了足球比赛中的精彩瞬间。

Node.js

// 确保之前在 package.json 中指定了 "type": "module"
import OpenAI from "openai";

const openai = new OpenAI({
    // 若没有配置环境变量,请用百炼API Key将下行替换为:apiKey: "sk-xxx",
    apiKey: process.env.DASHSCOPE_API_KEY,
    baseURL: "https://dashscope.aliyuncs.com/compatible-mode/v1"
});

async function main() {
    const response = await openai.chat.completions.create({
        model: "qwen-vl-max-latest",
        messages: [{
            role: "user",
            content: [
                {
                    type: "video",
                    video: [
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
                    ]
                },
                {
                    type: "text",
                    text: "描述这个视频的具体过程"
                }
            ]
        }]
    });
    console.log(response.choices[0].message.content);
}

main();

返回结果

这个视频展示了一场足球比赛的瞬间。具体过程如下:

1. **背景**:视频是在一个大型体育场拍摄的,观众席上坐满了观众,灯光明亮,气氛热烈。
2. **球员**:场上有两队球员,一队穿着红色球衣,另一队穿着蓝色球衣。守门员穿着绿色球衣。
3. **动作**:一名穿着红色球衣的球员在禁区内准备射门。他将球踢向球门。
4. **守门员**:守门员看到球飞来,迅速做出反应,向球的方向扑去,试图将球扑出。
5. **进球**:尽管守门员尽力扑救,但球还是飞进了球门,网子被球撞得晃动。

这个视频捕捉到了足球比赛中进球的精彩瞬间,展示了球员的技巧和守门员的反应。

curl

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen-vl-max-latest",
    "messages": [{"role": "user",
                "content": [{"type": "video",
                "video": ["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"]},
                {"type": "text",
                "text": "描述这个视频的具体过程"}]}]
}'

返回结果

{
  "choices": [
    {
      "message": {
        "content": "这个视频展示了一场足球比赛的瞬间。具体过程如下:\n\n1. **背景**:视频是在一个大型体育场内拍摄的,观众席上坐满了观众,灯光明亮,气氛热烈。\n2. **球员**:场上有两支队伍,一支穿着红色球衣,另一支穿着蓝色球衣。守门员穿着绿色球衣。\n3. **动作**:一名身穿红色球衣的球员在禁区内接到传球,准备射门。守门员迅速反应,向球的方向扑去,试图阻止进球。\n4. **射门**:红色球衣的球员果断射门,球飞向球门。\n5. **扑救**:守门员尽力扑救,但球还是飞进了球门,球网被球撞得晃动。\n\n整个过程充满了紧张和刺激,展示了足球比赛中的精彩瞬间。",
        "role": "assistant"
      },
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null
    }
  ],
  "object": "chat.completion",
  "usage": {
    "prompt_tokens": 1466,
    "completion_tokens": 181,
    "total_tokens": 1647
  },
  "created": 1728710375,
  "system_fingerprint": null,
  "model": "qwen-vl-max-latest",
  "id": "chatcmpl-73b2b130-b29a-99db-9eda-4cd45f27d4e0"
}

DashScope

您可以通过DashScope SDK或HTTP方式实现视频理解。

Python

import os
# dashscope版本需要不低于1.20.10
import dashscope

messages = [{"role": "user",
             "content": [
                 {"video":["https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                           "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"]},
                 {"text": "描述这个视频的具体过程"}]}]
response = dashscope.MultiModalConversation.call(
    # 若没有配置环境变量,请用百炼API Key将下行替换为:api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model='qwen-vl-max-latest',
    messages=messages
)
print(response["output"]["choices"][0]["message"].content[0]["text"])

返回结果

这个视频展示了一场足球比赛的瞬间。具体过程如下:

1. **背景**:视频是在一个大型体育场内拍摄的,观众席上坐满了观众,灯光明亮,气氛热烈。
2. **球员**:场上有两支队伍,一支穿着红色球衣,另一支穿着蓝色球衣。守门员穿着绿色球衣。
3. **动作**:一名身穿红色球衣的球员在禁区内准备射门。他将球踢向球门。
4. **守门员**:守门员看到球飞来,迅速做出反应,向球的方向扑去,试图将球扑出。
5. **进球**:尽管守门员尽力扑救,但球还是飞进了球门,网子被球撞得晃动。

这个视频捕捉到了足球比赛中进球的精彩瞬间,展示了球员的技巧和守门员的反应。

Java

// DashScope SDK版本需要不低于2.16.7
import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;

public class Main {
    private static final String MODEL_NAME = "qwen-vl-max-latest";
    public static void videoImageListSample() throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage systemMessage = MultiModalMessage.builder()
                .role(Role.SYSTEM.getValue())
                .content(Arrays.asList(Collections.singletonMap("text", "You are a helpful assistant.")))
                .build();
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(Collections.singletonMap("video", Arrays.asList("https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                                "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg")),
                        Collections.singletonMap("text", "描述这个视频的具体过程")))
                .build();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .model(MODEL_NAME).message(systemMessage)
                .message(userMessage).build();
        MultiModalConversationResult result = conv.call(param);
        System.out.print(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
    }
    public static void main(String[] args) {
        try {
            videoImageListSample();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

返回结果

这个视频展示了一场足球比赛的瞬间。具体过程如下:

1. **背景**:视频是在一个大型体育场内拍摄的,观众席上坐满了观众,灯光明亮,气氛热烈。
2. **球员**:场上有两队球员,一队穿着红色球衣,另一队穿着蓝色球衣。守门员穿着绿色球衣。
3. **动作**:一名身穿红色球衣的球员在禁区内准备射门。他将球踢向球门。
4. **守门员**:守门员看到球飞来,迅速做出反应,向球的方向扑去,试图将球扑出。
5. **进球**:尽管守门员尽力扑救,但球还是飞进了球门,网子被球撞得晃动。

这个视频捕捉到了足球比赛中进球的精彩瞬间,展示了球员的技巧和守门员的反应。

curl

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
  "model": "qwen-vl-max-latest",
  "input": {
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "video": [
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
              "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
            ]
                 
          },
          {
            "text": "描述这个视频的具体过程"
          }
        ]
      }
    ]
  }
}'

返回结果

{
  "output": {
    "choices": [
      {
        "finish_reason": "stop",
        "message": {
          "role": "assistant",
          "content": [
            {
              "text": "这个视频展示了一场足球比赛的瞬间。具体过程如下:\n\n1. **背景**:视频是在一个大型体育场拍摄的,观众席上坐满了观众,灯光明亮,气氛热烈。\n2. **球员**:场上有两队球员,一队穿着红色球衣,另一队穿着蓝色球衣。守门员穿着绿色球衣。\n3. **动作**:一名穿着红色球衣的球员在禁区内接到了队友的传球,准备射门。\n4. **射门**:红色球员用右脚大力射门,球飞向球门。\n5. **扑救**:守门员迅速反应,向球的方向扑去,试图将球扑出。\n6. **进球**:尽管守门员尽力扑救,但球还是飞进了球门,守门员未能阻止进球。\n\n整个过程充满了紧张和激动,展示了足球比赛中的精彩瞬间。"
            }
          ]
        }
      }
    ]
  },
  "usage": {
    "output_tokens": 191,
    "video_tokens": 1440,
    "input_tokens": 1466
  },
  "request_id": "c728d1e0-79ad-9076-8589-7f072e96bccf"
}

应用示例

提取图片里的文字

输入示例

传统OCR工具输出示例

视觉语言模型输出示例

image

image

传统OCR工具提取的内容不完整且格式混乱。

image

qwen-vl-max大模型不仅能精准提取图片中的文字,还能以规整的形式输出。

代码示例

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen-vl-max",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://img.alicdn.com/imgextra/i2/O1CN01y1lotD1hINrPr60fs_!!6000000004254-0-tps-1384-3024.jpg"},
                    {"text": "这是什么?"}
                ]
            }
        ]
    }
}'

识别图片里的物体、人物、神态、动作、场景等,理解图片含义

输入示例

输出示例

量身高

图片里是什么?

图片里是一位母亲正在给她的孩子测量身高。母亲蹲在地上,用笔在墙上标记孩子的身高。孩子站在一个黄色的身高测量尺旁边,显得非常开心。背景是一个明亮的房间,有一些家具和装饰。

示例代码

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen-vl-max",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://img.alicdn.com/imgextra/i1/O1CN01lMJMCv29E5O54B7y5_!!6000000008035-0-tps-1024-683.jpg"},
                    {"text": "图片里是什么?"}
                ]
            }
        ]
    }
}'

1

图片想表达什么?

这张图片通过展示冰山的全貌,包括水面上和水面下的部分,传达了“冰山一角”这个成语的含义。冰山一角通常用来形容事物的表面现象,而真正的本质或深层次的问题往往隐藏在表面之下,就像冰山的大部分体积隐藏在水下一样。图片可能想表达的是,人们在面对问题或事物时,不应该只看到表面,而应该深入探究其背后的真相和本质。

代码示例

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen-vl-max",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://img.alicdn.com/imgextra/i1/O1CN01iIByHc1z7zxtfOhuy_!!6000000006668-0-tps-2334-1558.jpg"},
                    {"text": "图片想表达什么?"}
                ]
            }
        ]
    }
}'

看图做题、看图翻译

输入示例

输出示例

题目10

解这道题

image

代码示例

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen-vl-max",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://img.alicdn.com/imgextra/i1/O1CN01gDEY8M1W114Hi3XcN_!!6000000002727-0-tps-1024-406.jpg"},
                    {"text": "解这道题?"}
                ]
            }
        ]
    }
}'

巴巴

把图片里的文字翻译成中文

image

代码示例

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen-vl-max",
    "input":{
        "messages":[
            {
                "role": "user",
                "content": [
                    {"image": "https://img.alicdn.com/imgextra/i2/O1CN01tLcqfB1ghHmWEifx5_!!6000000004173-0-tps-1332-406.jpg"},
                    {"text": "把图片里的文字翻译成中文?"}
                ]
            }
        ]
    }
}'

根据图片生成前端代码

输入示例

输出示例

image

提示词:转换成HTML、CSS和基于JavaScript的界面。

大模型输出内容

要将这个界面转换成HTML、CSS和JavaScript,你可以按照以下步骤进行:

<!DOCTYPE html>
<html lang="zh-CN">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>阿里云百炼</title>
    <link rel="stylesheet" href="styles.css">
</head>
<body>
    <div class="container">
        <h1>你好,欢迎使用阿里云百炼</h1>
        <div class="search-bar">
            <div class="selected-model">
                已选择
                <div class="model-selector">
                    <img src="model-icon.png" alt="Model Icon">
                    通义千问-Max
                    <span class="dropdown-icon">▼</span>
                </div>
            </div>
            <input type="text" placeholder="输入你的问题开启模型体验吧" class="search-input">
            <button class="search-button">发送</button>
        </div>
        <div class="features">
            <div class="feature feature-1">
                <h2>模型调用</h2>
                <p>结合业务需求,选择适合的大模型,支持API/SDK的调用方式</p>
                <img src="feature-1-icon.png" alt="Feature 1 Icon">
                <div class="feature-buttons">
                    <a href="#" class="button">立即调用</a>
                    <a href="#" class="button">查看教程</a>
                </div>
            </div>
            <div class="feature feature-2">
                <h2>应用创建</h2>
                <p>基于Assistant API,支持调用Prompt、插件、RAG等多种能力,完成大模型应用的创建。</p>
                <img src="feature-2-icon.png" alt="Feature 2 Icon">
                <div class="feature-buttons">
                    <a href="#" class="button">立即创建</a>
                    <a href="#" class="button">查看教程</a>
                </div>
            </div>
        </div>
    </div>
    <script src="script.js"></script>
</body>
</html>
body {
    font-family: Arial, sans-serif;
    background: linear-gradient(to bottom, #f8f9fa, #e9ecef);
    color: #343a40;
}

.container {
    max-width: 1200px;
    margin: 0 auto;
    padding: 20px;
}

h1 {
    text-align: center;
    margin-bottom: 30px;
}

.search-bar {
    display: flex;
    align-items: center;
    margin-bottom: 30px;
}

.selected-model {
    display: flex;
    align-items: center;
    margin-right: 10px;
}

.model-selector {
    display: flex;
    align-items: center;
    border: 1px solid #ced4da;
    border-radius: 5px;
    padding: 5px 10px;
    cursor: pointer;
}

.model-selector img {
    width: 20px;
    height: 20px;
    margin-right: 5px;
}

.dropdown-icon {
    margin-left: 5px;
}

.search-input {
    flex: 1;
    padding: 10px;
    border: 1px solid #ced4da;
    border-radius: 5px;
    margin-right: 10px;
}

.search-button {
    padding: 10px 20px;
    background-color: #6c757d;
    color: #fff;
    border: none;
    border-radius: 5px;
    cursor: pointer;
}

.features {
    display: flex;
    justify-content: space-between;
}

.feature {
    flex: 1;
    padding: 20px;
    border-radius: 5px;
    margin: 10px;
}

.feature-1 {
    background-color: #5a5aff;
    color: #fff;
}

.feature-2 {
    background-color: #48dbfb;
    color: #fff;
}

.feature h2 {
    margin-bottom: 10px;
}

.feature p {
    margin-bottom: 20px;
}

.feature img {
    width: 100px;
    height: 100px;
    margin: 0 auto;
}

.feature-buttons {
    display: flex;
    justify-content: space-between;
}

.button {
    padding: 5px 10px;
    border: none;
    border-radius: 5px;
    color: #fff;
    text-decoration: none;
}

.button:hover {
    background-color: #495057;
}
document.addEventListener('DOMContentLoaded', function() {
    const modelSelector = document.querySelector('.model-selector');
    const dropdownIcon = document.querySelector('.dropdown-icon');

    modelSelector.addEventListener('click', function() {
        dropdownIcon.classList.toggle('open');
    });
});

这个HTML、CSS和JavaScript代码将创建一个类似于图片中显示的界面。你可以根据需要调整样式和功能。

基于大模型提供的代码,生成前端界面如下:

image

支持的图片

图片格式

Content Type

文件扩展名

BMP

image/bmp

.bmp

DIB

image/bmp

.dib

ICNS

image/icns

.icns

ICO

image/x-icon

.ico

JPEG

image/jpeg

.jfif, .jpe, .jpeg, .jpg

JPEG2000

image/jp2

.j2c, .j2k, .jp2, .jpc, .jpf, .jpx

PNG

image/png

.apng, .png

SGI

image/sgi

.bw, .rgb, .rgba, .sgi

TIFF

image/tiff

.tif, .tiff

WEBP

image/webp

.webp

对于输入的图片有以下限制:

  • 图片文件大小不超过10MB。

  • 输入qwen-vl-maxqwen-vl-max-latestqwen-vl-max-0809qwen-vl-plus-latestqwen-vl-plus-0809qwen2-vl-7b-instruct模型的单张图片,总的像素数不超过 12M,可以支持标准的 4K 图片;输入qwen-vl-max-0201qwen-vl-plus模型的单张图片,总的像素数不超过 1048576,相当于一张宽高均为 1024 的图片总像素数。

模型列表、计费和免费额度

商业版模型

通义千问VL模型按输入和输出的总Token数进行计费。
图像转换为Token的规则:512x512像素的图像约等于334个Token,其他分辨率图像按比例换算;最小单位是28x28像素,即每28x28像素对应一个Token,如果图像的长或宽不是28的整数倍,则向上取整至28的整数倍;一张图最少4个Token。

模型名称

版本

上下文长度

最大输入

最大输出

输入成本

输出成本

免费额度

(注)

(Token数)

(每千Token)

qwen-vl-max

相比qwen-vl-plus再次提升视觉推理和指令遵循能力,在更多复杂任务中提供最佳性能
当前等同qwen-vl-max-2024-11-19

稳定版

32,768

30,720

单图最大16384

2,048

0.003元

Batch调用:0.0015元

0.009元

Batch调用:0.0045元

100万Token

有效期:百炼开通后180天内

qwen-vl-max-latest

始终等同最新快照版

最新版

0.003元

0.009元

qwen-vl-max-2024-12-30

又称qwen-vl-max-1230

快照版

qwen-vl-max-2024-11-19

又称qwen-vl-max-1119

qwen-vl-max-2024-10-30

又称qwen-vl-max-1030

0.02元

qwen-vl-max-2024-08-09

又称qwen-vl-max-0809
此版本扩展上下文至32k,增强图像理解能力,能更好地识别图片中的多语种和手写体。

qwen-vl-max-2024-02-01

又称qwen-vl-max-0201

8,000

6,000

单图最大1280

2,000

qwen-vl-plus

大幅提升细节识别和文字识别能力,支持超百万像素分辨率和任意宽高比的图像。在广泛的视觉任务中提供卓越性能

稳定版

8,000

6,000

单图最大1280

0.0015元

Batch调用:0.00075元

0.0045元

Batch调用:0.00225元

qwen-vl-plus-latest

始终等同最新快照版

最新版

32,000

30,000

单图最大16384

2,048

0.0015元

0.0045元

qwen-vl-plus-2025-01-02

又称qwen-vl-plus-0102
大幅提升指令跟随、图像理解和数学能力。

快照版

qwen-vl-plus-2024-08-09

又称qwen-vl-plus-0809

qwen-vl-plus-2023-12-01

8,000

6,000

2,000

0.008元

开源版模型

QVQ模型是由 Qwen 团队开发的实验性研究模型,专注于提升视觉推理能力,尤其在数学推理领域。

模型名称

上下文长度

最大输入

最大输出

输入成本

输出成本

免费额度

(注)

(Token数)

(每千Token)

qvq-72b-preview

32,768

16,384

单图最大16384

16,384

0.012元

0.036元

10万Token

有效期:百炼开通后180天内

模型名称

上下文长度

最大输入

最大输出

输入成本

输出成本

免费额度

(注)

(Token数)

(每千Token)

qwen2-vl-72b-instruct

32,768

30,720

单图最大16384

2,048

目前仅供免费体验。

免费额度用完后不可调用,敬请关注后续动态。

10万Token

有效期:百炼开通后180天内

qwen2-vl-7b-instruct

32,000

30,000

单图最大16384

2,000

qwen2-vl-2b-instruct

限时免费

qwen-vl-v1

8,000

6,000

单图最大1280

1,500

目前仅供免费体验。

免费额度用完后不可调用,敬请关注后续动态。

qwen-vl-chat-v1

常见问题

  1. 我可以删除已上传的图片吗?

    在模型完成文本生成后,百炼服务器会自动将图片删除,无需手动删除。

  2. 通义千问VL模型是否支持理解视频内容?

    支持,如需使用该功能,请提交工单进行申请以及获取使用方式。

  3. 通义千问VL模型可以处理PDF、EXCEL、DOC等文本文件吗?

    不可以,通义千问VL模型属于视觉理解模型,只能处理图片格式的文件,不能处理文本文件。您可以使用Qwen-Long模型解析文档内容。

  4. 通义千问VL模型是否支持批量提交任务?

    目前qwen-vl-max、qwen-vl-plus模型兼容OpenAI Batch 接口,支持以文件方式批量提交任务。任务会以异步形式执行,并在 24 小时内返回结果。使用批量接口的费用为实时调用的 50%。

API参考

关于通义千问VL模型的输入输出参数,请参见通义千问

错误码

如果模型调用失败并返回报错信息,请参见错误码进行解决。

上一篇: 上下文缓存(Context Cache) 下一篇: 文字提取(OCR)
阿里云首页 大模型服务平台百炼 相关技术圈