通义千问Audio是阿里云研发的大规模音频语言模型,能够接受多种音频(包括说话人语音、自然声音、音乐、歌声)和文本作为输入,并输出文本。通义千问Audio不仅能对输入的音频进行转录,还具备更深层次的语义理解、情感分析、音频事件检测、语音聊天等能力。
功能介绍
识别多种音频
示例场景
语音翻译 | 音频问答 | ||
输入 |
| 输入 |
|
输出 |
| 输出 |
|
基于音频进行创作 | 语音聊天 | ||
输入 |
| 输入 | |
输出 | 输出 |
|
支持的模型
建议优先使用开源版模型qwen2-audio-instruct,它是目前最新的模型,能力最强。同时,qwen2-audio-instruct还支持使用音频本身进行对话,适用于语音聊天场景。如果您的项目依赖更早的模型,您也可以考虑测试以及迁移至最新模型。
开始使用
您需要已获取API-KEY并配置API-KEY到环境变量。如果通过SDK调用,还需要安装DashScope SDK。
基本示例
Python
请求示例
import dashscope
messages = [
{
"role": "user",
"content": [
{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"},
{"text": "这段音频在说什么?"}
]
}
]
response = dashscope.MultiModalConversation.call(
model="qwen2-audio-instruct",
messages=messages,
result_format="message"
)
print(response)
响应示例
{
"status_code": 200,
"request_id": "dd6b6c89-8550-9151-ac11-df31f17b026e",
"code": "",
"message": "",
"output": {
"text": null,
"finish_reason": null,
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": [
{
"text": "这段音频说的是:'欢迎使用阿里云'"
}
]
}
}
]
},
"usage": {
"input_tokens": 33,
"output_tokens": 10,
"audio_tokens": 85
}
}
Java
请求示例
import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"),
Collections.singletonMap("text", "这段音频在说什么?")))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.model("qwen2-audio-instruct")
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(JsonUtils.toJson(result));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
响应示例
{
"requestId": "d3b95bb7-6b8c-916d-a81f-d4e348efaf55",
"usage": {
"input_tokens": 33,
"output_tokens": 10
},
"output": {
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": [
{
"text": "这段音频说的是:'欢迎使用阿里云'"
}
]
}
}
]
}
}
curl
请求示例
curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen2-audio-instruct",
"input":{
"messages":[
{
"role": "system",
"content": [
{"text": "You are a helpful assistant."}
]
},
{
"role": "user",
"content": [
{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"},
{"text": "这段音频在说什么?"}
]
}
]
}
}'
响应示例
{
"output": {
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": [
{
"text": "这段音频说的是:'欢迎使用阿里云'"
}
]
}
}
]
},
"usage": {
"audio_tokens": 85,
"output_tokens": 10,
"input_tokens": 33
},
"request_id": "1341c517-71bf-94f5-862a-18fbddb332e9"
}
使用本地文件
您可以参考以下示例代码,调用通义千问 Audio模型处理本地文件。以下代码使用的示例音频文件为:welcome.mp3
Python
请求示例
from dashscope import MultiModalConversation
# 请用您的本地音频的绝对路径替换 ABSOLUTE_PATH/welcome.mp3
audio_file_path = "file://ABSOLUTE_PATH/welcome.mp3"
messages = [
{
"role": "system",
"content": [{"text": "You are a helpful assistant."}]},
{
"role": "user",
"content": [{"audio": audio_file_path}, {"text": "音频里在说什么?"}],
}
]
response = MultiModalConversation.call(model="qwen2-audio-instruct", messages=messages)
print(response)
响应示例
{
"status_code": 200,
"request_id": "dd6b6c89-8550-9151-ac11-df31f17b026e",
"code": "",
"message": "",
"output": {
"text": null,
"finish_reason": null,
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": [
{
"text": "这段音频说的是:'欢迎使用阿里云'"
}
]
}
}
]
},
"usage": {
"input_tokens": 33,
"output_tokens": 10,
"audio_tokens": 85
}
}
Java
请求示例
import java.util.Arrays;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
public static void callWithLocalFile()
throws ApiException, NoApiKeyException, UploadFileException {
// 请用您本地文件的实际路径替换掉ABSOLUTE_PATH/welcome.mp3
// 当前测试系统为macOS。如果您使用Windows系统,请用"file:///ABSOLUTE_PATH/welcome.mp3"代替。
String localFilePath = "file://ABSOLUTE_PATH/welcome.mp3";
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(new HashMap<String, Object>(){{put("audio", localFilePath);}},
new HashMap<String, Object>(){{put("text", "这段音频在说什么?");}}))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.model("qwen2-audio-instruct")
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(JsonUtils.toJson(result));
}
public static void main(String[] args) {
try {
callWithLocalFile();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
响应示例
{
"requestId": "d3efa06a-2126-94e4-9db8-52f550eebbe2",
"usage": {
"input_tokens": 33,
"output_tokens": 10
},
"output": {
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": [
{
"text": "这段音频说的是:'欢迎使用阿里云'"
}
]
}
}
]
}
}
多轮对话
通义千问Audio模型可以参考历史对话信息进行回复。您可以参考以下示例代码,实现多轮对话的功能。
Python
请求示例
from dashscope import MultiModalConversation
messages = [
{
"role": "user",
"content": [
{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"},
{"text": "这段音频在说什么?"},
]
}
]
response = MultiModalConversation.call(model='qwen2-audio-instruct', messages=messages)
print("第1次回复:", response)
# 将模型回复到messages中,并添加新的用户消息
messages.append({
'role': response.output.choices[0].message.role,
'content': response.output.choices[0].message.content
})
messages.append({
"role": "user",
"content": [
{"text": "简单介绍这家公司。"}
]
})
response = MultiModalConversation.call(model='qwen2-audio-instruct', messages=messages)
print("第2次回复:", response)
响应示例
第1次回复: {"status_code": 200, "request_id": "03084263-bc78-985d-9357-583f355d6a80", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "stop", "message": {"role": "assistant", "content": [{"text": "这段音频说的是:'欢迎使用阿里云'"}]}}]}, "usage": {"input_tokens": 33, "output_tokens": 10, "audio_tokens": 85}}
第2次回复: {"status_code": 200, "request_id": "27ebd962-f67c-9ca5-9510-be26d7988ca6", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "stop", "message": {"role": "assistant", "content": [{"text": "阿里巴巴集团是一家总部位于中国杭州的全球领先的电子商务和科技公司,成立于1999年。阿里巴巴集团旗下拥有包括淘宝、天猫、支付宝、菜鸟网络等在内的多个知名业务,涉及电商、金融、物流、云计算等多个领域。阿里巴巴在全球范围内开展业务,业务覆盖超过200个国家和地区,员工数量超过10万人。"}]}}]}, "usage": {"input_tokens": 56, "output_tokens": 74, "audio_tokens": 85}}
Java
请求示例
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
private static final String modelName = "qwen2-audio-instruct";
public static void MultiRoundConversationCall() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage systemMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
.content(Arrays.asList(Collections.singletonMap("text", "You are a helpful assistant."))).build();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(Collections.singletonMap("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"),
Collections.singletonMap("text", "这段音频在说什么?"))).build();
List<MultiModalMessage> messages = new ArrayList<>();
messages.add(systemMessage);
messages.add(userMessage);
MultiModalConversationParam param = MultiModalConversationParam.builder()
.model(modelName)
.messages(messages)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println("第一轮回复:"+JsonUtils.toJson(result));
// add the result to conversation
messages.add(result.getOutput().getChoices().get(0).getMessage());
MultiModalMessage msg = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(Collections.singletonMap("text", "介绍一下这家公司?"))).build();
messages.add(msg);
// new messages
param.setMessages((List)messages);
result = conv.call(param);
System.out.println("第二轮回复:"+JsonUtils.toJson(result));
}
public static void main(String[] args) {
try {
MultiRoundConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
响应示例
第一轮回复:{"requestId":"2ddc6246-453e-96ed-b047-837f42c239d4","usage":{"input_tokens":33,"output_tokens":10},"output":{"choices":[{"finish_reason":"stop","message":{"role":"assistant","content":[{"text":"这段音频说的是:'欢迎使用阿里云'"}]}}]}}
第二轮回复:{"requestId":"463c4851-1b5f-95f8-8484-f35339a0c567","usage":{"input_tokens":55,"output_tokens":74},"output":{"choices":[{"finish_reason":"stop","message":{"role":"assistant","content":[{"text":"阿里巴巴集团是一家总部位于中国杭州的全球领先的电子商务和科技公司,成立于1999年。阿里巴巴集团旗下拥有包括淘宝、天猫、支付宝、菜鸟网络等在内的多个知名业务,涉及电商、金融、物流、云计算等多个领域。阿里巴巴在全球范围内开展业务,业务覆盖超过200个国家和地区,员工数量超过10万人。"}]}}]}}
curl
请求示例
curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen2-audio-instruct",
"input":{
"messages":[
{
"role": "system",
"content": [
{"text": "You are a helpful assistant."}
]
},
{
"role": "user",
"content": [
{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"},
{"text": "这段音频在说什么?"}
]
},
{
"role": "assistant",
"content": [
{"text": "这段音频说的是:'欢迎使用阿里云'"}
]
},
{
"role": "user",
"content": [
{"text": "简单介绍这家公司。"}
]
}
]
}
}'
响应示例
{
"output": {
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": [
{
"text": "阿里巴巴集团是一家总部位于中国杭州的全球领先的电子商务和科技公司,成立于1999年。阿里巴巴集团旗下拥有包括淘宝、天猫、支付宝、菜鸟网络等在内的多个知名业务,旗下员工数量超过10万人,业务覆盖了全球200多个国家和地区。\n\n阿里巴巴的愿景是让世界各地的企业都能够平等地进行贸易,让小企业通过数字化技术实现更好的发展。阿里巴巴秉持开放、合作、共赢的理念,致力于打造一个开放、包容、公平的数字经济生态系统,为全球数字经济的发展做出贡献。\n\n除了电子商务和科技业务外,阿里巴巴还涉足云计算、数字娱乐、金融等多个领域,不断壮大和发展自己的业务版图。阿里巴巴在全球范围内开展业务,旨在通过科技创新和优质的服务,推动全球经济的可持续发展。"
}
]
}
}
]
},
"usage": {
"audio_tokens": 85,
"output_tokens": 156,
"input_tokens": 55
},
"request_id": "9621b9e4-9933-9e2d-a7a8-ad02305a009c"
}
流式输出
模型并不是一次性生成最终结果,而是逐步地生成中间结果,最终结果由中间结果拼接而成。使用非流式输出方式需要等待模型生成结束后再将生成的中间结果拼接后返回,而流式输出可以实时地将中间结果返回,您可以在模型进行输出的同时进行阅读,减少等待模型回复的时间。
Python
请求示例
import dashscope
messages = [
{
"role": "user",
"content": [
{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"},
{"text": "这段音频在说什么?"}
]
}
]
response = dashscope.MultiModalConversation.call(
model="qwen2-audio-instruct",
messages=messages,
stream=True,
incremental_output=True,
result_format="message"
)
for chunk in response:
print(chunk)
响应示例
{"status_code": 200, "request_id": "f877cf8d-6109-9e84-b68f-9c618e0975fe", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "这段"}]}}]}, "usage": {"input_tokens": 33, "output_tokens": 1, "audio_tokens": 85}}
{"status_code": 200, "request_id": "f877cf8d-6109-9e84-b68f-9c618e0975fe", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "音频"}]}}]}, "usage": {"input_tokens": 33, "output_tokens": 2, "audio_tokens": 85}}
{"status_code": 200, "request_id": "f877cf8d-6109-9e84-b68f-9c618e0975fe", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "说的是"}]}}]}, "usage": {"input_tokens": 33, "output_tokens": 3, "audio_tokens": 85}}
{"status_code": 200, "request_id": "f877cf8d-6109-9e84-b68f-9c618e0975fe", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": []}}]}, "usage": {"input_tokens": 33, "output_tokens": 3, "audio_tokens": 85}}
{"status_code": 200, "request_id": "f877cf8d-6109-9e84-b68f-9c618e0975fe", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": ":'欢迎"}]}}]}, "usage": {"input_tokens": 33, "output_tokens": 5, "audio_tokens": 85}}
{"status_code": 200, "request_id": "f877cf8d-6109-9e84-b68f-9c618e0975fe", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "使用"}]}}]}, "usage": {"input_tokens": 33, "output_tokens": 6, "audio_tokens": 85}}
{"status_code": 200, "request_id": "f877cf8d-6109-9e84-b68f-9c618e0975fe", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "阿里"}]}}]}, "usage": {"input_tokens": 33, "output_tokens": 7, "audio_tokens": 85}}
{"status_code": 200, "request_id": "f877cf8d-6109-9e84-b68f-9c618e0975fe", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "云"}]}}]}, "usage": {"input_tokens": 33, "output_tokens": 8, "audio_tokens": 85}}
{"status_code": 200, "request_id": "f877cf8d-6109-9e84-b68f-9c618e0975fe", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": []}}]}, "usage": {"input_tokens": 33, "output_tokens": 8, "audio_tokens": 85}}
{"status_code": 200, "request_id": "f877cf8d-6109-9e84-b68f-9c618e0975fe", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": []}}]}, "usage": {"input_tokens": 33, "output_tokens": 8, "audio_tokens": 85}}
{"status_code": 200, "request_id": "f877cf8d-6109-9e84-b68f-9c618e0975fe", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "null", "message": {"role": "assistant", "content": [{"text": "'"}]}}]}, "usage": {"input_tokens": 33, "output_tokens": 10, "audio_tokens": 85}}
{"status_code": 200, "request_id": "f877cf8d-6109-9e84-b68f-9c618e0975fe", "code": "", "message": "", "output": {"text": null, "finish_reason": null, "choices": [{"finish_reason": "stop", "message": {"role": "assistant", "content": []}}]}, "usage": {"input_tokens": 33, "output_tokens": 10, "audio_tokens": 85}}
Java
请求示例
import java.util.Arrays;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import io.reactivex.Flowable;
public class Main {
public static void streamCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
// must create mutable map.
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(new HashMap<String, Object>(){{put("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3");}},
new HashMap<String, Object>(){{put("text", "这段音频在说什么?");}})).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.model("qwen2-audio-instruct")
.message(userMessage)
.incrementalOutput(true)
.build();
Flowable<MultiModalConversationResult> result = conv.streamCall(param);
result.blockingForEach(item -> {
System.out.println(JsonUtils.toJson(item));
});
}
public static void main(String[] args) {
try {
streamCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
响应示例
{"requestId":"92d46d3c-5c0a-96d3-8969-8845820cd6ce","usage":{"input_tokens":33,"output_tokens":1},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"这段"}]}}]}}
{"requestId":"92d46d3c-5c0a-96d3-8969-8845820cd6ce","usage":{"input_tokens":33,"output_tokens":2},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"音频"}]}}]}}
{"requestId":"92d46d3c-5c0a-96d3-8969-8845820cd6ce","usage":{"input_tokens":33,"output_tokens":3},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"说的是"}]}}]}}
{"requestId":"92d46d3c-5c0a-96d3-8969-8845820cd6ce","usage":{"input_tokens":33,"output_tokens":3},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[]}}]}}
{"requestId":"92d46d3c-5c0a-96d3-8969-8845820cd6ce","usage":{"input_tokens":33,"output_tokens":5},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":":'欢迎"}]}}]}}
{"requestId":"92d46d3c-5c0a-96d3-8969-8845820cd6ce","usage":{"input_tokens":33,"output_tokens":6},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"使用"}]}}]}}
{"requestId":"92d46d3c-5c0a-96d3-8969-8845820cd6ce","usage":{"input_tokens":33,"output_tokens":7},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"阿里"}]}}]}}
{"requestId":"92d46d3c-5c0a-96d3-8969-8845820cd6ce","usage":{"input_tokens":33,"output_tokens":8},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"云"}]}}]}}
{"requestId":"92d46d3c-5c0a-96d3-8969-8845820cd6ce","usage":{"input_tokens":33,"output_tokens":8},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[]}}]}}
{"requestId":"92d46d3c-5c0a-96d3-8969-8845820cd6ce","usage":{"input_tokens":33,"output_tokens":8},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[]}}]}}
{"requestId":"92d46d3c-5c0a-96d3-8969-8845820cd6ce","usage":{"input_tokens":33,"output_tokens":10},"output":{"choices":[{"finish_reason":"null","message":{"role":"assistant","content":[{"text":"'"}]}}]}}
{"requestId":"92d46d3c-5c0a-96d3-8969-8845820cd6ce","usage":{"input_tokens":33,"output_tokens":10},"output":{"choices":[{"finish_reason":"stop","message":{"role":"assistant","content":[]}}]}}
curl
请求示例
curl -X POST 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
"model": "qwen2-audio-instruct",
"input":{
"messages":[
{
"role": "system",
"content": [
{"text": "You are a helpful assistant."}
]
},
{
"role": "user",
"content": [
{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"},
{"text": "这段音频在说什么?"}
]
}
]
},
"parameters": {
"incremental_output": true
}
}'
响应示例
id:1
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"这段"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"audio_tokens":85,"input_tokens":33,"output_tokens":1},"request_id":"1f297a6c-f8b1-90c4-8b64-3d70796a78f1"}
id:2
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"音频"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"audio_tokens":85,"input_tokens":33,"output_tokens":2},"request_id":"1f297a6c-f8b1-90c4-8b64-3d70796a78f1"}
......
id:10
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[],"role":"assistant"},"finish_reason":"null"}]},"usage":{"audio_tokens":85,"input_tokens":33,"output_tokens":8},"request_id":"1f297a6c-f8b1-90c4-8b64-3d70796a78f1"}
id:11
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[{"text":"'"}],"role":"assistant"},"finish_reason":"null"}]},"usage":{"audio_tokens":85,"input_tokens":33,"output_tokens":10},"request_id":"1f297a6c-f8b1-90c4-8b64-3d70796a78f1"}
id:12
event:result
:HTTP_STATUS/200
data:{"output":{"choices":[{"message":{"content":[],"role":"assistant"},"finish_reason":"stop"}]},"usage":{"audio_tokens":85,"input_tokens":33,"output_tokens":10},"request_id":"1f297a6c-f8b1-90c4-8b64-3d70796a78f1"}
语音对话
您可以直接用语音向模型发出指令,无需输入文本指令。例如:如果音频中包含内容“这种环境下适合做什么”,模型会回复适合做的事情,而不是返回这段语音的文本。
当前仅qwen2-audio-instruct模型支持语音聊天。
Python
请求示例
import dashscope
messages = [
{
"role": "user",
"content": [
{"audio": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/kvkadk/%E6%8E%A8%E8%8D%90%E4%B9%A6.wav"}
]
}
]
response = dashscope.MultiModalConversation.call(model='qwen2-audio-instruct', messages=messages)
print(response)
响应示例
{
"status_code": 200,
"request_id": "84f2bfbe-71a0-9291-b711-bdc615b5aea3",
"code": "",
"message": "",
"output": {
"text": null,
"finish_reason": null,
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": [
{
"text": "当然可以,不过需要先了解你的兴趣方向。你喜欢哪种类型的文学作品呢?比如小说、散文、诗歌还是戏剧?"
}
]
}
}
]
},
"usage": {
"input_tokens": 28,
"output_tokens": 28,
"audio_tokens": 237
}
}
Java
请求示例
import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(Collections.singletonMap("audio","https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/kvkadk/%E6%8E%A8%E8%8D%90%E4%B9%A6.wav")))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.model("qwen2-audio-instruct")
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(JsonUtils.toJson(result));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
响应示例
{
"requestId": "2ae264d3-507e-9580-9100-afee4cd3a005",
"usage": {
"input_tokens": 28,
"output_tokens": 20
},
"output": {
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": [
{
"text": "当然可以,不过需要先了解你的兴趣方向。你更喜欢哪种类型的文学作品呢?比如小说、散文、诗歌等。"
}
]
}
}
]
}
}
curl
请求示例
curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen2-audio-instruct",
"input":{
"messages":[
{
"role": "system",
"content": [
{"text": "You are a helpful assistant."}
]
},
{
"role": "user",
"content": [
{"audio": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/kvkadk/%E6%8E%A8%E8%8D%90%E4%B9%A6.wav"} ]
}
]
}
}'
响应示例
{
"output": {
"choices": [
{
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": [
{
"text": "当然可以,不过需要先了解你的兴趣方向。你更喜欢哪种类型的文学作品呢?比如小说、散文、诗歌等。"
}
]
}
}
]
},
"usage": {
"audio_tokens": 237,
"output_tokens": 29,
"input_tokens": 28
},
"request_id": "ae407255-2fed-9e5a-90e6-6dab3178e913"
}
支持的音频文件
音频文件大小不超过10 MB。
音频的时长建议不超过30秒,如果超过30秒,模型会自动截取前30秒的音频。
音频文件的格式支持大部分常见编码的音频格式,例如AMR、WAV(CodecID: GSM_MS)、WAV(PCM)、3GP、3GPP、AAC、MP3等。
音频中支持的语言包括中文、英语、粤语、法语、意大利语、西班牙语、德语和日语。
API参考
关于模型调用的输入输出参数,请参见通过API使用通义千问。