Qwen-TTS 是通义千问系列的语音合成模型,支持输入中文、英文、中英混合的文本,并流式输出音频。
支持的模型
Qwen-TTS 接收文本与音色参数,输出音频。模型具有以下特点:
自然:音色真实自然,在停顿、语气、韵律等方面达到真人水准,并且可以自适应地根据输入文本调整说话语气;
稳定:提供稳定可靠的语音生成,包括中英文长难句;
快速:高速的语音生成,理论首包在400ms以内;
流式:支持音频的流式输出。
模型名称 | 版本 | 上下文长度 | 最大输入 | 最大输出 | 输入成本 | 输出成本 | 免费额度 |
(Token数) | (每千Token) | ||||||
qwen-tts 当前等同qwen-tts-2025-04-10 | 稳定版 | 8,192 | 512 | 7,680 | 0.0016元 | 0.01元 | 各100万Token 有效期:百炼开通后180天内 |
qwen-tts-latest 始终等同最新快照版 | 最新版 | ||||||
qwen-tts-2025-04-10 | 快照版 |
音频转换为 Token 的规则:每1秒的音频对应 50个 Token 。若音频时长不足1秒,则按 50个 Token 计算。
功能特性
功能特性 | 说明 |
接入方式 | Python、Java、HTTP |
流式输出 | 支持 |
流式输入 | 不支持 |
合成音频格式 |
|
合成音频采样率 | 24kHz |
时间戳 | 不支持 |
语言 | 中文、英文、中英混合 |
声音复刻 | 不支持 |
SSML | 不支持 |
支持的音色
Chelsie(女) | Cherry(女) | Ethan(男) | Serena(女) |
快速开始
您需要已获取API Key并配置API Key到环境变量。如果通过 DashScope SDK 进行调用,需要安装最新版SDK。DashScope Java SDK 版本需要不低于2.19.0,DashScope Python SDK 版本需要不低于1.23.1。
您可以通过text
指定文本,voice
指定音色,通过返回的url
来获取合成的语音。
URL 有效期为24 小时。
import os
import requests
import dashscope
text = "那我来给大家推荐一款T恤,这款呢真的是超级好看,这个颜色呢很显气质,而且呢也是搭配的绝佳单品,大家可以闭眼入,真的是非常好看,对身材的包容性也很好,不管啥身材的宝宝呢,穿上去都是很好看的。推荐宝宝们下单哦。"
response = dashscope.audio.qwen_tts.SpeechSynthesizer.call(
model="qwen-tts",
api_key=os.getenv("DASHSCOPE_API_KEY"),
text=text,
voice="Cherry",
)
audio_url = response.output.audio["url"]
save_path = "downloaded_audio.wav" # 自定义保存路径
try:
response = requests.get(audio_url)
response.raise_for_status() # 检查请求是否成功
with open(save_path, 'wb') as f:
f.write(response.content)
print(f"音频文件已保存至:{save_path}")
except Exception as e:
print(f"下载失败:{str(e)}")
// DashScope SDK 版本需要不低于 2.19.0
import com.alibaba.dashscope.aigc.multimodalconversation.AudioParameters;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.protocol.Protocol;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.net.URL;
public class Main {
private static final String MODEL = "qwen-tts";
public static void call() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.model(MODEL)
.text("Today is a wonderful day to build something people love!")
.voice(AudioParameters.Voice.CHERRY)
.build();
MultiModalConversationResult result = conv.call(param);
String audioUrl = result.getOutput().getAudio().getUrl();
System.out.print(audioUrl);
// 下载音频文件到本地
try (InputStream in = new URL(audioUrl).openStream();
FileOutputStream out = new FileOutputStream("downloaded_audio.wav")) {
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = in.read(buffer)) != -1) {
out.write(buffer, 0, bytesRead);
}
System.out.println("\n音频文件已下载到本地: downloaded_audio.wav");
} catch (Exception e) {
System.out.println("\n下载音频文件时出错: " + e.getMessage());
}
}
public static void main(String[] args) {
try {
call();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
curl -X POST 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen-tts",
"input": {
"text": "那我来给大家推荐一款T恤,这款呢真的是超级好看,这个颜色呢很显气质,而且呢也是搭配的绝佳单品,大家可以闭眼入,真的是非常好看,对身材的包容性也很好,不管啥身材的宝宝呢,穿上去都是很好看的。推荐宝宝们下单哦。",
"voice": "Chelsie"
}
}'
实时播放
Qwen-TTS 模型可以流式地将音频数据以 Base64 格式进行输出,并在最后一个数据包中包含完整音频的 URL。
# coding=utf-8
#
# Installation instructions for pyaudio:
# APPLE Mac OS X
# brew install portaudio
# pip install pyaudio
# Debian/Ubuntu
# sudo apt-get install python-pyaudio python3-pyaudio
# or
# pip install pyaudio
# CentOS
# sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
# python -m pip install pyaudio
import os
import dashscope
import pyaudio
import time
import base64
import numpy as np
p = pyaudio.PyAudio()
# 创建音频流
stream = p.open(format=pyaudio.paInt16,
channels=1,
rate=24000,
output=True)
text = "你好啊,我是通义千问"
responses = dashscope.audio.qwen_tts.SpeechSynthesizer.call(
model="qwen-tts",
api_key=os.getenv("DASHSCOPE_API_KEY"),
text=text,
voice="Ethan",
stream=True
)
for chunk in responses:
audio_string = chunk["output"]["audio"]["data"]
wav_bytes = base64.b64decode(audio_string)
audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
# 直接播放音频数据
stream.write(audio_np.tobytes())
time.sleep(0.8)
# 清理资源
stream.stop_stream()
stream.close()
p.terminate()
// DashScope SDK 版本需要不低于 2.19.0
import com.alibaba.dashscope.aigc.multimodalconversation.AudioParameters;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.protocol.Protocol;
import io.reactivex.Flowable;
import javax.sound.sampled.*;
import java.util.Base64;
public class Main {
private static final String MODEL = "qwen-tts";
public static void streamCall() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.model(MODEL)
.text("Today is a wonderful day to build something people love!")
.voice(AudioParameters.Voice.CHERRY)
.build();
Flowable<MultiModalConversationResult> result = conv.streamCall(param);
result.blockingForEach(r -> {
try {
// 1. 获取Base64编码的音频数据
String base64Data = r.getOutput().getAudio().getData();
byte[] audioBytes = Base64.getDecoder().decode(base64Data);
// 2. 配置音频格式(根据API返回的音频格式调整)
AudioFormat format = new AudioFormat(
AudioFormat.Encoding.PCM_SIGNED,
24000, // 采样率(需与API返回格式一致)
16, // 采样位数
1, // 声道数
2, // 帧大小(位数/字节数)
16000, // 数据传输率
false // 是否压缩
);
// 3. 实时播放音频数据
DataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
try (SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info)) {
if (line != null) {
line.open(format);
line.start();
line.write(audioBytes, 0, audioBytes.length);
line.drain();
}
}
} catch (LineUnavailableException e) {
e.printStackTrace();
}
});
}
public static void main(String[] args) {
try {
streamCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
curl -X POST 'https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
"model": "qwen-tts",
"input": {
"text": "那我来给大家推荐一款T恤,这款呢真的是超级好看,这个颜色呢很显气质,而且呢也是搭配的绝佳单品,大家可以闭眼入,真的是非常好看,对身材的包容性也很好,不管啥身材的宝宝呢,穿上去都是很好看的。推荐宝宝们下单哦。",
"voice": "Chelsie"
}
}'
API 参考
请参见语音合成-通义千问。
常见问题
Q:音频文件链接的有效期是多久?
A:24小时后音频文件链接将失效。