实时语音合成-通义千问提供低延迟、流式文本输入与流式音频输出能力,提供多种拟人音色,支持多语种/方言合成,可在同一音色下输出多语种,并能自适应调节语气,流畅处理复杂文本。
核心功能
实时生成高保真语音,支持中英等多语种自然发声
提供声音复刻能力,快速定制个性化音色
支持流式输入输出,低延迟响应实时交互场景
可调节语速、语调、音量与码率,精细控制语音表现
兼容主流音频格式,最高支持48kHz采样率输出
适用范围
支持的地域:
支持的模型
中国大陆(北京)
通义千问3-TTS-VC-Realtime:qwen3-tts-vc-realtime-2025-11-27(快照版)
通义千问3-TTS-Flash-Realtime:qwen3-tts-flash-realtime(稳定版,当前等同qwen3-tts-flash-realtime-2025-09-18)、qwen3-tts-flash-realtime-2025-11-27(最新快照版)、qwen3-tts-flash-realtime-2025-09-18(快照版)
通义千问-TTS-Realtime:qwen-tts-realtime(稳定版,当前等同qwen-tts-realtime-2025-07-15)、qwen-tts-realtime-latest(最新版,当前等同qwen-tts-realtime-2025-07-15)、qwen-tts-realtime-2025-07-15(快照版)
国际(新加坡)
通义千问3-TTS-VC-Realtime:qwen3-tts-vc-realtime-2025-11-27(快照版)
通义千问3-TTS-Flash-Realtime:qwen3-tts-flash-realtime(稳定版,当前等同qwen3-tts-flash-realtime-2025-09-18)、qwen3-tts-flash-realtime-2025-11-27(最新快照版)、qwen3-tts-flash-realtime-2025-09-18(快照版)
更多信息请参见模型列表
模型选型
场景 | 推荐模型 | 理由 | 注意事项 |
品牌形象语音定制/个性化语音克隆服务 | qwen3-tts-vc-realtime-2025-11-27 | 支持声音复刻,打造拟人化品牌声纹 | 不支持使用默认音色 |
智能客服与对话机器人 | qwen3-tts-flash-realtime-2025-11-27 | 支持流式输入输出,可调节语速音高,提供自然交互体验;多音频格式输出适配不同终端 | 仅支持默认音色,不支持声音复刻 |
多语种内容播报 | qwen3-tts-flash-realtime-2025-11-27 | 支持多种语言与中文方言,覆盖全球化内容分发需求 | 仅支持默认音色,不支持声音复刻 |
有声阅读与内容生产 | qwen3-tts-flash-realtime-2025-11-27 | 可调节音量、语速、音高,满足有声书、播客等内容精细化制作需求 | 仅支持默认音色,不支持声音复刻 |
电商直播与短视频配音 | qwen3-tts-flash-realtime-2025-11-27 | 支持 mp3/opus 压缩格式,适合带宽受限场景;可调节参数满足不同风格配音需求 | 仅支持默认音色,不支持声音复刻 |
更多说明请参见模型功能特性对比
快速开始
运行代码前,需要获取并配置 API Key。如果通过SDK调用,还需要安装最新版DashScope SDK。
使用默认音色进行语音合成
以下示例演示如何使用默认音色(参见音色列表)进行语音合成。
使用DashScope SDK
Python
DashScope Python SDK 版本不能低于1.25.2。
server commit模式
import os
import base64
import threading
import time
import dashscope
from dashscope.audio.qwen_tts_realtime import *
qwen_tts_realtime: QwenTtsRealtime = None
text_to_synthesize = [
'对吧~我就特别喜欢这种超市,',
'尤其是过年的时候',
'去逛超市',
'就会觉得',
'超级超级开心!',
'想买好多好多的东西呢!'
]
DO_VIDEO_TEST = False
def init_dashscope_api_key():
"""
Set your DashScope API-key. More information:
https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
"""
# 新加坡和北京地域的API Key不同。获取API Key:https://help.aliyun.com/zh/model-studio/get-api-key
if 'DASHSCOPE_API_KEY' in os.environ:
dashscope.api_key = os.environ[
'DASHSCOPE_API_KEY'] # load API-key from environment variable DASHSCOPE_API_KEY
else:
dashscope.api_key = 'your-dashscope-api-key' # set API-key manually
class MyCallback(QwenTtsRealtimeCallback):
def __init__(self):
self.complete_event = threading.Event()
self.file = open('result_24k.pcm', 'wb')
def on_open(self) -> None:
print('connection opened, init player')
def on_close(self, close_status_code, close_msg) -> None:
self.file.close()
print('connection closed with code: {}, msg: {}, destroy player'.format(close_status_code, close_msg))
def on_event(self, response: str) -> None:
try:
global qwen_tts_realtime
type = response['type']
if 'session.created' == type:
print('start session: {}'.format(response['session']['id']))
if 'response.audio.delta' == type:
recv_audio_b64 = response['delta']
self.file.write(base64.b64decode(recv_audio_b64))
if 'response.done' == type:
print(f'response {qwen_tts_realtime.get_last_response_id()} done')
if 'session.finished' == type:
print('session finished')
self.complete_event.set()
except Exception as e:
print('[Error] {}'.format(e))
return
def wait_for_finished(self):
self.complete_event.wait()
if __name__ == '__main__':
init_dashscope_api_key()
print('Initializing ...')
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
model='qwen3-tts-flash-realtime',
callback=callback,
# 以下为北京地域url,若使用新加坡地域的模型,需将url替换为:wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime
url='wss://dashscope.aliyuncs.com/api-ws/v1/realtime'
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice = 'Cherry',
response_format = AudioFormat.PCM_24000HZ_MONO_16BIT,
mode = 'server_commit'
)
for text_chunk in text_to_synthesize:
print(f'send texd: {text_chunk}')
qwen_tts_realtime.append_text(text_chunk)
time.sleep(0.1)
qwen_tts_realtime.finish()
callback.wait_for_finished()
print('[Metric] session: {}, first audio delay: {}'.format(
qwen_tts_realtime.get_session_id(),
qwen_tts_realtime.get_first_audio_delay(),
))
commit模式
import base64
import os
import threading
import dashscope
from dashscope.audio.qwen_tts_realtime import *
qwen_tts_realtime: QwenTtsRealtime = None
text_to_synthesize = [
'这是第一句话。',
'这是第二句话。',
'这是第三句话。',
]
DO_VIDEO_TEST = False
def init_dashscope_api_key():
"""
Set your DashScope API-key. More information:
https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
"""
# 新加坡和北京地域的API Key不同。获取API Key:https://help.aliyun.com/zh/model-studio/get-api-key
if 'DASHSCOPE_API_KEY' in os.environ:
dashscope.api_key = os.environ[
'DASHSCOPE_API_KEY'] # load API-key from environment variable DASHSCOPE_API_KEY
else:
dashscope.api_key = 'your-dashscope-api-key' # set API-key manually
class MyCallback(QwenTtsRealtimeCallback):
def __init__(self):
super().__init__()
self.response_counter = 0
self.complete_event = threading.Event()
self.file = open(f'result_{self.response_counter}_24k.pcm', 'wb')
def reset_event(self):
self.response_counter += 1
self.file = open(f'result_{self.response_counter}_24k.pcm', 'wb')
self.complete_event = threading.Event()
def on_open(self) -> None:
print('connection opened, init player')
def on_close(self, close_status_code, close_msg) -> None:
print('connection closed with code: {}, msg: {}, destroy player'.format(close_status_code, close_msg))
def on_event(self, response: str) -> None:
try:
global qwen_tts_realtime
type = response['type']
if 'session.created' == type:
print('start session: {}'.format(response['session']['id']))
if 'response.audio.delta' == type:
recv_audio_b64 = response['delta']
self.file.write(base64.b64decode(recv_audio_b64))
if 'response.done' == type:
print(f'response {qwen_tts_realtime.get_last_response_id()} done')
self.complete_event.set()
self.file.close()
if 'session.finished' == type:
print('session finished')
self.complete_event.set()
except Exception as e:
print('[Error] {}'.format(e))
return
def wait_for_response_done(self):
self.complete_event.wait()
if __name__ == '__main__':
init_dashscope_api_key()
print('Initializing ...')
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
model='qwen3-tts-flash-realtime',
callback=callback,
# 以下为北京地域url,若使用新加坡地域的模型,需将url替换为:wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime
url='wss://dashscope.aliyuncs.com/api-ws/v1/realtime'
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice = 'Cherry',
response_format = AudioFormat.PCM_24000HZ_MONO_16BIT,
mode = 'commit'
)
print(f'send texd: {text_to_synthesize[0]}')
qwen_tts_realtime.append_text(text_to_synthesize[0])
qwen_tts_realtime.commit()
callback.wait_for_response_done()
callback.reset_event()
print(f'send texd: {text_to_synthesize[1]}')
qwen_tts_realtime.append_text(text_to_synthesize[1])
qwen_tts_realtime.commit()
callback.wait_for_response_done()
callback.reset_event()
print(f'send texd: {text_to_synthesize[2]}')
qwen_tts_realtime.append_text(text_to_synthesize[2])
qwen_tts_realtime.commit()
callback.wait_for_response_done()
qwen_tts_realtime.finish()
print('[Metric] session: {}, first audio delay: {}'.format(
qwen_tts_realtime.get_session_id(),
qwen_tts_realtime.get_first_audio_delay(),
))Java
DashScope Java SDK 版本不能低于2.21.16。
server commit模式
// Dashscope SDK 版本不低于2.21.16
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.AudioSystem;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class Main {
static String[] textToSynthesize = {
"对吧~我就特别喜欢这种超市",
"尤其是过年的时候",
"去逛超市",
"就会觉得",
"超级超级开心!",
"想买好多好多的东西呢!"
};
// 实时PCM音频播放器类
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private AudioFormat audioFormat;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
// 构造函数初始化音频格式和音频线路
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
RawAudioBuffer.add(rawAudio);
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
playerThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
byte[] rawAudio = RawAudioBuffer.poll();
if (rawAudio != null) {
try {
playChunk(rawAudio);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
decoderThread.start();
playerThread.start();
}
// 播放一个音频块并阻塞直到播放完成
private void playChunk(byte[] chunk) throws IOException, InterruptedException {
if (chunk == null || chunk.length == 0) return;
int bytesWritten = 0;
while (bytesWritten < chunk.length) {
bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
}
int audioLength = chunk.length / (this.sampleRate*2/1000);
// 等待缓冲区中的音频播放完成
Thread.sleep(audioLength - 10);
}
public void write(String b64Audio) {
b64AudioBuffer.add(b64Audio);
}
public void cancel() {
b64AudioBuffer.clear();
RawAudioBuffer.clear();
}
public void waitForComplete() throws InterruptedException {
while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
Thread.sleep(100);
}
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) {
line.drain();
line.close();
}
}
}
public static void main(String[] args) throws InterruptedException, LineUnavailableException, FileNotFoundException {
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
.model("qwen3-tts-flash-realtime")
// 以下为北京地域url,若使用新加坡地域的模型,需将url替换为:wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime
.url("wss://dashscope.aliyuncs.com/api-ws/v1/realtime")
// 新加坡和北京地域的API Key不同。获取API Key:https://help.aliyun.com/zh/model-studio/get-api-key
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
// 创建实时音频播放器实例
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
@Override
public void onOpen() {
// 连接建立时的处理
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
// 会话创建时的处理
break;
case "response.audio.delta":
String recvAudioB64 = message.get("delta").getAsString();
// 实时播放音频
audioPlayer.write(recvAudioB64);
break;
case "response.done":
// 响应完成时的处理
break;
case "session.finished":
// 会话结束时的处理
completeLatch.get().countDown();
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
// 连接关闭时的处理
}
});
qwenTtsRef.set(qwenTtsRealtime);
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice("Cherry")
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("server_commit")
.build();
qwenTtsRealtime.updateSession(config);
for (String text:textToSynthesize) {
qwenTtsRealtime.appendText(text);
Thread.sleep(100);
}
qwenTtsRealtime.finish();
completeLatch.get().await();
qwenTtsRealtime.close();
// 等待音频播放完成并关闭播放器
audioPlayer.waitForComplete();
audioPlayer.shutdown();
System.exit(0);
}
}commit模式
// Dashscope SDK 版本不低于2.21.16
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.AudioSystem;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Base64;
import java.util.Queue;
import java.util.Scanner;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class commit {
// 实时PCM音频播放器类
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private AudioFormat audioFormat;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
// 构造函数初始化音频格式和音频线路
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
RawAudioBuffer.add(rawAudio);
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
playerThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
byte[] rawAudio = RawAudioBuffer.poll();
if (rawAudio != null) {
try {
playChunk(rawAudio);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
decoderThread.start();
playerThread.start();
}
// 播放一个音频块并阻塞直到播放完成
private void playChunk(byte[] chunk) throws IOException, InterruptedException {
if (chunk == null || chunk.length == 0) return;
int bytesWritten = 0;
while (bytesWritten < chunk.length) {
bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
}
int audioLength = chunk.length / (this.sampleRate*2/1000);
// 等待缓冲区中的音频播放完成
Thread.sleep(audioLength - 10);
}
public void write(String b64Audio) {
b64AudioBuffer.add(b64Audio);
}
public void cancel() {
b64AudioBuffer.clear();
RawAudioBuffer.clear();
}
public void waitForComplete() throws InterruptedException {
// 等待所有缓冲区中的音频数据播放完成
while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
Thread.sleep(100);
}
// 等待音频线路播放完成
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) {
line.drain();
line.close();
}
}
}
public static void main(String[] args) throws InterruptedException, LineUnavailableException, FileNotFoundException {
Scanner scanner = new Scanner(System.in);
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
.model("qwen3-tts-flash-realtime")
// 以下为北京地域url,若使用新加坡地域的模型,需将url替换为:wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime
.url("wss://dashscope.aliyuncs.com/api-ws/v1/realtime")
// 新加坡和北京地域的API Key不同。获取API Key:https://help.aliyun.com/zh/model-studio/get-api-key
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
// 创建实时播放器实例
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
// File file = new File("result_24k.pcm");
// FileOutputStream fos = new FileOutputStream(file);
@Override
public void onOpen() {
System.out.println("connection opened");
System.out.println("输入文本并按Enter发送,输入'quit'退出程序");
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
System.out.println("start session: " + message.get("session").getAsJsonObject().get("id").getAsString());
break;
case "response.audio.delta":
String recvAudioB64 = message.get("delta").getAsString();
byte[] rawAudio = Base64.getDecoder().decode(recvAudioB64);
// fos.write(rawAudio);
// 实时播放音频
audioPlayer.write(recvAudioB64);
break;
case "response.done":
System.out.println("response done");
// 等待音频播放完成
try {
audioPlayer.waitForComplete();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
// 为下一次输入做准备
completeLatch.get().countDown();
break;
case "session.finished":
System.out.println("session finished");
if (qwenTtsRef.get() != null) {
System.out.println("[Metric] response: " + qwenTtsRef.get().getResponseId() +
", first audio delay: " + qwenTtsRef.get().getFirstAudioDelay() + " ms");
}
completeLatch.get().countDown();
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
System.out.println("connection closed code: " + code + ", reason: " + reason);
try {
// fos.close();
// 等待播放完成并关闭播放器
audioPlayer.waitForComplete();
audioPlayer.shutdown();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
});
qwenTtsRef.set(qwenTtsRealtime);
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice("Cherry")
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("commit")
.build();
qwenTtsRealtime.updateSession(config);
// 循环读取用户输入
while (true) {
System.out.print("请输入要合成的文本: ");
String text = scanner.nextLine();
// 如果用户输入quit,则退出程序
if ("quit".equalsIgnoreCase(text.trim())) {
System.out.println("正在关闭连接...");
qwenTtsRealtime.finish();
completeLatch.get().await();
break;
}
// 如果用户输入为空,跳过
if (text.trim().isEmpty()) {
continue;
}
// 重新初始化倒计时锁存器
completeLatch.set(new CountDownLatch(1));
// 发送文本
qwenTtsRealtime.appendText(text);
qwenTtsRealtime.commit();
// 等待本次合成完成
completeLatch.get().await();
}
// 清理资源
audioPlayer.waitForComplete();
audioPlayer.shutdown();
scanner.close();
System.exit(0);
}
}使用WebSocket API
准备运行环境
根据您的操作系统安装 pyaudio。
macOS
brew install portaudio && pip install pyaudioDebian/Ubuntu
sudo apt-get install python3-pyaudio 或者 pip install pyaudioCentOS
sudo yum install -y portaudio portaudio-devel && pip install pyaudioWindows
pip install pyaudio安装完成后,通过 pip 安装 websocket 相关的依赖:
pip install websocket-client==1.8.0 websockets创建客户端
在本地新建 python 文件,命名为
tts_realtime_client.py并复制以下代码到文件中:选择语音合成模式
Realtime API 支持以下两种模式:
server_commit 模式
客户端仅发送文本。服务端会智能判断文本分段方式与合成时机。适合低延迟且无需手动控制合成节奏的场景,例如 GPS 导航。
commit 模式
客户端先将文本添加至缓冲区,再主动触发服务端合成指定文本。适合需精细控制断句和停顿的场景,例如新闻播报。
server_commit 模式
在
tts_realtime_client.py的同级目录下新建另一个 Python 文件,命名为server_commit.py,并将以下代码复制进文件中:运行
server_commit.py,即可听到 Realtime API 实时生成的音频。commit 模式
在
tts_realtime_client.py的同级目录下新建另一个 python 文件,命名为commit.py,并将以下代码复制进文件中:运行
commit.py,可多次输入要合成的文本。在未输入文本的情况下单击 Enter 键,您将从扬声器听到 Realtime API 返回的音频。
使用复刻音色进行语音合成
以下示例演示了如何在语音合成中使用声音复刻生成的专属音色,实现与原音高度相似的输出效果。这里参考了使用默认音色进行语音合成DashScope SDK的“server commit模式”示例代码,将voice参数替换为复刻生成的专属音色进行语音合成。
关键原则:声音复刻时使用的模型 (
target_model) 必须与后续进行语音合成时使用的模型 (model) 保持一致,否则会导致合成失败。示例使用本地音频文件
voice.mp3进行声音复刻,运行代码时,请注意替换。
Python
# DashScope SDK 版本需要不低于1.23.9,Python版本需3.10及以上
# coding=utf-8
# Installation instructions for pyaudio:
# APPLE Mac OS X
# brew install portaudio
# pip install pyaudio
# Debian/Ubuntu
# sudo apt-get install python-pyaudio python3-pyaudio
# or
# pip install pyaudio
# CentOS
# sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
# python -m pip install pyaudio
import pyaudio
import os
import requests
import base64
import pathlib
import threading
import time
import dashscope # DashScope Python SDK 版本需要不低于1.23.9
from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat
# ======= 常量配置 =======
DEFAULT_TARGET_MODEL = "qwen3-tts-vc-realtime-2025-11-27" # 声音复刻、语音合成要使用相同的模型
DEFAULT_PREFERRED_NAME = "guanyu"
DEFAULT_AUDIO_MIME_TYPE = "audio/mpeg"
VOICE_FILE_PATH = "voice.mp3" # 用于声音复刻的本地音频文件的相对路径
TEXT_TO_SYNTHESIZE = [
'对吧~我就特别喜欢这种超市,',
'尤其是过年的时候',
'去逛超市',
'就会觉得',
'超级超级开心!',
'想买好多好多的东西呢!'
]
def create_voice(file_path: str,
target_model: str = DEFAULT_TARGET_MODEL,
preferred_name: str = DEFAULT_PREFERRED_NAME,
audio_mime_type: str = DEFAULT_AUDIO_MIME_TYPE) -> str:
"""
创建音色,并返回 voice 参数
"""
# 新加坡和北京地域的API Key不同。获取API Key:https://help.aliyun.com/zh/model-studio/get-api-key
# 若没有配置环境变量,请用百炼API Key将下行替换为:api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
raise FileNotFoundError(f"音频文件不存在: {file_path}")
base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"
# 以下为北京地域url,若使用新加坡地域的模型,需将url替换为:https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
url = "https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization"
payload = {
"model": "qwen-voice-enrollment", # 不要修改该值
"input": {
"action": "create",
"target_model": target_model,
"preferred_name": preferred_name,
"audio": {"data": data_uri}
}
}
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
resp = requests.post(url, json=payload, headers=headers)
if resp.status_code != 200:
raise RuntimeError(f"创建 voice 失败: {resp.status_code}, {resp.text}")
try:
return resp.json()["output"]["voice"]
except (KeyError, ValueError) as e:
raise RuntimeError(f"解析 voice 响应失败: {e}")
def init_dashscope_api_key():
"""
初始化 dashscope SDK 的 API key
"""
# 新加坡和北京地域的API Key不同。获取API Key:https://help.aliyun.com/zh/model-studio/get-api-key
# 若没有配置环境变量,请用百炼API Key将下行替换为:dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
# ======= 回调类 =======
class MyCallback(QwenTtsRealtimeCallback):
"""
自定义 TTS 流式回调
"""
def __init__(self):
self.complete_event = threading.Event()
self._player = pyaudio.PyAudio()
self._stream = self._player.open(
format=pyaudio.paInt16, channels=1, rate=24000, output=True
)
def on_open(self) -> None:
print('[TTS] 连接已建立')
def on_close(self, close_status_code, close_msg) -> None:
self._stream.stop_stream()
self._stream.close()
self._player.terminate()
print(f'[TTS] 连接关闭 code={close_status_code}, msg={close_msg}')
def on_event(self, response: dict) -> None:
try:
event_type = response.get('type', '')
if event_type == 'session.created':
print(f'[TTS] 会话开始: {response["session"]["id"]}')
elif event_type == 'response.audio.delta':
audio_data = base64.b64decode(response['delta'])
self._stream.write(audio_data)
elif event_type == 'response.done':
print(f'[TTS] 响应完成, Response ID: {qwen_tts_realtime.get_last_response_id()}')
elif event_type == 'session.finished':
print('[TTS] 会话结束')
self.complete_event.set()
except Exception as e:
print(f'[Error] 处理回调事件异常: {e}')
def wait_for_finished(self):
self.complete_event.wait()
# ======= 主执行逻辑 =======
if __name__ == '__main__':
init_dashscope_api_key()
print('[系统] 初始化 Qwen TTS Realtime ...')
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
model=DEFAULT_TARGET_MODEL,
callback=callback,
# 以下为北京地域url,若使用新加坡地域的模型,需将url替换为:wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime
url='wss://dashscope.aliyuncs.com/api-ws/v1/realtime'
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice=create_voice(VOICE_FILE_PATH), # 将voice参数替换为复刻生成的专属音色
response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
mode='server_commit'
)
for text_chunk in TEXT_TO_SYNTHESIZE:
print(f'[发送文本]: {text_chunk}')
qwen_tts_realtime.append_text(text_chunk)
time.sleep(0.1)
qwen_tts_realtime.finish()
callback.wait_for_finished()
print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, '
f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')
Java
需要导入Gson依赖,若是使用Maven或者Gradle,添加依赖方式如下:
Maven
在pom.xml中添加如下内容:
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.13.1</version>
</dependency>Gradle
在build.gradle中添加如下内容:
// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")// Java SDK 版本需要不低于2.20.9
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import javax.sound.sampled.*;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class Main {
// ===== 常量定义 =====
// 声音复刻、语音合成要使用相同的模型
private static final String TARGET_MODEL = "qwen3-tts-vc-realtime-2025-11-27";
private static final String PREFERRED_NAME = "guanyu";
// 用于声音复刻的本地音频文件的相对路径
private static final String AUDIO_FILE = "voice.mp3";
private static final String AUDIO_MIME_TYPE = "audio/mpeg";
private static String[] textToSynthesize = {
"对吧~我就特别喜欢这种超市",
"尤其是过年的时候",
"去逛超市",
"就会觉得",
"超级超级开心!",
"想买好多好多的东西呢!"
};
// 生成 data URI
public static String toDataUrl(String filePath) throws IOException {
byte[] bytes = Files.readAllBytes(Paths.get(filePath));
String encoded = Base64.getEncoder().encodeToString(bytes);
return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
}
// 调用 API 创建 voice
public static String createVoice() throws Exception {
// 新加坡和北京地域的API Key不同。获取API Key:https://help.aliyun.com/zh/model-studio/get-api-key
// 若没有配置环境变量,请用百炼API Key将下行替换为:String apiKey = "sk-xxx"
String apiKey = System.getenv("DASHSCOPE_API_KEY");
String jsonPayload =
"{"
+ "\"model\": \"qwen-voice-enrollment\"," // 不要修改该值
+ "\"input\": {"
+ "\"action\": \"create\","
+ "\"target_model\": \"" + TARGET_MODEL + "\","
+ "\"preferred_name\": \"" + PREFERRED_NAME + "\","
+ "\"audio\": {"
+ "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
+ "}"
+ "}"
+ "}";
HttpURLConnection con = (HttpURLConnection) new URL("https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization").openConnection();
con.setRequestMethod("POST");
con.setRequestProperty("Authorization", "Bearer " + apiKey);
con.setRequestProperty("Content-Type", "application/json");
con.setDoOutput(true);
try (OutputStream os = con.getOutputStream()) {
os.write(jsonPayload.getBytes(StandardCharsets.UTF_8));
}
int status = con.getResponseCode();
System.out.println("HTTP 状态码: " + status);
try (BufferedReader br = new BufferedReader(
new InputStreamReader(status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(),
StandardCharsets.UTF_8))) {
StringBuilder response = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
response.append(line);
}
System.out.println("返回内容: " + response);
if (status == 200) {
JsonObject jsonObj = new Gson().fromJson(response.toString(), JsonObject.class);
return jsonObj.getAsJsonObject("output").get("voice").getAsString();
}
throw new IOException("创建语音失败: " + status + " - " + response);
}
}
// 实时PCM音频播放器类
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private AudioFormat audioFormat;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
// 构造函数初始化音频格式和音频线路
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
RawAudioBuffer.add(rawAudio);
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
playerThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
byte[] rawAudio = RawAudioBuffer.poll();
if (rawAudio != null) {
try {
playChunk(rawAudio);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
decoderThread.start();
playerThread.start();
}
// 播放一个音频块并阻塞直到播放完成
private void playChunk(byte[] chunk) throws IOException, InterruptedException {
if (chunk == null || chunk.length == 0) return;
int bytesWritten = 0;
while (bytesWritten < chunk.length) {
bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
}
int audioLength = chunk.length / (this.sampleRate*2/1000);
// 等待缓冲区中的音频播放完成
Thread.sleep(audioLength - 10);
}
public void write(String b64Audio) {
b64AudioBuffer.add(b64Audio);
}
public void cancel() {
b64AudioBuffer.clear();
RawAudioBuffer.clear();
}
public void waitForComplete() throws InterruptedException {
while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
Thread.sleep(100);
}
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) {
line.drain();
line.close();
}
}
}
public static void main(String[] args) throws Exception {
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
.model(TARGET_MODEL)
// 以下为北京地域url,若使用新加坡地域的模型,需将url替换为:wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime
.url("wss://dashscope.aliyuncs.com/api-ws/v1/realtime")
// 新加坡和北京地域的API Key不同。获取API Key:https://help.aliyun.com/zh/model-studio/get-api-key
// 若没有配置环境变量,请用百炼API Key将下行替换为:.apikey("sk-xxx")
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
// 创建实时音频播放器实例
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
@Override
public void onOpen() {
// 连接建立时的处理
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
// 会话创建时的处理
break;
case "response.audio.delta":
String recvAudioB64 = message.get("delta").getAsString();
// 实时播放音频
audioPlayer.write(recvAudioB64);
break;
case "response.done":
// 响应完成时的处理
break;
case "session.finished":
// 会话结束时的处理
completeLatch.get().countDown();
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
// 连接关闭时的处理
}
});
qwenTtsRef.set(qwenTtsRealtime);
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice(createVoice()) // 将voice参数替换为复刻生成的专属音色
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("server_commit")
.build();
qwenTtsRealtime.updateSession(config);
for (String text:textToSynthesize) {
qwenTtsRealtime.appendText(text);
Thread.sleep(100);
}
qwenTtsRealtime.finish();
completeLatch.get().await();
// 等待音频播放完成并关闭播放器
audioPlayer.waitForComplete();
audioPlayer.shutdown();
System.exit(0);
}
}更多示例代码请参见github。
交互流程
server_commit 模式
将session.update事件的session.mode 设为"server_commit"以启用该模式,服务端会智能处理文本分段和合成时机。
交互流程如下:
客户端发送
session.update事件,服务端响应session.created与session.updated事件。客户端发送
input_text_buffer.append事件追加文本至服务端缓冲区。服务端智能处理文本分段和合成时机,并返回
response.created、response.output_item.added、response.content_part.added、response.audio.delta事件。服务端响应完成后响应
response.audio.done、response.content_part.done、response.output_item.done、response.done。服务端响应
session.finished来结束会话。
生命周期 | 客户端事件 | 服务器事件 |
会话初始化 | session.update 会话配置 | session.created 会话已创建 session.updated 会话配置已更新 |
用户文本输入 | input_text_buffer.append 添加文本到服务端 input_text_buffer.commit 立即合成服务端缓存的文本 session.finish 通知服务端不再有文本输入 | input_text_buffer.committed 服务端收到提交的文本 |
服务器音频输出 | 无 | response.created 服务端开始生成响应 response.output_item.added 响应时有新的输出内容 response.content_part.added 新的输出内容添加到assistant message response.audio.delta 模型增量生成的音频 response.content_part.done Assistant mesasge 的文本或音频内容流式输出完成 response.output_item.done Assistant mesasge 的整个输出项流式传输完成 response.audio.done 音频生成完成 response.done 响应完成 |
commit 模式
将session.update事件的session.mode 设为"commit"以启用该模式,客户端需主动提交文本缓冲区至服务端来获取响应。
交互流程如下:
客户端发送
session.update事件,服务端响应session.created与session.updated事件。客户端发送
input_text_buffer.append事件追加文本至服务端缓冲区。客户端发送
input_text_buffer.commit事件将缓冲区提交至服务端,并发送session.finish事件表示后续无文本输入。服务端响应
response.created,开始生成响应。服务端响应
response.output_item.added、response.content_part.added、response.audio.delta事件。服务端响应完成后返回
response.audio.done、response.content_part.done、response.output_item.done、response.done。服务端响应
session.finished来结束会话。
生命周期 | 客户端事件 | 服务器事件 |
会话初始化 | session.update 会话配置 | session.created 会话已创建 session.updated 会话配置已更新 |
用户文本输入 | input_text_buffer.append 添加文本到缓冲区 input_text_buffer.commit 提交缓冲区到服务端 input_text_buffer.clear 清除缓冲区 | input_text_buffer.commited 服务器收到提交的文本 |
服务器音频输出 | 无 | response.created 服务端开始生成响应 response.output_item.added 响应时有新的输出内容 response.content_part.added 新的输出内容添加到assistant message response.audio.delta 模型增量生成的音频 response.content_part.done Assistant mesasge 的文本或音频内容流式输出完成 response.output_item.done Assistant mesasge 的整个输出项流式传输完成 response.audio.done 音频生成完成 response.done 响应完成 |
API参考
模型功能特性对比
功能/特性 | qwen3-tts-vc-realtime-2025-11-27 | qwen3-tts-flash-realtime、qwen3-tts-flash-realtime-2025-11-27、qwen3-tts-flash-realtime-2025-09-18 | qwen-tts-realtime、qwen-tts-realtime-latest、qwen-tts-realtime-2025-07-15 |
支持语言 | 中文、英文、西班牙语、俄语、意大利语、法语、韩语、日语、德语、葡萄牙语 | 中文(普通话、北京、上海、四川、南京、陕西、闽南、天津、粤语,因音色而异)、英文、西班牙语、俄语、意大利语、法语、韩语、日语、德语、葡萄牙语 | 中文、英文 |
音频格式 | pcm、wav、mp3、opus | pcm | |
音频采样率 | 16kHz、24kHz、48kHz | 24kHz | |
声音复刻 | |||
SSML | |||
LaTeX | |||
音量调节 | |||
语速调节 | |||
语调(音高)调节 | |||
码率调节 | |||
时间戳 | |||
设置情感 | |||
流式输入 | |||
流式输出 | |||
限流 | 每分钟调用次数(RPM):180 | qwen3-tts-flash-realtime-2025-11-27每分钟调用次数(RPM):180 qwen3-tts-flash-realtime、qwen3-tts-flash-realtime-2025-09-18每分钟调用次数(RPM):10 | 每分钟调用次数(RPM):10 每分钟消耗Token数(TPM):100,000 |
接入方式 | Java/Python/ SDK、WebSocket API | ||
价格 | 中国大陆(北京):1元/万字符 国际(新加坡):0.954101元/万字符 | 中国大陆(北京):
| |
支持的音色
通义千问3-TTS-Flash-Realtime
不同模型支持的音色有所差异,使用时将请求参数voice设置为音色列表中voice参数列对应的值。
通义千问-TTS-Realtime
所有模型使用相同音色,使用时将请求参数voice设置为音色列表中voice参数列对应的值。