语音合成_大模型服务平台百炼(Model Studio)-阿里云帮助中心

语音合成，又称文本转语音（Text-to-Speech，TTS），是将文本转换为自然语音的技术。该技术基于机器学习算法，通过学习大量语音样本，掌握语言的韵律、语调和发音规则，从而在接收到文本输入时生成真人般自然的语音内容。

示例场景和语音

聊天数字人日常闲聊 cosyvoice-v1（longxiaochun）：这这也不知道为啥哈，反正，它刚出来的时候儿叫台湾手抓饼，啊，现在就是可能这个，大陆这边儿都给改良了，整的都像那种，烙的那种，鸡蛋灌饼儿似的啦，啊，有就有那种感觉哈。	电话客服客服提醒 cosyvoice-v1（loongstella）：您好，我们是银行的账务部门，如果你有疑问的话呢，可以拨打我们的客服热线进行咨询，那您的账单呢目前已经逾期了，麻烦您尽快地处理一下好吧？	直播带货推荐T恤 cosyvoice-v1（loongstella）：那我来给大家推荐一款T恤，这款呢真的是超级好看，这个颜色呢很显气质，而且呢也是搭配的绝佳单品，大家可以闭眼入，真的是非常好看，对身材的包容性也很好，不管啥身材的宝宝呢，穿上去都是很好看的。推荐宝宝们下单哦。
有声阅读诗歌朗诵 cosyvoice-v1（longyue）：明月几时有？把酒问青天。不知天上宫阙，今夕是何年。我欲乘风归去，又恐琼楼玉宇，高处不胜寒。起舞弄清影，何似在人间。	语音导航导航播报 cosyvoice-v1（longshuo）：看起来您错过了上一个转弯点，没关系，我们会帮您重新规划路线。请在安全的情况下继续直行。	新闻播报新闻播报 cosyvoice-v1（longfei）：典型案例：二零二零年十一月二十八日，受害人王某在某游戏中看到有人在喊低价出售游戏币。
英文场景秘密相授 cosyvoice-v1（longjielidou）：Listen here, boy. I'm gonna teach you the secret formula on one condition. You can never let it fall into the hands of Plankton.	语音助手请求解释 cosyvoice-v1（longxiaobai）：对不起，我没有理解你的意思。可以再说一遍吗？	视频配音带娃日常 cosyvoice-v1（longlaotie）：各位宝爸宝妈，早上好！今天又是元气满满的一天！嗯？不对，是“战斗”的一天！

选择模型

百炼支持的语音合成模型包括CosyVoice和Sambert。

如果您希望声音更接近真人、有特色或者有生活气息，建议选择CosyVoice。CosyVoice基于新一代生成式语音大模型，能根据上下文预测情绪、语调、韵律等，具有更好的拟人效果。
如果您希望边输入文本边合成语音（比如大模型流式输出文本时实时合成语音，让虚拟数字人说话），请选择CosyVoice。
Sambert不支持流式输入，需一次性输入完整文本才能开始合成语音。
CosyVoice支持流式输入+流式输出，以及非流式输入+流式/非流式输出。Sambert仅支持非流式输入+流式/非流式输出。
如果您有以下特定需求，请选择Sambert。
- 需要合成中英以外的语言（西班牙语、意大利语等）。
- 需要通过SSML标记语言控制声音的断句、停顿、情绪、发音等。
- 需要在输出音频流的同时，输出每个汉字/英文单词在音频中的时间戳，用于驱动虚拟人口型、制作视频配音字幕等。

CosyVoice音色列表

模型名称	voice参数	音色	音频试听	适用场景	语言	默认采样率（Hz）	默认音频格式
cosyvoice-v1	longxiaochun	龙小淳		语音助手、导航播报、聊天数字人	中文+英文	22050	mp3
cosyvoice-v1	longxiaoxia	龙小夏		语音助手、聊天数字人	中文	22050	mp3
cosyvoice-v1	longxiaocheng	龙小诚		语音助手、导航播报、聊天数字人	中文+英文	22050	mp3
cosyvoice-v1	longxiaobai	龙小白		聊天数字人、有声书、语音助手	中文	22050	mp3
cosyvoice-v1	longlaotie	龙老铁		新闻播报、有声书、语音助手、直播带货、导航播报	中文东北口音	22050	mp3
cosyvoice-v1	longshu	龙书		有声书、语音助手、导航播报、新闻播报、智能客服	中文	22050	mp3
cosyvoice-v1	longshuo	龙硕		语音助手、导航播报、新闻播报、客服催收	中文	22050	mp3
cosyvoice-v1	longjing	龙婧		语音助手、导航播报、新闻播报、客服催收	中文	22050	mp3
cosyvoice-v1	longmiao	龙妙		客服催收、导航播报、有声书、语音助手	中文	22050	mp3
cosyvoice-v1	longyue	龙悦		语音助手、诗词朗诵、有声书朗读、导航播报、新闻播报、客服催收	中文	22050	mp3
cosyvoice-v1	longyuan	龙媛		有声书、语音助手、聊天数字人	中文	22050	mp3
cosyvoice-v1	longfei	龙飞		会议播报、新闻播报、有声书	中文	22050	mp3
cosyvoice-v1	longjielidou	龙杰力豆		新闻播报、有声书、聊天助手	中文+英文	22050	mp3
cosyvoice-v1	longtong	龙彤		有声书、导航播报、聊天数字人	中文	22050	mp3
cosyvoice-v1	longxiang	龙祥		新闻播报、有声书、导航播报	中文	22050	mp3
cosyvoice-v1	loongstella	Stella		语音助手、直播带货、导航播报、客服催收、有声书	中文+英文	22050	mp3
cosyvoice-v1	loongbella	Bella		语音助手、客服催收、新闻播报、导航播报	中文	22050	mp3

Sambert音色列表

模型名称	音色	音频试听	时间戳支持	适用场景	特色	语言	默认采样率（Hz）
sambert-zhinan-v1	知楠		是	通用场景	广告男声	中文+英文	48k
sambert-zhiqi-v1	知琪		是	通用场景	温柔女声	中文+英文	48k
sambert-zhichu-v1	知厨		是	新闻播报	舌尖男声	中文+英文	48k
sambert-zhide-v1	知德		是	新闻播报	新闻男声	中文+英文	48k
sambert-zhijia-v1	知佳		是	新闻播报	标准女声	中文+英文	48k
sambert-zhiru-v1	知茹		是	新闻播报	新闻女声	中文+英文	48k
sambert-zhiqian-v1	知倩		是	配音解说、新闻播报	资讯女声	中文+英文	48k
sambert-zhixiang-v1	知祥		是	配音解说	磁性男声	中文+英文	48k
sambert-zhiwei-v1	知薇		是	阅读产品简介	萝莉女声	中文+英文	48k
sambert-zhihao-v1	知浩		是	通用场景	咨询男声	中文+英文	16k
sambert-zhijing-v1	知婧		是	通用场景	严厉女声	中文+英文	16k
sambert-zhiming-v1	知茗		是	通用场景	诙谐男声	中文+英文	16k
sambert-zhimo-v1	知墨		是	通用场景	情感男声	中文+英文	16k
sambert-zhina-v1	知娜		是	通用场景	浙普女声	中文+英文	16k
sambert-zhishu-v1	知树		是	通用场景	资讯男声	中文+英文	16k
sambert-zhistella-v1	知莎		是	通用场景	知性女声	中文+英文	16k
sambert-zhiting-v1	知婷		是	通用场景	电台女声	中文+英文	16k
sambert-zhixiao-v1	知笑		是	通用场景	资讯女声	中文+英文	16k
sambert-zhiya-v1	知雅		是	通用场景	严厉女声	中文+英文	16k
sambert-zhiye-v1	知晔		是	通用场景	青年男声	中文+英文	16k
sambert-zhiying-v1	知颖		是	通用场景	软萌童声	中文+英文	16k
sambert-zhiyuan-v1	知媛		是	通用场景	知心姐姐	中文+英文	16k
sambert-zhiyue-v1	知悦		是	客服	温柔女声	中文+英文	16k
sambert-zhigui-v1	知柜		是	阅读产品简介	直播女声	中文+英文	16k
sambert-zhishuo-v1	知硕		是	数字人	自然男声	中文+英文	16k
sambert-zhimiao-emo-v1	知妙（多情感）		是	阅读产品简介、数字人、直播	多种情感女声	中文+英文	16k
sambert-zhimao-v1	知猫		是	阅读产品简介、配音解说、数字人、直播	直播女声	中文+英文	16k
sambert-zhilun-v1	知伦		是	配音解说	悬疑解说	中文+英文	16k
sambert-zhifei-v1	知飞		是	配音解说	激昂解说	中文+英文	16k
sambert-zhida-v1	知达		是	新闻播报	标准男声	中文+英文	16k
sambert-camila-v1	Camila		否	通用场景	西班牙语女声	西班牙语	16k
sambert-perla-v1	Perla		否	通用场景	意大利语女声	意大利语	16k
sambert-indah-v1	Indah		否	通用场景	印尼语女声	印尼语	16k
sambert-clara-v1	Clara		否	通用场景	法语女声	法语	16k
sambert-hanna-v1	Hanna		否	通用场景	德语女声	德语	16k
sambert-beth-v1	Beth		是	通用场景	咨询女声	美式英文	16k
sambert-betty-v1	Betty		是	通用场景	客服女声	美式英文	16k
sambert-cally-v1	Cally		是	通用场景	自然女声	美式英文	16k
sambert-cindy-v1	Cindy		是	通用场景	对话女声	美式英文	16k
sambert-eva-v1	Eva		是	通用场景	陪伴女声	美式英文	16k
sambert-donna-v1	Donna		是	通用场景	教育女声	美式英文	16k
sambert-brian-v1	Brian		是	通用场景	客服男声	美式英文	16k
sambert-waan-v1	Waan		否	通用场景	泰语女声	泰语	16k

点击查看功能特性对比

	语音合成CosyVoice	语音合成Sambert
接入方式	SDK（只支持Python和Java，参见CosyVoice API详情） Websocket API（参见WebSocket API）	SDK（只支持Python和Java，参见Sambert API详情） Websocket API（参见通过WebSocket连接访问语音合成服务）
SSML	不支持	支持（参见SSML标记语言）
流式输入	支持	不支持
流式输出	支持	支持
合成音频格式	参见CosyVoice API详情，可以通过`format`参数进行设置： pcm wav mp3	参见API详情，可以通过`format`参数进行设置： pcm wav mp3
合成音频采样率	参见CosyVoice API详情，可以通过`format`参数进行设置： 8kHz 16kHz 22.05kHz 24kHz 44.1kHz 48kHz	参见API详情，可以通过`sample_rate`参数进行设置，建议使用模型默认采样率，如果不匹配，服务会进行必要的升降采样处理： 16kHz 48kHz
音量调节	支持，参见CosyVoice API详情，可以通过`volume`参数进行调节	支持，参见API详情，可以通过`volume`参数进行调节
语速调节	支持，参见CosyVoice API详情，可以通过`speech_rate`参数进行调节	支持，参见API详情，可以通过`rate`参数进行调节
语调调节	支持，参见CosyVoice API详情，可以通过`pitch_rate`参数进行调节	支持，参见API详情，可以通过`pitch`参数进行调节
时间戳	不支持	支持，参见API详情，可以通过`word_timestamp_enabled`和`phoneme_timestamp_enabled`参数开启时间戳
语言	因音色而异：中文、英文、中文东北口音	因模型而异：中文、英文、美式英文、意大利语、西班牙语、印尼语、法语、德语、泰语
声音复刻	支持	不支持
待合成文本长度限制	流式输入：每次发送的文本片段长度不超过2000字符，所有文本片段总计长度不超过20万字符非流式输入：文本总长度不超过2000字符字符计算规则： 1个汉字算作2个字符 1个英文字母、1个标点或者1个句子中间的空格均算作1个字符	最高字符限制：1万字符字符计算规则：1个汉字、1个英文字母、1个标点或1个句子中间空格均算作1个字符
单价	2元/万字符根据待合成字符数计费（其中1个汉字算2个字符，英文、标点符号均按照个1个字符计费）	1元/万字符根据待合成字符数计费（其中1个汉字算2个字符，英文、标点符号均按照个1个字符计费） SSML标签内容不计费
免费额度	每主账号每模型每月2000字符	每主账号每模型每月3万字符

快速开始

在线体验

请在语音合成页面选择“语音合成CosyVoice大模型”，单击立即体验。再选择合适的音色，输入自定义文本，在线体验语音合成。

示例代码

您需要已获取API Key并配置API Key到环境变量。如果通过SDK调用，还需要安装DashScope SDK。

CosyVoice

将合成音频保存为文件

Python

import dashscope
from dashscope.audio.tts_v2 import *

# 若没有将API Key配置到环境变量中，需将下面这行代码注释放开，并将apiKey替换为自己的API Key
# dashscope.api_key = "apiKey"
model = "cosyvoice-v1"
voice = "longxiaochun"

synthesizer = SpeechSynthesizer(model=model, voice=voice)
audio = synthesizer.call("今天天气怎么样？")
print('requestId: ', synthesizer.get_last_request_id())
with open('output.mp3', 'wb') as f:
    f.write(audio)

Java

import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisAudioFormat;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;

public class Tts2File {
  private static String model = "cosyvoice-v1";
  private static String voice = "longxiaochun";

  public static void synthesizeAndSaveAudio() {
    SpeechSynthesisParam param =
        SpeechSynthesisParam.builder()
            // 若没有将API Key配置到环境变量中，需将下面这行代码注释放开，并将apiKey替换为自己的API Key
            // .apiKey(apikey)
            .model(model)
            .voice(voice)
            .build();
    SpeechSynthesizer synthesizer = new SpeechSynthesizer(param, null);
    ByteBuffer audio = synthesizer.call("今天天气怎么样？");
    File file = new File("output.mp3");
    System.out.print("requestId: " + synthesizer.getLastRequestId());
    try (FileOutputStream fos = new FileOutputStream(file)) {
      fos.write(audio.array());
    } catch (IOException e) {
      throw new RuntimeException(e);
    }
  }

  public static void main(String[] args) {
    synthesizeAndSaveAudio();
    System.exit(0);
  }
}

将LLM生成的文本实时转成语音并通过扬声器播放

以下代码展示通过本地设备播放通义千问大语言模型（qwen-turbo）实时返回的文本内容。

Python

运行Python示例前，需要通过pip安装第三方音频播放库。

# coding=utf-8
# Installation instructions for pyaudio:
# APPLE Mac OS X
#   brew install portaudio
#   pip install pyaudio
# Debian/Ubuntu
#   sudo apt-get install python-pyaudio python3-pyaudio
#   or
#   pip install pyaudio
# CentOS
#   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
#   python -m pip install pyaudio

import pyaudio
import dashscope
from dashscope.audio.tts_v2 import *


from http import HTTPStatus
from dashscope import Generation

# 若没有将API Key配置到环境变量中，需将下面这行代码注释放开，并将apiKey替换为自己的API Key
# dashscope.api_key = "apiKey"
model = "cosyvoice-v1"
voice = "longxiaochun"


class Callback(ResultCallback):
    _player = None
    _stream = None

    def on_open(self):
        print("websocket is open.")
        self._player = pyaudio.PyAudio()
        self._stream = self._player.open(
            format=pyaudio.paInt16, channels=1, rate=22050, output=True
        )

    def on_complete(self):
        print("speech synthesis task complete successfully.")

    def on_error(self, message: str):
        print(f"speech synthesis task failed, {message}")

    def on_close(self):
        print("websocket is closed.")
        # stop player
        self._stream.stop_stream()
        self._stream.close()
        self._player.terminate()

    def on_event(self, message):
        print(f"recv speech synthsis message {message}")

    def on_data(self, data: bytes) -> None:
        print("audio result length:", len(data))
        self._stream.write(data)


def synthesizer_with_llm():
    callback = Callback()
    synthesizer = SpeechSynthesizer(
        model=model,
        voice=voice,
        format=AudioFormat.PCM_22050HZ_MONO_16BIT,
        callback=callback,
    )

    messages = [{"role": "user", "content": "请介绍一下你自己"}]
    responses = Generation.call(
        model="qwen-turbo",
        messages=messages,
        result_format="message",  # set result format as 'message'
        stream=True,  # enable stream output
        incremental_output=True,  # enable incremental output 
    )
    for response in responses:
        if response.status_code == HTTPStatus.OK:
            print(response.output.choices[0]["message"]["content"], end="")
            synthesizer.streaming_call(response.output.choices[0]["message"]["content"])
        else:
            print(
                "Request id: %s, Status code: %s, error code: %s, error message: %s"
                % (
                    response.request_id,
                    response.status_code,
                    response.code,
                    response.message,
                )
            )
    synthesizer.streaming_complete()
    print('requestId: ', synthesizer.get_last_request_id())


if __name__ == "__main__":
    synthesizer_with_llm()

Java

import com.alibaba.dashscope.aigc.generation.Generation;
import com.alibaba.dashscope.aigc.generation.GenerationParam;
import com.alibaba.dashscope.aigc.generation.GenerationResult;
import com.alibaba.dashscope.audio.tts.SpeechSynthesisResult;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisAudioFormat;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import com.alibaba.dashscope.common.Message;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.Flowable;
import java.nio.ByteBuffer;
import java.util.Arrays;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
import javax.sound.sampled.*;

public class Main {
    private static String model = "cosyvoice-v1";
    private static String voice = "longxiaochun";
    public static void process() throws NoApiKeyException, InputRequiredException {
        // Playback thread
        class PlaybackRunnable implements Runnable {
            // Set the audio format. Please configure according to your actual device,
            // synthesized audio parameters, and platform choice Here it is set to
            // 22050Hz16bit single channel. It is recommended that customers choose other
            // sample rates and formats based on the model sample rate and device
            // compatibility.
            private AudioFormat af = new AudioFormat(22050, 16, 1, true, false);
            private DataLine.Info info = new DataLine.Info(SourceDataLine.class, af);
            private SourceDataLine targetSource = null;
            private AtomicBoolean runFlag = new AtomicBoolean(true);
            private ConcurrentLinkedQueue<ByteBuffer> queue =
                    new ConcurrentLinkedQueue<>();

            // Prepare the player
            public void prepare() throws LineUnavailableException {
                targetSource = (SourceDataLine) AudioSystem.getLine(info);
                targetSource.open(af, 4096);
                targetSource.start();
            }

            public void put(ByteBuffer buffer) {
                queue.add(buffer);
            }

            // Stop playback
            public void stop() {
                runFlag.set(false);
            }

            @Override
            public void run() {
                if (targetSource == null) {
                    return;
                }

                while (runFlag.get()) {
                    if (queue.isEmpty()) {
                        try {
                            Thread.sleep(100);
                        } catch (InterruptedException e) {
                        }
                        continue;
                    }

                    ByteBuffer buffer = queue.poll();
                    if (buffer == null) {
                        continue;
                    }

                    byte[] data = buffer.array();
                    targetSource.write(data, 0, data.length);
                }

                // Play all remaining cache
                if (!queue.isEmpty()) {
                    ByteBuffer buffer = null;
                    while ((buffer = queue.poll()) != null) {
                        byte[] data = buffer.array();
                        targetSource.write(data, 0, data.length);
                    }
                }
                // Release the player
                targetSource.drain();
                targetSource.stop();
                targetSource.close();
            }
        }

        // Create a subclass inheriting from ResultCallback<SpeechSynthesisResult>
        // to implement the callback interface
        class ReactCallback extends ResultCallback<SpeechSynthesisResult> {
            private PlaybackRunnable playbackRunnable = null;

            public ReactCallback(PlaybackRunnable playbackRunnable) {
                this.playbackRunnable = playbackRunnable;
            }

            // Callback when the service side returns the streaming synthesis result
            @Override
            public void onEvent(SpeechSynthesisResult result) {
                // Get the binary data of the streaming result via getAudio
                if (result.getAudioFrame() != null) {
                    // Stream the data to the player
                    playbackRunnable.put(result.getAudioFrame());
                }
            }

            // Callback when the service side completes the synthesis
            @Override
            public void onComplete() {
                // Notify the playback thread to end
                playbackRunnable.stop();
            }

            // Callback when an error occurs
            @Override
            public void onError(Exception e) {
                // Tell the playback thread to end
                System.out.println(e);
                playbackRunnable.stop();
            }
        }

        PlaybackRunnable playbackRunnable = new PlaybackRunnable();
        try {
            playbackRunnable.prepare();
        } catch (LineUnavailableException e) {
            throw new RuntimeException(e);
        }
        Thread playbackThread = new Thread(playbackRunnable);
        // Start the playback thread
        playbackThread.start();
        /*******  Call the Generative AI Model to get streaming text *******/
        // Prepare for the LLM call
        Generation gen = new Generation();
        Message userMsg = Message.builder()
                .role(Role.USER.getValue())
                .content("请介绍一下你自己")
                .build();
        GenerationParam genParam =
                GenerationParam.builder()
                        // 若没有将API Key配置到环境变量中，需将下面这行代码注释放开，并将apiKey替换为自己的API Key
                        // .apiKey("apikey")
                        .model("qwen-turbo")
                        .messages(Arrays.asList(userMsg))
                        .resultFormat(GenerationParam.ResultFormat.MESSAGE)
                        .topP(0.8)
                        .incrementalOutput(true)
                        .build();
        // Prepare the speech synthesis task
        SpeechSynthesisParam param =
                SpeechSynthesisParam.builder()
                        // 若没有将API Key配置到环境变量中，需将下面这行代码注释放开，并将apiKey替换为自己的API Key
                        // .apiKey("apikey")
                        .model(model)
                        .voice(voice)
                        .format(SpeechSynthesisAudioFormat
                                .PCM_22050HZ_MONO_16BIT)
                        .build();
        SpeechSynthesizer synthesizer =
                new SpeechSynthesizer(param, new ReactCallback(playbackRunnable));
        Flowable<GenerationResult> result = gen.streamCall(genParam);
        result.blockingForEach(message -> {
            String text =
                    message.getOutput().getChoices().get(0).getMessage().getContent();
            System.out.println("LLM output：" + text);
            synthesizer.streamingCall(text);
        });
        synthesizer.streamingComplete();
        System.out.print("requestId: " + synthesizer.getLastRequestId());
        try {
            // Wait for the playback thread to finish playing all
            playbackThread.join();
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] args) throws NoApiKeyException, InputRequiredException {
        process();
        System.exit(0);
    }
}

关于API参数说明，请参见CosyVoice API详情。

示例代码展示了如何使用DashScope SDK进行调用，SDK只支持Python和Java，如果您想用其他编程语言进行应用开发，请参见WebSocket API。

更多常用场景的代码示例，请参见GitHub。

Sambert

将合成音频保存为文件

Python

import dashscope
from dashscope.audio.tts import SpeechSynthesizer

# 若没有将API Key配置到环境变量中，需将下面这行代码注释放开，并将apiKey替换为自己的API Key
# dashscope.api_key = "apiKey"
result = SpeechSynthesizer.call(model='sambert-zhichu-v1',
                                # 当text内容的语种发生变化时，请确认model是否匹配。不同model支持不同的语种，详情请参见Sambert音色列表中的“语言”列。
                                text='今天天气怎么样',
                                sample_rate=48000,
                                format='wav')

if result.get_audio_data() is not None:
    with open('output.wav', 'wb') as f:
        f.write(result.get_audio_data())
print(' get response: %s' % (result.get_response()))

Java

import com.alibaba.dashscope.audio.tts.SpeechSynthesizer;
import com.alibaba.dashscope.audio.tts.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.tts.SpeechSynthesisResult;
import com.alibaba.dashscope.audio.tts.SpeechSynthesisAudioFormat;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.common.Status;

import java.io.*;
import java.nio.ByteBuffer;

public class Main {

    public static void SyncAudioDataToFile() {
        SpeechSynthesizer synthesizer = new SpeechSynthesizer();
        SpeechSynthesisParam param = SpeechSynthesisParam.builder()
          // 若没有将API Key配置到环境变量中，需将下面这行代码注释放开，并将apiKey替换为自己的API Key
          // .apiKey(apikey)
          .model("sambert-zhichu-v1")
          // 当text内容的语种发生变化时，请确认model是否匹配。不同model支持不同的语种，详情请参见Sambert音色列表中的“语言”列。
          .text("今天天气怎么样")
          .sampleRate(48000)
          .format(SpeechSynthesisAudioFormat.WAV)
          .build();

        File file = new File("output.wav");
        // 调用call方法，传入param参数，获取合成音频
        ByteBuffer audio = synthesizer.call(param);
        try (FileOutputStream fos = new FileOutputStream(file)) {
            fos.write(audio.array());
            System.out.println("synthesis done!");
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] args) {
        SyncAudioDataToFile();
        System.exit(0);
    }
}

将合成的音频通过扬声器播放

合成语音后，通过本地设备播放实时返回的音频内容。

运行Python示例前，需要通过pip安装第三方音频播放库。

Python

# coding=utf-8
#
# Installation instructions for pyaudio:
# APPLE Mac OS X
#   brew install portaudio
#   pip install pyaudio
# Debian/Ubuntu
#   sudo apt-get install python-pyaudio python3-pyaudio
#   or
#   pip install pyaudio
# CentOS
#   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
#   python -m pip install pyaudio

import dashscope
import sys
import pyaudio
from dashscope.api_entities.dashscope_response import SpeechSynthesisResponse
from dashscope.audio.tts import ResultCallback, SpeechSynthesizer, SpeechSynthesisResult

# 若没有将API Key配置到环境变量中，需将下面这行代码注释放开，并将apiKey替换为自己的API Key
# dashscope.api_key = "apiKey"

class Callback(ResultCallback):
    _player = None
    _stream = None

    def on_open(self):
        print('Speech synthesizer is opened.')
        self._player = pyaudio.PyAudio()
        self._stream = self._player.open(
            format=pyaudio.paInt16,
            channels=1,
            rate=48000,
            output=True)

    def on_complete(self):
        print('Speech synthesizer is completed.')

    def on_error(self, response: SpeechSynthesisResponse):
        print('Speech synthesizer failed, response is %s' % (str(response)))

    def on_close(self):
        print('Speech synthesizer is closed.')
        self._stream.stop_stream()
        self._stream.close()
        self._player.terminate()

    def on_event(self, result: SpeechSynthesisResult):
        if result.get_audio_frame() is not None:
            print('audio result length:', sys.getsizeof(result.get_audio_frame()))
            self._stream.write(result.get_audio_frame())

        if result.get_timestamp() is not None:
            print('timestamp result:', str(result.get_timestamp()))

callback = Callback()
SpeechSynthesizer.call(model='sambert-zhichu-v1',
                       text='今天天气怎么样',
                       sample_rate=48000,
                       format='pcm',
                       callback=callback)

Java

import com.alibaba.dashscope.audio.tts.SpeechSynthesizer;
import com.alibaba.dashscope.audio.tts.SpeechSynthesisAudioFormat;
import com.alibaba.dashscope.audio.tts.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.tts.SpeechSynthesisResult;
import com.alibaba.dashscope.common.ResultCallback;

import java.nio.ByteBuffer;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicBoolean;

import javax.sound.sampled.*;

public class Main {

    public static void StreamAuidoDataToSpeaker() {
        CountDownLatch latch = new CountDownLatch(1);
        SpeechSynthesizer synthesizer = new SpeechSynthesizer();
        SpeechSynthesisParam param =
                SpeechSynthesisParam.builder()
                        // 若没有将API Key配置到环境变量中，需将下面这行代码注释放开，并将apiKey替换为自己的API Key
                        // .apiKey("apikey")
                        .text("今天天气怎么样")
                        .model("sambert-zhichu-v1")
                        .sampleRate(48000)
                        .format(SpeechSynthesisAudioFormat.PCM) // 流式合成使用PCM或者MP3
                        .build();

        // 播放线程
        class PlaybackRunnable implements Runnable {
            // 设置音频格式，请根据实际自身设备，合成音频参数和平台选择配置
            // 这里选择48k16bit单通道，建议客户根据选用的模型采样率情况和自身设备兼容性选择其他采样率和格式
            private AudioFormat af = new AudioFormat(48000, 16, 1, true, false);
            private DataLine.Info info = new DataLine.Info(SourceDataLine.class, af);
            private SourceDataLine targetSource = null;
            private AtomicBoolean runFlag = new AtomicBoolean(true);
            private ConcurrentLinkedQueue<ByteBuffer> queue = new ConcurrentLinkedQueue<>();

            // 准备播放器
            public void prepare() throws LineUnavailableException {
                targetSource = (SourceDataLine) AudioSystem.getLine(info);
                targetSource.open(af, 4096);
                targetSource.start();
            }

            public void put(ByteBuffer buffer) {
                queue.add(buffer);
            }

            // 停止播放
            public void stop() {
                runFlag.set(false);
            }

            @Override
            public void run() {
                if (targetSource == null) {
                    return;
                }

                while (runFlag.get()) {
                    if (queue.isEmpty()) {
                        try {
                            Thread.sleep(100);
                        } catch (InterruptedException e) {

                        }
                        continue;
                    }

                    ByteBuffer buffer = queue.poll();
                    if (buffer == null) {
                        continue;
                    }

                    byte[] data = buffer.array();
                    targetSource.write(data, 0, data.length);
                }

                // 将缓存全部播放完
                if (!queue.isEmpty()) {
                    ByteBuffer buffer = null;
                    while ((buffer = queue.poll()) != null) {
                        byte[] data = buffer.array();
                        targetSource.write(data, 0, data.length);
                    }
                }

                // 释放播放器
                targetSource.drain();
                targetSource.stop();
                targetSource.close();
            }
        }

        // 创建一个继承自ResultCallback<SpeechSynthesisResult>的子类来实现回调接口
        class ReactCallback extends ResultCallback<SpeechSynthesisResult> {
            private PlaybackRunnable playbackRunnable = null;

            public ReactCallback(PlaybackRunnable playbackRunnable) {
                this.playbackRunnable = playbackRunnable;
            }

            // 当服务侧返回流式合成结果后回调
            @Override
            public void onEvent(SpeechSynthesisResult result) {
                // 通过getAudio获取流式结果二进制数据
                if (result.getAudioFrame() != null) {
                    // 将数据流式推给播放器
                    playbackRunnable.put(result.getAudioFrame());
                }
            }

            // 当服务侧完成合成后回调
            @Override
            public void onComplete() {
                // 告知播放线程结束
                playbackRunnable.stop();
                latch.countDown();
            }

            // 当出现错误时回调
            @Override
            public void onError(Exception e) {
                // 告诉播放线程结束
                System.out.println(e);
                playbackRunnable.stop();
                latch.countDown();
            }
        }

        PlaybackRunnable playbackRunnable = new PlaybackRunnable();
        try {
            playbackRunnable.prepare();
        } catch (LineUnavailableException e) {
            throw new RuntimeException(e);
        }
        Thread playbackThread = new Thread(playbackRunnable);
        // 启动播放线程
        playbackThread.start();
        // 带Callback的call方法将不会阻塞当前线程
        synthesizer.call(param, new ReactCallback(playbackRunnable));
        // 等待合成完成
        try {
            latch.await();
            // 等待播放线程全部播放完
            playbackThread.join();
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] args) {
        StreamAuidoDataToSpeaker();
        System.exit(0);
    }
}

关于API参数说明，请参见Sambert API详情。

示例代码展示了如何使用DashScope SDK进行调用，SDK只支持Python和Java，如果您想用其他编程语言进行应用开发，请参见通过WebSocket连接访问语音合成服务。

常见问题

1. 语音合成的发音读错怎么办？多音字如何控制发音？

您可以尝试：

将多音字替换成同音的其他汉字快速解决发音问题。
使用SSML标记语言：若您选择的是Sambert模型，可使用SSML标记语言对语音发音进行精确控制。CosyVoice不支持SSML标记语言。
在GitHub中扫描钉钉群的二维码，加入钉钉群，联系产品研发进行优化。

2. 限流

语音合成CosyVoice：

模型名称	提交作业接口RPS限制
cosyvoice-v1	3

CosyVoice声音复刻：

模型名称	提交作业接口RPS限制
cosyvoice-v1	3

语音合成Sambert：

模型服务	提交作业接口RPS限制
Sambert系列模型	20