API reference

更新时间:
复制 MD 格式

Speech Synthesis converts input text into binary audio data. This topic provides general information that applies to the SDK documents in this directory.

← Back to the Speech Synthesis product page

Billing and concurrency limits

Features

  • Supports PCM, WAV, and MP3 audio encoding formats.

  • Supports adjustments for speech rate, pitch, and volume.

  • Supports various voices for different scenarios and styles. For more information, see Voice list.

  • Synthesizes up to 300 characters at a time. A Chinese character, an English letter, a punctuation mark, or a space between words is counted as one character. Text that exceeds 300 characters is truncated.

  • Supports only UTF-8 encoded text.

  • Supports multi-emotion voices. For more information, see the <emotion> tag in SSML markup language. Tags are not counted as characters.

Note
  • Word-level phoneme boundaries: When Speech Synthesis outputs audio, it can also return the timestamp for each Chinese character or English word in the audio. You can use this timestamp information to drive the lip movements of a virtual human or create captions for video dubbing. For more information, see Speech Synthesis timestamp feature.

  • For information about voices for literary scenarios, see API reference.

  • To use the Android or iOS SDK, see API reference for mobile devices.

Intelligent access to the nearest region

Speech Synthesis supports intelligent access to the nearest region using the domain name nls-gateway.aliyuncs.com.

We recommend that end users connect to the nearest region. The system automatically resolves the domain name to the server in the nearest region based on the client's location. For example, a request from Beijing is routed to a server in the China (Beijing) region, which is equivalent to using the domain name nls-gateway-cn-beijing.aliyuncs.com.

Endpoint

Access type

Description

URL

Public access (defaults to the China (Shanghai) region)

All servers can use the public access URL. The public access URL is set by default in the SDK.

  • China (Shanghai): wss://nls-gateway-cn-shanghai.aliyuncs.com/ws/v1

  • China (Beijing): wss://nls-gateway-cn-beijing.aliyuncs.com/ws/v1

  • China (Shenzhen): wss://nls-gateway-cn-shenzhen.aliyuncs.com/ws/v1

Internal access from an ECS instance

If you use an Alibaba Cloud ECS instance in the China (Shanghai), China (Beijing), or China (Shenzhen) region, you can use the internal access URL. ECS instances in the classic network cannot access AnyTunnel. This means you cannot access Voice Service over the internal network. To use AnyTunnel, create a VPC and access the service from within the VPC.

Note

  • Internal access does not incur public data transfer costs for the ECS instance.

  • For more information about network types, see Network types.

  • China (Shanghai): ws://nls-gateway-cn-shanghai-internal.aliyuncs.com:80/ws/v1

  • China (Beijing): ws://nls-gateway-cn-beijing-internal.aliyuncs.com:80/ws/v1

  • China (Shenzhen): ws://nls-gateway-cn-shenzhen-internal.aliyuncs.com:80/ws/v1

Interaction flow

image
Note
  • The preceding figure shows the interaction flow for WebSocket. For the interaction flow of the RESTful API, see RESTful API.

  • In addition to the audio stream, the server response header includes the `task_id` parameter, which is the unique identifier for the request.

  • If you want to play the audio stream returned by the server in real time, use an audio player that supports stream playback, such as FFmpeg, PyAudio (Python), AudioFormat (Java), and MediaSource (JavaScript).

  1. Authentication

    When the client establishes a WebSocket connection with the server, it uses a token for authentication. For more information about how to obtain a token, see Obtain a token.

  2. Start synthesis

    The client sends a speech synthesis request. The following table describes the request parameters.

    Parameter

    Type

    Required

    Description

    appkey

    String

    Yes

    The AppKey of the project that you created in the console.

    text

    String

    Yes

    The text to synthesize. The text must be UTF-8 encoded and cannot exceed 300 characters. Add a space between English words.

    Note

    To call the multi-emotion feature of a voice, add the ssml-emotion tag to the text. For more information, see <emotion>.

    If you use the <emotion> tag for a voice that does not support multiple emotions, the `Illegal ssml text` error is reported.

    voice

    String

    No

    The voice that is used for synthesis. Default value: xiaoyun.

    format

    String

    No

    The audio encoding format. Supported formats are .pcm, .wav, and .mp3. Default value: pcm.

    sample_rate

    Integer

    No

    The audio sample rate. Default value: 16000 Hz.

    volume

    Integer

    No

    The volume. Valid values: 0 to 100. Default value: 50.

    speech_rate

    Integer

    No

    The speech rate. Valid values: -500 to 500. Default value: 0.

    The value range [-500, 0, 500] corresponds to a speed multiplier range of [0.5, 1.0, 2.0].

    1. -500 indicates 0.5 times the default speed.

    2. 0 indicates the default speed (1.0×). The default speed is the synthesis speed of the model output, which varies slightly for each voice, at approximately four characters per second.

    3. 500 indicates 2.0 times the default speed.

    The calculation method is as follows:

    1. 0.8× speed: (1 - 1/0.8) / 0.002 = -125

    2. 1.2× speed: (1 - 1/1.2) / 0.001 = 166

    Note
    1. For speeds less than 1.0×, a coefficient of 0.002 is used.

    2. For speeds greater than 1.0×, a coefficient of 0.001 is used.

    The actual algorithm result is an approximate value.

    pitch_rate

    Integer

    No

    The pitch. Valid values: -500 to 500. Default value: 0.

    enable_subtitle

    Boolean

    No

    Enables word-level timestamps. For more information, see Speech Synthesis timestamp feature.

  3. Receive synthesized data

    The server returns the synthesized audio as binary data. The SDK receives and processes the binary data.

  4. End synthesis

    After the synthesis is complete, the server sends a SynthesisCompleted event notification. The following code provides an example.

    {
        "header": {
            "message_id": "05450bf69c53413f8d88aed1ee60****",
            "task_id": "640bc797bb684bd6960185651307****",
            "namespace": "SpeechSynthesizer",
            "name": "SynthesisCompleted",
            "status": 20000000,
            "status_message": "GATEWAY|SUCCESS|Success."
        }
    }
    Note

    The examples in the documentation save the synthesized audio to a file. To play the audio in real time with low latency, we recommend that you use stream playback. This lets you play the audio data as you receive it, which reduces delay.

  5. Handle synthesis failures

    If the synthesis task fails, you will receive a TaskFailed notification. The following code provides an example. After you receive a TaskFailed notification, the underlying connection is closed.

    {
       "header":{
          "namespace":"Default",
          "name":"TaskFailed",
          "status":41020001,
          "message_id":"62c126f7d9b340deb82b5b7eaca0****",
          "task_id":"4552df26d1f547aab9a2c4a94678****",
          "status_text":"TTS:TtsClientError:[tts]Engine return error code: 418"
       }
    }

Voice list

Name

voice parameter value

Type

Scenario

Supported languages

Supported sample rates (Hz)

Supports word/sentence-level timestamps

Supports retroflex finals

Voice quality

Abin

abin

Mandarin with Cantonese accent

Conversational digital human

Supports Chinese and Chinese-English mixed scenarios

8K/16K/24K/48K

No

No

Standard Edition

Zhixiaobai

zhixiaobai

Mandarin female voice

Conversational digital human

Supports Chinese and Chinese-English mixed scenarios

8K/16K/24K/48K

No

Yes

Standard Edition

Zhixiaoxia

zhixiaoxia

Mandarin female voice

Conversational digital human

Supports Chinese and Chinese-English mixed scenarios

8K/16K/24K/48K

No

Yes

Standard Edition

Zhixiaomei

zhixiaomei

Mandarin female voice

Live streaming digital human

Supports Chinese and Chinese-English mixed scenarios

8K/16K/24K

Yes

Yes

Standard Edition

Zhigui

zhigui

Mandarin female voice

Live streaming digital human

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Standard Edition

Zhishuo

zhishuo

Mandarin male voice

Customer service digital human

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Standard Edition

Aixia

aixia

Mandarin female voice

Customer service digital human

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Standard Edition

Cally

cally

American English female voice

Spoken English conversational digital human

English-only scenarios

8K/16K

Yes

Yes

Standard Edition

Zhifeng_emo

zhifeng_emo

Multi-emotion male voice

General scenarios

Chinese and Chinese-English mixed scenarios

8K/16K/24K

Yes

Yes

Standard Edition

Zhibing_emo

zhibing_emo

Multi-emotion male voice

General scenarios

Chinese-only scenarios

8K/16K/24K

Yes

Yes

Standard Edition

Zhimiao_emo

zhimiao_emo

Multi-emotion female voice

Chinese-English scenarios

Chinese and English scenarios

8K/16K

Yes

Yes

Standard Edition

Zhimi_emo

zhimi_emo

Multi-emotion female voice

General scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

No

Standard Edition

Zhiyan_emo

zhiyan_emo

Multi-emotion female voice

General scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

No

Standard Edition

Zhibei_emo

zhibei_emo

Multi-emotion child voice

General scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

No

Standard Edition

Zhitian_emo

zhitian_emo

Multi-emotion female voice

General scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

No

Standard Edition

Xiaoyun

xiaoyun

Standard female voice

General scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

No

No

Lite Edition

Xiaogang

xiaogang

Standard male voice

General scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

No

No

Lite Edition

Ruoxi

ruoxi

Gentle female voice

General scenarios

Chinese and Chinese-English mixed scenarios

8K/16K/24K

No

No

Standard Edition

Siqi

siqi

Gentle female voice

General scenarios

Chinese and Chinese-English mixed scenarios

8K/16K/24K

Yes

No

Standard Edition

Sijia

sijia

Standard female voice

General scenarios

Chinese and Chinese-English mixed scenarios

8K/16K/24K

No

No

Standard Edition

Sicheng

sicheng

Standard male voice

General scenarios

Chinese and Chinese-English mixed scenarios

8K/16K/24K

Yes

No

Standard Edition

Aiqi

aiqi

Gentle female voice

General scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

No

Standard Edition

Aijia

aijia

Standard female voice

General scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

No

Standard Edition

Aicheng

aicheng

Standard male voice

General scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

No

Standard Edition

Aida

aida

Standard male voice

General scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

No

Standard Edition

Ninger

ninger

Standard female voice

General scenarios

Chinese-only scenarios

8K/16K/24K

No

No

Standard Edition

Ruilin

ruilin

Standard female voice

General scenarios

Chinese-only scenarios

8K/16K/24K

No

No

Standard Edition

Siyue

siyue

Gentle female voice

Customer service scenarios

Chinese and Chinese-English mixed scenarios

8K/16K/24K

Yes

No

Standard Edition

Aiya

aiya

Stern female voice

Customer service scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

No

Standard Edition

Aimei

aimei

Sweet female voice

Customer service scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

No

Standard Edition

Aiyu

aiyu

Natural female voice

Customer service scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

No

Standard Edition

Aiyue

aiyue

Gentle female voice

Customer service scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

No

Standard Edition

Aijing

aijing

Stern female voice

Customer service scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

No

Standard Edition

Xiaomei

xiaomei

Sweet female voice

Customer service scenarios

Chinese and Chinese-English mixed scenarios

8K/16K/24K

No

No

Standard Edition

Aina

aina

Female voice with Zhejiang accent

Customer service scenarios

Chinese-only scenarios

8K/16K

Yes

No

Standard Edition

Yina

yina

Female voice with Zhejiang accent

Customer service scenarios

Chinese-only scenarios

8K/16K/24K

No

No

Standard Edition

Sijing

sijing

Stern female voice

Customer service scenarios

Chinese-only scenarios

8K/16K/24K

Yes

No

Standard Edition

Sitong

sitong

Child voice

Child voice scenarios

Chinese-only scenarios

8K/16K/24K

No

No

Standard Edition

Xiaobei

xiaobei

Lolita female voice

Child voice scenarios

Chinese-only scenarios

8K/16K/24K

Yes

No

Standard Edition

Aitong

aitong

Child voice

Child voice scenarios

Chinese-only scenarios

8K/16K

Yes

No

Standard Edition

Aiwei

aiwei

Lolita female voice

Child voice scenarios

Chinese-only scenarios

8K/16K

Yes

No

Standard Edition

Aibao

aibao

Lolita female voice

Child voice scenarios

Chinese-only scenarios

8K/16K

Yes

No

Standard Edition

Harry

harry

British English male voice

English scenarios

English scenarios

8K/16K

No

No

Standard Edition

Abby

abby

American English female voice

English scenarios

English scenarios

8K/16K

Yes

No

Standard Edition

Andy

andy

American English male voice

English scenarios

English scenarios

8K/16K

Yes

No

Standard Edition

Eric

eric

British English male voice

English scenarios

English scenarios

8K/16K

Yes

No

Standard Edition

Emily

emily

British English female voice

English scenarios

English scenarios

8K/16K

Yes

No

Standard Edition

Luna

luna

British English female voice

English scenarios

English scenarios

8K/16K

Yes

No

Standard Edition

Luca

luca

British English male voice

English scenarios

English scenarios

8K/16K

Yes

No

Standard Edition

Wendy

wendy

British English female voice

English scenarios

English scenarios

8K/16K/24K

No

No

Standard Edition

William

william

British English male voice

English scenarios

English scenarios

8K/16K/24K

No

No

Standard Edition

Olivia

olivia

British English female voice

English scenarios

English scenarios

8K/16K/24K

No

No

Standard Edition

Shanshan

shanshan

Cantonese female voice

Dialect scenarios

Standard Cantonese (Simplified) and Cantonese-English mixed scenarios

8K/16K/24K

No

No

Standard Edition

Aiyuan

aiyuan

Confidante

Literary scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Premium Edition

Aiying

aiying

Cute and soft child voice

Literary scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Premium Edition

Aixiang

aixiang

Magnetic male voice

Literary scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Premium Edition

Aimo

aimo

Emotional male voice

Literary scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Premium Edition

Aiye

aiye

Young male voice

Literary scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Premium Edition

Aiting

aiting

Radio-style female voice

Literary scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Premium Edition

Aifan

aifan

Emotional female voice

Literary scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Premium Edition

Lydia

lydia

Bilingual English-Chinese female voice

English scenarios

English and English-Chinese mixed scenarios

8K/16K

Yes

No

Standard Edition

Xiaoyue

chuangirl

Sichuanese female voice

Dialect scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

No

No

Standard Edition

Aishuo

aishuo

Natural male voice

Customer service scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

No

Standard Edition

Qingqing

qingqing

Taiwan (China) Mandarin female voice

Dialect scenarios

Chinese-only scenarios

8K/16K

No

No

Standard Edition

Cuijie

cuijie

Northeastern Mandarin female voice

Dialect scenarios

Chinese-only scenarios

8K/16K

Yes

Yes

Standard Edition

Xiaoze

xiaoze

Male voice with a strong Hunan accent

Dialect scenarios

Chinese-only scenarios

8K/16K

No

No

Standard Edition

Ainan

ainan

Advertisement-style male voice

Literary scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Premium Edition

Aihao

aihao

News-style male voice

Literary scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Premium Edition

Aiming

aiming

Humorous male voice

Literary scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Premium Edition

Aixiao

aixiao

News-style female voice

Literary scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Premium Edition

Aichu

aichu

Food documentary-style male voice

Literary scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Premium Edition

Aiqian

aiqian

News-style female voice

Literary scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Premium Edition

Zhi Xiang

tomoka

Japanese female voice

Multilingual scenarios

Japanese-only scenarios

8K/16K

Yes

No

Standard Edition

Tomoya

tomoya

Japanese male voice

Multilingual scenarios

Japanese-only scenarios

8K/16K

Yes

No

Standard Edition

Annie

annie

American English female voice

English scenarios

English-only scenarios

8K/16K

Yes

No

Standard Edition

Aishu

aishu

News-style male voice

Literary scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Premium Edition

Airu

airu

Newscast female voice

Literary scenarios

Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Premium Edition

Jiajia

jiajia

Cantonese female voice

Dialect scenarios

Standard Cantonese (Simplified) and Cantonese-English mixed scenarios

8K/16K

Yes

No

Standard Edition

Indah

indah

Indonesian female voice

Multilingual scenarios

Indonesian-only scenarios

8K/16K

No

No

Standard Edition

Peach

taozi

Cantonese female voice

Dialect scenarios

Supports Standard Cantonese (Simplified) and Cantonese-English mixed scenarios

8K/16K

Yes

No

Standard Edition

Guijie

guijie

Friendly female voice

General scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Standard Edition

Stella

stella

Intellectual female voice

General scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Standard Edition

Stanley

stanley

Calm male voice

General scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Standard Edition

Kenny

kenny

Calm male voice

General scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Standard Edition

Rosa

rosa

Natural female voice

General scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Standard Edition

Farah

farah

Malay female voice

Multilingual scenarios

Malay-only scenarios

8K/16K

No

No

Standard Edition

Mashu

mashu

Children's drama male voice

General scenarios

General scenarios

8K/16K

Yes

No

Standard Edition

Zhiqi

zhiqi

Gentle female voice

Ultra-high definition scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K/24K/48K

Yes

No

Premium Edition

Zhichu

zhichu

Food documentary-style male voice

Ultra-high definition scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K/24K/48K

Yes

Yes

Premium Edition

Xiaoxian

xiaoxian

Friendly female voice

Live streaming scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Standard Edition

Yuer

yuer

Children's drama female voice

General scenarios

Chinese-only scenarios

8K/16K

Yes

No

Standard Edition

Maoxiaomei

maoxiaomei

Energetic female voice

Live streaming scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Standard Edition

Zhixiang

zhixiang

Magnetic male voice

Ultra-high definition scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K/24K/48K

Yes

No

Premium Edition

Zhijia

zhijia

Standard female voice

Ultra-high definition scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K/24K/48K

Yes

No

Premium Edition

Zhinan

zhinan

Advertisement-style male voice

Ultra-high definition scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K/24K/48K

Yes

No

Premium Edition

Zhiqian

zhiqian

News-style female voice

Ultra-high definition scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K/24K/48K

Yes

No

Premium Edition

Zhiru

zhiru

Newscast female voice

Ultra-high definition scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K/24K/48K

Yes

No

Premium Edition

Zhide

zhide

Newscast male voice

Ultra-high definition scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K/24K/48K

Yes

No

Premium Edition

Zhifei

zhifei

Passionate commentary voice

Ultra-high definition scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

No

Premium Edition

Aifei

aifei

Passionate commentary voice

Live streaming scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Standard Edition

Yaqun

yaqun

Store broadcast voice

Live streaming scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Standard Edition

Qiaowei

qiaowei

Store broadcast voice

Live streaming scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Standard Edition

Dahu

dahu

Northeastern Mandarin male voice

Dialect scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Standard Edition

ava

ava

American girl

English scenarios

English-only scenarios

8K/16K

Yes

No

Standard Edition

Zhilun

zhilun

Suspense commentary voice

Ultra-high definition scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

No

Premium Edition

Ailun

ailun

Suspense commentary voice

Live streaming scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

Yes

Standard Edition

Jielidou

jielidou

Soothing child voice

Child voice scenarios

Chinese-only scenarios

8K/16K

Yes

Yes

Standard Edition

Zhiwei

zhiwei

Lolita female voice

Ultra-high definition scenarios

Chinese-only scenarios

8K/16K/24K/48K

Yes

No

Premium Edition

Laotie

laotie

Buddy from Northeast China

Live streaming scenarios

Chinese-only scenarios

8K/16K

Yes

Yes

Standard Edition

Laomei

laomei

Female Hawking Voice

Live streaming scenarios

Chinese-only scenarios

8K/16K

Yes

Yes

Standard Edition

Aikan

aikan

Tianjin dialect male voice

Dialect scenarios

Chinese-only scenarios

8K/16K

Yes

Yes

Standard Edition

Tala

tala

Filipino female voice

Multilingual scenarios

Filipino-only scenarios

8K/16K

No

No

Standard Edition

Zhitian

zhitian

Sweet female voice

General scenarios

Supports Chinese and Chinese-English mixed scenarios

8K/16K

Yes

No

Premium Edition

Zhiqing

zhiqing

A girl speaking a dialect from Taiwan (China)

Dialect scenarios

Chinese-only scenarios

8K/16K

Yes

No

Premium Edition

Tien

tien

Vietnamese female voice

Multilingual scenarios

Vietnamese-only scenarios

8K/16K

No

No

Standard Edition

Becca

becca

American English customer service female voice

American English

English-only scenarios

8K/16K

No

No

Standard Edition

Kyong

Kyong

Korean female voice

Korean scenarios

Korean

8K/16K

No

No

Standard Edition

masha

masha

Russian female voice

Russian scenarios

Russian

8K/16K

No

No

Standard Edition

camila

camila

Spanish female voice

Spanish scenarios

Spanish

8k/16k

No

No

Standard Edition

perla

perla

Italian female voice

Italian Scenario

Italian

8k/16k

No

No

Standard Edition

Zhimao

zhimao

Mandarin female voice

Live streaming

Chinese

8k/16k

Yes

No

Standard Edition

Zhiyuan

zhiyuan

Mandarin female voice

General scenarios

Chinese

8k/16k

Yes

No

Standard Edition

Zhiya

zhiya

Mandarin female voice

Customer service

Chinese

8k/16k

Yes

No

Standard Edition

Zhiyue

zhiyue

Mandarin female voice

General scenarios

Chinese

8k/16k

Yes

No

Standard Edition

Zhida

zhida

Mandarin male voice

General scenarios

Chinese and Chinese-English mixed scenarios

8k/16k

Yes

No

Standard Edition

Zhistella

zhistella

Mandarin female voice

General scenarios

Chinese

8k/16k

Yes

No

Standard Edition

Kelly

kelly

Hong Kong Cantonese female voice

Dialect scenarios

Hong Kong Cantonese

8k/16k

Yes

No

Standard Edition

clara

clara

French female voice

General scenarios

French

8k/16k

No

No

Standard Edition

hanna

hanna

German female voice

General scenarios

German

8k/16k

No

No

Standard Edition

waan

waan

Thai female voice

General scenarios

Thai

8k/16k

No

No

Standard Edition

betty

betty

American English female voice

General scenarios

American English

8k/16k

Yes

No

Standard Edition

beth

beth

American English female voice

General scenarios

American English

8k/16k

Yes

No

Standard Edition

cindy

cindy

American English female voice

General scenarios

American English

8k/16k

Yes

No

Standard Edition

donna

donna

American English female voice

General scenarios

American English

8k/16k

Yes

No

Standard Edition

eva

eva

American English female voice

General scenario

American English

8k/16k

Yes

No

Standard Edition

brian

brian

American English male voice

General scenarios

American English

8k/16k

Yes

No

Standard Edition

david

david

American English male voice

General scenarios

American English

8k/16k/24k

Yes

No

Standard Edition

abby_ecmix

abby_ecmix

American English female voice

General scenarios

English and English-Chinese mixed scenarios

8k/16k/24k

Yes

No

Standard Edition

annie_ecmix

annie_ecmix

American English female voice

General scenarios

English and English-Chinese mixed scenarios

8k/16k/24k

Yes

No

Standard Edition

andy_ecmix

andy_ecmix

American English male voice

General scenarios

English and English-Chinese mixed scenarios

8k/16k/24k

Yes

No

Standard Edition

ava_ecmix

ava_ecmix

American English female voice

General scenarios

English and English-Chinese mixed scenarios

8k/16k/24k

Yes

No

Standard Edition

betty_ecmix

betty_ecmix

American English female voice

General scenarios

English and English-Chinese mixed scenarios

8k/16k/24k

Yes

No

Standard Edition

beth_ecmix

beth_ecmix

American English female voice

General scenarios

English and English-Chinese mixed scenarios

8k/16k/24k

Yes

No

Standard Edition

brian_ecmix

brian_ecmix

American English male voice

General scenarios

English and English-Chinese mixed scenarios

8k/16k/24k

Yes

No

Standard Edition

cindy_ecmix

cindy_ecmix

American English female voice

General scenarios

English and English-Chinese mixed scenarios

8k/16k/24k

Yes

No

Standard Edition

cally_ecmix

cally_ecmix

American English female voice

General scenarios

English and English-Chinese mixed scenarios

8k/16k/24k

Yes

No

Standard Edition

donna_ecmix

donna_ecmix

American English female voice

General scenarios

English and English-Chinese mixed scenarios

8k/16k/24k

Yes

No

Standard Edition

david_ecmix

david_ecmix

American English male voice

General scenarios

English and English-Chinese mixed scenarios

8k/16k/24k

Yes

No

Standard Edition

eva_ecmix

eva_ecmix

American English female voice

General scenarios

English and English-Chinese mixed scenarios

8k/16k/24k

Yes

No

Standard Edition

Support for multi-emotion voices

Only multi-emotion voice models support emotion selection. The following table lists the supported emotions. The supported emotion categories vary by voice. The main categories include the following: neutral, happy, angry, sad, fear, hate, surprise, arousal, serious, disgust, jealousy, embarrassed, frustrated, affectionate, gentle, newscast, customer-service, story, and living.

Voice name

voice parameter value

Emotion category

Zhifeng_emo

zhifeng_emo

angry, fear, happy, neutral, sad, surprise

Zhibing_emo

zhibing_emo

angry, fear, happy, neutral, sad, surprise

Zhimiao_emo

zhimiao_emo

serious, sad, disgust, jealousy, embarrassed, happy, fear, surprise, neutral, frustrated, affectionate, gentle, angry, newscast, customer-service, story, living

Zhimi_emo

zhimi_emo

angry, fear, happy, hate, neutral, sad, surprise

Zhiyan-Multi-Emotion

zhiyan_emo

neutral, happy, angry, sad, fear, hate, surprise, arousal

Zhibei_emo

zhibei_emo

neutral, happy, angry, sad, fear, hate, surprise

Zhitian_emo

zhitian_emo

neutral, happy, angry, sad, fear, hate, surprise

Service status codes

Each service response contains a `status` field, which is the service status code. The following tables describe the status codes.

General-purpose error codes

Status code

Status message

Cause

Solution

40000000

The default client error code. This code corresponds to multiple error messages.

Invalid parameters or call logic was used.

Compare your code with the sample code in the official documentation to test and verify it.

40000001

The token 'xxx' has expired.

The token 'xxx' is invalid

Invalid parameters or call logic was used. This is a general-purpose client error code that usually indicates an incorrect token, such as an expired or invalid token.

Compare your code with the sample code in the official documentation to test and verify it.

40000002

Gateway:MESSAGE_INVALID:Can't process message in state'FAILED'!

The message is invalid or incorrect.

Compare your code with the sample code in the official documentation to test and verify it.

40000003

PARAMETER_INVALID

Failed to decode url params

The parameters passed by the user are incorrect. This error is common for RESTful API calls.

Compare your code with the sample code in the official documentation to test and verify it.

40000005

Gateway:TOO_MANY_REQUESTS:Too many requests!

Too many concurrent requests.

If you are using the Free Edition, you can upgrade to a commercial version to increase the concurrency.

If you are already using a commercial version, you can purchase a concurrency resource plan to increase your concurrency quota.

40000009

Invalid wav header!

The message header is invalid.

If you send a WAV audio file and set the format parameter to wav, check whether the WAV header of the audio file is correct. If the header is incorrect, the server may reject the request.

40000009

Too large wav header!

The WAV header of the transmitted audio is invalid.

You can send the audio stream in a format such as PCM or OPUS. If you use the WAV format, make sure that the WAV header of the audio file contains the correct data length.

40000010

Gateway:FREE_TRIAL_EXPIRED:The free trial has expired!

The trial period has ended, and the commercial version is not activated or your account has an overdue payment.

You can log on to the console to check the service activation status and your account balance.

40010001

Gateway:NAMESPACE_NOT_FOUND:RESTful url path illegal

The operation or parameter is not supported.

Check whether the parameters passed in the call are consistent with the requirements in the official documentation. You can compare them with the error message to identify and set the correct parameters.

For example, if you are using a curl command to make a RESTful API request, check whether the URL you constructed is valid.

40010003

Gateway:DIRECTIVE_INVALID:[xxx]

A general-purpose client-side error code.

This error indicates that the client passed an incorrect parameter or instruction. Detailed error messages are available for different operations. You can refer to the corresponding documentation to set the parameters correctly.

40010004

Gateway:CLIENT_DISCONNECT:Client disconnected before task finished!

The client actively terminated the connection before the request was processed.

None. Alternatively, you can close the connection after the server responds.

40010005

Gateway:TASK_STATE_ERROR:Got stop directive while task is stopping!

The client sent a message instruction that is not currently supported.

Compare your code with the sample code in the official documentation to test and verify it.

40020105

Meta:APPKEY_NOT_EXIST:Appkey not exist!

A non-existent Appkey was used.

Confirm whether a non-existent Appkey was used. You can log on to the console and view the project configuration to find the Appkey.

40020106

Meta:APPKEY_UID_MISMATCH:Appkey and user mismatch!

The Appkey and token passed in the call were not created by the same Alibaba Cloud account UID. This causes a mismatch.

Check whether you are using resources from two different accounts. Do not use an Appkey from Account A with a token generated from Account B.

403

Forbidden

The token is invalid. For example, the token does not exist or has expired.

Set a valid token. Tokens have an expiration period. You must obtain a new token before the current one expires.

41000003

MetaInfo doesn't have end point info

Failed to retrieve the routing information for this Appkey.

Check whether you are using resources from two different accounts. Do not use an Appkey from Account A with a token generated from Account B.

41010101

UNSUPPORTED_SAMPLE_RATE

The sample rate is not supported.

Real-time speech recognition currently supports only audio with a sample rate of 8000 Hz or 16000 Hz.

41040201

Realtime:GET_CLIENT_DATA_TIMEOUT:Client data does not send continuously!

Failed to retrieve data from the client due to a timeout.

When you call real-time speech recognition, the client must send data at a real-time rate and close the connection promptly after the data is sent.

50000000

GRPC_ERROR:Grpc error!

An exception caused by factors such as machine load or network issues. This error usually occurs randomly.

You can retry the call to resolve the issue.

50000001

GRPC_ERROR:Grpc error!

An exception caused by factors such as machine load or network issues. This error usually occurs randomly.

You can retry the call to resolve the issue.

52010001

GRPC_ERROR:Grpc error!

An exception caused by factors such as machine load or network issues. This error usually occurs randomly.

You can retry the call to resolve the issue.

Speech synthesis/Long-text speech synthesis error codes

Status code

Status message

Cause

Solution

40000001

Gateway:ACCESS_DENIED:No privilege to this voice!

An incorrect speaker name was set.

You can refer to the official documentation to set the correct speaker.

40000004

Gateway:IDLE_TIMEOUT:Websocket session is idle for too long time,the last directive is 'StartSynthesis'!

After a connection is established, the server returns this error message if no data is sent for more than 10 seconds.

Close the connection promptly after the request is processed. This error may also occur if the server is under high instantaneous pressure and cannot return data in time. In this case, you can retry the request to resolve the issue.

40010003

Gateway:DIRECTIVE_INVALID:No text specified!

No valid text for synthesis was set.

You can refer to the sample code in the official documentation to set the text for synthesis.

41020001

Speech synthesis client error

Multiple error messages may be returned. Adjust your code based on the specific error message.

  • If the message Engine return error code: 424. is returned, the background music or concatenated recording does not conform to the required format. You can set the correct background music as described in the documentation.

  • If the message Engine return error code:418 is returned, an unsupported speaker name was passed.

  • If the message Engine return error code: 413 is returned, the SSML format used is incorrect.

  • If the message Request json illegal,failed to parse request. is returned, the passed JSON format is invalid.

  • If the message SSML text length should be less than 300. is returned, the synthesis text is too long. You must use the long-text speech synthesis operation.

51020001

TTS:TtsServerError

An exception caused by factors such as machine load or network issues. This error usually occurs randomly.

You can retry the call to resolve the issue.