Custom Voice captures voice samples from a target speaker and uses the deep learning capabilities of large language models (LLMs) to replicate that individual’s unique vocal characteristics—such as timbre, intonation, and rhythm—to generate highly realistic personalized speech. This enhances the naturalness and lifelikeness of voice interactions in your calling system. This topic describes how to create and use a custom voice.
Creation process
If you use the Custom Voice feature, you are responsible for ensuring that you own or have legal rights to the voice materials you provide. As the data processor, you must comply with applicable laws and regulations, such as the Personal Information Protection Law, and obtain explicit consent from the individual whose voice was recorded. Alibaba Cloud acts as your agent and processes the voice data only according to your instructions, without infringing on any third party’s legitimate rights. This feature does not support using voices of celebrities. Alibaba Cloud is not liable for any losses arising from disputes over the rights to the voice or text you upload. All other matters not covered here are governed by the and the Artificial Intelligence Cloud Call Service terms referenced therein.
-
Log on to the Artificial Intelligence Cloud Call Service console.
-
In the left navigation pane, choose , then click Create Voice.
-
Follow the on-screen instructions to specify your audio file details, then click Create to complete the process.
Note-
You are responsible for ownership and lawful usage rights of the voice you provide. Voices of celebrities are not supported.
-
Upload mono or stereo audio with 16-bit audio bit depth, a sample rate above 16000 Hz, a duration between 10 and 20 seconds, and at least one continuous speech segment longer than 5 seconds.
-
Supported formats: WAV, MP3, M4A. File size must not exceed 10 MB.
-
You are responsible for ownership and lawful usage rights of the voice materials you provide. As the data processor, you must comply with applicable laws and regulations, such as the Personal Information Protection Law, and obtain explicit consent from the individual whose voice was recorded. Alibaba Cloud acts as your agent and processes the voice data only according to your instructions, without infringing on any third party’s legitimate rights. This feature does not support using voices of celebrities.
In the form, enter a Voice Name (required), upload an Audio File (required), and set Smart Cloning to Yes or No (default is Yes; enabling it produces a more natural and fluent cloned voice). Optionally, add Notes (up to 200 characters).
-
-
After creation, go to the Custom Voice page and click Preview.
-
If the result meets your requirements, go to the Custom Voice page and click Publish.
-
In the dialog box that appears, click Publish.
The dialog box displays: "After publishing, this voice can be associated with LLM applications." Click Publish to confirm.
Usage process
Prerequisites: You have completed custom voice creation and published it.
-
Log on to the Artificial Intelligence Cloud Call Service console.
-
In the left navigation pane, choose .
-
On the LLM Application Management page, click Create LLM Application or Edit an existing LLM application.
-
On the LLM application creation or edit page, click Select Calling Voice. In the dialog box that appears, choose Alibaba Cloud CosyVoice Custom Voice. From the Voice Style dropdown, select your created custom voice. Click OK to confirm.
In the calling voice configuration panel, you can also adjust the Speech Rate slider (–200 to 200) and the Volume slider (0 to 100), and choose whether to enable Audio Mixing and Background Sound. After configuring, enter text in the preview area and click Preview Audio to hear the result.