SubmitAudioProduceJob

更新时间:
复制 MD 格式

This API converts text into a high-quality audio file of speech.

Operation description

This is an asynchronous API. After you submit a job, you receive a job ID, and the job is processed in the background. You can get the result through a callback notification or by querying the job status with the GetSmartJobResult API.

Try it now

Try this API in OpenAPI Explorer, no manual signing needed. Successful calls auto-generate SDK code matching your parameters. Download it with built-in credential security for local usage.

Test

RAM authorization

The table below describes the authorization required to call this API. You can define it in a Resource Access Management (RAM) policy. The table's columns are detailed below:

  • Action: The actions can be used in the Action element of RAM permission policy statements to grant permissions to perform the operation.

  • API: The API that you can call to perform the action.

  • Access level: The predefined level of access granted for each API. Valid values: create, list, get, update, and delete.

  • Resource type: The type of the resource that supports authorization to perform the action. It indicates if the action supports resource-level permission. The specified resource must be compatible with the action. Otherwise, the policy will be ineffective.

    • For APIs with resource-level permissions, required resource types are marked with an asterisk (*). Specify the corresponding Alibaba Cloud Resource Name (ARN) in the Resource element of the policy.

    • For APIs without resource-level permissions, it is shown as All Resources. Use an asterisk (*) in the Resource element of the policy.

  • Condition key: The condition keys defined by the service. The key allows for granular control, applying to either actions alone or actions associated with specific resources. In addition to service-specific condition keys, Alibaba Cloud provides a set of common condition keys applicable across all RAM-supported services.

  • Dependent action: The dependent actions required to run the action. To complete the action, the RAM user or the RAM role must have the permissions to perform all dependent actions.

Action

Access level

Resource type

Condition key

Dependent action

ice:SubmitAudioProduceJob

*All Resource

*

None None

Request parameters

Parameter

Type

Required

Description

Example

EditingConfig

string

Yes

The audio production configuration:

  • voice: The voice type.

  • customizedVoice: The ID of the custom voice for voice cloning.

  • format: The output file format. Supported formats: PCM, WAV, and MP3.

  • volume: The volume. The value ranges from 0 to 100. Default: 50.

  • speech_rate: The speech rate. The value ranges from -500 to 500. Default: 0.
    • Values of -500, 0, and 500 correspond to 0.5x, 1.0x, and 2.0x speed, respectively.

    • Calculation method:
      • For a 0.8x speed multiplier: (1 - 1/0.8) / 0.002 = -125.

      • For a 1.2x speed multiplier: (1 - 1/1.2) / 0.001 = 166.

      • For speed multipliers less than 1, use a factor of 0.002.

      • For speed multipliers greater than 1, use a factor of 0.001.

  • pitch_rate: The pitch rate. The value ranges from -500 to 500. Default: 0.
    Important If you provide both voice and customizedVoice, customizedVoice takes precedence.

{"voice":"Siqi","format":"MP3","volume":50}

OutputConfig

string

Yes

The audio output configuration.

例如将输出的语音存至:http://my_bucket.oss-cn-shanghai.aliyuncs.com/target_audio.mp3,则此参数配置为: { "bucket": "my_bucket", "object": "target_audio" }

InputConfig

string

Yes

The text to synthesize. The maximum length is 10,000 characters. Supports SSML.

测试文本

Title

string

No

The title of the job. If you do not provide a title, the system automatically generates one based on the current date.

  • Cannot exceed 128 bytes.

  • Must be UTF-8 encoded.

任务标题。 若不提供,根据日期自动生成默认title 长度不超过128字节 UTF8编码

Description

string

No

The description of the job.

  • Cannot exceed 1,024 bytes.

  • Must be UTF-8 encoded.

任务描述长度不超过1024字节 UTF8编码

UserData

string

No

Custom settings in JSON format. The maximum length is 512 bytes. This parameter supports custom callback address configuration.

{"NotifyAddress":"http://xx.xx.xxx"}或{"NotifyAddress":"https://xx.xx.xxx"}或{"NotifyAddress":"ice-callback-demo"}

Overwrite

boolean

No

Specifies whether to overwrite an existing OSS file.

true

Response elements

Element

Type

Description

Example

object

The response body.

RequestId

string

The request ID.

******11-DB8D-4A9A-875B-275798******

JobId

string

The job ID.

****20b48fb04483915d4f2cd8ac****

State

string

The job status.

  • Created

  • Executing

  • Finished

  • Failed

Created

MediaId

string

The media ID.

****2bcbfcfa30fccb36f72dca22****

Use the job ID with the GetSmartJobResult API to query detailed information about a text-to-speech job. The following example shows a sample response from the GetSmartJobResult API for a successful job.

Note

By default, the TTS engine segments text based on punctuation marks such as commas and periods.

{
  "RequestId": "******2D-443C-5043-B0E4-867070******",
  "JobId": "******042d5e4db6866f6289d1******",
  "State": "Finished",
  "SmartJobInfo": {
    "Title": "default_title_2022-01-21T06:15:07Z",
    "JobType": "TextToSpeech",
    "CreateTime": "2022-01-21T06:15:07Z",
    "ModifiedTime": "2022-01-21T06:15:07Z",
    "InputConfig": {
      "InputFile": "Speaking of Guo Degang, he is extremely popular now. Tickets are often expensive but sell out instantly. He also participates in various crosstalk variety shows to comment on new performers."
    },
    "EditingConfig": "{\"format\":\"MP3\",\"pitch_rate\":0,\"sample_rate\":16000,\"speech_rate\":0,\"voice\":\"Siqi\",\"volume\":50}",
    "OutputConfig": {
      "Bucket": "your-bucket",
      "Object": "your-audio"
    }
  },
  "JobResult": {
    "MediaId": "******bf47c94e82b3b2014361******",
    "AiResult": "[{\"text\":\"Speaking of Guo Degang,\",\"begin_time\":0,\"end_time\":846},{\"text\":\"he is extremely popular now.\",\"begin_time\":846,\"end_time\":3386},{\"text\":\"Tickets are often expensive\",\"begin_time\":3386,\"end_time\":4402},{\"text\":\"but sell out instantly.\",\"begin_time\":4402,\"end_time\":6265},{\"text\":\"He also participates in various crosstalk variety shows to comment on new performers.\",\"begin_time\":6265,\"end_time\":10330}]"
  }
}

Examples

Success response

JSON format

{
  "RequestId": "******11-DB8D-4A9A-875B-275798******",
  "JobId": "****20b48fb04483915d4f2cd8ac****",
  "State": "Created",
  "MediaId": "****2bcbfcfa30fccb36f72dca22****"
}

Error codes

See Error Codes for a complete list.

Release notes

See Release Notes for a complete list.