Access TTS models

更新时间:
复制 MD 格式

Integrate a self-developed TTS model into a real-time workflow by deploying an HTTPS streaming endpoint that accepts text and returns audio.

How it works

  1. Configure the TTS node in the console with your HTTPS URL, token, and sample rate.

  2. Start the real-time workflow. For each TTS request, the workflow sends a POST with the text, voice, sample rate, and session metadata to your endpoint.

  3. Your TTS server generates audio and streams it back as an HTTP response.

  4. The workflow forwards audio chunks to downstream nodes in real time.

Prerequisites

Before you begin:

  • Your TTS model is accessible over the Internet via HTTPS.

  • Your HTTP server supports streaming responses.

  • A workflow template with a TTS node is created.

Configure the TTS node

Configure these parameters in the TTS node:

Parameter Type Required Description Example
Request URL String Yes HTTPS URL of your TTS model endpoint. https://www.abc.com
Token String No Authorization token sent with each request. AUJH-pfnTNMPBm6iWXcJAcWsrscb5KYaLitQhHBLKrI
Sample rate Integer Yes Audio sample rate in Hz. Valid values: 8000, 16000, 24000, 48000. 48000
Note

Only mono S16LE (Signed 16-bit Little-Endian) audio is supported. Resample your output to S16LE if your model uses a different format.

Request parameters

At runtime, the workflow sends a POST request to your endpoint with the following JSON body:

Parameter Type Required Description Example
Text String Yes Text to synthesize into speech. Hello
VoiceId String No Voice identifier for the TTS model. <your-voice-id>
SampleRate Integer Yes Sample rate in Hz. Matches the value configured in the console. 48000
Token String No Authorization token. Matches the value configured in the console. AUJH-pfnTNMPBm6iWXcJAcWsrscb5KYaLitQhHBLKrI
ExtendData String Yes JSON string with session metadata and custom business data. ExtendData fields. See below

ExtendData fields

ExtendData contains the following fields:

Field Type Required Description Example
InstanceId String Yes ID of the intelligent agent instance. 68e00b6640e*****3e943332fee7
ChannelId String Yes ID of the communication channel. 123
SentenceId Int Yes Q&A session ID. All responses to a single user inquiry share the same SentenceId. 3
Emotion String No Emotion for the synthesized speech. Valid values: neutral, happy, sad. If omitted, no emotion is applied. happy
UserData String No Custom business data passed at intelligent agent instance startup. {"aaaa":"bbbb"}

Example ExtendData value:

{
  "InstanceId": "68e00b6640e*****3e943332fee7",
  "ChannelId": "123",
  "SentenceId": "3",
  "Emotion": "happy",
  "UserData": "{\"aaaa\":\"bbbb\"}"
}

Response requirements

Your TTS server must return audio matching the requested tone and sample rate as an HTTP streaming response. Lower streaming latency directly improves end-to-end performance.

Sample TTS server

Python

Uses aiohttp to implement a streaming TTS server on the /stream-audio endpoint.

from aiohttp import web


async def stream_audio(request):
    data = await request.json()
    text = data.get('Text', "")
    token = data.get('Token', None)
    sample_rate = data.get('SampleRate', 48000)
    extend_data = data.get('ExtendData', "")
    print(f"text:{text}, token:{token}, sample_rate:{sample_rate}, extend_data:{extend_data}")
    # Validate the token here.

    response = web.StreamResponse(
        status=200,
        reason='OK',
        headers={'Content-Type': 'audio/mpeg'}
    )

    # Start the streaming response.
    await response.prepare(request)

    # generate_tts_data is a coroutine that yields audio chunks.
    async for chunk in generate_tts_data(text, sample_rate):
        await response.write(chunk)

    # Signal the end of the response.
    await response.write_eof()

    return response


async def generate_tts_data(text: str, sample_rate: int):
    # Replace this with your TTS model inference logic.
    # This example reads audio data from a local PCM file.
    file_path = '/your_dir/sample.pcm'
    with open(file_path, 'rb') as f:
        while True:
            chunk = f.read(4096)  # Read 4 KB per chunk.
            if not chunk:
                break
            yield chunk

app = web.Application()
app.add_routes([web.post('/stream-audio', stream_audio)])

if __name__ == '__main__':
    web.run_app(app)

In production, replace generate_tts_data with your model's inference logic and update /your_dir/sample.pcm to your actual audio path.

References