Quick start for interactive messaging

更新时间:
复制 MD 格式

Create an interactive messaging agent to enable AI-powered conversations with speech recognition and text-to-speech capabilities.

Before you begin, ensure you meet the following requirements:

  • Intelligent Media Services (IMS) subscription is upgraded to IMS Enterprise Standard Edition or Ultimate Edition. To upgrade subscribed service, go to the IMS Subscription page.

  • Real-time Conversational AI is enabled. To enable the feature, go to the buy page.

Step 1: Create an interactive messaging workflow

  1. Log in to the Intelligent Media Service (IMS) console and click Create Workflow Template.

  2. Select the Messaging workflow type and configure the workflow nodes as needed.

    Note

    To enable speech recognition or text-to-speech capabilities, you must configure the following nodes:

    • Configure the Speech-to-text node to enable push-to-talk speech recognition.

    • Configure the Text-to-speech node to enable spoken responses.

    Speech-to-text

    Converts voice input into text. Supports multiple languages.

    Available model versions include System Preset ASR (recommended for mixed Chinese-English scenarios due to higher accuracy), Qwen3-ASR-Realtime, Fun-ASR-Realtime, and NLS-ASR (recommended for ultra-low latency scenarios). The default silence time is 400 ms. Requirements for uploaded hotword and sensitive word files: TXT format, up to 500 words, up to 10 characters per word, up to 100 KB, and UTF-8 encoding without BOM.

    • Preset: The preset model allows you to select a language model, set the silence time, and configure custom hotwords.

      • Model: Select a language model that fits your scenario.

      • Silent Time: How long the agent waits for user input before timing out.

      • Custom Hotword: Hotwords improve recognition accuracy for domain-specific vocabulary. For more information, see Speech recognition hotwords.

      • Sensitive Words: After you configure sensitive words, the system automatically redacts any detected sensitive word with asterisks (*) in the client-side subtitles. For more information, see Custom sensitive words.

    • Third-party Plug-in: Currently, you can select iFLYTEK Speech Recognition. To obtain the required parameters, go to iFLYTEK Real-time Speech Dictation.

    Text-to-speech

    Converts text into speech so that users can hear the agent's response.

    The configuration dialog also includes: Version selection (Text-to-Speech 2.0 and Text-to-Speech 1.0 (Legacy)), Voice selection (such as Yunfeng), Volume setting (range 0–100), and a Preview Content input area (up to 200 characters). Click Play to preview the speech.

    You can choose a text-to-speech model that suits your application scenario, including Preset Template, Self-developed Template, Third-party Plug-in, or Alibaba Cloud Model Studio.

    • System Preset Template: Includes System Default TTS, CosyVoice, and Qwen3-TTS.

    • Self-developed Template: Integrate your own model via a standard protocol. TTS Standard Interface.

    • Third-party Plug-in: Only MiniMax Speech Model is supported. Use the latest version. MiniMax Speech Model.

    • Model Studio: For custom voice cloning, integrate with Alibaba Cloud Model Studio. Voice Cloning.

    LLM

    The LLM processes STT output to understand and generate natural language responses.

    Configure the System Persona (role, objectives, capabilities, response requirements, and constraints; up to 3,072 characters) and Conversation Memory Rounds (0–30). More rounds retain more context but may increase processing time.

    Supported LLM providers: Qwen (system preset), Alibaba Cloud Model Studio, Tongyi Xingchen, and OpenAI-compliant self-developed models.

    Alibaba Cloud Model Studio

    A platform for large model development and application building. Connect through the Model center or the Application center.

    Tongyi Xingchen

    Tongyi Xingchen enables deeply personalized agents with unique personas, combinable with avatar-based real-time voice interaction.

    • ModelId: Tongyi Xingchen offers five models: xingchen-lite, xingchen-base, xingchen-plus, xingchen-plus-v2, and xingchen-max.

    • API key: Go to the Tongyi Xingchen console to create and obtain an API key.

    Self-developed model

    Connect your self-developed large model using the OpenAI specification.

    Enter the following parameters:

    Parameter

    Description

    Example

    ModelId

    The model name. Maps to the model field in the OpenAI specification.

    abc

    API key

    The API authentication credential. Maps to the api_key field in the OpenAI specification.

    AUJH-pfnTNMPBm6iWXcJAcWsrscb5KYaLitQhHBLKrI

    Model URL (HTTPS)

    The service endpoint URL. Maps to the base_url field in the OpenAI specification.

    http://www.abc.com

    LLM Standard Interface.

  3. Click Save to create the interactive messaging workflow.

Step 2: Create an interactive messaging agent

  1. Log in to the Intelligent Media Service (IMS) console and click Create AI Agent.

  2. Configure the basic information and bind an interactive messaging workflow.

    Set Workflow Type to Interactive Messaging, select the target workflow from the Workflow ID drop-down list, and configure the Interactive Messaging Application.

  3. Create an Interactive Messaging Application.

    Note

    An interactive messaging application serves as a communication bridge that enables the conversational features.

    In the Create Application dialog box, select a Region (for example, China (Shanghai)), enter an Application Name (2 to 16 characters), set the Message Storage Duration (default is 30 days), enable the Callback Settings and Security Audit switches as needed, and then click Create.

  4. Configure the Interactive Messaging Application and click Submit to create the interactive messaging agent.

Step 3: Test the agent

After creating the agent, test it by scanning the demo QR code.

  1. Generate a demo QR code in the console.

    In the left-side navigation pane, click AI Agents. Find the target agent and click Demo QR Code in the Actions column. Select an expiration time (1 hour, 7 hours, 24 hours, or 3 days) and click Generate.

  2. Scan the QR code with DingTalk, WeChat, or a browser, or copy the demo URL into your browser to use the H5 demo.

    In the Demo QR Code dialog, the APP QR code is on the left and the H5 QR code is on the right. The H5 Demo URL, Experience Token, and Expiration Time are displayed at the bottom.

Agent integration

Collect the following parameters for integration. For detailed instructions, see Integrate an interactive messaging agent.

  • Region ID: The region of your workflow and agent. Find your Region ID in the region selector at the top-left of the console.

    Region name

    Region ID

    China (Hangzhou)

    cn-hangzhou

    China (Shanghai)

    cn-shanghai

    China (Beijing)

    cn-beijing

    China (Shenzhen)

    cn-shenzhen

    Singapore

    ap-southeast-1

  • AppId and AppKey of the interactive messaging application:

    On the Agent Management details page, you can find the associated AppId in the Interactive Messaging Application field within the Workflow Configuration section.

    On the Application Management page of the ApsaraVideo Live console, click the target application to open its details panel. Find the AppId and AppKey fields and click the copy icon next to them to copy their values.

  • AccessKey ID and AccessKey secret: To obtain them, see Create an AccessKey pair.