Quick start for interactive messaging-Intelligent Media Services(IMS)-阿里云帮助中心

Step 1: Create an interactive messaging workflow

Log in to the Intelligent Media Service (IMS) console and click Create Workflow Template.

Select the Messaging workflow type and configure the workflow nodes as needed.

Note

To enable speech recognition or text-to-speech capabilities, you must configure the following nodes:

Configure the Speech-to-text node to enable push-to-talk speech recognition.
Configure the Text-to-speech node to enable spoken responses.

Speech-to-text

Converts voice input into text. Supports multiple languages.

Available model versions include System Preset ASR (recommended for mixed Chinese-English scenarios due to higher accuracy), Qwen3-ASR-Realtime, Fun-ASR-Realtime, and NLS-ASR (recommended for ultra-low latency scenarios). The default silence time is 400 ms. Requirements for uploaded hotword and sensitive word files: TXT format, up to 500 words, up to 10 characters per word, up to 100 KB, and UTF-8 encoding without BOM.

Preset: The preset model allows you to select a language model, set the silence time, and configure custom hotwords.
- Model: Select a language model that fits your scenario.
- Silent Time: How long the agent waits for user input before timing out.
- Custom Hotword: Hotwords improve recognition accuracy for domain-specific vocabulary. For more information, see Speech recognition hotwords.
- Sensitive Words: After you configure sensitive words, the system automatically redacts any detected sensitive word with asterisks (*) in the client-side subtitles. For more information, see Custom sensitive words.
Third-party Plug-in: Currently, you can select iFLYTEK Speech Recognition. To obtain the required parameters, go to iFLYTEK Real-time Speech Dictation.

Text-to-speech

Converts text into speech so that users can hear the agent's response.

The configuration dialog also includes: Version selection (Text-to-Speech 2.0 and Text-to-Speech 1.0 (Legacy)), Voice selection (such as Yunfeng), Volume setting (range 0–100), and a Preview Content input area (up to 200 characters). Click Play to preview the speech.

You can choose a text-to-speech model that suits your application scenario, including Preset Template, Self-developed Template, Third-party Plug-in, or Alibaba Cloud Model Studio.

System Preset Template: Includes System Default TTS, CosyVoice, and Qwen3-TTS.
Self-developed Template: Integrate your own model via a standard protocol. TTS Standard Interface.
Third-party Plug-in: Only MiniMax Speech Model is supported. Use the latest version. MiniMax Speech Model.
Model Studio: For custom voice cloning, integrate with Alibaba Cloud Model Studio. Voice Cloning.

LLM

The LLM processes STT output to understand and generate natural language responses.

Configure the System Persona (role, objectives, capabilities, response requirements, and constraints; up to 3,072 characters) and Conversation Memory Rounds (0–30). More rounds retain more context but may increase processing time.

Supported LLM providers: Qwen (system preset), Alibaba Cloud Model Studio, Tongyi Xingchen, and OpenAI-compliant self-developed models.

Alibaba Cloud Model Studio

A platform for large model development and application building. Connect through the Model center or the Application center.

Model center: In the Model Marketplace, select a model and copy its code as the ModelId.
Application Center: Create an agent application in Alibaba Cloud Model Studio, then obtain the AppId.
Go to the Key Management page to create and copy an API key.

Note

Best practices for integrating Alibaba Cloud Model Studio and Real-time Conversational AI.

Tongyi Xingchen

Tongyi Xingchen enables deeply personalized agents with unique personas, combinable with avatar-based real-time voice interaction.

ModelId: Tongyi Xingchen offers five models: xingchen-lite, xingchen-base, xingchen-plus, xingchen-plus-v2, and xingchen-max.
API key: Go to the Tongyi Xingchen console to create and obtain an API key.

Self-developed model

Connect your self-developed large model using the OpenAI specification.

Enter the following parameters:

Parameter	Description	Example
ModelId	The model name. Maps to the `model` field in the OpenAI specification.	abc
API key	The API authentication credential. Maps to the `api_key` field in the OpenAI specification.	AUJH-pfnTNMPBm6iWXcJAcWsrscb5KYaLitQhHBLKrI
Model URL (HTTPS)	The service endpoint URL. Maps to the `base_url` field in the OpenAI specification.	http://www.abc.com

LLM Standard Interface.

Click Save to create the interactive messaging workflow.

Step 2: Create an interactive messaging agent

Log in to the Intelligent Media Service (IMS) console and click Create AI Agent.
Configure the basic information and bind an interactive messaging workflow.

Set Workflow Type to Interactive Messaging, select the target workflow from the Workflow ID drop-down list, and configure the Interactive Messaging Application.
Create an Interactive Messaging Application.

Note
An interactive messaging application serves as a communication bridge that enables the conversational features.

In the Create Application dialog box, select a Region (for example, China (Shanghai)), enter an Application Name (2 to 16 characters), set the Message Storage Duration (default is 30 days), enable the Callback Settings and Security Audit switches as needed, and then click Create.
Configure the Interactive Messaging Application and click Submit to create the interactive messaging agent.

Step 3: Test the agent

After creating the agent, test it by scanning the demo QR code.

Generate a demo QR code in the console.

In the left-side navigation pane, click AI Agents. Find the target agent and click Demo QR Code in the Actions column. Select an expiration time (1 hour, 7 hours, 24 hours, or 3 days) and click Generate.
Scan the QR code with DingTalk, WeChat, or a browser, or copy the demo URL into your browser to use the H5 demo.

In the Demo QR Code dialog, the APP QR code is on the left and the H5 QR code is on the right. The H5 Demo URL, Experience Token, and Expiration Time are displayed at the bottom.

Agent integration

Collect the following parameters for integration. For detailed instructions, see Integrate an interactive messaging agent.

Region ID: The region of your workflow and agent. Find your Region ID in the region selector at the top-left of the console.

Region name	Region ID
China (Hangzhou)	cn-hangzhou
China (Shanghai)	cn-shanghai
China (Beijing)	cn-beijing
China (Shenzhen)	cn-shenzhen
Singapore	ap-southeast-1

AppId and AppKey of the interactive messaging application:

On the Agent Management details page, you can find the associated AppId in the Interactive Messaging Application field within the Workflow Configuration section.

On the Application Management page of the ApsaraVideo Live console, click the target application to open its details panel. Find the AppId and AppKey fields and click the copy icon next to them to copy their values.
AccessKey ID and AccessKey secret: To obtain them, see Create an AccessKey pair.