Integration solution with UI-Intelligent Media Services(IMS)-阿里云帮助中心

The Real-time Conversational AI solution provides ready-to-use UI components that help you quickly build interactive AI applications.

Overview

Built on AICallKit SDK, this solution provides UI components for audio and video applications. You can reuse the functional modules in AUI Kits to quickly build real-time interactive AI applications, reducing development time and costs while ensuring app quality and stability. To integrate AUI Kits for Real-time Conversational AI, see the following topics:

For server-side development, see Server-side integration and API references.

Demo experience

For the demo, see Try the demo.

Features

Feature	Description
Real-time workflow	Build AI agent workflows using a visual, drag-and-drop interface. Speech-to-text: Built-in Tongyi Qwen capability. Integrate iFLYTEK speech-to-text as a third-party plugin. Text-to-speech: Built-in Tongyi Qwen capability. Connect your custom text-to-speech module using standard protocols. Integrate MiniMax speech capabilities as a third-party plugin. Large language model (text-to-text): Built-in Tongyi Qwen capability. Select AI models from the Model Hub or Application Center in Alibaba Cloud Model Studio. Integrate your custom large language model using OpenAI standards. Digital human Integrate Xiangxin digital human capabilities as a third-party plugin. Video frame extraction Multimodal large language model: Built-in Tongyi Qwen capability. Integrate your custom multimodal large language model using OpenAI standards.
AI agent outbound calls	The AI agent dials users directly using carrier lines for telemarketing and notifications. Outbound and inbound phone calls Quick Start.
Custom AI agent appearance	Upload an image to represent your AI agent during voice calls.
AI agent emotion recognition	The AI agent detects the user’s current emotion and replies with emotional context.
Welcome message	Set a welcome message in the console. The AI agent speaks it when a conversation starts.
Proactive announcements	Your business server uses OpenAPI to trigger audio-video output from the AI agent.
Real-time captions	Display conversation text in real time on the end user’s interface.
Intelligent noise reduction	The AI agent filters background noise from the user side. When multiple people speak at once, it captures the loudest voice first.
Intelligent interruption detection	The AI agent detects when a user tries to interrupt during a conversation.
Intelligent Sentence Segmentation	The AI agent splits long or complex sentences automatically to improve readability.
Per-sentence audio callbacks	Configure audio callbacks in the console to store real-time audio data in OSS.
Walkie-talkie mode	Enable walkie-talkie mode at startup or during a call. Press a button to talk with the AI agent.
ASR hotwords	Define business-specific hotwords to improve speech recognition accuracy.
Voiceprint noise reduction	In group conversations, the AI agent identifies and preserves the main speaker’s voiceprint while reducing irrelevant noise.
Live agent takeover	When the AI agent cannot handle a request or a key decision is needed, transfer the conversation to a live agent.
Graceful shutdown	When your business server stops an AI agent, let it finish its current response before shutting down to avoid abrupt interruptions.
Data archiving	Convert AI agent conversations to text and retrieve them through APIs. Store audio-video recordings in OSS or ApsaraVideo VOD.