Integration solution with UI

更新时间:
复制 MD 格式

The Real-time Conversational AI solution provides ready-to-use UI components that help you quickly build interactive AI applications.

Overview

Built on AICallKit SDK, this solution provides UI components for audio and video applications. You can reuse the functional modules in AUI Kits to quickly build real-time interactive AI applications, reducing development time and costs while ensuring app quality and stability. To integrate AUI Kits for Real-time Conversational AI, see the following topics:

For server-side development, see Server-side integration and API references.

Demo experience

For the demo, see Try the demo.

Features

Feature

Description

Real-time workflow

Build AI agent workflows using a visual, drag-and-drop interface.

  • Speech-to-text:

    • Built-in Tongyi Qwen capability.

    • Integrate iFLYTEK speech-to-text as a third-party plugin.

  • Text-to-speech:

    • Built-in Tongyi Qwen capability.

    • Connect your custom text-to-speech module using standard protocols.

    • Integrate MiniMax speech capabilities as a third-party plugin.

  • Large language model (text-to-text):

    • Built-in Tongyi Qwen capability.

    • Select AI models from the Model Hub or Application Center in Alibaba Cloud Model Studio.

    • Integrate your custom large language model using OpenAI standards.

  • Digital human

    • Integrate Xiangxin digital human capabilities as a third-party plugin.

  • Video frame extraction

  • Multimodal large language model:

    • Built-in Tongyi Qwen capability.

    • Integrate your custom multimodal large language model using OpenAI standards.

AI agent outbound calls

The AI agent dials users directly using carrier lines for telemarketing and notifications. Outbound and inbound phone calls Quick Start.

Custom AI agent appearance

Upload an image to represent your AI agent during voice calls.

AI agent emotion recognition

The AI agent detects the user’s current emotion and replies with emotional context.

Welcome message

Set a welcome message in the console. The AI agent speaks it when a conversation starts.

Proactive announcements

Your business server uses OpenAPI to trigger audio-video output from the AI agent.

Real-time captions

Display conversation text in real time on the end user’s interface.

Intelligent noise reduction

The AI agent filters background noise from the user side. When multiple people speak at once, it captures the loudest voice first.

Intelligent interruption detection

The AI agent detects when a user tries to interrupt during a conversation.

Intelligent Sentence Segmentation

The AI agent splits long or complex sentences automatically to improve readability.

Per-sentence audio callbacks

Configure audio callbacks in the console to store real-time audio data in OSS.

Walkie-talkie mode

Enable walkie-talkie mode at startup or during a call. Press a button to talk with the AI agent.

ASR hotwords

Define business-specific hotwords to improve speech recognition accuracy.

Voiceprint noise reduction

In group conversations, the AI agent identifies and preserves the main speaker’s voiceprint while reducing irrelevant noise.

Live agent takeover

When the AI agent cannot handle a request or a key decision is needed, transfer the conversation to a live agent.

Graceful shutdown

When your business server stops an AI agent, let it finish its current response before shutting down to avoid abrupt interruptions.

Data archiving

Convert AI agent conversations to text and retrieve them through APIs. Store audio-video recordings in OSS or ApsaraVideo VOD.