AI companionship

更新时间:
复制 MD 格式

Build AI companionship applications with Alibaba Cloud Real-time Conversational AI, supporting audio-only and avatar-based interactions.

Background

AI companionship products span role-playing, emotional chat, and psychological therapy. While most AI chat apps rely on offline text or voice in IM-style interfaces, multimodal models like GPT-4o are driving real-time voice and video interactions for more immersive virtual experiences.

Alibaba Cloud integrates third-party LLMs and TTS to deliver real-time interactive companionship with dynamic, evolving storylines. Users can both consume and create content for a personalized experience.

Options

Interaction modes

Real-time Conversational AI supports two interaction modes. Select a mode by specifying the call type when creating your agent, then integrate the corresponding SDK. Try the demo first. To integrate the service, see Quick start for audio/video calls.

Audio-only call

Avatar call

Example

555d2e763e3c49c23ac59cb7060d2a44

lQDPJxjZw5Ame9nNC6zNBaCw89zk0Od4uB8HWJitduNrAA_1440_2988

Interaction

  • User: Audio

  • AI companion: Audio

  • User: Audio

  • AI companion: Video

Cost

Low

Medium

Client SDKs

For more information about SDK integration, see Developer guide.

SDK

Description

Web SDK

Recommended

  • Desktop browsers, such as Chrome.

  • Mobile H5, such as Alipay H5, DingTalk H5, and WeChat mini program H5.

  • In-app WebView.

Note
  • Do not use native mobile browsers because some devices are not compatible with Web Real-Time Communication (WebRTC).

  • Native components of WeChat mini programs are not supported. Use WeChat mini program H5. For more information about integration, see Integrate the Web SDK in a WeChat mini program.

Android/iOS SDK

Recommended: Applications that run on the Android or iOS operating system.

Others

If you want to develop on a Windows or macOS desktop, search for the DingTalk group ID 106730016696 to join the group and contact us.

Basic features

Personalized calls

Customize each user's call experience by configuring call startup parameters at call initiation.

Setting

Description

Modifiable during call?

LLM prompt

Pass user-specific information in the initial prompt for a more personalized companionship experience.

Yes

ASR language

Set the speech recognition language (such as Chinese or English).

Yes

TTS voice

Set the AI's voice and timbre.

Yes

Avatar

If using a VideoAgent with multiple avatars, you can specify which one to use for the call.

No

Welcome message

Set a custom welcome message for each user, such as "Hi, Alice, it's great to see you again!"

No

Knowledge base

If you need a knowledge base, perform the following steps:

  1. Use Alibaba Cloud Model Studio to create an agent and publish it to Real-time Conversational AI. For more information about publishing an agent, see Publish a Real-time Conversational AI agent from Alibaba Cloud Model Studio.

  2. Set up the question library in Alibaba Cloud Model Studio. For more information about how to set up a question library, see Quick Start.

User information pass-through model

If multiple users are online during a call, the LLM must be able to accurately distinguish which user sent the current input. Real-time Conversational AI lets you pass information to the LLM. You can use this feature to pass custom information, such as a UserID, to the model. For more information, see Pass business parameters to an Alibaba Cloud Model Studio LLM.

Detect and handle user silence

You can listen for the intent_recognized parameter in the callback to obtain the time of each user utterance. For more information, see Agent callbacks. This lets you handle cases where a user is silent for a long time. Common handling methods are as follows:

Conversation archiving

Save audio data and text transcripts from companionship sessions. For instructions, see Data archiving.

Advanced features

Spoken language assessment (Per-sentence)

Real-time Conversational AI can record each user utterance as a separate audio file, saved in real time to your OSS bucket for pronunciation assessment.

Note

Real-time Conversational AI provides per-sentence audio recording but not the assessment feature itself. To configure per-sentence audio callbacks, see AI agent callbacks.