Audio and video calls-Intelligent Media Services(IMS)-阿里云帮助中心

Overview

The AICallKit SDK provides low-code solutions to integrate AI agents with real-time audio and video capabilities, enabling enterprises to rapidly build AI agent communication into their applications.

Benefits

Rapid integration and development: The AICallKit SDK offers pre-built interfaces that allow developers to implement real-time conversational AI with minimal coding.
Cross-platform support: The AICallKit SDK is compatible with iOS, Android, and Web, providing unified APIs that ensure consistent functionality and user experience across platforms.
Rich features: Beyond basic call functionality, the AICallKit SDK supports agent status display, real-time subtitles, and intelligent interruption. These features can be configured as needed if you use the integration solution without UI.

Integration solutions

The AICallKit SDK offers two integration solutions:

Integration solution with UI: A low-code solution that includes UI components for audio and video applications. You can run a demo with simple configurations and integrate the UI components into your project.
Integration solution without UI: The AICallKit SDK encapsulates real-time conversational AI capabilities to reduce development workload for AI agents and real-time communication (RTC). This solution is ideal if you want to customize the UI without managing the underlying implementation.

AICallKit SDK features

Feature	Description	iOS & Android	Web
Voice call	Users can talk with AI agents and receive instant feedback and services.	✔️	✔️
Avatar call	Users can make video calls with avatars for more realistic interactions.	✔️	✔️
Vision call	During video calls, the agent provides feedback based on the user's voice and camera feed.	✔️	✔️
Agent status	Displays the agent status, including listening, thinking, and speaking.	✔️	✔️
Real-time subtitles	Transcribes the dialogue between the agent and the user in real time and displays it on the client.	✔️	✔️
Manual interruption	Sends an instruction to the agent to stop it from speaking.	✔️	✔️
Intelligent interruption	The agent automatically detects the user's intent to interrupt the conversation.	✔️	✔️
Voice	Configures the agent voice. For supported voices, see Intelligent voice demos.	✔️	✔️
Push-to-talk mode	Users can switch to push-to-talk mode at the beginning of or during a call, and press the button to talk.	✔️	✔️
Voiceprint recognition	In multi-speaker scenarios, the agent identifies the voiceprint of the main speaker to accurately capture their speech and minimize background interference.	✔️	❌
Custom message	Sends custom messages through the RTC custom message channel.	✔️	✔️
Local device management	Users can turn off the speaker and mute the microphone during a call.	✔️	✔️
Callbacks	Retrieves information such as the main speaker's volume and network status through callbacks.	✔️	✔️