Integrate AI agents for audio and video calls by using the AICallKit SDK.
Overview
The AICallKit SDK provides low-code solutions to integrate AI agents with real-time audio and video capabilities, enabling enterprises to rapidly build AI agent communication into their applications.
Benefits
-
Rapid integration and development: The AICallKit SDK offers pre-built interfaces that allow developers to implement real-time conversational AI with minimal coding.
-
Cross-platform support: The AICallKit SDK is compatible with iOS, Android, and Web, providing unified APIs that ensure consistent functionality and user experience across platforms.
-
Rich features: Beyond basic call functionality, the AICallKit SDK supports agent status display, real-time subtitles, and intelligent interruption. These features can be configured as needed if you use the integration solution without UI.
Integration solutions
The AICallKit SDK offers two integration solutions:
-
Integration solution with UI: A low-code solution that includes UI components for audio and video applications. You can run a demo with simple configurations and integrate the UI components into your project.
-
Integration solution without UI: The AICallKit SDK encapsulates real-time conversational AI capabilities to reduce development workload for AI agents and real-time communication (RTC). This solution is ideal if you want to customize the UI without managing the underlying implementation.
AICallKit SDK features
|
Feature |
Description |
iOS & Android |
Web |
|
Voice call |
Users can talk with AI agents and receive instant feedback and services. |
✔️ |
✔️ |
|
Avatar call |
Users can make video calls with avatars for more realistic interactions. |
✔️ |
✔️ |
|
Vision call |
During video calls, the agent provides feedback based on the user's voice and camera feed. |
✔️ |
✔️ |
|
Agent status |
Displays the agent status, including listening, thinking, and speaking. |
✔️ |
✔️ |
|
Real-time subtitles |
Transcribes the dialogue between the agent and the user in real time and displays it on the client. |
✔️ |
✔️ |
|
Manual interruption |
Sends an instruction to the agent to stop it from speaking. |
✔️ |
✔️ |
|
Intelligent interruption |
The agent automatically detects the user's intent to interrupt the conversation. |
✔️ |
✔️ |
|
Voice |
Configures the agent voice. For supported voices, see Intelligent voice demos. |
✔️ |
✔️ |
|
Push-to-talk mode |
Users can switch to push-to-talk mode at the beginning of or during a call, and press the button to talk. |
✔️ |
✔️ |
|
Voiceprint recognition |
In multi-speaker scenarios, the agent identifies the voiceprint of the main speaker to accurately capture their speech and minimize background interference. |
✔️ |
❌ |
|
Custom message |
Sends custom messages through the RTC custom message channel. |
✔️ |
✔️ |
|
Local device management |
Users can turn off the speaker and mute the microphone during a call. |
✔️ |
✔️ |
|
Callbacks |
Retrieves information such as the main speaker's volume and network status through callbacks. |
✔️ |
✔️ |