Audio and video calls

更新时间:
复制 MD 格式

Integrate AI agents for audio and video calls by using the AICallKit SDK.

Overview

The AICallKit SDK provides low-code solutions to integrate AI agents with real-time audio and video capabilities, enabling enterprises to rapidly build AI agent communication into their applications.

Benefits

  • Rapid integration and development: The AICallKit SDK offers pre-built interfaces that allow developers to implement real-time conversational AI with minimal coding.

  • Cross-platform support: The AICallKit SDK is compatible with iOS, Android, and Web, providing unified APIs that ensure consistent functionality and user experience across platforms.

  • Rich features: Beyond basic call functionality, the AICallKit SDK supports agent status display, real-time subtitles, and intelligent interruption. These features can be configured as needed if you use the integration solution without UI.

Integration solutions

The AICallKit SDK offers two integration solutions:

  • Integration solution with UI: A low-code solution that includes UI components for audio and video applications. You can run a demo with simple configurations and integrate the UI components into your project.

  • Integration solution without UI: The AICallKit SDK encapsulates real-time conversational AI capabilities to reduce development workload for AI agents and real-time communication (RTC). This solution is ideal if you want to customize the UI without managing the underlying implementation.

AICallKit SDK features

Feature

Description

iOS & Android

Web

Voice call

Users can talk with AI agents and receive instant feedback and services. 

✔️

✔️

Avatar call

Users can make video calls with avatars for more realistic interactions. 

✔️

✔️

Vision call

During video calls, the agent provides feedback based on the user's voice and camera feed. 

✔️

✔️

Agent status

Displays the agent status, including listening, thinking, and speaking.

✔️

✔️

Real-time subtitles

Transcribes the dialogue between the agent and the user in real time and displays it on the client.

✔️

✔️

Manual interruption

Sends an instruction to the agent to stop it from speaking.

✔️

✔️

Intelligent interruption

The agent automatically detects the user's intent to interrupt the conversation.

✔️

✔️

Voice

Configures the agent voice. For supported voices, see Intelligent voice demos.

✔️

✔️

Push-to-talk mode

Users can switch to push-to-talk mode at the beginning of or during a call, and press the button to talk.

✔️

✔️

Voiceprint recognition

In multi-speaker scenarios, the agent identifies the voiceprint of the main speaker to accurately capture their speech and minimize background interference.

✔️

Custom message

Sends custom messages through the RTC custom message channel.

✔️

✔️

Local device management

Users can turn off the speaker and mute the microphone during a call.

✔️

✔️

Callbacks

Retrieves information such as the main speaker's volume and network status through callbacks.

✔️

✔️