This topic describes how to integrate an audio and video intelligent agent into your Android app using the AICallKit SDK.
Prerequisites
Android Studio 4.1.3 or later
Gradle 7.0.2 or later
JDK 11, which is bundled with Android Studio
Workflow
Your app obtains an RTC Token from your App Server. You can then call call(rtcToken) to start a call. During the call, you can use the AICallKit API to implement interactive features such as subtitles and interrupting the agent. AICallKit relies on real-time audio and video capabilities and includes the functionality of the AliVCSDK_ARTC SDK. If your business scenario also requires live streaming and on-demand video, you can use ApsaraVideo for MediaBox, such as AliVCSDK_Standard or AliVCSDK_InteractiveLive. To learn how to combine the SDKs, see Select and download an SDK.
Integrate the SDK
Add the Alibaba Cloud Maven repository to your project-level build.gradle file.
allprojects { repositories { google() jcenter() maven { url 'https://maven.aliyun.com/repository/google' } maven { url 'https://maven.aliyun.com/repository/public' } } }In your app-level build.gradle file, add the ARTCAICallKit dependency.
dependencies { implementation 'com.aliyun.aio:AliVCSDK_ARTC:x.x.x' // Replace x.x.x with a version that is compatible with your project. implementation 'com.aliyun.auikits.android:ARTCAICallKit:x.x.x' implementation 'com.alivc.live.component:PluginAEC:2.0.0' }NoteLatest ARTC SDK version: 7.10.0
Latest AICallKit SDK version: 2.11.0.
SDK development
Step 1: Request audio and video permissions
Check if your app has microphone and camera permissions. If not, prompt the user to grant them. You must implement this permission request logic in your app. For a code example, see PermissionUtils.java.
PermissionX.init(this)
.permissions(PermissionUtils.getPermissions())
.request((allGranted, grantedList, deniedList) -> {
});Step 2: Create and initialize the engine
Create and initialize the ARTCAICallEngine. The following code shows an example:
String userId = "123"; // We recommend using the user ID from your app's login.
ARTCAICallEngineImpl engine = new ARTCAICallEngineImpl(this, userId);
// If the agent is a digital human (AvatarAgent), configure its view container.
if (aiAgentType == AvatarAgent) {
ViewGroup avatarlayer;
engine.setAgentView(
avatarlayer,
new ViewGroup.LayoutParams(ViewGroup.LayoutParams.MATCH_PARENT,
ViewGroup.LayoutParams.MATCH_PARENT)
);
}
// If the agent provides visual understanding (VisionAgent), configure the local video preview container.
else if (aiAgentType == VisionAgent) {
ViewGroup previewLayer;
engine.setLocalView(previewLayer,
new FrameLayout.LayoutParams(ViewGroup.LayoutParams.MATCH_PARENT,
ViewGroup.LayoutParams.MATCH_PARENT)
);
} else if(aiAgentType == VideoAgent) {
ARTCAICallEngine.ARTCAICallVideoCanvas remoteCanvas = new ARTCAICallEngine.ARTCAICallVideoCanvas();
remoteCanvas.zOrderOnTop = false;
remoteCanvas.zOrderMediaOverlay = false;
ViewGroup avatarlayer;
engine.setAgentView(
avatarlayer,
new ViewGroup.LayoutParams(ViewGroup.LayoutParams.MATCH_PARENT,
ViewGroup.LayoutParams.MATCH_PARENT), remoteCanvas
);
ViewGroup previewLayer;
engine.setLocalView(previewLayer,
new FrameLayout.LayoutParams(ViewGroup.LayoutParams.MATCH_PARENT,
ViewGroup.LayoutParams.MATCH_PARENT)
);
}Step 3: Implement callbacks
Implement the necessary engine callbacks to handle various events. For details on the callback interface, see the API reference.
protected ARTCAICallEngine.IARTCAICallEngineCallback mCallEngineCallback = new ARTCAICallEngine.IARTCAICallEngineCallback() {
@Override
public void onErrorOccurs(ARTCAICallEngine.AICallErrorCode errorCode) {
// An error occurred. End the call.
engine.handup();
}
@Override
public void onCallBegin() {
// The call starts (the user has joined the session).
}
@Override
public void onCallEnd() {
// The call ends (the user has left the session).
}
@Override
public void onAICallEngineRobotStateChanged(ARTCAICallEngine.ARTCAICallRobotState oldRobotState, ARTCAICallEngine.ARTCAICallRobotState newRobotState) {
// The agent's state changes.
}
@Override
public void onUserSpeaking(boolean isSpeaking) {
// A user is speaking.
}
@Override
public void onUserAsrSubtitleNotify(String text, boolean isSentenceEnd, int sentenceId, VoicePrintStatusCode voicePrintStatusCode) {
}
@Override
public void onAIAgentSubtitleNotify(String text, boolean end, int userAsrSentenceId) {
// Received subtitles for the agent's response.
}
@Override
public void onNetworkStatusChanged(String uid, ARTCAICallEngine.ARTCAICallNetworkQuality quality) {
// The network status changes.
}
@Override
public void onVoiceVolumeChanged(String uid, int volume) {
// A user's voice volume changes.
}
@Override
public void onVoiceIdChanged(String voiceId) {
// The voice for the current call changes.
}
@Override
public void onVoiceInterrupted(boolean enable) {
// The voice interruption setting for the current call changes.
}
@Override
public void onAgentVideoAvailable(boolean available) {
// The agent's video is available (publishing).
}
@Override
public void onAgentAudioAvailable(boolean available) {
// The agent's audio is available (publishing).
}
@Override
public void onAgentAvatarFirstFrameDrawn() {
// The first video frame of the digital human is rendered.
}
@Override
public void onUserOnLine(String uid) {
// A remote user comes online.
}
};
engine.setEngineCallback(mCallEngineCallback);Step 4: Create and initialize ARTCAICallConfig
For details, see the ARTCAICallConfig reference.
ARTCAICallEngine.ARTCAICallConfig artcaiCallConfig = new ARTCAICallEngine.ARTCAICallConfig();
artcaiCallConfig.agentId = "XXX"; // Required. The agent ID.
artcaiCallConfig.region = "cn-shanghai"; // Required. The agent region.
artcaiCallConfig.agentType = VoiceAgent; // Specify the agent type: voice-only, digital human, visual understanding, or video call.
engine.init(artcaiCallConfig);Region name | Region ID |
China (Hangzhou) | cn-hangzhou |
China (Shanghai) | cn-shanghai |
China (Beijing) | cn-beijing |
China (Shenzhen) | cn-shenzhen |
Singapore | ap-southeast-1 |
Step 5: Start a call with the agent
Call call() to start a call with the agent. To obtain an authentication token, see Generate an ARTC authentication token. After the call starts, you can implement in-call features such as subtitles and interrupting the agent. For more information, see Implement features.
engine.call(token);
// After the call is connected, the following callback is triggered.
public void onCallBegin() {
// The call starts (the user has joined the session).
}Step 6: Implement in-call features
During the call, you can implement features such as subtitles and interrupting the agent. For more information, see Implement features.
Step 7: End the call
Call handup() to end the call.
engine.handup();