Use AICallKit SDK to capture frames from a user's camera and send them to a large language model (LLM) for inspection and analysis.
Before you begin
-
The following examples show how to capture frames using the API.
-
You must integrate AICallKit SDK in advance. For more information, see Integrate AICallKit SDK for Android, Integrate AICallKit SDK for iOS, and Integrate AICallKit SDK for Web.
-
AICallKit SDK V2.1.0 or later supports frame capture.
Feature description
During a call with a visual understanding agent, you can call the frame capture API to capture camera images for inspection. AICallKit SDK provides two capture modes that automatically capture images and push them to the LLM for analysis. This feature is suitable for scenarios such as industrial inspection and AI glass applications.
How it works
To implement frame capture, call startVisionCustomCapture. AICallKit SDK provides two modes:
-
One-time frame capture: captures images when a specific event occurs. For example, when a user clicks a button, camera images are pushed to the LLM for processing.
Parameter
Type
Description
isSingle
Boolean
The frame capture mode.
-
true: one-time frame capture
-
false (default): regular frame capture
text
String
The text prompt sent to the multimodal LLM.
eachDuration
Int
The frame capture duration. Unit: seconds.
num
Int
The number of images to capture.
userData
String
Custom business information passed along with the text and frames to the LLM.
Example: If the frame capture duration is 1 second and the number of images is 2, the system begins timing at the start of the call and processes the video data within that 1 second, evenly captures 2 frames, and sends them to the LLM for analysis.
-
-
Regular frame capture: automatically captures camera images at regular intervals over a specified duration and sends them to the LLM for processing.
Parameter
Type
Description
isSingle
Boolean
The frame capture mode.
-
true: one-time frame capture
-
false (default): regular frame capture
text
String
The text prompt sent to the multimodal LLM.
duration
Int
The frame capture duration. Unit: seconds.
eachDuration
Int
The frame capture interval. Unit: seconds.
num
Int
The number of images to capture each time.
userData
String
Custom business information passed along with the text and frames to the LLM.
enableASR
Boolean
Specifies whether to send ASR-recognized speech to the LLM as input.
-
true
-
false (default)
NoteThis parameter is available only in AICallKit SDK V2.2.0 or later.
Example: If the duration is 100 seconds, with a capture interval of 2 seconds and 2 images per cycle, the system captures 2 evenly distributed frames every 2 seconds throughout the 100-second period starting from when the API is called and sends them to the LLM for analysis. The frame capture mode and userData remain unchanged during this process.
NoteIn regular frame capture mode, user voice is not processed by default.
-
Sample code
Android
// Call startVisionCustomCapture after the call is initiated.
// Start custom frame capture in the onCallBegin callback.
// For frame capture parameters, see the description about ARTCAICallVisionCustomCaptureRequest.
public void onCallBegin() {
// Request parameters, including the text parameter, frame capture mode, frame capture interval, number of images to capture each time, frame capture duration, and custom business information.
mARTCAICallEngine.startVisionCustomCapture(new ARTCAICallEngine.ARTCAICallVisionCustomCaptureRequest("XXX", false, 5, 2, 100, ""))
}
// End frame capture if needed.
mARTCAICallEngine.stopVisionCustomCapture()
iOS
// Call startVisionCustomCapture after the call is initiated.
// Start custom frame capture in the onCallBegin callback.
// For frame capture parameters, see the description about ARTCAICallVisionCustomCaptureRequest.
public func onCallBegin() {
// The call starts.
let req = ARTCAICallVisionCustomCaptureRequest()
req.isSingle = false
req.text = "xxx"
req.userData = "{}"
req.duration = 100
req.eachDuration = 5
req.num = 2
_ = self.engine.startVisionCustomCapture(req: req)
}
// End frame capture if needed.
_ = self.engine.stopVisionCustomCapture()
Web
// Call startVisionCustomCapture after the call is initiated.
// For frame capture parameters, see the description about AICallVisionCustomCaptureRequest.
engine.startVisionCustomCapture({
isSingle: false,
text: 'xxx',
userData: '{}',
duration: 100,
eachDuration: 5,
num: 2,
});
// End frame capture if needed.
engine.stopVisionCustomCapture();