Frame capture

更新时间:
复制 MD 格式

Use AICallKit SDK to capture frames from a user's camera and send them to a large language model (LLM) for inspection and analysis.

Before you begin

Feature description

During a call with a visual understanding agent, you can call the frame capture API to capture camera images for inspection. AICallKit SDK provides two capture modes that automatically capture images and push them to the LLM for analysis. This feature is suitable for scenarios such as industrial inspection and AI glass applications.

How it works

To implement frame capture, call startVisionCustomCapture. AICallKit SDK provides two modes:

  • One-time frame capture: captures images when a specific event occurs. For example, when a user clicks a button, camera images are pushed to the LLM for processing.

    Parameter

    Type

    Description

    isSingle

    Boolean

    The frame capture mode.

    • true: one-time frame capture

    • false (default): regular frame capture

    text

    String

    The text prompt sent to the multimodal LLM.

    eachDuration

    Int

    The frame capture duration. Unit: seconds.

    num

    Int

    The number of images to capture.

    userData

    String

    Custom business information passed along with the text and frames to the LLM.

    Example: If the frame capture duration is 1 second and the number of images is 2, the system begins timing at the start of the call and processes the video data within that 1 second, evenly captures 2 frames, and sends them to the LLM for analysis.

  • Regular frame capture: automatically captures camera images at regular intervals over a specified duration and sends them to the LLM for processing.

    Parameter

    Type

    Description

    isSingle

    Boolean

    The frame capture mode.

    • true: one-time frame capture

    • false (default): regular frame capture

    text

    String

    The text prompt sent to the multimodal LLM.

    duration

    Int

    The frame capture duration. Unit: seconds.

    eachDuration

    Int

    The frame capture interval. Unit: seconds.

    num

    Int

    The number of images to capture each time.

    userData

    String

    Custom business information passed along with the text and frames to the LLM.

    enableASR

    Boolean

    Specifies whether to send ASR-recognized speech to the LLM as input.

    • true

    • false (default)

    Note

    This parameter is available only in AICallKit SDK V2.2.0 or later.

    Example: If the duration is 100 seconds, with a capture interval of 2 seconds and 2 images per cycle, the system captures 2 evenly distributed frames every 2 seconds throughout the 100-second period starting from when the API is called and sends them to the LLM for analysis. The frame capture mode and userData remain unchanged during this process.

    Note

    In regular frame capture mode, user voice is not processed by default.

Sample code

Android

// Call startVisionCustomCapture after the call is initiated.
// Start custom frame capture in the onCallBegin callback.
// For frame capture parameters, see the description about ARTCAICallVisionCustomCaptureRequest.
 public void onCallBegin() {
    // Request parameters, including the text parameter, frame capture mode, frame capture interval, number of images to capture each time, frame capture duration, and custom business information.
    mARTCAICallEngine.startVisionCustomCapture(new ARTCAICallEngine.ARTCAICallVisionCustomCaptureRequest("XXX", false, 5, 2, 100, ""))
 }

// End frame capture if needed.
mARTCAICallEngine.stopVisionCustomCapture()

iOS

// Call startVisionCustomCapture after the call is initiated.
// Start custom frame capture in the onCallBegin callback.
// For frame capture parameters, see the description about ARTCAICallVisionCustomCaptureRequest.
public func onCallBegin() {
    // The call starts.
  let req = ARTCAICallVisionCustomCaptureRequest()
  req.isSingle = false
  req.text = "xxx"
  req.userData = "{}"
  req.duration = 100
  req.eachDuration = 5
  req.num = 2
  _ = self.engine.startVisionCustomCapture(req: req)
}

// End frame capture if needed.
_ = self.engine.stopVisionCustomCapture()

Web

// Call startVisionCustomCapture after the call is initiated.
// For frame capture parameters, see the description about AICallVisionCustomCaptureRequest.
engine.startVisionCustomCapture({
  isSingle: false,
  text: 'xxx',
  userData: '{}',
  duration: 100,
  eachDuration: 5,
  num: 2,
});

// End frame capture if needed.
engine.stopVisionCustomCapture();