Real-time captions

更新时间:
复制 MD 格式

Use the AICallKit SDK to retrieve real-time captions for user speech and agent responses.

Usage notes

Real-time caption preview

Real-time user captions

User input recognized by the agent is displayed in real time on the UI.

lQDPKeLM6OXzFGHNCMjNBDiwmvXWuI7zQ84HVy-x3Np4AA_1080_2248

Real-time agent captions

Content generated by the large language model (LLM) is displayed in real time on the UI.

lQDPJyB9LXbMhGHNCMjNBDiw5kh8C3S6jikHVy-y6E6YAA_1080_2248

Note

This feature and the user interface (UI) are built into the UI-integrated solution. For more information about the UI-integrated solution, see the UI-integrated solution.

Implement the real-time caption feature

After a call starts, user speech captions are returned through the onUserSubtitleNotify callback, and agent responses are returned through the onVoiceAgentSubtitleNotify callback.

onUserSubtitleNotify details

While a user is speaking, the agent returns multiple notifications with recognition results. For real-time captions, render the text directly on the UI. For non-real-time conversational chat, render the final text on the UI only when isSentenceEnd=true.

Status values

Description

text

The speech text recognized by the agent. Each notification contains the complete result for the current sentence.

isSentenceEnd

Whether the current text is the final result for the sentence. If true, the sentence has ended and the agent will respond.

sentenceId

The ID of the sentence that the current text belongs to. Increments with each question.

voiceprintResult

The Voice Activity Detection (VAD) feedback result. If voiceprint denoising (EnableVoicePrint) or AI VAD (VadLevel is greater than 0) is enabled, the possible values are:

0: VAD for voiceprint denoising and AI VAD are disabled.

1: VAD for voiceprint denoising is enabled, but voiceprint registration is not complete.

2: VAD for voiceprint denoising is enabled, and the main speaker is detected.

3: VAD for voiceprint denoising is enabled, but the main speaker is not detected. In this state, the agent does not respond.

4: AI VAD is enabled, and the main speaker is detected.

5: AI VAD is enabled, but the main speaker is not detected. In this state, the agent does not respond.

Note

Voiceprint denoising is not supported on the web client.

For example, if a user says "How is the weather today?", the following callbacks might be received:

text="today" isSentenceEnd=false sentenceId=1

text="Today's weather" isSentenceEnd=false sentenceId=1

text="How is the weather today?" isSentenceEnd=true sentenceId=1

onVoiceAgentSubtitleNotify details

When the agent responds, the response is split into multiple notifications. The client must merge the text from these notifications and render the merged text character by character on the UI.

Status

Description

text

The agent response text.

isSentenceEnd

Whether the current text is the last part of the response. If true, the response is complete and the agent switches to listening.

userAsrSentenceId

The sentence ID of the user's question.

For example, in response to the user's previous question, the agent might reply, "The weather today is sunny and bright, with a moderate temperature, perfect for outdoor activities." The following callbacks might be received:

text="The weather today is sunny and bright, " isSentenceEnd=false userAsrSentenceId=1

text="with a moderate temperature, perfect for outdoor activities." isSentenceEnd=false userAsrSentenceId=1

text="" isSentenceEnd=true userAsrSentenceId=1

Sample code

Android

// Add a callback to the engine.
mARTCAICallEngine.setEngineCallback(mCallEngineCallbackWrapper); 

// Callback processing (only operations relevant to the example are shown).
ARTCAICallEngine.IARTCAICallEngineCallback mCallEngineCallbackWrapper = new ARTCAICallEngine.IARTCAICallEngineCallback() {
    @Override
    public void onUserAsrSubtitleNotify(String text, boolean isSentenceEnd, int sentenceId, VoicePrintStatusCode voicePrintStatusCode) {
        // Sync user speech recognized by ASR.
    }

    @Override
    public void onAIAgentSubtitleNotify(String text, boolean end, int userAsrSentenceId) {
        // Sync the agent's response.
    }
}

iOS

// Set the delegate for the engine.
self.engine.delegate = self

func onUserSubtitleNotify(text: String, isSentenceEnd: Bool, sentenceId: Int, voiceprintResult: ARTCAICallVoiceprintResult) {
    // Notification for the result of the agent recognizing the user's question.
}

func onVoiceAgentSubtitleNotify(text: String, isSentenceEnd: Bool, userAsrSentenceId: Int) {
    // Notification for the agent's response result.
}

Web

// Add callbacks to the engine.
engine.on('userSubtitleNotify', (subtitle, voiceprintResult) => {
  // Notification for the result of the agent recognizing the user's question.
  console.log('AICallUserSubtitleNotify', subtitle.text, subtitle.end, subtitle.sentenceId, voiceprintResult);
});
engine.on('agentSubtitleNotify', (subtitle) => {
  // Notification for the agent's response result.
  console.log('AICallAgentSubtitleNotify', subtitle.text, subtitle.end, subtitle.sentenceId);
});

Harmony

const listener = new ARTCAICallEngineListener();

listener.onUserSubtitleNotifyCallback = (data: ARTCAICallUserSubtitleResult) => {
  // Notification for the result of the agent recognizing the user's question.
};
listener.onVoiceAgentSubtitleNotifyCallback = (data: ARTCAICallAgentSubtitleResult) => {
  // Notification for the agent's response result.
};

...

this.aiCallSDK.listener = listener;