本文介绍如何基于RTOS SDK(License模式)实现视觉相关能力,如视频对话、拍照问答。
1. 开发准备
1.1. 前置说明
本文依赖前置文档 “基于RTOS SDK (License模式) 实现聊天能力” ,请在阅读本文前先完成相关内容的学习与环境准备。
默认开发者已完成SDK接入,并可正常运行语音交互流程。
此最佳实践将会使用基于RTOS SDK (License模式) 实现聊天能力中的部分伪代码和交互流程。
1.2. 配置启用视觉模块
在百炼控制台创建多模态应用,创建完成后如下图所示
点击
配置应用
开启视频通话
或/和拍照问答
的Agent
2. 端侧视觉模块功能开发
视觉模块功能包括拍照问答(下文简称VQA)和视频通话(下文简称LIVE AI)两部分。未进入LIVE AI时,所有类似“我前面有什么”的问答都会触发VQA;一旦进入LIVE AI,VQA将不再被触发,所有图片问答逻辑由LIVE AI处理。
2.1. SDK目录及文件结构说明
获取的SDK包解压后,与本文相关的文件及目录结构如下
获取SDK包可以参考文档:License模式C SDK
aliyun_sdk/
├── include
│ ├── c_utils
│ │ └── ...
│ ├── lib_c_mmi_vl.h
│ └── ...
├── libcmd_vl.a
└── ...
启用视觉模块功能的时候需要加载lib_c_mmi_vl.h
头文件和libcmd_vl.a
静态库文件。
2.2. 硬件功能对接
在使用SDK实现视觉模块功能前,需完成SDK与摄像头硬件的对接,确保能将采集到的图像数据传递给SDK。
数据采集示例代码如下,其中相机相关接口(以dummy_开头)仅做示例:
#define DUMMY_CAMERA_PIC_WIDTH 320
#define DUMMY_CAMERA_PIC_HEIGHT 240
// 将摄像头采集数据输出给SDK
int dummy_camera_put(void)
{
uint8_t pic_data[DUMMY_CAMERA_PIC_WIDTH * DUMMY_CAMERA_PIC_HEIGHT];
uint32_t pic_size;
uint8_t base64_data[DUMMY_CAMERA_PIC_WIDTH * DUMMY_CAMERA_PIC_HEIGHT * 4 / 3 + 4];
uint32_t base64_size;
uint32_t input_size;
// 判断摄像头是否处于工作状态
if(dummy_camra_is_open()){
// 获取摄像头图像数据
pic_size = dummy_hw_camera_get(pic_data, sizeof(pic_data));
if (pic_size) {
// 将图片数据转换为base64格式
base64_size = dummy_jpeg2base64(base64_data, sizeof(base64_data), pic_data, pic_size);
// 将base64数据输出给SDK
// 注意此处数据大小是base64字串实际占用内存大小,含'\0'
input_size = c_mmi_put_vl_data_base64(base64_data, strlen((char*)base64_data) + 1, 1);
}
}
}
注意事项:上传的base64字符串中不能包含换行符或其他非base64字符,同时也不需要data:image/jpeg;base64
字符串解析头,只需要传入base64-string
即可。
2.3. VQA功能实现
2.3.1. VQA初始化
VQA初始化代码示例代码如下:
int dummy_vl_callback(uint32_t event, void* param)
{
(void)param;
switch (event) {
case C_MMI_VL_EVENT_VQA_START:
// 考虑功耗场景下可在触发此事件时开启摄像头并拍照
// 考虑性能场景下可事先启动摄像头,在触发此事件时直接执行拍照
UTIL_LOG_I("start vqa");
dummy_camera_put();
break;
case C_MMI_VL_EVENT_VQA_END:
// 根据实际使用场景,业务侧决定是否关闭摄像头
UTIL_LOG_I("end vqa");
break;
case C_MMI_VL_EVENT_TIMEOUT:
// 当VQA请求超时时触发此事件
UTIL_LOG_I("VL timeout");
break;
// ...
// 其他事件响应
// ...
default:
UTIL_LOG_E("unknown VL event");
}
return UTIL_SUCCESS;
}
int dummy_vl_init(void)
{
vl_config_t vl_config = {
// mode设置采用多重设置,即可以设置为 C_MMI_VL_MODE_VQA | C_MMI_VL_MODE_LIVE_AI,
// 设置为如上形式时可以同时使用VQA和LIVE AI,
// C_MMI_VL_MODE_NONE模式优先级最低仅为占位作用。
.vl_mode = C_MMI_VL_MODE_VQA,
// 图片格式设置项目前仅为占位符,可随意设置,目前支持的图片格式为头文件枚举项
.pic_format = C_MMI_VL_PIC_FORMAT_JPG,
// 必选项
.data_type = C_MMI_VL_DATA_BASE64,
// 图片分辨率设置项目前仅为占位符,可随意设置;
// 实际传入图像的宽度和高度均应大于10像素,宽高比不应超过200:1或1:200。
.frame_size = C_MMI_VL_FRAMESIZE_128x128,
// 由于现阶段百炼云端单张照片只支持180KiB及以下大小,因此建议开240KiB大小的缓冲区
// 来容纳编码后的base64字符串
.buffer_size = 240 * 1024,
// 超时时间设置为0表示无限等待
.timeout_ms = 0,
// 基于百炼网关推荐,目前建议fps设置为2
.fps = 2,
.event_callback = dummy_vl_callback
};
c_mmi_set_vl_params(&vl_config);
return UTIL_SUCCESS;
}
注:VQA需要在调用SDK进行网络通信之前进行初始化,推荐在SDK初始化(c_mmi_sdk_init)之后调用VQA初始化函数。
初始化日志如下:
...
[UT][I][c_dev_gen_get_token_req]plaintext [164][{"appId":"<YOUR APP ID>","deviceName":"<YOUR DEVICE NAME>","payMode":"LICENSE","requestTime":"1753327457730","sdkVersion":"0.3.2","tokenType":"MMI"}]
[UT][I][c_dev_gen_get_token_req]req_str [420][{"appId":"<YOUR APP ID>","deviceName":"<YOUR DEVICE NAME>","nonce":"<NONCE>","requestTime":"1753327457730","sdkVersion":"0.3.2","tokenType":"MMI","signature":"<SIGNTURE>"}]
[UT][I][c_mmi_analyze_get_token_rsp]rsp_str [589][{"nonce":"<NONCE>","responseTime":"1753327458081","appId":"<YOUR APP ID>","deviceName":"<YOUR DEVICE NAME>","requestIp":"YOUR IP","signature":"<SIGNATURE>"}]
[UT][I][c_mmi_analyze_get_token_rsp]nonce [<NONCE>]
[UT][I][dummy_mmi_event_callback]C_MMI_EVENT_DATA_INIT
[UT][I][dummy_wss_init]work
[UT][I][dummy_wss_connect]wss update
[UT][I][dummy_wss_connect]request[239][GET <wss_api> HTTP/1.1
Host: <WSS HOST>
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: FhVlQeR4S1N06+1/SU79XA==
Sec-WebSocket-Version: 13
<WSS HEADER>
]
[UT][I][dummy_wss_connect]Reading response...
[UT][I][dummy_wss_connect]response[MMI][187][HTTP/1.1 101 Switching Protocols
upgrade: websocket
connection: upgrade
sec-websocket-accept: sqchBdVDX8kKBgi90/PFl5+/4VI=
date: Thu, 24 Jul 2025 08:25:24 GMT
server: istio-envoy
]
[UT][I][dummy_wss_connect][MMI]done
// 以下为VL模块日志
[UT][D][c_mmi_set_vl_params]mode 2
[UT][D][c_mmi_set_vl_params]VL set params success
[UT][D][c_mmi_set_upstream_type]upstream_type[AudioAndVideo]
[UT][I][c_mmi_ringbuffer_init]rb_p[0x12f91bab8] rb_rb[0x12f91bb28] write_mode[1] list_flag[1]
[UT][D][_vl_set_data_ringbuffer]ringbuffer_init size : [204800]
[UT][I][c_mmi_cmd_register]_command_list count [1][visual_qa]
[UT][I][_vl_init]c_mmi_vl_init
//
[UT][I][dummy_wss_thread_start][dummy_wss_task_send] send task [0x135e27180]
[UT][I][dummy_wss_thread_start][dummy_wss_task_recv] recv task [0x135e271a0]
[UT][I][_gen_cmd_start]task_id [<TASK ID>]
...
2.3.2. VQA交互
完成初始化后,SDK会根据请求自动进行状态流转,交互日志示例如下:
[UT][I][_on_payload_event_state_change]recv [Listening] [0-4]
[UT][D][dummy_button1_down]
[UT][I][dummy_player_stop]
[UT][I][c_mmi_speech_start]done [1-4] [4]
[UT][I][_send_cmd_req2spk]ready to send [1-4]
[UT][I][_send_cmd_speech]send [SendSpeech] [1-5] [0]
[UT][I][dummy_recorder_start]
[UT][I][_on_payload_event_speech_start]recv [SpeechStarted][ASR Start] [1-5]
[UT][I][dummy_mmi_event_callback]event [C_MMI_EVENT_ASR_START]
[UT][D][dummy_button1_up]
[UT][I][dummy_recorder_stop]
[UT][I][c_mmi_speech_end]done [1-5] [20]
[UT][I][_send_cmd_stop_speech]send [StopSpeech] [1-5] [0]
[UT][I][_on_payload_event_speech_content]recv [SpeechContent][ASR Text] [1-5]
[UT][D][dummy_mmi_event_callback]ASR C [我前面有什么东西?]
[UT][I][_on_payload_event_speech_end]recv [SpeechEnded][ASR End] [1-6]
[UT][I][dummy_recorder_stop]
[UT][I][_on_payload_event_state_change]recv [Thinking] [1-7]
[UT][I][_on_payload_event_respond_content]recv [RespondingContent][LLM Text] [1-7]
[UT][D][_mmi_event_callback]LLM C []
[UT][I][_on_payload_event_respond_content]disable speech when get command
[UT][I][_on_payload_event_respond_content]commands [[{"name":"visual_qa","params":[{"name":"shot","value":"我前面","normValue":"True"}]}]]
[UT][I][dummy_vl_callback]start vqa
[UT][I][dummy_camera_open]dummy_camera_open
[UT][I][_on_payload_event_state_change]recv [Listening] [0-4]
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 1
[UT][D][c_mmi_set_param_req2rsp]req2rsp [159718][{"input":{"directive":"RequestToRespond","dialog_id":"<DIALOG ID>","type":"prompt","text":""},"parameters":{"images":[{"type":"base64","value":"/9j/4AAQSkZJRgABAQAASABIAAD/4QB"]
[UT][I][dummy_vl_callback]end vqa
[UT][I][dummy_camera_close]dummy_camera_close
[UT][D][dummy_mmi_event_callback]disable record when ASR complete
[UT][I][c_recorder_stop]
[UT][I][_send_cmd_req2rsp]send [RequestToRespond] [1-7] [0]
[UT][I][_on_payload_event_state_change]recv [Responding][Audio Start] [1-8]
[UT][I][dummy_mmi_event_callback]enable player when dialog start
[UT][I][dummy_player_start]
[UT][I][_on_payload_event_respond_start]recv [RespondingStarted][Audio Start] [1-8]
[UT][I][_on_payload_event_respond_content]recv [RespondingContent][LLM Text] [1-8]
[UT][D][dummy_mmi_event_callback]LLM C [你前面有电脑和办公桌,还有其他在工作的人。]
[UT][I][_on_payload_event_respond_end]recv [RespondingEnded][Audio End] [1-9]
2.4. LIVE AI功能实现
2.4.1. LIVE AI初始化
LIVE AI初始化参数与VQA大部分相同,仅模式设置有所不同;LIVE AI初始化代码示例如下:
int dummy_vl_callback(uint32_t event, void* param)
{
(void)param;
switch (event) {
// ...
// 其他事件响应
// ...
case C_MMI_VL_EVENT_TIMEOUT:
// 当LIVE AI请求超时时,也会触发此事件
UTIL_LOG_I("VL timeout");
break;
case C_MMI_VL_EVENT_LIVEAI_START:
// 当LIVE AI启动时触发此事件
UTIL_LOG_I("live ai start");
break;
case C_MMI_VL_EVENT_LIVEAI_ACTION:
// 当LIVE AI需要取图像数据时触发此事件
UTIL_LOG_I("liveai action");
dummy_camera_put();
break;
case C_MMI_VL_EVENT_LIVEAI_STOP:
// 当推出LIVE AI时触发此事件
UTIL_LOG_I("live ai stop");
break;
default:
UTIL_LOG_E("unknown VL event");
}
return UTIL_SUCCESS;
}
int dummy_vl_init(void)
{
vl_config_t vl_config = {
// mode设置采用多重设置,即可以设置为 C_MMI_VL_MODE_VQA | C_MMI_VL_MODE_LIVE_AI,
// 设置为如上形式时可以同时使用VQA和LIVE AI,
// C_MMI_VL_MODE_NONE模式优先级最低仅为占位作用。
.vl_mode = C_MMI_VL_MODE_VQA | C_MMI_VL_MODE_LIVE_AI,
// 图片格式设置项目前仅为占位符,可随意设置,目前支持的图片格式为头文件枚举项
.pic_format = C_MMI_VL_PIC_FORMAT_JPG,
// 必选项
.data_type = C_MMI_VL_DATA_BASE64,
// 图片分辨率设置项目前仅为占位符,可随意设置;
// 实际传入图像的宽度和高度均应大于10像素,宽高比不应超过200:1或1:200。
.frame_size = C_MMI_VL_FRAMESIZE_128x128,
// 由于现阶段百炼云端单张照片只支持180KiB及以下大小,因此建议开240KiB大小的缓冲区
// 来容纳编码后的base64字符串
.buffer_size = 240 * 1024,
// 超时时间设置为0表示无限等待
.timeout_ms = 0,
// 基于百炼网关推荐,目前建议fps设置为2
.fps = 2,
.event_callback = dummy_vl_callback
};
c_mmi_set_vl_params(&vl_config);
return UTIL_SUCCESS;
}
注:LIVE AI需要在调用SDK进行网络通信之前进行初始化,推荐在SDK初始化(c_mmi_sdk_init)之后调用LIVE AI初始化函数。
2.4.2. LIVE AI交互
完成初始化后,SDK会根据请求自动进行状态流转,交互日志示例如下:
[UT][I][_on_payload_event_state_change]recv [Listening] [0-4]
[UT][D][dummy_button1_down]
[UT][I][dummy_player_stop]
[UT][I][c_mmi_speech_start]done [1-4] [4]
[UT][I][_send_cmd_req2spk]ready to send [1-4]
[UT][I][_send_cmd_speech]send [SendSpeech] [1-5] [0]
[UT][I][dummy_recorder_start]done
[UT][I][_on_payload_event_speech_start]recv [SpeechStarted][ASR Start] [1-5]
[UT][I][dummy_mmi_event_callback]event [C_MMI_EVENT_ASR_START]
[UT][D][dummy_button1_up]
[UT][I][dummy_recorder_stop]done
[UT][I][c_mmi_speech_end]done [1-5] [20]
[UT][I][_send_cmd_stop_speech]send [StopSpeech] [1-5] [0]
[UT][I][_on_payload_event_speech_content]recv [SpeechContent][ASR Text] [1-5]
[UT][D][dummy_mmi_event_callback]ASR C [开始视频对话。]
[UT][I][_on_payload_event_speech_end]recv [SpeechEnded][ASR End] [1-6]
[UT][I][dummy_recorder_stop]already
[UT][I][_on_payload_event_state_change]recv [Thinking] [1-7]
[UT][D][_on_payload_event_state_change]prepare player rb
[UT][I][_on_payload_event_respond_content]recv [RespondingContent][LLM Text] [1-7]
[UT][D][dummy_mmi_event_callback]LLM C []
[UT][I][_on_payload_event_respond_content]disable speech when get command
[UT][I][_on_payload_event_respond_content]commands [[{"name":"open_videochat","params":[]}]]
[UT][D][c_mmi_vl_work]mmi_vl_liveai_start : success
[UT][I][_on_payload_event_state_change]recv [Listening] [0-4]
[UT][D][dummy_mmi_event_callback]disable record when ASR complete
[UT][I][dummy_recorder_stop]already
[UT][I][_send_cmd_req2rsp]send [RequestToRespond] [1-7] [0]
[UT][I][_on_payload_event_respond_content]recv [RespondingContent][LLM Text] [1-7]
[UT][D][dummy_mmi_event_callback]LLM C [嗨,我来陪你看世界]
[UT][I][_on_payload_event_respond_content]commands [[{"name":"switch_video_call_success"}]]
[UT][I][dummy_camera_open]dummy_camera_open
[UT][I][dummy_vl_callback]live ai start
[UT][I][_on_payload_event_state_change]recv [Responding][Audio Start] [1-8]
[UT][I][dummy_mmi_event_callback]enable player when dialog start
[UT][I][dummy_player_start]
[UT][I][_on_payload_event_respond_start]recv [RespondingStarted][Audio Start] [1-8]
[UT][I][dummy_vl_callback]start liveai
[UT][D][_vl_wdg_start]c_mmi_vl_watch_dog_start
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 3
[UT][D][_vl_put_data]put data size: [186785], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [186785], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_on_payload_event_respond_end]recv [RespondingEnded][Audio End] [1-9]
[UT][D][_on_payload_event_respond_end]recv audio data size [56320]
[UT][I][_send_cmd_update_info]send [Update Info] [1-9] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][D][c_mmi_get_player_data]disable speech_work
[UT][I][AudioQueueCallback]disable player when no data
[UT][I][dummy_player_stop]done
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 4
[UT][D][_vl_put_data]put data size: [185001], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [185001], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [0-10] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][I][_mem_debug_task_entry]mem used 807060/813544
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 5
[UT][D][_vl_put_data]put data size: [185349], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [185349], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [0-10] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 6
[UT][D][_vl_put_data]put data size: [183609], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [183609], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [0-10] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 7
[UT][D][_vl_put_data]put data size: [179813], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [179813], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [0-10] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 8
[UT][D][_vl_put_data]put data size: [179765], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [179765], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [0-10] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 9
[UT][D][_vl_put_data]put data size: [177753], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [177753], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [0-10] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_vl_callback]start liveai
[UT][I][_mem_debug_task_entry]mem used 807060/813544
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 10
[UT][D][_vl_put_data]put data size: [180733], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [180733], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [0-10] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 12
[UT][D][_vl_put_data]put data size: [179589], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [179589], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [0-10] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][D][dummy_button1_down]count 0
[UT][I][dummy_player_stop]already
[UT][I][c_mmi_speech_start]done [1-10] [4]
[UT][I][_send_cmd_req2spk]send [RequestToSpeak] [1-11] [0]
[UT][I][_on_payload_event_request_accept]recv [RequestAccepted] [1-3]
[UT][I][_on_payload_event_state_change]recv [Listening] [1-4]
[UT][I][_send_cmd_speech]send [SendSpeech] [1-5] [0]
[UT][D][dummy_mmi_event_callback]enable recorder when send speech
[UT][I][dummy_recorder_start]done
[UT][I][_on_payload_event_speech_start]recv [SpeechStarted][ASR Start] [1-5]
[UT][I][dummy_mmi_event_callback]event [C_MMI_EVENT_ASR_START]
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 13
[UT][D][_vl_put_data]put data size: [179073], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [179073], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [1-5] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 14
[UT][D][_vl_put_data]put data size: [179049], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [179049], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [1-5] [0]
[UT][D][dummy_button1_up]
[UT][I][dummy_recorder_stop]done
[UT][I][c_mmi_speech_end]done [1-5] [20]
[UT][I][_send_cmd_stop_speech]send [StopSpeech] [1-5] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][I][_on_payload_event_speech_content]recv [SpeechContent][ASR Text] [1-5]
[UT][D][dummy_mmi_event_callback]ASR C [我手上拿着什么东西啊?]
[UT][I][_on_payload_event_speech_end]recv [SpeechEnded][ASR End] [1-6]
[UT][D][dummy_mmi_event_callback]disable record when ASR complete
[UT][I][dummy_recorder_stop]already
[UT][I][_on_payload_event_state_change]recv [Thinking] [1-7]
[UT][D][_on_payload_event_state_change]prepare player rb
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 15
[UT][D][_vl_put_data]put data size: [180329], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [180329], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [1-7] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][I][_heartbeat_sync]send heartbeat [1-7] [40]
[UT][I][_send_cmd_heartbeat]send [Heartbeat] [1-7] [0]
[UT][I][_on_payload_event_heartbeat]recv [HeartBeat] [1-7]
[UT][D][dummy_mmi_event_callback]heartbeat
[UT][I][dummy_vl_callback]start liveai
[UT][I][_mem_debug_task_entry]mem used 807060/813544
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 16
[UT][D][_vl_put_data]put data size: [180865], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [180865], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [1-7] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][I][_on_payload_event_respond_content]recv [RespondingContent][LLM Text] [1-7]
[UT][D][dummy_mmi_event_callback]LLM C [你手上拿着一部手机呢。]
[UT][I][_on_payload_event_state_change]recv [Responding][Audio Start] [1-8]
[UT][I][dummy_mmi_event_callback]enable player when dialog start
[UT][I][dummy_player_start]
[UT][I][_on_payload_event_respond_start]recv [RespondingStarted][Audio Start] [1-8]
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 17
[UT][D][_vl_put_data]put data size: [180569], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [180569], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [1-8] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 18
[UT][D][_vl_put_data]put data size: [180765], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [180765], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [1-8] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][I][_on_payload_event_respond_end]recv [RespondingEnded][Audio End] [1-9]
[UT][D][_on_payload_event_respond_end]recv audio data size [57600]
[UT][D][c_mmi_get_player_data]disable speech_work
[UT][I][AudioQueueCallback]disable player when no data
[UT][I][dummy_player_stop]done
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 19
[UT][D][_vl_put_data]put data size: [180605], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [180605], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [0-10] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 20
[UT][D][_vl_put_data]put data size: [180705], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [180705], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [0-10] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_vl_callback]start liveai
[UT][I][_mem_debug_task_entry]mem used 807060/813544
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 21
[UT][D][_vl_put_data]put data size: [180001], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [180001], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [0-10] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][D][dummy_button1_down]count 0
[UT][I][dummy_player_stop]already
[UT][I][c_mmi_speech_start]done [1-10] [4]
[UT][I][_send_cmd_req2spk]send [RequestToSpeak] [1-11] [0]
[UT][I][_on_payload_event_request_accept]recv [RequestAccepted] [1-3]
[UT][I][_on_payload_event_state_change]recv [Listening] [1-4]
[UT][I][_send_cmd_speech]send [SendSpeech] [1-5] [0]
[UT][D][dummy_mmi_event_callback]enable recorder when send speech
[UT][I][dummy_recorder_start]done
[UT][I][_on_payload_event_speech_start]recv [SpeechStarted][ASR Start] [1-5]
[UT][I][dummy_mmi_event_callback]event [C_MMI_EVENT_ASR_START]
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 23
[UT][D][_vl_put_data]put data size: [180433], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [180433], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [1-5] [0]
[UT][I][dummy_vl_callback]start liveai
[UT][D][dummy_button1_up]
[UT][I][dummy_recorder_stop]done
[UT][I][c_mmi_speech_end]done [1-5] [20]
[UT][I][_send_cmd_stop_speech]send [StopSpeech] [1-5] [0]
[UT][I][_on_payload_event_speech_content]recv [SpeechContent][ASR Text] [1-5]
[UT][D][dummy_mmi_event_callback]ASR C [退出视频对话。]
[UT][I][_on_payload_event_speech_end]recv [SpeechEnded][ASR End] [1-6]
[UT][D][dummy_mmi_event_callback]disable record when ASR complete
[UT][I][dummy_recorder_stop]already
[UT][I][_on_payload_event_state_change]recv [Thinking] [1-7]
[UT][D][_on_payload_event_state_change]prepare player rb
[UT][I][dummy_vl_callback]start liveai
[UT][I][dummy_camera_task]take photo success
[UT][D][dummy_camera_task]remain pic count: 24
[UT][D][_vl_put_data]put data size: [181725], put data finish_flag: [1], put data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][D][_vl_get_send_frame_params]get frame len: [181725], frame data : [/9j/4AAQSkZJRgABAQAASABIAAD/4QBM]
[UT][I][_send_cmd_update_info]send [Update Info] [1-7] [0]
[UT][I][_on_payload_event_respond_content]recv [RespondingContent][LLM Text] [1-7]
[UT][D][dummy_mmi_event_callback]LLM C []
[UT][I][_on_payload_event_respond_content]commands [[{"name":"quit_videochat","params":[]}]]
[UT][I][_on_payload_event_state_change]recv [Listening] [0-4]
[UT][D][dummy_mmi_event_callback]disable record when ASR complete
[UT][I][dummy_recorder_stop]already
[UT][I][_send_cmd_req2rsp]send [RequestToRespond] [1-7] [0]
[UT][I][_on_payload_event_respond_content]recv [RespondingContent][LLM Text] [1-7]
[UT][D][dummy_mmi_event_callback]LLM C [那我闭眼啦,拜拜]
[UT][I][_on_payload_event_respond_content]commands [[{"name":"exit_video_call_success"}]]
[UT][I][dummy_camera_close]c_camera_close
[UT][I][dummy_vl_callback]live ai stop
[UT][I][_on_payload_event_state_change]recv [Responding][Audio Start] [1-8]
[UT][I][dummy_mmi_event_callback]enable player when dialog start
[UT][I][dummy_player_start]
[UT][I][_on_payload_event_respond_start]recv [RespondingStarted][Audio Start] [1-8]
[UT][I][_on_payload_event_respond_end]recv [RespondingEnded][Audio End] [1-9]
[UT][D][_on_payload_event_respond_end]recv audio data size [53760]