在直播、在线会议、语音聊天或智能助手等场景中,需要将连续的音频流实时转化为文字,以提供即时字幕、生成会议记录或响应语音指令。通义千问实时语音识别服务通过 WebSocket 协议接收音频流并实时转写。
支持的模型
中国大陆(北京)
| 模型名称 | 版本 | 支持的语言 | 支持的采样率 | 单价 | 免费额度(注) | 
| qwen3-asr-flash-realtime 当前等同qwen3-asr-flash-realtime-2025-10-27 | 稳定版 | 中文(普通话、四川话、闽南语、吴语、粤语)、英语、日语、德语、韩语、俄语、法语、葡萄牙语、阿拉伯语、意大利语、西班牙语 | 8kHz、16kHz | 0.00033元/秒 | 36,000秒(10小时) 有效期:百炼开通后90天内 | 
| qwen3-asr-flash-realtime-2025-10-27 | 快照版 | 
国际(新加坡)
| 模型名称 | 版本 | 支持的语言 | 支持的采样率 | 单价 | 
| qwen3-asr-flash-realtime 当前等同qwen3-asr-flash-realtime-2025-10-27 | 稳定版 | 中文(普通话、四川话、闽南语、吴语、粤语)、英语、日语、德语、韩语、俄语、法语、葡萄牙语、阿拉伯语、意大利语、西班牙语 | 8kHz、16kHz | 0.00066元/秒 | 
| qwen3-asr-flash-realtime-2025-10-27 | 快照版 | 
功能特性
| 功能特性 | 描述 | 
| 接入方式 | Java/Python SDK,WebSocket API | 
| 多语言 | 中文(普通话、四川话、闽南语、吴语、粤语)、英文、日语、德语、韩语、俄语、法语、葡萄牙语、阿拉伯语、意大利语、西班牙语 | 
| 语种识别 | ✅ | 
| 指定待识别语种 | ✅ 若已知音频的语种,可通过请求参数 | 
| 标点符号预测 | ✅ | 
| 流式输入/输出 | ✅ | 
| VAD(Voice Activity Detection,语音活动检测) | ✅ | 
| 说话人分离 | ❌ | 
| 音频输入方式 | 二进制音频流 | 
| 待识别音频格式 | pcm、opus | 
| 待识别音频声道 | 单声道 | 
| 待识别音频采样率 | 8000Hz、16000Hz | 
快速开始
使用DashScope SDK
Java
- 安装SDK,确保DashScope SDK版本不低于2.21.14。 
- 获取API Key,推荐使用环境变量配置 API Key,以避免在代码中硬编码。 
- 运行示例代码。 - 更多示例代码请参见Github。 - import com.alibaba.dashscope.audio.omni.*; import com.alibaba.dashscope.exception.NoApiKeyException; import com.google.gson.JsonObject; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import javax.sound.sampled.LineUnavailableException; import java.io.File; import java.io.FileInputStream; import java.util.Base64; import java.util.Collections; import java.util.concurrent.atomic.AtomicReference; public class Qwen3AsrRealtimeUsage { private static final Logger log = LoggerFactory.getLogger(Qwen3AsrRealtimeUsage.class); private static final int AUDIO_CHUNK_SIZE = 1024; // Audio chunk size in bytes private static final int SLEEP_INTERVAL_MS = 30; // Sleep interval in milliseconds public static void main(String[] args) throws InterruptedException, LineUnavailableException { OmniRealtimeParam param = OmniRealtimeParam.builder() .model("qwen3-asr-flash-realtime") // 以下为北京地域url,若使用新加坡地域的模型,需将url替换为:wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime .url("wss://dashscope.aliyuncs.com/api-ws/v1/realtime") // 新加坡和北京地域的API Key不同。获取API Key:https://help.aliyun.com/zh/model-studio/get-api-key // 若没有配置环境变量,请用百炼API Key将下行替换为:.apikey("sk-xxx") .apikey(System.getenv("DASHSCOPE_API_KEY")) .build(); OmniRealtimeConversation conversation = null; final AtomicReference<OmniRealtimeConversation> conversationRef = new AtomicReference<>(null); conversation = new OmniRealtimeConversation(param, new OmniRealtimeCallback() { @Override public void onOpen() { System.out.println("connection opened"); } @Override public void onEvent(JsonObject message) { String type = message.get("type").getAsString(); switch(type) { case "session.created": System.out.println("start session: " + message.get("session").getAsJsonObject().get("id").getAsString()); break; case "conversation.item.input_audio_transcription.completed": System.out.println("transcription: " + message.get("transcript").getAsString()); break; case "response.audio_transcript.delta": System.out.println("got llm response delta: " + message.get("delta").getAsString()); break; case "input_audio_buffer.speech_started": System.out.println("======VAD Speech Start======"); break; case "input_audio_buffer.speech_stopped": System.out.println("======VAD Speech Stop======"); break; case "response.done": System.out.println("======RESPONSE DONE======"); if (conversationRef.get() != null) { System.out.println("[Metric] response: " + conversationRef.get().getResponseId() + ", first text delay: " + conversationRef.get().getFirstTextDelay() + " ms, first audio delay: " + conversationRef.get().getFirstAudioDelay() + " ms"); } break; default: break; } } @Override public void onClose(int code, String reason) { System.out.println("connection closed code: " + code + ", reason: " + reason); } }); conversationRef.set(conversation); try { conversation.connect(); } catch (NoApiKeyException e) { throw new RuntimeException(e); } OmniRealtimeTranscriptionParam transcriptionParam = new OmniRealtimeTranscriptionParam(); transcriptionParam.setLanguage("zh"); transcriptionParam.setInputAudioFormat("pcm"); transcriptionParam.setInputSampleRate(16000); // 语料,可选,如果有语料,建议设置以增强识别效果 // transcriptionParam.setCorpusText("这是一段脱口秀表演"); OmniRealtimeConfig config = OmniRealtimeConfig.builder() .modalities(Collections.singletonList(OmniRealtimeModality.TEXT)) .transcriptionConfig(transcriptionParam) .build(); conversation.updateSession(config); String filePath = "your_audio_file.pcm"; File audioFile = new File(filePath); if (!audioFile.exists()) { log.error("Audio file not found: {}", filePath); return; } try (FileInputStream audioInputStream = new FileInputStream(audioFile)) { byte[] audioBuffer = new byte[AUDIO_CHUNK_SIZE]; int bytesRead; int totalBytesRead = 0; log.info("Starting to send audio data from: {}", filePath); // Read and send audio data in chunks while ((bytesRead = audioInputStream.read(audioBuffer)) != -1) { totalBytesRead += bytesRead; String audioB64 = Base64.getEncoder().encodeToString(audioBuffer); // Send audio chunk to conversation conversation.appendAudio(audioB64); // Add small delay to simulate real-time audio streaming Thread.sleep(SLEEP_INTERVAL_MS); } log.info("Finished sending audio data. Total bytes sent: {}", totalBytesRead); } catch (Exception e) { log.error("Error sending audio from file: {}", filePath, e); } // enableTurnDetection为false时,应将下行代码注释取消 // conversation.commit(); conversation.createResponse(null, null); conversation.close(1000, "bye"); System.exit(0); } }
Python
- 安装SDK,确保DashScope SDK版本不低于1.24.8。 
- 获取API Key,推荐使用环境变量配置 API Key,以避免在代码中硬编码。 
- 运行示例代码。 - 更多示例代码请参见Github。 - import logging import os import base64 import signal import sys import time import dashscope from dashscope.audio.qwen_omni import * from dashscope.audio.qwen_omni.omni_realtime import TranscriptionParams def setup_logging(): """配置日志输出""" logger = logging.getLogger('dashscope') logger.setLevel(logging.DEBUG) handler = logging.StreamHandler(sys.stdout) handler.setLevel(logging.DEBUG) formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') handler.setFormatter(formatter) logger.addHandler(handler) logger.propagate = False return logger def init_api_key(): """初始化 API Key""" # 新加坡和北京地域的API Key不同。获取API Key:https://help.aliyun.com/zh/model-studio/get-api-key # 若没有配置环境变量,请用百炼API Key将下行替换为:dashscope.api_key = "sk-xxx" dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY', 'YOUR_API_KEY') if dashscope.api_key == 'YOUR_API_KEY': print('[Warning] Using placeholder API key, set DASHSCOPE_API_KEY environment variable.') class MyCallback(OmniRealtimeCallback): """实时识别回调处理""" def __init__(self, conversation): self.conversation = conversation self.handlers = { 'session.created': self._handle_session_created, 'conversation.item.input_audio_transcription.completed': self._handle_final_text, 'conversation.item.input_audio_transcription.text': self._handle_stash_text, 'input_audio_buffer.speech_started': lambda r: print('======Speech Start======'), 'input_audio_buffer.speech_stopped': lambda r: print('======Speech Stop======'), 'response.done': self._handle_response_done } def on_open(self): print('Connection opened') def on_close(self, code, msg): print(f'Connection closed, code: {code}, msg: {msg}') def on_event(self, response): try: handler = self.handlers.get(response['type']) if handler: handler(response) except Exception as e: print(f'[Error] {e}') def _handle_session_created(self, response): print(f"Start session: {response['session']['id']}") def _handle_final_text(self, response): print(f"Final recognized text: {response['transcript']}") def _handle_stash_text(self, response): print(f"Got stash result: {response['stash']}") def _handle_response_done(self, response): print('======RESPONSE DONE======') print(f"[Metric] response: {self.conversation.get_last_response_id()}, " f"first text delay: {self.conversation.get_last_first_text_delay()}, " f"first audio delay: {self.conversation.get_last_first_audio_delay()}") def read_audio_chunks(file_path, chunk_size=3200): """按块读取音频文件""" with open(file_path, 'rb') as f: while chunk := f.read(chunk_size): yield chunk def send_audio(conversation, file_path, delay=0.1): """发送音频数据""" if not os.path.exists(file_path): raise FileNotFoundError(f"Audio file {file_path} does not exist.") print("Processing audio file... Press 'Ctrl+C' to stop.") for chunk in read_audio_chunks(file_path): audio_b64 = base64.b64encode(chunk).decode('ascii') conversation.append_audio(audio_b64) time.sleep(delay) # enable_turn_detection为False时,应将下行代码注释取消 # conversation.commit() # print("Audio commit sent.") def main(): setup_logging() init_api_key() audio_file_path = "./your_audio_file.pcm" conversation = OmniRealtimeConversation( model='qwen3-asr-flash-realtime', # 以下为北京地域url,若使用新加坡地域的模型,需将url替换为:wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime url='wss://dashscope.aliyuncs.com/api-ws/v1/realtime', callback=MyCallback(conversation=None) # 暂时传None,稍后注入 ) # 注入自身到回调 conversation.callback.conversation = conversation def handle_exit(sig, frame): print('Ctrl+C pressed, exiting...') conversation.close() sys.exit(0) signal.signal(signal.SIGINT, handle_exit) conversation.connect() transcription_params = TranscriptionParams( language='zh', sample_rate=16000, input_audio_format="pcm" # 输入音频的语料,用于辅助识别 # corpus_text="" ) conversation.update_session( output_modalities=[MultiModality.TEXT], enable_input_audio_transcription=True, transcription_params=transcription_params ) try: send_audio(conversation, audio_file_path) time.sleep(3) # 等待响应 except Exception as e: print(f"Error occurred: {e}") finally: conversation.close() print("Audio processing completed.") if __name__ == '__main__': main()
使用WebSocket API
以下示例演示如何通过 WebSocket 连接发送本地音频文件并获取识别结果。
- 获取API Key:获取API Key,安全起见,推荐将API Key配置到环境变量。 
- 编写并运行代码:通过代码实现认证、连接、发送音频和接收结果的完整流程(详情请参见交互流程)。 - Python- 在运行示例前,请确保已使用以下命令安装依赖: - pip uninstall websocket-client pip uninstall websocket pip install websocket-client- 请不要将示例代码文件命名为 - websocket.py,否则可能触发如下错误:AttributeError: module 'websocket' has no attribute 'WebSocketApp'. Did you mean: 'WebSocket'?- # pip install websocket-client import os import time import json import threading import base64 import websocket import logging import logging.handlers from datetime import datetime logger = logging.getLogger(__name__) logger.setLevel(logging.DEBUG) # 新加坡和北京地域的API Key不同。获取API Key:https://help.aliyun.com/zh/model-studio/get-api-key # 若没有配置环境变量,请用百炼API Key将下行替换为:API_KEY="sk-xxx" API_KEY = os.environ.get("DASHSCOPE_API_KEY", "sk-xxx") QWEN_MODEL = "qwen3-asr-flash-realtime" # 以下是北京地域baseUrl,如果使用新加坡地域的模型,需要将baseUrl替换为:wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime baseUrl = "wss://dashscope.aliyuncs.com/api-ws/v1/realtime" url = f"{baseUrl}?model={QWEN_MODEL}" print(f"Connecting to server: {url}") # 注意: 如果是非vad模式,建议持续发送的音频时长累加不超过60s enableServerVad = True is_running = True # 增加运行标志位 headers = [ "Authorization: Bearer " + API_KEY, "OpenAI-Beta: realtime=v1" ] def init_logger(): formatter = logging.Formatter('%(asctime)s|%(levelname)s|%(message)s') f_handler = logging.handlers.RotatingFileHandler( "omni_tester.log", maxBytes=100 * 1024 * 1024, backupCount=3 ) f_handler.setLevel(logging.DEBUG) f_handler.setFormatter(formatter) console = logging.StreamHandler() console.setLevel(logging.DEBUG) console.setFormatter(formatter) logger.addHandler(f_handler) logger.addHandler(console) def on_open(ws): logger.info("Connected to server.") # 会话更新事件 event_manual = { "event_id": "event_123", "type": "session.update", "session": { "modalities": ["text"], "input_audio_format": "pcm", "sample_rate": 16000, "input_audio_transcription": { # 语种标识,可选,如果有明确的语种信息,建议设置 "language": "zh" # 语料,可选,如果有语料,建议设置以增强识别效果 # "corpus": { # "text": "" # } }, "turn_detection": None } } event_vad = { "event_id": "event_123", "type": "session.update", "session": { "modalities": ["text"], "input_audio_format": "pcm", "sample_rate": 16000, "input_audio_transcription": { "language": "zh" }, "turn_detection": { "type": "server_vad", "threshold": 0.2, "silence_duration_ms": 800 } } } if enableServerVad: logger.info(f"Sending event: {json.dumps(event_vad, indent=2)}") ws.send(json.dumps(event_vad)) else: logger.info(f"Sending event: {json.dumps(event_manual, indent=2)}") ws.send(json.dumps(event_manual)) def on_message(ws, message): global is_running try: data = json.loads(message) logger.info(f"Received event: {json.dumps(data, ensure_ascii=False, indent=2)}") if data.get("type") == "conversation.item.input_audio_transcription.completed": logger.info(f"Final transcript: {data.get('transcript')}") logger.info("Closing WebSocket connection after completion...") is_running = False # 停止音频发送线程 ws.close() except json.JSONDecodeError: logger.error(f"Failed to parse message: {message}") def on_error(ws, error): logger.error(f"Error: {error}") def on_close(ws, close_status_code, close_msg): logger.info(f"Connection closed: {close_status_code} - {close_msg}") def send_audio(ws, local_audio_path): time.sleep(3) # 等待会话更新完成 global is_running with open(local_audio_path, 'rb') as audio_file: logger.info(f"文件读取开始: {datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}") while is_running: audio_data = audio_file.read(3200) # ~0.1s PCM16/16kHz if not audio_data: logger.info(f"文件读取完毕: {datetime.now().strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]}") if not enableServerVad and ws.sock and ws.sock.connected: commit_event = { "event_id": "event_789", "type": "input_audio_buffer.commit" } ws.send(json.dumps(commit_event)) break if not ws.sock or not ws.sock.connected: logger.info("WebSocket已关闭,停止发送音频。") break encoded_data = base64.b64encode(audio_data).decode('utf-8') eventd = { "event_id": f"event_{int(time.time() * 1000)}", "type": "input_audio_buffer.append", "audio": encoded_data } ws.send(json.dumps(eventd)) logger.info(f"Sending audio event: {eventd['event_id']}") time.sleep(0.1) # 模拟实时采集 # 初始化日志 init_logger() logger.info(f"Connecting to WebSocket server at {url}...") local_audio_path = "your_audio_file.pcm" ws = websocket.WebSocketApp( url, header=headers, on_open=on_open, on_message=on_message, on_error=on_error, on_close=on_close ) thread = threading.Thread(target=send_audio, args=(ws, local_audio_path)) thread.start() ws.run_forever()- Java- 在运行示例前,请确保已安装Java-WebSocket依赖: - Maven- <dependency> <groupId>org.java-websocket</groupId> <artifactId>Java-WebSocket</artifactId> <version>1.5.6</version> </dependency>- Gradle- implementation 'org.java-websocket:Java-WebSocket:1.5.6'- import org.java_websocket.client.WebSocketClient; import org.java_websocket.handshake.ServerHandshake; import org.json.JSONObject; import java.net.URI; import java.nio.file.Files; import java.nio.file.Paths; import java.util.Base64; import java.util.concurrent.atomic.AtomicBoolean; import java.util.logging.*; public class QwenASRRealtimeClient { private static final Logger logger = Logger.getLogger(QwenASRRealtimeClient.class.getName()); // 新加坡和北京地域的API Key不同。获取API Key:https://help.aliyun.com/zh/model-studio/get-api-key // 若没有配置环境变量,请用百炼API Key将下行替换为:private static final String API_KEY = "sk-xxx" private static final String API_KEY = System.getenv().getOrDefault("DASHSCOPE_API_KEY", "sk-xxx"); private static final String MODEL = "qwen3-asr-flash-realtime"; // 控制是否使用 VAD 模式 private static final boolean enableServerVad = true; private static final AtomicBoolean isRunning = new AtomicBoolean(true); private static WebSocketClient client; public static void main(String[] args) throws Exception { initLogger(); // 以下是北京地域baseUrl,如果使用新加坡地域的模型,需要将baseUrl替换为:wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime String baseUrl = "wss://dashscope.aliyuncs.com/api-ws/v1/realtime"; String url = baseUrl + "?model=" + MODEL; logger.info("Connecting to server: " + url); client = new WebSocketClient(new URI(url)) { @Override public void onOpen(ServerHandshake handshake) { logger.info("Connected to server."); sendSessionUpdate(); } @Override public void onMessage(String message) { try { JSONObject data = new JSONObject(message); String eventType = data.optString("type"); logger.info("Received event: " + data.toString(2)); // 收到最终识别结果事件 → 停止发送线程并关闭连接 if ("conversation.item.input_audio_transcription.completed".equals(eventType)) { logger.info("Final transcript: " + data.optString("transcript")); logger.info("Closing WebSocket connection after completion..."); isRunning.set(false); // 停止发送音频线程 if (this.isOpen()) { this.close(1000, "ASR completed"); } } } catch (Exception e) { logger.severe("Failed to parse message: " + message); } } @Override public void onClose(int code, String reason, boolean remote) { logger.info("Connection closed: " + code + " - " + reason); } @Override public void onError(Exception ex) { logger.severe("Error: " + ex.getMessage()); } }; // 添加请求头 client.addHeader("Authorization", "Bearer " + API_KEY); client.addHeader("OpenAI-Beta", "realtime=v1"); client.connectBlocking(); // 阻塞直到连接建立 // 替换为待识别的音频文件路径 String localAudioPath = "your_audio_file.pcm"; Thread audioThread = new Thread(() -> { try { sendAudio(localAudioPath); } catch (Exception e) { logger.severe("Audio sending thread error: " + e.getMessage()); } }); audioThread.start(); } /** 会话更新事件(开启/关闭 VAD) */ private static void sendSessionUpdate() { JSONObject eventNoVad = new JSONObject() .put("event_id", "event_123") .put("type", "session.update") .put("session", new JSONObject() .put("modalities", new String[]{"text"}) .put("input_audio_format", "pcm") .put("sample_rate", 16000) .put("input_audio_transcription", new JSONObject() .put("language", "zh")) .put("turn_detection", JSONObject.NULL) // 手动模式 ); JSONObject eventVad = new JSONObject() .put("event_id", "event_123") .put("type", "session.update") .put("session", new JSONObject() .put("modalities", new String[]{"text"}) .put("input_audio_format", "pcm") .put("sample_rate", 16000) .put("input_audio_transcription", new JSONObject() .put("language", "zh")) .put("turn_detection", new JSONObject() .put("type", "server_vad") .put("threshold", 0.2) .put("silence_duration_ms", 800)) ); if (enableServerVad) { logger.info("Sending event (VAD):\n" + eventVad.toString(2)); client.send(eventVad.toString()); } else { logger.info("Sending event (Manual):\n" + eventNoVad.toString(2)); client.send(eventNoVad.toString()); } } /** 发送音频文件流 */ private static void sendAudio(String localAudioPath) throws Exception { Thread.sleep(3000); // 等会话准备 byte[] allBytes = Files.readAllBytes(Paths.get(localAudioPath)); logger.info("文件读取开始"); int offset = 0; while (isRunning.get() && offset < allBytes.length) { int chunkSize = Math.min(3200, allBytes.length - offset); byte[] chunk = new byte[chunkSize]; System.arraycopy(allBytes, offset, chunk, 0, chunkSize); offset += chunkSize; if (client != null && client.isOpen()) { String encoded = Base64.getEncoder().encodeToString(chunk); JSONObject eventd = new JSONObject() .put("event_id", "event_" + System.currentTimeMillis()) .put("type", "input_audio_buffer.append") .put("audio", encoded); client.send(eventd.toString()); logger.info("Sending audio event: " + eventd.getString("event_id")); } else { break; // 避免在断开后继续发送 } Thread.sleep(100); // 模拟实时发送 } logger.info("文件读取完毕"); // 非 VAD 模式下需要 commit if (!enableServerVad && client != null && client.isOpen()) { JSONObject commitEvent = new JSONObject() .put("event_id", "event_789") .put("type", "input_audio_buffer.commit"); client.send(commitEvent.toString()); logger.info("Sent commit event for manual mode."); } } /** 初始化日志 */ private static void initLogger() { logger.setLevel(Level.ALL); Logger rootLogger = Logger.getLogger(""); for (Handler h : rootLogger.getHandlers()) { rootLogger.removeHandler(h); } Handler consoleHandler = new ConsoleHandler(); consoleHandler.setLevel(Level.ALL); consoleHandler.setFormatter(new SimpleFormatter()); logger.addHandler(consoleHandler); } }- Node.js- 在运行示例前,请确保已使用以下命令安装依赖: - npm install ws- /** * Qwen-ASR Realtime WebSocket 客户端(Node.js版) * 功能: * - 支持 VAD 模式和 Manual 模式 * - 发送 session.update 启动会话 * - 持续发送音频块 input_audio_buffer.append * - Manual 模式下发送 input_audio_buffer.commit * - 收到 completed 事件后关闭连接并停止发送 */ const WebSocket = require('ws'); const fs = require('fs'); // ===== 配置 ===== // 新加坡和北京地域的API Key不同。获取API Key:https://help.aliyun.com/zh/model-studio/get-api-key // 若没有配置环境变量,请用百炼API Key将下行替换为:const API_KEY = "sk-xxx" const API_KEY = process.env.DASHSCOPE_API_KEY || 'sk-xxx'; const MODEL = 'qwen3-asr-flash-realtime'; const enableServerVad = true; // true为VAD模式,false为Manual模式 const localAudioPath = 'your_audio_file.pcm'; // PCM16、16kHz音频文件路径 // 以下是北京地域baseUrl,如果使用新加坡地域的模型,需要将baseUrl替换为:wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime const baseUrl = 'wss://dashscope.aliyuncs.com/api-ws/v1/realtime'; const url = `${baseUrl}?model=${MODEL}`; console.log(`Connecting to server: ${url}`); // ===== 状态控制 ===== let isRunning = true; // ===== 建立连接 ===== const ws = new WebSocket(url, { headers: { 'Authorization': `Bearer ${API_KEY}`, 'OpenAI-Beta': 'realtime=v1' } }); // ===== 事件绑定 ===== ws.on('open', () => { console.log('[WebSocket] Connected to server.'); sendSessionUpdate(); // 启动音频发送线程 sendAudio(localAudioPath); }); ws.on('message', (message) => { try { const data = JSON.parse(message); console.log('[Received Event]:', JSON.stringify(data, null, 2)); // 收到完成事件 if (data.type === 'conversation.item.input_audio_transcription.completed') { console.log(`[Final Transcript] ${data.transcript}`); console.log('[Action] Closing WebSocket connection after completion...'); isRunning = false; // 停止发送音频 if (ws.readyState === WebSocket.OPEN) { ws.close(1000, 'ASR completed'); } } } catch (e) { console.error('[Error] Failed to parse message:', message); } }); ws.on('close', (code, reason) => { console.log(`[WebSocket] Connection closed: ${code} - ${reason}`); }); ws.on('error', (err) => { console.error('[WebSocket Error]', err); }); // ===== 会话更新 ===== function sendSessionUpdate() { const eventNoVad = { event_id: 'event_123', type: 'session.update', session: { modalities: ['text'], input_audio_format: 'pcm', sample_rate: 16000, input_audio_transcription: { language: 'zh' }, turn_detection: null } }; const eventVad = { event_id: 'event_123', type: 'session.update', session: { modalities: ['text'], input_audio_format: 'pcm', sample_rate: 16000, input_audio_transcription: { language: 'zh' }, turn_detection: { type: 'server_vad', threshold: 0.2, silence_duration_ms: 800 } } }; if (enableServerVad) { console.log('[Send Event] VAD Mode:\n', JSON.stringify(eventVad, null, 2)); ws.send(JSON.stringify(eventVad)); } else { console.log('[Send Event] Manual Mode:\n', JSON.stringify(eventNoVad, null, 2)); ws.send(JSON.stringify(eventNoVad)); } } // ===== 发送音频文件流 ===== function sendAudio(audioPath) { setTimeout(() => { console.log(`[File Read Start] ${audioPath}`); const buffer = fs.readFileSync(audioPath); let offset = 0; const chunkSize = 3200; // 约0.1s的PCM16音频 function sendChunk() { if (!isRunning) return; if (offset >= buffer.length) { console.log('[File Read End]'); if (!enableServerVad && ws.readyState === WebSocket.OPEN) { const commitEvent = { event_id: 'event_789', type: 'input_audio_buffer.commit' }; ws.send(JSON.stringify(commitEvent)); console.log('[Send Commit Event]'); } return; } if (ws.readyState !== WebSocket.OPEN) { console.log('[Stop] WebSocket is not open.'); return; } const chunk = buffer.slice(offset, offset + chunkSize); offset += chunkSize; const encoded = chunk.toString('base64'); const appendEvent = { event_id: `event_${Date.now()}`, type: 'input_audio_buffer.append', audio: encoded }; ws.send(JSON.stringify(appendEvent)); console.log(`[Send Audio Event] ${appendEvent.event_id}`); setTimeout(sendChunk, 100); // 模拟实时发送 } sendChunk(); }, 3000); // 等待会话配置完成 }
API参考
限流
| 模型名称 | 每秒钟调用次数(RPS) | 
| qwen3-asr-flash-realtime | 20 | 
| qwen3-asr-flash-realtime-2025-10-27 | 
模型应用上架及备案
参见应用合规备案。