实时语音识别-Fun-ASR/Gummy/Paraformer

实时语音识别服务可将音频流实时转换为带标点的文本,实现“边说边出文字”的效果。无论是麦克风语音、会议录音还是本地音频文件,都能轻松转录。服务广泛应用于会议实时记录、直播字幕、语音聊天、智能客服等场景。

核心功能

  • 支持多语种实时语音识别,覆盖中英文及多种方言

  • 支持热词定制,可提升特定词汇的识别准确率

  • 支持时间戳输出,生成结构化识别结果

  • 灵活采样率与多种音频格式,适配不同录音环境

  • 可选VAD(Voice Activity Detection),自动过滤静音片段,提升长音频处理效率

  • SDK + WebSocket 接入,低延迟稳定服务

适用范围

  • 支持的地域:仅支持中国大陆(北京)地域,需使用“中国大陆(北京)”地域的API Key

  • 支持的模型

    • Fun-ASR:fun-asr-realtime、fun-asr-realtime-2025-11-07、fun-asr-realtime-2025-09-15

    • Gummy:gummy-realtime-v1、gummy-chat-v1

    • Paraformer:paraformer-realtime-v2、paraformer-realtime-v1、paraformer-realtime-8k-v2、paraformer-realtime-8k-v1

模型选型

场景

推荐模型

理由

中文普通话识别(会议/直播)

fun-asr-realtime、fun-asr-realtime-2025-11-07、paraformer-realtime-v2

多格式兼容,高采样率支持,稳定延迟

多语种识别(国际会议)

gummy-realtime-v1、paraformer-realtime-v2

覆盖多语种

中文方言识别(客服/政务)

fun-asr-realtime-2025-11-07、paraformer-realtime-v2

覆盖多地方言

中英日混合识别(课堂/演讲)

fun-asr-realtime、fun-asr-realtime-2025-11-07

中英日识别优化

短音频快速交互(智能客服)

gummy-chat-v1

1分钟内音频,低成本,支持多语种

低带宽电话录音转写

paraformer-realtime-8k-v2

支持8kHz,默认情感识别

热词定制场景(品牌名/专有术语)

Gummy、Paraformer、Fun-ASR最新版本模型

热词可开关,易于迭代配置

更多说明请参见模型功能特性对比

快速开始

下面是调用API的示例代码。更多常用场景的代码示例,请参见GitHub

您需要已获取API Key配置API Key到环境变量。如果通过SDK调用,还需要安装DashScope SDK

Fun-ASR

识别传入麦克风的语音

实时语音识别可以识别麦克风中传入的语音并输出识别结果,达到“边说边出文字”的效果。

Java

import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.BackpressureStrategy;
import io.reactivex.Flowable;

import java.nio.ByteBuffer;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.TargetDataLine;

public class Main {

    public static void main(String[] args) throws NoApiKeyException {
        // 创建一个Flowable<ByteBuffer>
        Flowable<ByteBuffer> audioSource = Flowable.create(emitter -> {
            new Thread(() -> {
                try {
                    // 创建音频格式
                    AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
                    // 根据格式匹配默认录音设备
                    TargetDataLine targetDataLine =
                            AudioSystem.getTargetDataLine(audioFormat);
                    targetDataLine.open(audioFormat);
                    // 开始录音
                    targetDataLine.start();
                    ByteBuffer buffer = ByteBuffer.allocate(1024);
                    long start = System.currentTimeMillis();
                    // 录音300s并进行实时转写
                    while (System.currentTimeMillis() - start < 300000) {
                        int read = targetDataLine.read(buffer.array(), 0, buffer.capacity());
                        if (read > 0) {
                            buffer.limit(read);
                            // 将录音音频数据发送给流式识别服务
                            emitter.onNext(buffer);
                            buffer = ByteBuffer.allocate(1024);
                            // 录音速率有限,防止cpu占用过高,休眠一小会儿
                            Thread.sleep(20);
                        }
                    }
                    // 通知结束转写
                    emitter.onComplete();
                } catch (Exception e) {
                    emitter.onError(e);
                }
            }).start();
        },
        BackpressureStrategy.BUFFER);

        // 创建Recognizer
        Recognition recognizer = new Recognition();
        // 创建RecognitionParam,audioFrames参数中传入上面创建的Flowable<ByteBuffer>
        RecognitionParam param = RecognitionParam.builder()
            .model("fun-asr-realtime")
            .format("pcm")
            .sampleRate(16000)
            // 若没有将API Key配置到环境变量中,需将下面这行代码注释放开,并将apiKey替换为自己的API Key
            // .apiKey("apikey")
            .build();

        // 流式调用接口
        recognizer.streamCall(param, audioSource)
            // 调用Flowable的subscribe方法订阅结果
            .blockingForEach(
                result -> {
                    // 打印最终结果
                    if (result.isSentenceEnd()) {
                        System.out.println("Fix:" + result.getSentence().getText());
                    } else {
                        System.out.println("Result:" + result.getSentence().getText());
                    }
                });
        System.exit(0);
    }
}

Python

运行Python示例前,需要通过pip install pyaudio命令安装第三方音频播放与采集套件。

import pyaudio
from dashscope.audio.asr import (Recognition, RecognitionCallback,
                                 RecognitionResult)

# 若没有将API Key配置到环境变量中,需将下面这行代码注释放开,并将apiKey替换为自己的API Key
# import dashscope
# dashscope.api_key = "apiKey"

mic = None
stream = None


class Callback(RecognitionCallback):
    def on_open(self) -> None:
        global mic
        global stream
        print('RecognitionCallback open.')
        mic = pyaudio.PyAudio()
        stream = mic.open(format=pyaudio.paInt16,
                          channels=1,
                          rate=16000,
                          input=True)

    def on_close(self) -> None:
        global mic
        global stream
        print('RecognitionCallback close.')
        stream.stop_stream()
        stream.close()
        mic.terminate()
        stream = None
        mic = None

    def on_event(self, result: RecognitionResult) -> None:
        print('RecognitionCallback sentence: ', result.get_sentence())


callback = Callback()
recognition = Recognition(model='fun-asr-realtime',
                          format='pcm',
                          sample_rate=16000,
                          callback=callback)
recognition.start()

while True:
    if stream:
        data = stream.read(3200, exception_on_overflow=False)
        recognition.send_audio_frame(data)
    else:
        break

recognition.stop()

识别本地音频文件

实时语音识别可以识别本地音频文件并输出识别结果。对于对话聊天、控制口令、语音输入法、语音搜索等较短的准实时语音识别场景可考虑采用该接口进行语音识别。

Java

import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;

public class Main {
    public static void main(String[] args) {
        // 用户可忽略url下载文件部分,可以直接使用本地文件进行相关api调用进行识别
        String exampleWavUrl =
                "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav";
        try {
            InputStream in = new URL(exampleWavUrl).openStream();
            Files.copy(in, Paths.get("asr_example.wav"), StandardCopyOption.REPLACE_EXISTING);
        } catch (IOException e) {
            System.out.println("error: " + e);
            System.exit(1);
        }

        // 创建Recognition实例
        Recognition recognizer = new Recognition();
        // 创建RecognitionParam
        RecognitionParam param =
                RecognitionParam.builder()
                        // 若没有将API Key配置到环境变量中,需将下面这行代码注释放开,并将apiKey替换为自己的API Key
                        // .apiKey("apikey")
                        .model("fun-asr-realtime")
                        .format("wav")
                        .sampleRate(16000)
                        .parameter("language_hints", new String[]{"zh", "en"})
                        .build();

        try {
            System.out.println("识别结果:" + recognizer.call(param, new File("asr_example.wav")));
        } catch (Exception e) {
            e.printStackTrace();
        }
        System.exit(0);
    }
}

Python

import requests
from http import HTTPStatus
from dashscope.audio.asr import Recognition

# 若没有将API Key配置到环境变量中,需将下面这行代码注释放开,并将apiKey替换为自己的API Key
# import dashscope
# dashscope.api_key = "apiKey"

# 用户可忽略从url下载文件这部分代码,直接使用本地文件进行识别
r = requests.get(
    'https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav'
)
with open('asr_example.wav', 'wb') as f:
    f.write(r.content)

recognition = Recognition(model='fun-asr-realtime',
                          format='wav',
                          sample_rate=16000,
                          language_hints=['zh', 'en'],
                          callback=None)
result = recognition.call('asr_example.wav')
if result.status_code == HTTPStatus.OK:
    print('识别结果:')
    print(result.get_sentence())
else:
    print('Error: ', result.message)

Gummy

实时语音识别:适用于会议演讲、视频直播等长时间不间断识别的场景。

一句话识别:对停顿更加敏感,支持对一分钟内的短语音进行精准识别,适用于对话聊天、指令控制、语音输入法、语音搜索等短时语音交互场景。

实时语音识别

实时语音识别支持对长时间的语音数据流(无论是从外部设备如麦克风获取的音频流,还是从本地文件读取的音频流)进行识别并流式返回结果。

识别传入麦克风的语音

Java

import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerParam;
import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerRealtime;
import com.alibaba.dashscope.audio.asr.translation.results.TranslationRecognizerResult;
import com.alibaba.dashscope.common.ResultCallback;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.TargetDataLine;

import java.nio.ByteBuffer;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class Main {
    public static void main(String[] args) throws InterruptedException {
        ExecutorService executorService = Executors.newSingleThreadExecutor();
        executorService.submit(new RealtimeRecognitionTask());
        executorService.shutdown();
        executorService.awaitTermination(1, TimeUnit.MINUTES);
        System.exit(0);
    }
}

class RealtimeRecognitionTask implements Runnable {
    @Override
    public void run() {
        String targetLanguage = "en";
        // 初始化请求参数
        TranslationRecognizerParam param =
                TranslationRecognizerParam.builder()
                        // 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
                        // .apiKey("your-api-key")
                        .model("gummy-realtime-v1") // 设置模型名
                        .format("pcm") // 设置待识别音频格式,支持的音频格式:pcm、wav、mp3、opus、speex、aac、amr
                        .sampleRate(16000) // 设置待识别音频采样率(单位Hz)。支持16000Hz及以上采样率。
                        .transcriptionEnabled(true) // 设置是否开启实时识别
                        .sourceLanguage("auto") // 设置源语言(待识别/翻译语言)代码
                        .translationEnabled(true) // 设置是否开启实时翻译
                        .translationLanguages(new String[] {targetLanguage}) // 设置翻译目标语言
                        .build();

        // 初始化回调接口
        ResultCallback<TranslationRecognizerResult> callback =
                new ResultCallback<TranslationRecognizerResult>() {
                    @Override
                    public void onEvent(TranslationRecognizerResult result) {
                        System.out.println("RequestId: " + result.getRequestId());
                        // 打印最终结果
                        if (result.getTranscriptionResult() != null) {
                            System.out.println("Transcription Result:"+result);
                            if (result.isSentenceEnd()) {
                                System.out.println("\tFix:" + result.getTranscriptionResult().getText());
                            } else {
                                System.out.println("\tTemp Result:" + result.getTranscriptionResult().getText());
                            }
                        }
                        if (result.getTranslationResult() != null) {
                            System.out.println("English Translation Result:");
                            if (result.isSentenceEnd()) {
                                System.out.println("\tFix:" + result.getTranslationResult().getTranslation(targetLanguage).getText());
                            } else {
                                System.out.println("\tTemp Result:" + result.getTranslationResult().getTranslation(targetLanguage).getText());
                            }
                        }
                    }

                    @Override
                    public void onComplete() {
                        System.out.println("Translation complete");
                    }

                    @Override
                    public void onError(Exception e) {
                        e.printStackTrace();
                        System.out.println("TranslationCallback error: " + e.getMessage());
                    }
                };

        // 初始化流式识别服务
        TranslationRecognizerRealtime translator = new TranslationRecognizerRealtime();

        try {
            // 启动流式语音识别/翻译,绑定请求参数和回调接口
            translator.call(param, callback);
            // 创建音频格式
            AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
            // 根据格式匹配默认录音设备
            TargetDataLine targetDataLine =
                    AudioSystem.getTargetDataLine(audioFormat);
            targetDataLine.open(audioFormat);
            // 开始录音
            targetDataLine.start();
            System.out.println("请您通过麦克风讲话体验实时语音识别和翻译功能");
            ByteBuffer buffer = ByteBuffer.allocate(1024);
            long start = System.currentTimeMillis();
            // 录音50s并进行实时识别
            while (System.currentTimeMillis() - start < 50000) {
                int read = targetDataLine.read(buffer.array(), 0, buffer.capacity());
                if (read > 0) {
                    buffer.limit(read);
                    // 将录音音频数据发送给流式识别服务
                    translator.sendAudioFrame(buffer);
                    buffer = ByteBuffer.allocate(1024);
                    // 录音速率有限,防止cpu占用过高,休眠一小会儿
                    Thread.sleep(20);
                }
            }
            // 通知结束
            translator.stop();
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            // 任务结束关闭 websocket 连接
            translator.getDuplexApi().close(1000, "bye");
        }

        System.out.println(
                "[Metric] requestId: "
                        + translator.getLastRequestId()
                        + ", first package delay ms: "
                        + translator.getFirstPackageDelay()
                        + ", last package delay ms: "
                        + translator.getLastPackageDelay());
    }
}

Python

运行Python示例前,需要通过pip install pyaudio命令安装第三方音频播放与采集套件。

import pyaudio
import dashscope
from dashscope.audio.asr import *


# 若没有将API Key配置到环境变量中,需将下面这行代码注释放开, 并将your-api-key替换为自己的API Key
# dashscope.api_key = "your-api-key"

mic = None
stream = None

class Callback(TranslationRecognizerCallback):
    def on_open(self) -> None:
        global mic
        global stream
        print("TranslationRecognizerCallback open.")
        mic = pyaudio.PyAudio()
        stream = mic.open(
            format=pyaudio.paInt16, channels=1, rate=16000, input=True
        )

    def on_close(self) -> None:
        global mic
        global stream
        print("TranslationRecognizerCallback close.")
        stream.stop_stream()
        stream.close()
        mic.terminate()
        stream = None
        mic = None

    def on_event(
        self,
        request_id,
        transcription_result: TranscriptionResult,
        translation_result: TranslationResult,
        usage,
    ) -> None:
        print("request id: ", request_id)
        print("usage: ", usage)
        if translation_result is not None:
            print(
                "translation_languages: ",
                translation_result.get_language_list(),
            )
            english_translation = translation_result.get_translation("en")
            print("sentence id: ", english_translation.sentence_id)
            print("translate to english: ", english_translation.text)
        if transcription_result is not None:
            print("sentence id: ", transcription_result.sentence_id)
            print("transcription: ", transcription_result.text)


callback = Callback()


translator = TranslationRecognizerRealtime(
    model="gummy-realtime-v1",
    format="pcm",
    sample_rate=16000,
    transcription_enabled=True,
    translation_enabled=True,
    translation_target_languages=["en"],
    callback=callback,
)
translator.start()
print("请您通过麦克风讲话体验实时语音识别和翻译功能")
while True:
    if stream:
        data = stream.read(3200, exception_on_overflow=False)
        translator.send_audio_frame(data)
    else:
        break

translator.stop()

识别本地文件

Java

示例中用到的音频为:hello_world.wav

import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerParam;
import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerRealtime;
import com.alibaba.dashscope.audio.asr.translation.results.TranslationRecognizerResult;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.exception.NoApiKeyException;

import java.io.FileInputStream;
import java.nio.ByteBuffer;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

class RealtimeTranslateTask implements Runnable {
    private Path filepath;

    public RealtimeTranslateTask(Path filepath) {
        this.filepath = filepath;
    }

    @Override
    public void run() {
        String targetLanguage = "en";
        // Create translation params
        // you can customize the translation parameters, like model, format,
        // sample_rate for more information, please refer to
        // https://help.aliyun.com/document_detail/2712536.html
        TranslationRecognizerParam param =
                TranslationRecognizerParam.builder()
                        // 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
                        // .apiKey("your-api-key")
                        .model("gummy-realtime-v1")
                        .format("wav") // 'pcm'、'wav'、'mp3'、'opus'、'speex'、'aac'、'amr', you
                        // can check the supported formats in the document
                        .sampleRate(16000)
                        .transcriptionEnabled(true)
                        .sourceLanguage("auto")
                        .translationEnabled(true)
                        .translationLanguages(new String[] {targetLanguage})
                        .build();
        TranslationRecognizerRealtime translator = new TranslationRecognizerRealtime();
        CountDownLatch latch = new CountDownLatch(1);

        String threadName = Thread.currentThread().getName();

        ResultCallback<TranslationRecognizerResult> callback =
                new ResultCallback<TranslationRecognizerResult>() {
                    @Override
                    public void onEvent(TranslationRecognizerResult result) {
                        System.out.println("RequestId: " + result.getRequestId());
                        // 打印最终结果
                        if (result.getTranscriptionResult() != null) {
                            System.out.println("Transcription Result:"+result);
                            if (result.isSentenceEnd()) {
                                System.out.println("\tFix:" + result.getTranscriptionResult().getText());
                            } else {
                                System.out.println("\tTemp Result:" + result.getTranscriptionResult().getText());
                            }
                        }
                        if (result.getTranslationResult() != null) {
                            System.out.println("English Translation Result:");
                            if (result.isSentenceEnd()) {
                                System.out.println("\tFix:" + result.getTranslationResult().getTranslation(targetLanguage).getText());
                            } else {
                                System.out.println("\tTemp Result:" + result.getTranslationResult().getTranslation(targetLanguage).getText());
                            }
                        }
                    }

                    @Override
                    public void onComplete() {
                        System.out.println("[" + threadName + "] Translation complete");
                        latch.countDown();
                    }

                    @Override
                    public void onError(Exception e) {
                        e.printStackTrace();
                        System.out.println("[" + threadName + "] TranslationCallback error: " + e.getMessage());
                    }
                };
        // set param & callback
        try {
            translator.call(param, callback);
            // Please replace the path with your audio file path
            System.out.println("[" + threadName + "] Input file_path is: " + this.filepath);
            // Read file and send audio by chunks
            FileInputStream fis = new FileInputStream(this.filepath.toFile());
            // chunk size set to 1 seconds for 16KHz sample rate
            byte[] buffer = new byte[3200];
            int bytesRead;
            // Loop to read chunks of the file
            while ((bytesRead = fis.read(buffer)) != -1) {
                ByteBuffer byteBuffer;
                // Handle the last chunk which might be smaller than the buffer size
                System.out.println("[" + threadName + "] bytesRead: " + bytesRead);
                if (bytesRead < buffer.length) {
                    byteBuffer = ByteBuffer.wrap(buffer, 0, bytesRead);
                } else {
                    byteBuffer = ByteBuffer.wrap(buffer);
                }
                // Send the ByteBuffer to the translation instance
                translator.sendAudioFrame(byteBuffer);
                buffer = new byte[3200];
                Thread.sleep(100);
            }
            System.out.println(LocalDateTime.now());
            translator.stop();
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            // 任务结束关闭 websocket 连接
            translator.getDuplexApi().close(1000, "bye");
        }

        // wait for the translation to complete
        try {
            latch.await();
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }

}

public class Main {
    public static void main(String[] args)
            throws NoApiKeyException, InterruptedException {

        String currentDir = System.getProperty("user.dir");
        // Please replace the path with your audio source
        Path[] filePaths = {
                Paths.get(currentDir, "hello_world.wav"),
//                Paths.get(currentDir, "hello_world_male_16k_16bit_mono.wav"),
        };
        // Use ThreadPool to run recognition tasks
        ExecutorService executorService = Executors.newFixedThreadPool(10);
        for (Path filepath:filePaths) {
            executorService.submit(new RealtimeTranslateTask(filepath));
        }
        executorService.shutdown();
        // wait for all tasks to complete
        executorService.awaitTermination(1, TimeUnit.MINUTES);
        System.exit(0);
    }
}

Python

import os
import requests
from http import HTTPStatus

import dashscope
from dashscope.audio.asr import *

# 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
# dashscope.api_key = "your-api-key"

r = requests.get(
    "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav"
)
with open("asr_example.wav", "wb") as f:
    f.write(r.content)


class Callback(TranslationRecognizerCallback):
    def on_open(self) -> None:
        print("TranslationRecognizerCallback open.")

    def on_close(self) -> None:
        print("TranslationRecognizerCallback close.")

    def on_event(
        self,
        request_id,
        transcription_result: TranscriptionResult,
        translation_result: TranslationResult,
        usage,
    ) -> None:
        print("request id: ", request_id)
        print("usage: ", usage)
        if translation_result is not None:
            print(
                "translation_languages: ",
                translation_result.get_language_list(),
            )
            english_translation = translation_result.get_translation("en")
            print("sentence id: ", english_translation.sentence_id)
            print("translate to english: ", english_translation.text)
        if transcription_result is not None:
            print("sentence id: ", transcription_result.sentence_id)
            print("transcription: ", transcription_result.text)
    
    def on_error(self, message) -> None:
        print('error: {}'.format(message))
    
    def on_complete(self) -> None:
        print('TranslationRecognizerCallback complete')


callback = Callback()


translator = TranslationRecognizerRealtime(
    model="gummy-realtime-v1",
    format="wav",
    sample_rate=16000,
    callback=callback,
)

translator.start()

try:
    audio_data: bytes = None
    f = open("asr_example.wav", 'rb')
    if os.path.getsize("asr_example.wav"):
        while True:
            audio_data = f.read(12800)
            if not audio_data:
                break
            else:
                translator.send_audio_frame(audio_data)
    else:
        raise Exception(
            'The supplied file was empty (zero bytes long)')
    f.close()
except Exception as e:
    raise e

translator.stop()

一句话识别

一句话识别能够对一分钟内的语音数据流(无论是从外部设备如麦克风获取的音频流,还是从本地文件读取的音频流)进行识别并流式返回结果。

识别传入麦克风的语音

Java

package org.alibaba.bailian.example.examples;

import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerChat;
import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerParam;
import com.alibaba.dashscope.audio.asr.translation.results.TranscriptionResult;
import com.alibaba.dashscope.audio.asr.translation.results.TranslationRecognizerResult;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.exception.NoApiKeyException;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.TargetDataLine;
import java.nio.ByteBuffer;

public class Main {

    public static void main(String[] args) throws NoApiKeyException, InterruptedException, LineUnavailableException {

        // 创建Recognizer
        TranslationRecognizerChat translator = new TranslationRecognizerChat();
        // 初始化请求参数
        TranslationRecognizerParam param =
                TranslationRecognizerParam.builder()
                        // 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
                        // .apiKey("your-api-key")
                        .model("gummy-chat-v1") // 设置模型名
                        .format("pcm") // 设置待识别音频格式,支持的音频格式:pcm、pcm编码的wav、mp3、ogg封装的opus、ogg封装的speex、aac、amr
                        .sampleRate(16000) // 设置待识别音频采样率(单位Hz)。仅支持16000Hz采样率。
                        .transcriptionEnabled(true) // 设置是否开启实时识别
                        .translationEnabled(true) // 设置是否开启实时翻译
                        .translationLanguages(new String[] {"en"}) // 设置翻译目标语言
                        .build();

        try {

            translator.call(param, new ResultCallback<TranslationRecognizerResult>() {
                @Override
                public void onEvent(TranslationRecognizerResult result) {
                    if (result.getTranscriptionResult() == null) {
                        return;
                    }
                    try {
                        System.out.println("RequestId: " + result.getRequestId());
                        // 打印最终结果
                        if (result.getTranscriptionResult() != null) {
                            System.out.println("Transcription Result:");
                            if (result.isSentenceEnd()) {
                                System.out.println("\tFix:" + result.getTranscriptionResult().getText());
                            } else {
                                TranscriptionResult transcriptionResult = result.getTranscriptionResult();
                                System.out.println("\tTemp Result:" + transcriptionResult.getText());
                                if (result.getTranscriptionResult().isVadPreEnd()) {
                                    System.out.printf("VadPreEnd: start:%d, end:%d, time:%d\n", transcriptionResult.getPreEndStartTime(), transcriptionResult.getPreEndEndTime(), transcriptionResult.getPreEndTimemillis());
                                }
                            }
                        }
                        if (result.getTranslationResult() != null) {
                            System.out.println("English Translation Result:");
                            if (result.isSentenceEnd()) {
                                System.out.println("\tFix:" + result.getTranslationResult().getTranslation("en").getText());
                            } else {
                                System.out.println("\tTemp Result:" + result.getTranslationResult().getTranslation("en").getText());
                            }
                        }
                    } catch (Exception e) {
                        e.printStackTrace();
                    }

                }

                @Override
                public void onComplete() {
                    System.out.println("Translation complete");
                }

                @Override
                public void onError(Exception e) {

                }
            });

            // 创建音频格式
            AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
            // 根据格式匹配默认录音设备
            TargetDataLine targetDataLine =
                    AudioSystem.getTargetDataLine(audioFormat);
            targetDataLine.open(audioFormat);
            // 开始录音
            targetDataLine.start();
            System.out.println("请您通过麦克风讲话体验一句话语音识别和翻译功能");
            ByteBuffer buffer = ByteBuffer.allocate(1024);
            long start = System.currentTimeMillis();
            // 录音5s并进行实时识别
            while (System.currentTimeMillis() - start < 50000) {
                int read = targetDataLine.read(buffer.array(), 0, buffer.capacity());
                if (read > 0) {
                    buffer.limit(read);
                    // 将录音音频数据发送给流式识别服务
                    if (!translator.sendAudioFrame(buffer)) {
                        System.out.println("sentence end, stop sending");
                        break;
                    }
                    buffer = ByteBuffer.allocate(1024);
                    // 录音速率有限,防止cpu占用过高,休眠一小会儿
                    Thread.sleep(20);
                }
            }
            translator.stop();
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            // 任务结束关闭 websocket 连接
            translator.getDuplexApi().close(1000, "bye");
        }
        System.exit(0);
    }
}

Python

运行Python示例前,需要通过pip install pyaudio命令安装第三方音频播放与采集套件。

import pyaudio
import dashscope
from dashscope.audio.asr import *


# 若没有将API Key配置到环境变量中,需将下面这行代码注释放开, 并将your-api-key替换为自己的API Key
# dashscope.api_key = "your-api-key"

mic = None
stream = None

class Callback(TranslationRecognizerCallback):
    def on_open(self) -> None:
        global mic
        global stream
        print("TranslationRecognizerCallback open.")
        mic = pyaudio.PyAudio()
        stream = mic.open(
            format=pyaudio.paInt16, channels=1, rate=16000, input=True
        )

    def on_close(self) -> None:
        global mic
        global stream
        print("TranslationRecognizerCallback close.")
        stream.stop_stream()
        stream.close()
        mic.terminate()
        stream = None
        mic = None

    def on_event(
        self,
        request_id,
        transcription_result: TranscriptionResult,
        translation_result: TranslationResult,
        usage,
    ) -> None:
        print("request id: ", request_id)
        print("usage: ", usage)
        if translation_result is not None:
            print(
                "translation_languages: ",
                translation_result.get_language_list(),
            )
            english_translation = translation_result.get_translation("en")
            print("sentence id: ", english_translation.sentence_id)
            print("translate to english: ", english_translation.text)
            if english_translation.vad_pre_end:
                print("vad pre end {}, {}, {}".format(transcription_result.pre_end_start_time, transcription_result.pre_end_end_time, transcription_result.pre_end_timemillis))
        if transcription_result is not None:
            print("sentence id: ", transcription_result.sentence_id)
            print("transcription: ", transcription_result.text)


callback = Callback()


translator = TranslationRecognizerChat(
    model="gummy-chat-v1",
    format="pcm",
    sample_rate=16000,
    transcription_enabled=True,
    translation_enabled=True,
    translation_target_languages=["en"],
    callback=callback,
)
translator.start()
print("请您通过麦克风讲话体验一句话语音识别和翻译功能")
while True:
    if stream:
        data = stream.read(3200, exception_on_overflow=False)
        if not translator.send_audio_frame(data):
            print("sentence end, stop sending")
            break
    else:
        break

translator.stop()

识别本地文件

Java

示例中用到的音频为:hello_world.wav

import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerChat;
import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerParam;
import com.alibaba.dashscope.audio.asr.translation.results.TranslationRecognizerResult;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.exception.NoApiKeyException;

import java.io.FileInputStream;
import java.nio.ByteBuffer;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

class RealtimeTranslateChatTask implements Runnable {
    private Path filepath;
    private TranslationRecognizerChat translator = null;

    public RealtimeTranslateChatTask(Path filepath) {
        this.filepath = filepath;
    }

    @Override
    public void run() {
        for (int i=0; i<1; i++) {
            // 初始化请求参数
            TranslationRecognizerParam param =
                    TranslationRecognizerParam.builder()
                            // 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
                            // .apiKey("your-api-key")
                            .model("gummy-chat-v1") // 设置模型名
                            .format("wav") // 设置待识别音频格式,支持的音频格式:pcm、pcm编码的wav、mp3、ogg封装的opus、ogg封装的speex、aac、amr
                            .sampleRate(16000) // 设置待识别音频采样率(单位Hz)。只支持16000Hz的采样率。
                            .transcriptionEnabled(true) // 设置是否开启实时识别
                            .translationEnabled(true) // 设置是否开启实时翻译
                            .translationLanguages(new String[] {"en"}) // 设置翻译目标语言
                            .build();
            if (translator == null) {
                // 初始化流式识别服务
                translator = new TranslationRecognizerChat();
            }

            String threadName = Thread.currentThread().getName();

            // 初始化回调接口
            ResultCallback<TranslationRecognizerResult> callback =
                    new ResultCallback<TranslationRecognizerResult>() {
                        @Override
                        public void onEvent(TranslationRecognizerResult result) {
                            System.out.println("RequestId: " + result.getRequestId());
                            // 打印最终结果
                            if (result.getTranscriptionResult() != null) {
                                System.out.println("Transcription Result:"+result);
                                if (result.isSentenceEnd()) {
                                    System.out.println("\tFix:" + result.getTranscriptionResult().getText());
                                } else {
                                    System.out.println("\tTemp Result:" + result.getTranscriptionResult().getText());
                                }
                            }
                            if (result.getTranslationResult() != null) {
                                System.out.println("English Translation Result:");
                                if (result.isSentenceEnd()) {
                                    System.out.println("\tFix:" + result.getTranslationResult().getTranslation("en").getText());
                                } else {
                                    System.out.println("\tTemp Result:" + result.getTranslationResult().getTranslation("en").getText());
                                }
                            }
                        }

                        @Override
                        public void onComplete() {
                            System.out.println("[" + threadName + "] Translation complete");
                        }

                        @Override
                        public void onError(Exception e) {
                            e.printStackTrace();
                            System.out.println("[" + threadName + "] TranslationCallback error: " + e.getMessage());
                        }
                    };
            try {
                // 启动流式语音识别/翻译,绑定请求参数和回调接口
                translator.call(param, callback);
                // 替换成您自己的文件路径
                System.out.println("[" + threadName + "] Input file_path is: " + this.filepath);
                // Read file and send audio by chunks
                try (FileInputStream fis = new FileInputStream(this.filepath.toFile())) {
                    // chunk size set to 1 seconds for 16KHz sample rate
                    byte[] buffer = new byte[3200];
                    int bytesRead;
                    // Loop to read chunks of the file
                    while ((bytesRead = fis.read(buffer)) != -1) {
                        ByteBuffer byteBuffer;
                        // Handle the last chunk which might be smaller than the buffer size
                        System.out.println("[" + threadName + "] bytesRead: " + bytesRead);
                        if (bytesRead < buffer.length) {
                            byteBuffer = ByteBuffer.wrap(buffer, 0, bytesRead);
                        } else {
                            byteBuffer = ByteBuffer.wrap(buffer);
                        }
                        // Send the ByteBuffer to the translation instance
                        if (!translator.sendAudioFrame(byteBuffer)) {
                            System.out.println("sentence end, stop sending");
                            break;
                        }
                        buffer = new byte[3200];
                        Thread.sleep(100);
                    }
                    fis.close();
                    System.out.println(LocalDateTime.now());
                } catch (Exception e) {
                    e.printStackTrace();
                }

                // 通知结束
                translator.stop();
            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                // 任务结束关闭 websocket 连接
                if (translator != null) {
                    translator.getDuplexApi().close(1000, "bye");
                }
            }
        }
    }

}


public class Main {
    public static void main(String[] args)
            throws NoApiKeyException, InterruptedException {
        String currentDir = System.getProperty("user.dir");
        // Please replace the path with your audio source
        Path[] filePaths = {
                Paths.get(currentDir, "hello_world.wav"),
//                Paths.get(currentDir, "hello_world_male_16k_16bit_mono.wav"),
        };
        // Use ThreadPool to run recognition tasks
        ExecutorService executorService = Executors.newFixedThreadPool(10);
        for (Path filepath:filePaths) {
            executorService.submit(new RealtimeTranslateChatTask(filepath));
        }
        executorService.shutdown();
        // wait for all tasks to complete
        executorService.awaitTermination(1, TimeUnit.MINUTES);
//        System.exit(0);
    }
}

Python

import os
import requests
from http import HTTPStatus

import dashscope
from dashscope.audio.asr import *

# 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
# dashscope.api_key = "your-api-key"

r = requests.get(
    "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav"
)
with open("asr_example.wav", "wb") as f:
    f.write(r.content)


class Callback(TranslationRecognizerCallback):
    def on_open(self) -> None:
        print("TranslationRecognizerCallback open.")

    def on_close(self) -> None:
        print("TranslationRecognizerCallback close.")

    def on_event(
            self,
            request_id,
            transcription_result: TranscriptionResult,
            translation_result: TranslationResult,
            usage,
    ) -> None:
        print("request id: ", request_id)
        print("usage: ", usage)
        if translation_result is not None:
            print(
                "translation_languages: ",
                translation_result.get_language_list(),
            )
            english_translation = translation_result.get_translation("en")
            print("sentence id: ", english_translation.sentence_id)
            print("translate to english: ", english_translation.text)
        if transcription_result is not None:
            print("sentence id: ", transcription_result.sentence_id)
            print("transcription: ", transcription_result.text)

    def on_error(self, message) -> None:
        print('error: {}'.format(message))

    def on_complete(self) -> None:
        print('TranslationRecognizerCallback complete')


callback = Callback()

translator = TranslationRecognizerChat(
    model="gummy-chat-v1",
    format="wav",
    sample_rate=16000,
    callback=callback,
)

translator.start()

try:
    audio_data: bytes = None
    f = open("asr_example.wav", 'rb')
    if os.path.getsize("asr_example.wav"):
        while True:
            audio_data = f.read(12800)
            if not audio_data:
                break
            else:
                if translator.send_audio_frame(audio_data):
                    print("send audio frame success")
                else:
                    print("sentence end, stop sending")
                    break
    else:
        raise Exception(
            'The supplied file was empty (zero bytes long)')
    f.close()
except Exception as e:
    raise e

translator.stop()

Paraformer

识别传入麦克风的语音

实时语音识别可以识别麦克风中传入的语音并输出识别结果,达到“边说边出文字”的效果。

Java

import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.BackpressureStrategy;
import io.reactivex.Flowable;

import java.nio.ByteBuffer;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.TargetDataLine;

public class Main {

    public static void main(String[] args) throws NoApiKeyException {
        // 创建一个Flowable<ByteBuffer>
        Flowable<ByteBuffer> audioSource = Flowable.create(emitter -> {
            new Thread(() -> {
                try {
                    // 创建音频格式
                    AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
                    // 根据格式匹配默认录音设备
                    TargetDataLine targetDataLine =
                            AudioSystem.getTargetDataLine(audioFormat);
                    targetDataLine.open(audioFormat);
                    // 开始录音
                    targetDataLine.start();
                    ByteBuffer buffer = ByteBuffer.allocate(1024);
                    long start = System.currentTimeMillis();
                    // 录音300s并进行实时转写
                    while (System.currentTimeMillis() - start < 300000) {
                        int read = targetDataLine.read(buffer.array(), 0, buffer.capacity());
                        if (read > 0) {
                            buffer.limit(read);
                            // 将录音音频数据发送给流式识别服务
                            emitter.onNext(buffer);
                            buffer = ByteBuffer.allocate(1024);
                            // 录音速率有限,防止cpu占用过高,休眠一小会儿
                            Thread.sleep(20);
                        }
                    }
                    // 通知结束转写
                    emitter.onComplete();
                } catch (Exception e) {
                    emitter.onError(e);
                }
            }).start();
        },
        BackpressureStrategy.BUFFER);

        // 创建Recognizer
        Recognition recognizer = new Recognition();
        // 创建RecognitionParam,audioFrames参数中传入上面创建的Flowable<ByteBuffer>
        RecognitionParam param = RecognitionParam.builder()
            .model("paraformer-realtime-v2")
            .format("pcm")
            .sampleRate(16000)
            // 若没有将API Key配置到环境变量中,需将下面这行代码注释放开,并将apiKey替换为自己的API Key
            // .apiKey("apikey")
            .build();

        // 流式调用接口
        recognizer.streamCall(param, audioSource)
            // 调用Flowable的subscribe方法订阅结果
            .blockingForEach(
                result -> {
                    // 打印最终结果
                    if (result.isSentenceEnd()) {
                        System.out.println("Fix:" + result.getSentence().getText());
                    } else {
                        System.out.println("Result:" + result.getSentence().getText());
                    }
                });
        System.exit(0);
    }
}

Python

运行Python示例前,需要通过pip install pyaudio命令安装第三方音频播放与采集套件。

import pyaudio
from dashscope.audio.asr import (Recognition, RecognitionCallback,
                                 RecognitionResult)

# 若没有将API Key配置到环境变量中,需将下面这行代码注释放开,并将apiKey替换为自己的API Key
# import dashscope
# dashscope.api_key = "apiKey"

mic = None
stream = None


class Callback(RecognitionCallback):
    def on_open(self) -> None:
        global mic
        global stream
        print('RecognitionCallback open.')
        mic = pyaudio.PyAudio()
        stream = mic.open(format=pyaudio.paInt16,
                          channels=1,
                          rate=16000,
                          input=True)

    def on_close(self) -> None:
        global mic
        global stream
        print('RecognitionCallback close.')
        stream.stop_stream()
        stream.close()
        mic.terminate()
        stream = None
        mic = None

    def on_event(self, result: RecognitionResult) -> None:
        print('RecognitionCallback sentence: ', result.get_sentence())


callback = Callback()
recognition = Recognition(model='paraformer-realtime-v2',
                          format='pcm',
                          sample_rate=16000,
                          callback=callback)
recognition.start()

while True:
    if stream:
        data = stream.read(3200, exception_on_overflow=False)
        recognition.send_audio_frame(data)
    else:
        break

recognition.stop()

识别本地音频文件

实时语音识别可以识别本地音频文件并输出识别结果。对于对话聊天、控制口令、语音输入法、语音搜索等较短的准实时语音识别场景可考虑采用该接口进行语音识别。

Java

import com.alibaba.dashscope.audio.asr.recognition.Recognition;
import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;

public class Main {
    public static void main(String[] args) {
        // 用户可忽略url下载文件部分,可以直接使用本地文件进行相关api调用进行识别
        String exampleWavUrl =
                "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav";
        try {
            InputStream in = new URL(exampleWavUrl).openStream();
            Files.copy(in, Paths.get("asr_example.wav"), StandardCopyOption.REPLACE_EXISTING);
        } catch (IOException e) {
            System.out.println("error: " + e);
            System.exit(1);
        }

        // 创建Recognition实例
        Recognition recognizer = new Recognition();
        // 创建RecognitionParam
        RecognitionParam param =
                RecognitionParam.builder()
                        // 若没有将API Key配置到环境变量中,需将下面这行代码注释放开,并将apiKey替换为自己的API Key
                        // .apiKey("apikey")
                        .model("paraformer-realtime-v2")
                        .format("wav")
                        .sampleRate(16000)
                        // “language_hints”只支持paraformer-v2和paraformer-realtime-v2模型
                        .parameter("language_hints", new String[]{"zh", "en"})
                        .build();

        try {
            System.out.println("识别结果:" + recognizer.call(param, new File("asr_example.wav")));
        } catch (Exception e) {
            e.printStackTrace();
        }
        System.exit(0);
    }
}

Python

import requests
from http import HTTPStatus
from dashscope.audio.asr import Recognition

# 若没有将API Key配置到环境变量中,需将下面这行代码注释放开,并将apiKey替换为自己的API Key
# import dashscope
# dashscope.api_key = "apiKey"

# 用户可忽略从url下载文件这部分代码,直接使用本地文件进行识别
r = requests.get(
    'https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav'
)
with open('asr_example.wav', 'wb') as f:
    f.write(r.content)

recognition = Recognition(model='paraformer-realtime-v2',
                          format='wav',
                          sample_rate=16000,
                          # “language_hints”只支持paraformer-v2和paraformer-realtime-v2模型
                          language_hints=['zh', 'en'],
                          callback=None)
result = recognition.call('asr_example.wav')
if result.status_code == HTTPStatus.OK:
    print('识别结果:')
    print(result.get_sentence())
else:
    print('Error: ', result.message)

应用于生产环境

提升识别效果

  • 选择正确采样率的模型:8kHz 的电话音频应直接使用 8kHz 模型,而不是升采样到 16kHz 再识别,这样可以避免信息失真,获得更佳效果。

  • 使用热词功能:针对业务中的专有名词、人名、品牌名等,配置热词可以显著提升识别准确率,详情请参见定制热词-Paraformer/Fun-ASR定制热词-Gummy

  • 优化输入音频质量:尽量使用高质量的麦克风,并确保录音环境信噪比高、无回声。在应用层面,可以集成降噪(如RNNoise)、回声消除(AEC)等算法对音频进行预处理,以获得更纯净的音频。

  • 明确指定识别语种:对于Paraformer-v2等支持多语种的模型,如果在调用时能预先确定音频的语种(如使用Language_hints参数指定语种为['zh','en']),可以帮助模型收敛,避免在相似发音的语种间混淆,提升准确性。

  • 语气词过滤:对于Paraformer模型,可以通过设置参数disfluency_removal_enabled开启语气词过滤功能,获得更书面、更易读的文本结果。

设置容错策略

  • 客户端重连:客户端应实现断线自动重连机制,以应对网络抖动。以Python SDK为例,您可以参考如下建议:

    1. 捕获异常:在Callback类中实现on_error方法。当dashscope SDK遇到网络错误或其他问题时,会调用该方法。

    2. 状态通知:当on_error被触发时,设置重连信号。在Python中可以使用threading.Event,它是一种线程安全的信号标志。

    3. 重连循环:将主逻辑包裹在一个for循环中(例如重试3次)。当检测到重连信号后,当前轮次的识别会中断,清理资源,然后等待几秒钟,再次进入循环,创建一个全新的连接。

  • 设置心跳防止连接断开:当需要与服务端保持长连接时,可将参数heartbeat设置为true,即使音频中长时间没有声音,与服务端的连接也不会中断。

  • 模型限流:在调用模型接口时请注意模型的限流规则。

API参考

模型功能特性对比

功能/特性

fun-asr-realtime、fun-asr-realtime-2025-11-07

fun-asr-realtime-2025-09-15

gummy-realtime-v1

gummy-chat-v1

paraformer-realtime-v2

paraformer-realtime-v1

paraformer-realtime-8k-v2

paraformer-realtime-8k-v1

核心场景

视频直播、会议、三语教学等

视频直播、会议、双语教学等

长语音流式识别(会议、直播)

短语音交互(对话、指令)

长语音流式识别(会议、直播)

电话客服等

支持语言

中文(普通话、粤语、吴语、闽南语、客家话、赣语、湘语、晋语;并支持中原、西南、冀鲁、江淮、兰银、胶辽、东北、北京、港台等,包括河南、陕西、湖北、四川、重庆、云南、贵州、广东、广西、河北、天津、山东、安徽、南京、江苏、杭州、甘肃、宁夏等地区官话口音)、英文、日语

中文(普通话)、英文

中文、英文、日语、韩语、法语、德语、西班牙语、意大利语、俄语、粤语、葡萄牙语、印尼语、阿拉伯语、泰语、印地语、丹麦语、乌尔都语、土耳其语、荷兰语、马来语、越南语

中文(普通话、粤语、吴语、闽南语、东北话、甘肃话、贵州话、河南话、湖北话、湖南话、宁夏话、山西话、陕西话、山东话、四川话、天津话、江西话、云南话、上海话)、英文、日语、韩语、德语、法语、俄语

中文(普通话)

支持的音频格式

pcm、wav、mp3、opus、speex、aac、amr

采样率

16kHz

≥ 16kHz

16kHz

任意采样率

16kHz

8kHz

声道

单声道

输入形式

二进制音频流

音频大小/时长

不限

1分钟以内

不限

情感识别

不支持

支持 默认开启,可关闭

不支持

敏感词过滤

不支持

说话人分离

不支持

语气词过滤

支持 默认关闭,可开启

不支持

支持 默认关闭,可开启

时间戳

支持 固定开启

标点符号预测

支持 固定开启

支持 默认开启,可关闭

支持 固定开启

支持 默认开启,可关闭

支持 固定开启

热词

支持 可配置

ITN

支持 固定开启

VAD

支持 固定开启

限流(RPS)

20

10

20

接入方式

Java/Python/Android/iOS SDK、WebSocket API

价格

0.00033元/秒

0.00015元/秒

0.00024元/秒