实时语音翻译

更新时间: 2025-08-14 19:52:21

实时语音翻译结合了语音识别和机器翻译技术,直接将一种语言的语音转化为另一种语言的文本,实现“边说边翻译成文本”的效果

应用场景

  • 国际会议和商务交流:在多语言环境中,实时语音翻译助力与会者即时理解不同语言的发言,促进跨国沟通与合作。

  • 旅游和出行:在旅行或海外出差时,实时语音翻译帮助用户与当地人无障碍交流,解决问路、点餐、购物等场景中的语言障碍。

支持的模型

模型名称

支持的语言

支持的采样率

适用场景

支持的音频格式

单价

免费额度

gummy-realtime-v1

中文、英文、日语、韩语、粤语、德语、法语、俄语、意大利语、西班牙语

翻译语言对:

中 → 英/日/韩

英 → 中/日/韩

日/韩/粤/德/法/俄/意/西 → 中/英

16kHz及以上

会议演讲、视频直播等长时间不间断识别的场景

pcm、wav、mp3、opus、speex、aac、amr

0.00015元/秒

36,000秒(10小时)

2025年1月17日0点前开通百炼:有效期至2025年7月15日

2025年1月17日0点后开通百炼:自开通日起180天有效

gummy-chat-v1

16kHz

对话聊天、指令控制、语音输入法、语音搜索等短时语音交互场景

在线体验

Gummy实时语音翻译大模型均支持在线体验:

  • gummy-realtime-v1:在语音识别页面点击更多模型,选择“实时语音识别及翻译V1.0

  • gummy-chat-v1:在语音识别页面直接选择“一句话识别及翻译V1.0

快速开始

下面是调用API的示例代码。更多常用场景的代码示例,请参见GitHub

您需要已获取API Key配置API Key到环境变量。如果通过SDK调用,还需要安装DashScope SDK

实时语音翻译:适用于会议演讲、视频直播等长时间不间断的场景。

一句话翻译:对停顿更加敏感,支持对一分钟内的短语音进行精准翻译,适用于对话聊天、指令控制、语音输入法、语音搜索等短时语音交互场景。

实时语音翻译

实时语音翻译支持对长时间的语音数据流(无论是从外部设备如麦克风获取的音频流,还是从本地文件读取的音频流)进行翻译并流式返回结果。

翻译传入麦克风的语音

Java

import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerParam;
import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerRealtime;
import com.alibaba.dashscope.audio.asr.translation.results.TranslationRecognizerResult;
import com.alibaba.dashscope.common.ResultCallback;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.TargetDataLine;

import java.nio.ByteBuffer;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class Main {
    public static void main(String[] args) throws InterruptedException {
        ExecutorService executorService = Executors.newSingleThreadExecutor();
        executorService.submit(new RealtimeRecognitionTask());
        executorService.shutdown();
        executorService.awaitTermination(1, TimeUnit.MINUTES);
        System.exit(0);
    }
}

class RealtimeRecognitionTask implements Runnable {
    @Override
    public void run() {
        String targetLanguage = "en";
        // 初始化请求参数
        TranslationRecognizerParam param =
                TranslationRecognizerParam.builder()
                        // 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
                        // .apiKey("your-api-key")
                        .model("gummy-realtime-v1") // 设置模型名
                        .format("pcm") // 设置待识别音频格式,支持的音频格式:pcm、wav、mp3、opus、speex、aac、amr
                        .sampleRate(16000) // 设置待识别音频采样率(单位Hz)。支持16000Hz及以上采样率。
                        .transcriptionEnabled(true) // 设置是否开启实时识别
                        .sourceLanguage("auto") // 设置源语言(待识别/翻译语言)代码
                        .translationEnabled(true) // 设置是否开启实时翻译
                        .translationLanguages(new String[] {targetLanguage}) // 设置翻译目标语言
                        .build();

        // 初始化回调接口
        ResultCallback<TranslationRecognizerResult> callback =
                new ResultCallback<TranslationRecognizerResult>() {
                    @Override
                    public void onEvent(TranslationRecognizerResult result) {
                        System.out.println("RequestId: " + result.getRequestId());
                        // 打印最终结果
                        if (result.getTranscriptionResult() != null) {
                            System.out.println("Transcription Result:"+result);
                            if (result.isSentenceEnd()) {
                                System.out.println("\tFix:" + result.getTranscriptionResult().getText());
                            } else {
                                System.out.println("\tTemp Result:" + result.getTranscriptionResult().getText());
                            }
                        }
                        if (result.getTranslationResult() != null) {
                            System.out.println("English Translation Result:");
                            if (result.isSentenceEnd()) {
                                System.out.println("\tFix:" + result.getTranslationResult().getTranslation(targetLanguage).getText());
                            } else {
                                System.out.println("\tTemp Result:" + result.getTranslationResult().getTranslation(targetLanguage).getText());
                            }
                        }
                    }

                    @Override
                    public void onComplete() {
                        System.out.println("Translation complete");
                    }

                    @Override
                    public void onError(Exception e) {
                        e.printStackTrace();
                        System.out.println("TranslationCallback error: " + e.getMessage());
                    }
                };

        // 初始化流式识别服务
        TranslationRecognizerRealtime translator = new TranslationRecognizerRealtime();
        // 启动流式语音识别/翻译,绑定请求参数和回调接口
        translator.call(param, callback);

        try {
            // 创建音频格式
            AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
            // 根据格式匹配默认录音设备
            TargetDataLine targetDataLine =
                    AudioSystem.getTargetDataLine(audioFormat);
            targetDataLine.open(audioFormat);
            // 开始录音
            targetDataLine.start();
            System.out.println("请您通过麦克风讲话体验实时语音识别和翻译功能");
            ByteBuffer buffer = ByteBuffer.allocate(1024);
            long start = System.currentTimeMillis();
            // 录音50s并进行实时识别
            while (System.currentTimeMillis() - start < 50000) {
                int read = targetDataLine.read(buffer.array(), 0, buffer.capacity());
                if (read > 0) {
                    buffer.limit(read);
                    // 将录音音频数据发送给流式识别服务
                    translator.sendAudioFrame(buffer);
                    buffer = ByteBuffer.allocate(1024);
                    // 录音速率有限,防止cpu占用过高,休眠一小会儿
                    Thread.sleep(20);
                }
            }
            // 通知结束
            translator.stop();
        } catch (Exception e) {
            e.printStackTrace();
        }

        System.out.println(
                "[Metric] requestId: "
                        + translator.getLastRequestId()
                        + ", first package delay ms: "
                        + translator.getFirstPackageDelay()
                        + ", last package delay ms: "
                        + translator.getLastPackageDelay());
    }
}

Python

import pyaudio
import dashscope
from dashscope.audio.asr import *


# 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
# dashscope.api_key = "your-api-key"

mic = None
stream = None

class Callback(TranslationRecognizerCallback):
    def on_open(self) -> None:
        global mic
        global stream
        print("TranslationRecognizerCallback open.")
        mic = pyaudio.PyAudio()
        stream = mic.open(
            format=pyaudio.paInt16, channels=1, rate=16000, input=True
        )

    def on_close(self) -> None:
        global mic
        global stream
        print("TranslationRecognizerCallback close.")
        stream.stop_stream()
        stream.close()
        mic.terminate()
        stream = None
        mic = None

    def on_event(
        self,
        request_id,
        transcription_result: TranscriptionResult,
        translation_result: TranslationResult,
        usage,
    ) -> None:
        print("request id: ", request_id)
        print("usage: ", usage)
        if translation_result is not None:
            print(
                "translation_languages: ",
                translation_result.get_language_list(),
            )
            english_translation = translation_result.get_translation("en")
            print("sentence id: ", english_translation.sentence_id)
            print("translate to english: ", english_translation.text)
        if transcription_result is not None:
            print("sentence id: ", transcription_result.sentence_id)
            print("transcription: ", transcription_result.text)


callback = Callback()


translator = TranslationRecognizerRealtime(
    model="gummy-realtime-v1",
    format="pcm",
    sample_rate=16000,
    transcription_enabled=True,
    translation_enabled=True,
    translation_target_languages=["en"],
    callback=callback,
)
translator.start()
print("请您通过麦克风讲话体验实时语音识别和翻译功能")
while True:
    if stream:
        data = stream.read(3200, exception_on_overflow=False)
        translator.send_audio_frame(data)
    else:
        break

translator.stop()

翻译本地音频文件

Java

示例中用到的音频为:hello_world.wav

import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerParam;
import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerRealtime;
import com.alibaba.dashscope.audio.asr.translation.results.TranslationRecognizerResult;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.exception.NoApiKeyException;

import java.io.FileInputStream;
import java.nio.ByteBuffer;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

class RealtimeTranslateTask implements Runnable {
    private Path filepath;

    public RealtimeTranslateTask(Path filepath) {
        this.filepath = filepath;
    }

    @Override
    public void run() {
        String targetLanguage = "en";
        // Create translation params
        // you can customize the translation parameters, like model, format,
        // sample_rate for more information, please refer to
        // https://help.aliyun.com/document_detail/2712536.html
        TranslationRecognizerParam param =
                TranslationRecognizerParam.builder()
                        // 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
                        // .apiKey("your-api-key")
                        .model("gummy-realtime-v1")
                        .format("pcm") // 'pcm'、'wav'、'mp3'、'opus'、'speex'、'aac'、'amr', you
                        // can check the supported formats in the document
                        .sampleRate(16000)
                        .transcriptionEnabled(true)
                        .sourceLanguage("auto")
                        .translationEnabled(true)
                        .translationLanguages(new String[] {targetLanguage})
                        .build();
        TranslationRecognizerRealtime translator = new TranslationRecognizerRealtime();
        CountDownLatch latch = new CountDownLatch(1);

        String threadName = Thread.currentThread().getName();

        ResultCallback<TranslationRecognizerResult> callback =
                new ResultCallback<TranslationRecognizerResult>() {
                    @Override
                    public void onEvent(TranslationRecognizerResult result) {
                        System.out.println("RequestId: " + result.getRequestId());
                        // 打印最终结果
                        if (result.getTranscriptionResult() != null) {
                            System.out.println("Transcription Result:"+result);
                            if (result.isSentenceEnd()) {
                                System.out.println("\tFix:" + result.getTranscriptionResult().getText());
                            } else {
                                System.out.println("\tTemp Result:" + result.getTranscriptionResult().getText());
                            }
                        }
                        if (result.getTranslationResult() != null) {
                            System.out.println("English Translation Result:");
                            if (result.isSentenceEnd()) {
                                System.out.println("\tFix:" + result.getTranslationResult().getTranslation(targetLanguage).getText());
                            } else {
                                System.out.println("\tTemp Result:" + result.getTranslationResult().getTranslation(targetLanguage).getText());
                            }
                        }
                    }

                    @Override
                    public void onComplete() {
                        System.out.println("[" + threadName + "] Translation complete");
                        latch.countDown();
                    }

                    @Override
                    public void onError(Exception e) {
                        e.printStackTrace();
                        System.out.println("[" + threadName + "] TranslationCallback error: " + e.getMessage());
                    }
                };
        // set param & callback
        translator.call(param, callback);
        // Please replace the path with your audio file path
        System.out.println("[" + threadName + "] Input file_path is: " + this.filepath);
        // Read file and send audio by chunks
        try (FileInputStream fis = new FileInputStream(this.filepath.toFile())) {
            // chunk size set to 1 seconds for 16KHz sample rate
            byte[] buffer = new byte[3200];
            int bytesRead;
            // Loop to read chunks of the file
            while ((bytesRead = fis.read(buffer)) != -1) {
                ByteBuffer byteBuffer;
                // Handle the last chunk which might be smaller than the buffer size
                System.out.println("[" + threadName + "] bytesRead: " + bytesRead);
                if (bytesRead < buffer.length) {
                    byteBuffer = ByteBuffer.wrap(buffer, 0, bytesRead);
                } else {
                    byteBuffer = ByteBuffer.wrap(buffer);
                }
                // Send the ByteBuffer to the translation instance
                translator.sendAudioFrame(byteBuffer);
                buffer = new byte[3200];
                Thread.sleep(100);
            }
            System.out.println(LocalDateTime.now());
        } catch (Exception e) {
            e.printStackTrace();
        }

        translator.stop();
        // wait for the translation to complete
        try {
            latch.await();
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }

}

public class Main {
    public static void main(String[] args)
            throws NoApiKeyException, InterruptedException {

        String currentDir = System.getProperty("user.dir");
        // Please replace the path with your audio source
        Path[] filePaths = {
                Paths.get(currentDir, "hello_world.wav"),
//                Paths.get(currentDir, "hello_world_male_16k_16bit_mono.wav"),
        };
        // Use ThreadPool to run recognition tasks
        ExecutorService executorService = Executors.newFixedThreadPool(10);
        for (Path filepath:filePaths) {
            executorService.submit(new RealtimeTranslateTask(filepath));
        }
        executorService.shutdown();
        // wait for all tasks to complete
        executorService.awaitTermination(1, TimeUnit.MINUTES);
        System.exit(0);
    }
}

Python

import os
import requests
from http import HTTPStatus

import dashscope
from dashscope.audio.asr import *

# 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
# dashscope.api_key = "your-api-key"

r = requests.get(
    "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav"
)
with open("asr_example.wav", "wb") as f:
    f.write(r.content)


class Callback(TranslationRecognizerCallback):
    def on_open(self) -> None:
        print("TranslationRecognizerCallback open.")

    def on_close(self) -> None:
        print("TranslationRecognizerCallback close.")

    def on_event(
        self,
        request_id,
        transcription_result: TranscriptionResult,
        translation_result: TranslationResult,
        usage,
    ) -> None:
        print("request id: ", request_id)
        print("usage: ", usage)
        if translation_result is not None:
            print(
                "translation_languages: ",
                translation_result.get_language_list(),
            )
            english_translation = translation_result.get_translation("en")
            print("sentence id: ", english_translation.sentence_id)
            print("translate to english: ", english_translation.text)
        if transcription_result is not None:
            print("sentence id: ", transcription_result.sentence_id)
            print("transcription: ", transcription_result.text)
    
    def on_error(self, message) -> None:
        print('error: {}'.format(message))
    
    def on_complete(self) -> None:
        print('TranslationRecognizerCallback complete')


callback = Callback()


translator = TranslationRecognizerRealtime(
    model="gummy-realtime-v1",
    format="wav",
    sample_rate=16000,
    callback=callback,
)

translator.start()

try:
    audio_data: bytes = None
    f = open("asr_example.wav", 'rb')
    if os.path.getsize("asr_example.wav"):
        while True:
            audio_data = f.read(12800)
            if not audio_data:
                break
            else:
                translator.send_audio_frame(audio_data)
    else:
        raise Exception(
            'The supplied file was empty (zero bytes long)')
    f.close()
except Exception as e:
    raise e

translator.stop()

一句话翻译

一句话翻译能够对一分钟内的语音数据流(无论是从外部设备如麦克风获取的音频流,还是从本地文件读取的音频流)进行翻译并流式返回结果。

翻译传入麦克风的语音

Java

import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerChat;
import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerParam;
import com.alibaba.dashscope.audio.asr.translation.results.TranscriptionResult;
import com.alibaba.dashscope.audio.asr.translation.results.TranslationRecognizerResult;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.exception.NoApiKeyException;

import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.TargetDataLine;
import java.nio.ByteBuffer;

public class Main {

    public static void main(String[] args) throws NoApiKeyException, InterruptedException {

        // 创建Recognizer
        TranslationRecognizerChat translator = new TranslationRecognizerChat();
        // 初始化请求参数
        TranslationRecognizerParam param =
                TranslationRecognizerParam.builder()
                        // 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
                        // .apiKey("your-api-key")
                        .model("gummy-chat-v1") // 设置模型名
                        .format("pcm") // 设置待识别音频格式,支持的音频格式:pcm、pcm编码的wav、mp3、ogg封装的opus、ogg封装的speex、aac、amr
                        .sampleRate(16000) // 设置待识别音频采样率(单位Hz)。仅支持16000Hz采样率。
                        .transcriptionEnabled(true) // 设置是否开启实时识别
                        .translationEnabled(true) // 设置是否开启实时翻译
                        .translationLanguages(new String[] {"en"}) // 设置翻译目标语言
                        .build();


        // 创建一个Flowable<ByteBuffer>
        Thread thread = new Thread(
                () -> {
                    try {
                        // 创建音频格式
                        AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
                        // 根据格式匹配默认录音设备
                        TargetDataLine targetDataLine =
                                AudioSystem.getTargetDataLine(audioFormat);
                        targetDataLine.open(audioFormat);
                        // 开始录音
                        targetDataLine.start();
                        System.out.println("请您通过麦克风讲话体验一句话语音识别和翻译功能");
                        ByteBuffer buffer = ByteBuffer.allocate(1024);
                        long start = System.currentTimeMillis();
                        // 录音5s并进行实时识别
                        while (System.currentTimeMillis() - start < 50000) {
                            int read = targetDataLine.read(buffer.array(), 0, buffer.capacity());
                            if (read > 0) {
                                buffer.limit(read);
                                // 将录音音频数据发送给流式识别服务
                                if (!translator.sendAudioFrame(buffer)) {
                                    System.out.println("sentence end, stop sending");
                                    break;
                                }
                                buffer = ByteBuffer.allocate(1024);
                                // 录音速率有限,防止cpu占用过高,休眠一小会儿
                                Thread.sleep(20);
                            }
                        }
                    } catch (LineUnavailableException e) {
                        throw new RuntimeException(e);
                    } catch (InterruptedException e) {
                        throw new RuntimeException(e);
                    }
                });

        translator.call(param, new ResultCallback<TranslationRecognizerResult>() {
            @Override
            public void onEvent(TranslationRecognizerResult result) {
                if (result.getTranscriptionResult() == null) {
                    return;
                }
                try {
                    System.out.println("RequestId: " + result.getRequestId());
                    // 打印最终结果
                    if (result.getTranscriptionResult() != null) {
                        System.out.println("Transcription Result:");
                        if (result.isSentenceEnd()) {
                            System.out.println("\tFix:" + result.getTranscriptionResult().getText());
                        } else {
                            TranscriptionResult transcriptionResult = result.getTranscriptionResult();
                            System.out.println("\tTemp Result:" + transcriptionResult.getText());
                            if (result.getTranscriptionResult().isVadPreEnd()) {
                                System.out.printf("VadPreEnd: start:%d, end:%d, time:%d\n", transcriptionResult.getPreEndStartTime(), transcriptionResult.getPreEndEndTime(), transcriptionResult.getPreEndTimemillis());
                            }
                        }
                    }
                    if (result.getTranslationResult() != null) {
                        System.out.println("English Translation Result:");
                        if (result.isSentenceEnd()) {
                            System.out.println("\tFix:" + result.getTranslationResult().getTranslation("en").getText());
                        } else {
                            System.out.println("\tTemp Result:" + result.getTranslationResult().getTranslation("en").getText());
                        }
                    }
                } catch (Exception e) {
                    e.printStackTrace();
                }

            }

            @Override
            public void onComplete() {
                System.out.println("Translation complete");
            }

            @Override
            public void onError(Exception e) {

            }
        });

        thread.start();
        thread.join();
        translator.stop();
//        System.exit(0);
    }
}

Python

import pyaudio
import dashscope
from dashscope.audio.asr import *


# 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
# dashscope.api_key = "your-api-key"

mic = None
stream = None

class Callback(TranslationRecognizerCallback):
    def on_open(self) -> None:
        global mic
        global stream
        print("TranslationRecognizerCallback open.")
        mic = pyaudio.PyAudio()
        stream = mic.open(
            format=pyaudio.paInt16, channels=1, rate=16000, input=True
        )

    def on_close(self) -> None:
        global mic
        global stream
        print("TranslationRecognizerCallback close.")
        stream.stop_stream()
        stream.close()
        mic.terminate()
        stream = None
        mic = None

    def on_event(
        self,
        request_id,
        transcription_result: TranscriptionResult,
        translation_result: TranslationResult,
        usage,
    ) -> None:
        print("request id: ", request_id)
        print("usage: ", usage)
        if translation_result is not None:
            print(
                "translation_languages: ",
                translation_result.get_language_list(),
            )
            english_translation = translation_result.get_translation("en")
            print("sentence id: ", english_translation.sentence_id)
            print("translate to english: ", english_translation.text)
            if english_translation.vad_pre_end:
                print("vad pre end {}, {}, {}".format(transcription_result.pre_end_start_time, transcription_result.pre_end_end_time, transcription_result.pre_end_timemillis))
        if transcription_result is not None:
            print("sentence id: ", transcription_result.sentence_id)
            print("transcription: ", transcription_result.text)


callback = Callback()


translator = TranslationRecognizerChat(
    model="gummy-chat-v1",
    format="pcm",
    sample_rate=16000,
    transcription_enabled=True,
    translation_enabled=True,
    translation_target_languages=["en"],
    callback=callback,
)
translator.start()
print("请您通过麦克风讲话体验一句话语音识别和翻译功能")
while True:
    if stream:
        data = stream.read(3200, exception_on_overflow=False)
        if not translator.send_audio_frame(data):
            print("sentence end, stop sending")
            break
    else:
        break

translator.stop()

翻译本地音频文件

Java

示例中用到的音频为:hello_world.wav

import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerChat;
import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerParam;
import com.alibaba.dashscope.audio.asr.translation.results.TranslationRecognizerResult;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.exception.NoApiKeyException;

import java.io.FileInputStream;
import java.nio.ByteBuffer;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.Duration;
import java.time.LocalDateTime;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

class RealtimeTranslateChatTask implements Runnable {
    private Path filepath;
    private TranslationRecognizerChat translator = null;

    public RealtimeTranslateChatTask(Path filepath) {
        this.filepath = filepath;
    }

    @Override
    public void run() {
        for (int i=0; i<1; i++) {
            // 初始化请求参数
            TranslationRecognizerParam param =
                    TranslationRecognizerParam.builder()
                            // 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
                            // .apiKey("your-api-key")
                            .model("gummy-chat-v1") // 设置模型名
                            .format("wav") // 设置待识别音频格式,支持的音频格式:pcm、pcm编码的wav、mp3、ogg封装的opus、ogg封装的speex、aac、amr
                            .sampleRate(16000) // 设置待识别音频采样率(单位Hz)。只支持16000Hz的采样率。
                            .transcriptionEnabled(true) // 设置是否开启实时识别
                            .translationEnabled(true) // 设置是否开启实时翻译
                            .translationLanguages(new String[] {"en"}) // 设置翻译目标语言
                            .build();
            if (translator == null) {
                // 初始化流式识别服务
                translator = new TranslationRecognizerChat();
            }
            CountDownLatch latch = new CountDownLatch(1);

            String threadName = Thread.currentThread().getName();

            // 初始化回调接口
            ResultCallback<TranslationRecognizerResult> callback =
                    new ResultCallback<TranslationRecognizerResult>() {
                        @Override
                        public void onEvent(TranslationRecognizerResult result) {
                            System.out.println("RequestId: " + result.getRequestId());
                            // 打印最终结果
                            if (result.getTranscriptionResult() != null) {
                                System.out.println("Transcription Result:"+result);
                                if (result.isSentenceEnd()) {
                                    System.out.println("\tFix:" + result.getTranscriptionResult().getText());
                                } else {
                                    System.out.println("\tTemp Result:" + result.getTranscriptionResult().getText());
                                }
                            }
                            if (result.getTranslationResult() != null) {
                                System.out.println("English Translation Result:");
                                if (result.isSentenceEnd()) {
                                    System.out.println("\tFix:" + result.getTranslationResult().getTranslation("en").getText());
                                } else {
                                    System.out.println("\tTemp Result:" + result.getTranslationResult().getTranslation("en").getText());
                                }
                            }
                        }

                        @Override
                        public void onComplete() {
                            System.out.println("[" + threadName + "] Translation complete");
                            latch.countDown();
                        }

                        @Override
                        public void onError(Exception e) {
                            e.printStackTrace();
                            System.out.println("[" + threadName + "] TranslationCallback error: " + e.getMessage());
                        }
                    };
            // 启动流式语音识别/翻译,绑定请求参数和回调接口
            translator.call(param, callback);
            // 替换成您自己的文件路径
            System.out.println("[" + threadName + "] Input file_path is: " + this.filepath);
            // Read file and send audio by chunks
            try (FileInputStream fis = new FileInputStream(this.filepath.toFile())) {
                // chunk size set to 1 seconds for 16KHz sample rate
                byte[] buffer = new byte[3200];
                int bytesRead;
                // Loop to read chunks of the file
                while ((bytesRead = fis.read(buffer)) != -1) {
                    ByteBuffer byteBuffer;
                    // Handle the last chunk which might be smaller than the buffer size
                    System.out.println("[" + threadName + "] bytesRead: " + bytesRead);
                    if (bytesRead < buffer.length) {
                        byteBuffer = ByteBuffer.wrap(buffer, 0, bytesRead);
                    } else {
                        byteBuffer = ByteBuffer.wrap(buffer);
                    }
                    // Send the ByteBuffer to the translation instance
                    if (!translator.sendAudioFrame(byteBuffer)) {
                        System.out.println("sentence end, stop sending");
                        break;
                    }
                    buffer = new byte[3200];
                    Thread.sleep(100);
                }
                fis.close();
                System.out.println(LocalDateTime.now());
            } catch (Exception e) {
                e.printStackTrace();
            }

            // 通知结束
            translator.stop();
            // wait for the translation to complete
            try {
                latch.await();
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        }
    }

}


public class Main {
    public static void main(String[] args)
            throws NoApiKeyException, InterruptedException {
        String currentDir = System.getProperty("user.dir");
        // Please replace the path with your audio source
        Path[] filePaths = {
                Paths.get(currentDir, "hello_world.wav"),
//                Paths.get(currentDir, "hello_world_male_16k_16bit_mono.wav"),
        };
        // Use ThreadPool to run recognition tasks
        ExecutorService executorService = Executors.newFixedThreadPool(10);
        for (Path filepath:filePaths) {
            executorService.submit(new RealtimeTranslateChatTask(filepath));
        }
        executorService.shutdown();
        // wait for all tasks to complete
        executorService.awaitTermination(1, TimeUnit.MINUTES);
//        System.exit(0);
    }
}

Python

import os
import requests
from http import HTTPStatus

import dashscope
from dashscope.audio.asr import *

# 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
# dashscope.api_key = "your-api-key"

r = requests.get(
    "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav"
)
with open("asr_example.wav", "wb") as f:
    f.write(r.content)


class Callback(TranslationRecognizerCallback):
    def on_open(self) -> None:
        print("TranslationRecognizerCallback open.")

    def on_close(self) -> None:
        print("TranslationRecognizerCallback close.")

    def on_event(
            self,
            request_id,
            transcription_result: TranscriptionResult,
            translation_result: TranslationResult,
            usage,
    ) -> None:
        print("request id: ", request_id)
        print("usage: ", usage)
        if translation_result is not None:
            print(
                "translation_languages: ",
                translation_result.get_language_list(),
            )
            english_translation = translation_result.get_translation("en")
            print("sentence id: ", english_translation.sentence_id)
            print("translate to english: ", english_translation.text)
        if transcription_result is not None:
            print("sentence id: ", transcription_result.sentence_id)
            print("transcription: ", transcription_result.text)

    def on_error(self, message) -> None:
        print('error: {}'.format(message))

    def on_complete(self) -> None:
        print('TranslationRecognizerCallback complete')


callback = Callback()

translator = TranslationRecognizerChat(
    model="gummy-chat-v1",
    format="wav",
    sample_rate=16000,
    callback=callback,
)

translator.start()

try:
    audio_data: bytes = None
    f = open("asr_example.wav", 'rb')
    if os.path.getsize("asr_example.wav"):
        while True:
            audio_data = f.read(12800)
            if not audio_data:
                break
            else:
                if translator.send_audio_frame(audio_data):
                    print("send audio frame success")
                else:
                    print("sentence end, stop sending")
                    break
    else:
        raise Exception(
            'The supplied file was empty (zero bytes long)')
    f.close()
except Exception as e:
    raise e

translator.stop()

输入文件限制

对本地音频文件进行翻译时:

  • 输入文件的方式:将本地文件路径作为参数传递。

  • 文件数量:单次调用最多输入1个文件。

  • 文件大小:无限制。

  • 音频时长

    • Paraformer:无限制

    • Gummy实时语音翻译:无限制

    • Gummy一句话翻译:一分钟以内

  • 文件格式:支持pcm、pcm编码的wav、mp3、ogg封装的opus、ogg封装的speex、aac、amr这几种格式。推荐pcm和wav。

    由于音频文件格式及其变种众多,因此不能保证所有格式均能够被正确识别。请通过测试验证您所提供的文件能够获得正常的语音识别结果。
  • 音频采样位数:16bit。

  • 采样率:因模型而异。

    采样率是指每秒对声音信号的采样次数。更高的采样率提供更多信息,有助于提高语音识别的准确率,但过高的采样率可能引入更多无关信息,反而影响识别效果。应根据实际采样率选择合适的模型。例如,8000Hz的语音数据应直接使用支持8000Hz的模型,无需转换为16000Hz。

API参考

模型应用上架及备案

参见通义大模型应用上架及合规备案

常见问题

1. 可能影响识别准确率的因素有哪些?

  1. 声音质量:设备、环境等可能影响语音的清晰度,从而影响识别准确率。高质量的音频输入是提高识别准确性的前提。

  2. 说话人特征:不同人的声音特质(如音调、语速、口音、方言)差异很大,这些个体差异会对语音识别系统构成挑战,尤其是对于未充分训练过的特定口音或方言。

  3. 语言和词汇:语音识别模型通常针对特定的语言进行训练。当处理多语言混合、专业术语、俚语或网络用语时,识别难度会增加。若模型支持热词功能,可通过热词调整识别结果。

  4. 上下文理解:缺乏对对话上下文的理解可能会导致误解,尤其是在含义模糊或依赖于上下文的情境中。

2. 模型限流规则是怎样的?

模型名称

提交作业接口RPS限制

gummy-realtime-v1

10

gummy-chat-v1

上一篇: 实时语音识别 下一篇: 用户指南(应用)