实时语音翻译结合了语音识别和机器翻译技术,直接将一种语言的语音转化为另一种语言的文本,实现“边说边翻译成文本”的效果。
应用场景
国际会议和商务交流:在多语言环境中,实时语音翻译助力与会者即时理解不同语言的发言,促进跨国沟通与合作。
旅游和出行:在旅行或海外出差时,实时语音翻译帮助用户与当地人无障碍交流,解决问路、点餐、购物等场景中的语言障碍。
支持的模型
模型名称 | 支持的语言 | 支持的采样率 | 适用场景 | 支持的音频格式 | 单价 | 免费额度 |
gummy-realtime-v1 | 中文、英文、日语、韩语、粤语、德语、法语、俄语、意大利语、西班牙语 翻译语言对: 中 → 英/日/韩 英 → 中/日/韩 日/韩/粤/德/法/俄/意/西 → 中/英 | 16kHz及以上 | 会议演讲、视频直播等长时间不间断识别的场景 | pcm、wav、mp3、opus、speex、aac、amr | 0.00015元/秒 | 36,000秒(10小时) 2025年1月17日0点前开通百炼:有效期至2025年7月15日 2025年1月17日0点后开通百炼:自开通日起180天有效 |
gummy-chat-v1 | 16kHz | 对话聊天、指令控制、语音输入法、语音搜索等短时语音交互场景 |
在线体验
Gummy实时语音翻译大模型均支持在线体验:
快速开始
下面是调用API的示例代码。更多常用场景的代码示例,请参见GitHub。
您需要已获取API Key并配置API Key到环境变量。如果通过SDK调用,还需要安装DashScope SDK。
实时语音翻译:适用于会议演讲、视频直播等长时间不间断的场景。
一句话翻译:对停顿更加敏感,支持对一分钟内的短语音进行精准翻译,适用于对话聊天、指令控制、语音输入法、语音搜索等短时语音交互场景。
实时语音翻译
实时语音翻译支持对长时间的语音数据流(无论是从外部设备如麦克风获取的音频流,还是从本地文件读取的音频流)进行翻译并流式返回结果。
翻译传入麦克风的语音
Java
import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerParam;
import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerRealtime;
import com.alibaba.dashscope.audio.asr.translation.results.TranslationRecognizerResult;
import com.alibaba.dashscope.common.ResultCallback;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.TargetDataLine;
import java.nio.ByteBuffer;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class Main {
public static void main(String[] args) throws InterruptedException {
ExecutorService executorService = Executors.newSingleThreadExecutor();
executorService.submit(new RealtimeRecognitionTask());
executorService.shutdown();
executorService.awaitTermination(1, TimeUnit.MINUTES);
System.exit(0);
}
}
class RealtimeRecognitionTask implements Runnable {
@Override
public void run() {
String targetLanguage = "en";
// 初始化请求参数
TranslationRecognizerParam param =
TranslationRecognizerParam.builder()
// 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
// .apiKey("your-api-key")
.model("gummy-realtime-v1") // 设置模型名
.format("pcm") // 设置待识别音频格式,支持的音频格式:pcm、wav、mp3、opus、speex、aac、amr
.sampleRate(16000) // 设置待识别音频采样率(单位Hz)。支持16000Hz及以上采样率。
.transcriptionEnabled(true) // 设置是否开启实时识别
.sourceLanguage("auto") // 设置源语言(待识别/翻译语言)代码
.translationEnabled(true) // 设置是否开启实时翻译
.translationLanguages(new String[] {targetLanguage}) // 设置翻译目标语言
.build();
// 初始化回调接口
ResultCallback<TranslationRecognizerResult> callback =
new ResultCallback<TranslationRecognizerResult>() {
@Override
public void onEvent(TranslationRecognizerResult result) {
System.out.println("RequestId: " + result.getRequestId());
// 打印最终结果
if (result.getTranscriptionResult() != null) {
System.out.println("Transcription Result:"+result);
if (result.isSentenceEnd()) {
System.out.println("\tFix:" + result.getTranscriptionResult().getText());
System.out.println("\tStash:" + result.getTranscriptionResult().getStash());
} else {
System.out.println("\tTemp Result:" + result.getTranscriptionResult().getText());
}
}
if (result.getTranslationResult() != null) {
System.out.println("English Translation Result:");
if (result.isSentenceEnd()) {
System.out.println("\tFix:" + result.getTranslationResult().getTranslation(targetLanguage).getText());
System.out.println("\tStash:" + result.getTranslationResult().getTranslation(targetLanguage).getStash());
} else {
System.out.println("\tTemp Result:" + result.getTranslationResult().getTranslation(targetLanguage).getText());
}
}
}
@Override
public void onComplete() {
System.out.println("Translation complete");
}
@Override
public void onError(Exception e) {
e.printStackTrace();
System.out.println("TranslationCallback error: " + e.getMessage());
}
};
// 初始化流式识别服务
TranslationRecognizerRealtime translator = new TranslationRecognizerRealtime();
// 启动流式语音识别/翻译,绑定请求参数和回调接口
translator.call(param, callback);
try {
// 创建音频格式
AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
// 根据格式匹配默认录音设备
TargetDataLine targetDataLine =
AudioSystem.getTargetDataLine(audioFormat);
targetDataLine.open(audioFormat);
// 开始录音
targetDataLine.start();
System.out.println("请您通过麦克风讲话体验实时语音识别和翻译功能");
ByteBuffer buffer = ByteBuffer.allocate(1024);
long start = System.currentTimeMillis();
// 录音50s并进行实时识别
while (System.currentTimeMillis() - start < 50000) {
int read = targetDataLine.read(buffer.array(), 0, buffer.capacity());
if (read > 0) {
buffer.limit(read);
// 将录音音频数据发送给流式识别服务
translator.sendAudioFrame(buffer);
buffer = ByteBuffer.allocate(1024);
// 录音速率有限,防止cpu占用过高,休眠一小会儿
Thread.sleep(20);
}
}
// 通知结束
translator.stop();
} catch (Exception e) {
e.printStackTrace();
}
System.out.println(
"[Metric] requestId: "
+ translator.getLastRequestId()
+ ", first package delay ms: "
+ translator.getFirstPackageDelay()
+ ", last package delay ms: "
+ translator.getLastPackageDelay());
}
}
Python
# For prerequisites running the following sample, visit https://help.aliyun.com/document_detail/xxxxx.html
import pyaudio
import dashscope
from dashscope.audio.asr import *
# 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
# dashscope.api_key = "your-api-key"
mic = None
stream = None
class Callback(TranslationRecognizerCallback):
def on_open(self) -> None:
global mic
global stream
print("TranslationRecognizerCallback open.")
mic = pyaudio.PyAudio()
stream = mic.open(
format=pyaudio.paInt16, channels=1, rate=16000, input=True
)
def on_close(self) -> None:
global mic
global stream
print("TranslationRecognizerCallback close.")
stream.stop_stream()
stream.close()
mic.terminate()
stream = None
mic = None
def on_event(
self,
request_id,
transcription_result: TranscriptionResult,
translation_result: TranslationResult,
usage,
) -> None:
print("request id: ", request_id)
print("usage: ", usage)
if translation_result is not None:
print(
"translation_languages: ",
translation_result.get_language_list(),
)
english_translation = translation_result.get_translation("en")
print("sentence id: ", english_translation.sentence_id)
print("translate to english: ", english_translation.text)
if english_translation.stash is not None:
print(
"translate to english stash: ",
translation_result.get_translation("en").stash.text,
)
if transcription_result is not None:
print("sentence id: ", transcription_result.sentence_id)
print("transcription: ", transcription_result.text)
if transcription_result.stash is not None:
print("transcription stash: ", transcription_result.stash.text)
callback = Callback()
translator = TranslationRecognizerRealtime(
model="gummy-realtime-v1",
format="pcm",
sample_rate=16000,
transcription_enabled=True,
translation_enabled=True,
translation_target_languages=["en"],
callback=callback,
)
translator.start()
print("请您通过麦克风讲话体验实时语音识别和翻译功能")
while True:
if stream:
data = stream.read(3200, exception_on_overflow=False)
translator.send_audio_frame(data)
else:
break
translator.stop()
翻译本地音频文件
Java
示例中用到的音频为:hello_world.wav。
import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerParam;
import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerRealtime;
import com.alibaba.dashscope.audio.asr.translation.results.TranslationRecognizerResult;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.exception.NoApiKeyException;
import java.io.FileInputStream;
import java.nio.ByteBuffer;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.LocalDateTime;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
class RealtimeTranslateTask implements Runnable {
private Path filepath;
public RealtimeTranslateTask(Path filepath) {
this.filepath = filepath;
}
@Override
public void run() {
String targetLanguage = "en";
// Create translation params
// you can customize the translation parameters, like model, format,
// sample_rate for more information, please refer to
// https://help.aliyun.com/document_detail/2712536.html
TranslationRecognizerParam param =
TranslationRecognizerParam.builder()
// 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
// .apiKey("your-api-key")
.model("gummy-realtime-v1")
.format("pcm") // 'pcm'、'wav'、'mp3'、'opus'、'speex'、'aac'、'amr', you
// can check the supported formats in the document
.sampleRate(16000)
.transcriptionEnabled(true)
.sourceLanguage("auto")
.translationEnabled(true)
.translationLanguages(new String[] {targetLanguage})
.build();
TranslationRecognizerRealtime translator = new TranslationRecognizerRealtime();
CountDownLatch latch = new CountDownLatch(1);
String threadName = Thread.currentThread().getName();
ResultCallback<TranslationRecognizerResult> callback =
new ResultCallback<TranslationRecognizerResult>() {
@Override
public void onEvent(TranslationRecognizerResult result) {
System.out.println("RequestId: " + result.getRequestId());
// 打印最终结果
if (result.getTranscriptionResult() != null) {
System.out.println("Transcription Result:"+result);
if (result.isSentenceEnd()) {
System.out.println("\tFix:" + result.getTranscriptionResult().getText());
System.out.println("\tStash:" + result.getTranscriptionResult().getStash());
} else {
System.out.println("\tTemp Result:" + result.getTranscriptionResult().getText());
}
}
if (result.getTranslationResult() != null) {
System.out.println("English Translation Result:");
if (result.isSentenceEnd()) {
System.out.println("\tFix:" + result.getTranslationResult().getTranslation(targetLanguage).getText());
System.out.println("\tStash:" + result.getTranslationResult().getTranslation(targetLanguage).getStash());
} else {
System.out.println("\tTemp Result:" + result.getTranslationResult().getTranslation(targetLanguage).getText());
}
}
}
@Override
public void onComplete() {
System.out.println("[" + threadName + "] Translation complete");
latch.countDown();
}
@Override
public void onError(Exception e) {
e.printStackTrace();
System.out.println("[" + threadName + "] TranslationCallback error: " + e.getMessage());
}
};
// set param & callback
translator.call(param, callback);
// Please replace the path with your audio file path
System.out.println("[" + threadName + "] Input file_path is: " + this.filepath);
// Read file and send audio by chunks
try (FileInputStream fis = new FileInputStream(this.filepath.toFile())) {
// chunk size set to 1 seconds for 16KHz sample rate
byte[] buffer = new byte[3200];
int bytesRead;
// Loop to read chunks of the file
while ((bytesRead = fis.read(buffer)) != -1) {
ByteBuffer byteBuffer;
// Handle the last chunk which might be smaller than the buffer size
System.out.println("[" + threadName + "] bytesRead: " + bytesRead);
if (bytesRead < buffer.length) {
byteBuffer = ByteBuffer.wrap(buffer, 0, bytesRead);
} else {
byteBuffer = ByteBuffer.wrap(buffer);
}
// Send the ByteBuffer to the translation instance
translator.sendAudioFrame(byteBuffer);
buffer = new byte[3200];
Thread.sleep(100);
}
System.out.println(LocalDateTime.now());
} catch (Exception e) {
e.printStackTrace();
}
translator.stop();
// wait for the translation to complete
try {
latch.await();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
public class Main {
public static void main(String[] args)
throws NoApiKeyException, InterruptedException {
String currentDir = System.getProperty("user.dir");
// Please replace the path with your audio source
Path[] filePaths = {
Paths.get(currentDir, "hello_world.wav"),
// Paths.get(currentDir, "hello_world_male_16k_16bit_mono.wav"),
};
// Use ThreadPool to run recognition tasks
ExecutorService executorService = Executors.newFixedThreadPool(10);
for (Path filepath:filePaths) {
executorService.submit(new RealtimeTranslateTask(filepath));
}
executorService.shutdown();
// wait for all tasks to complete
executorService.awaitTermination(1, TimeUnit.MINUTES);
System.exit(0);
}
}
Python
# For prerequisites running the following sample, visit https://help.aliyun.com/document_detail/xxxxx.html
import os
import requests
from http import HTTPStatus
import dashscope
from dashscope.audio.asr import *
# 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
# dashscope.api_key = "your-api-key"
r = requests.get(
"https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav"
)
with open("asr_example.wav", "wb") as f:
f.write(r.content)
class Callback(TranslationRecognizerCallback):
def on_open(self) -> None:
print("TranslationRecognizerCallback open.")
def on_close(self) -> None:
print("TranslationRecognizerCallback close.")
def on_event(
self,
request_id,
transcription_result: TranscriptionResult,
translation_result: TranslationResult,
usage,
) -> None:
print("request id: ", request_id)
print("usage: ", usage)
if translation_result is not None:
print(
"translation_languages: ",
translation_result.get_language_list(),
)
english_translation = translation_result.get_translation("en")
print("sentence id: ", english_translation.sentence_id)
print("translate to english: ", english_translation.text)
if english_translation.stash is not None:
print(
"translate to english stash: ",
translation_result.get_translation("en").stash.text,
)
if transcription_result is not None:
print("sentence id: ", transcription_result.sentence_id)
print("transcription: ", transcription_result.text)
if transcription_result.stash is not None:
print("transcription stash: ", transcription_result.stash.text)
def on_error(self, message) -> None:
print('error: {}'.format(message))
def on_complete(self) -> None:
print('TranslationRecognizerCallback complete')
callback = Callback()
translator = TranslationRecognizerRealtime(
model="gummy-realtime-v1",
format="wav",
sample_rate=16000,
callback=callback,
)
translator.start()
try:
audio_data: bytes = None
f = open("asr_example.wav", 'rb')
if os.path.getsize("asr_example.wav"):
while True:
audio_data = f.read(12800)
if not audio_data:
break
else:
translator.send_audio_frame(audio_data)
else:
raise Exception(
'The supplied file was empty (zero bytes long)')
f.close()
except Exception as e:
raise e
translator.stop()
一句话翻译
一句话翻译能够对一分钟内的语音数据流(无论是从外部设备如麦克风获取的音频流,还是从本地文件读取的音频流)进行翻译并流式返回结果。
翻译传入麦克风的语音
Java
import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerChat;
import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerParam;
import com.alibaba.dashscope.audio.asr.translation.results.TranscriptionResult;
import com.alibaba.dashscope.audio.asr.translation.results.TranslationRecognizerResult;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.exception.NoApiKeyException;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.TargetDataLine;
import java.nio.ByteBuffer;
public class Main {
public static void main(String[] args) throws NoApiKeyException, InterruptedException {
// 创建Recognizer
TranslationRecognizerChat translator = new TranslationRecognizerChat();
// 初始化请求参数
TranslationRecognizerParam param =
TranslationRecognizerParam.builder()
// 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
// .apiKey("your-api-key")
.model("gummy-chat-v1") // 设置模型名
.format("pcm") // 设置待识别音频格式,支持的音频格式:pcm、pcm编码的wav、mp3、ogg封装的opus、ogg封装的speex、aac、amr
.sampleRate(16000) // 设置待识别音频采样率(单位Hz)。仅支持16000Hz采样率。
.transcriptionEnabled(true) // 设置是否开启实时识别
.translationEnabled(true) // 设置是否开启实时翻译
.translationLanguages(new String[] {"en"}) // 设置翻译目标语言
.build();
// 创建一个Flowable<ByteBuffer>
Thread thread = new Thread(
() -> {
try {
// 创建音频格式
AudioFormat audioFormat = new AudioFormat(16000, 16, 1, true, false);
// 根据格式匹配默认录音设备
TargetDataLine targetDataLine =
AudioSystem.getTargetDataLine(audioFormat);
targetDataLine.open(audioFormat);
// 开始录音
targetDataLine.start();
System.out.println("请您通过麦克风讲话体验一句话语音识别和翻译功能");
ByteBuffer buffer = ByteBuffer.allocate(1024);
long start = System.currentTimeMillis();
// 录音5s并进行实时识别
while (System.currentTimeMillis() - start < 50000) {
int read = targetDataLine.read(buffer.array(), 0, buffer.capacity());
if (read > 0) {
buffer.limit(read);
// 将录音音频数据发送给流式识别服务
if (!translator.sendAudioFrame(buffer)) {
System.out.println("sentence end, stop sending");
break;
}
buffer = ByteBuffer.allocate(1024);
// 录音速率有限,防止cpu占用过高,休眠一小会儿
Thread.sleep(20);
}
}
} catch (LineUnavailableException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
});
translator.call(param, new ResultCallback<TranslationRecognizerResult>() {
@Override
public void onEvent(TranslationRecognizerResult result) {
if (result.getTranscriptionResult() == null) {
return;
}
try {
System.out.println("RequestId: " + result.getRequestId());
// 打印最终结果
if (result.getTranscriptionResult() != null) {
System.out.println("Transcription Result:");
if (result.isSentenceEnd()) {
System.out.println("\tFix:" + result.getTranscriptionResult().getText());
} else {
TranscriptionResult transcriptionResult = result.getTranscriptionResult();
System.out.println("\tTemp Result:" + transcriptionResult.getText());
if (result.getTranscriptionResult().isVadPreEnd()) {
System.out.printf("VadPreEnd: start:%d, end:%d, time:%d\n", transcriptionResult.getPreEndStartTime(), transcriptionResult.getPreEndEndTime(), transcriptionResult.getPreEndTimemillis());
}
}
}
if (result.getTranslationResult() != null) {
System.out.println("English Translation Result:");
if (result.isSentenceEnd()) {
System.out.println("\tFix:" + result.getTranslationResult().getTranslation("en").getText());
} else {
System.out.println("\tTemp Result:" + result.getTranslationResult().getTranslation("en").getText());
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
@Override
public void onComplete() {
System.out.println("Translation complete");
}
@Override
public void onError(Exception e) {
}
});
thread.start();
thread.join();
translator.stop();
// System.exit(0);
}
}
Python
# For prerequisites running the following sample, visit https://help.aliyun.com/document_detail/xxxxx.html
import pyaudio
import dashscope
from dashscope.audio.asr import *
# 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
# dashscope.api_key = "your-api-key"
mic = None
stream = None
class Callback(TranslationRecognizerCallback):
def on_open(self) -> None:
global mic
global stream
print("TranslationRecognizerCallback open.")
mic = pyaudio.PyAudio()
stream = mic.open(
format=pyaudio.paInt16, channels=1, rate=16000, input=True
)
def on_close(self) -> None:
global mic
global stream
print("TranslationRecognizerCallback close.")
stream.stop_stream()
stream.close()
mic.terminate()
stream = None
mic = None
def on_event(
self,
request_id,
transcription_result: TranscriptionResult,
translation_result: TranslationResult,
usage,
) -> None:
print("request id: ", request_id)
print("usage: ", usage)
if translation_result is not None:
print(
"translation_languages: ",
translation_result.get_language_list(),
)
english_translation = translation_result.get_translation("en")
print("sentence id: ", english_translation.sentence_id)
print("translate to english: ", english_translation.text)
if english_translation.vad_pre_end:
print("vad pre end {}, {}, {}".format(transcription_result.pre_end_start_time, transcription_result.pre_end_end_time, transcription_result.pre_end_timemillis))
if transcription_result is not None:
print("sentence id: ", transcription_result.sentence_id)
print("transcription: ", transcription_result.text)
callback = Callback()
translator = TranslationRecognizerChat(
model="gummy-chat-v1",
format="pcm",
sample_rate=16000,
transcription_enabled=True,
translation_enabled=True,
translation_target_languages=["en"],
callback=callback,
)
translator.start()
print("请您通过麦克风讲话体验一句话语音识别和翻译功能")
while True:
if stream:
data = stream.read(3200, exception_on_overflow=False)
if not translator.send_audio_frame(data):
print("sentence end, stop sending")
break
else:
break
translator.stop()
翻译本地音频文件
Java
示例中用到的音频为:hello_world.wav。
import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerChat;
import com.alibaba.dashscope.audio.asr.translation.TranslationRecognizerParam;
import com.alibaba.dashscope.audio.asr.translation.results.TranslationRecognizerResult;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.exception.NoApiKeyException;
import java.io.FileInputStream;
import java.nio.ByteBuffer;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.time.Duration;
import java.time.LocalDateTime;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
class RealtimeTranslateChatTask implements Runnable {
private Path filepath;
private TranslationRecognizerChat translator = null;
public RealtimeTranslateChatTask(Path filepath) {
this.filepath = filepath;
}
@Override
public void run() {
for (int i=0; i<1; i++) {
// 初始化请求参数
TranslationRecognizerParam param =
TranslationRecognizerParam.builder()
// 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
// .apiKey("your-api-key")
.model("gummy-chat-v1") // 设置模型名
.format("wav") // 设置待识别音频格式,支持的音频格式:pcm、pcm编码的wav、mp3、ogg封装的opus、ogg封装的speex、aac、amr
.sampleRate(16000) // 设置待识别音频采样率(单位Hz)。只支持16000Hz的采样率。
.transcriptionEnabled(true) // 设置是否开启实时识别
.translationEnabled(true) // 设置是否开启实时翻译
.translationLanguages(new String[] {"en"}) // 设置翻译目标语言
.build();
if (translator == null) {
// 初始化流式识别服务
translator = new TranslationRecognizerChat();
}
CountDownLatch latch = new CountDownLatch(1);
String threadName = Thread.currentThread().getName();
// 初始化回调接口
ResultCallback<TranslationRecognizerResult> callback =
new ResultCallback<TranslationRecognizerResult>() {
@Override
public void onEvent(TranslationRecognizerResult result) {
System.out.println("RequestId: " + result.getRequestId());
// 打印最终结果
if (result.getTranscriptionResult() != null) {
System.out.println("Transcription Result:"+result);
if (result.isSentenceEnd()) {
System.out.println("\tFix:" + result.getTranscriptionResult().getText());
} else {
System.out.println("\tTemp Result:" + result.getTranscriptionResult().getText());
}
}
if (result.getTranslationResult() != null) {
System.out.println("English Translation Result:");
if (result.isSentenceEnd()) {
System.out.println("\tFix:" + result.getTranslationResult().getTranslation("en").getText());
} else {
System.out.println("\tTemp Result:" + result.getTranslationResult().getTranslation("en").getText());
}
}
}
@Override
public void onComplete() {
System.out.println("[" + threadName + "] Translation complete");
latch.countDown();
}
@Override
public void onError(Exception e) {
e.printStackTrace();
System.out.println("[" + threadName + "] TranslationCallback error: " + e.getMessage());
}
};
// 启动流式语音识别/翻译,绑定请求参数和回调接口
translator.call(param, callback);
// 替换成您自己的文件路径
System.out.println("[" + threadName + "] Input file_path is: " + this.filepath);
// Read file and send audio by chunks
try (FileInputStream fis = new FileInputStream(this.filepath.toFile())) {
// chunk size set to 1 seconds for 16KHz sample rate
byte[] buffer = new byte[3200];
int bytesRead;
// Loop to read chunks of the file
while ((bytesRead = fis.read(buffer)) != -1) {
ByteBuffer byteBuffer;
// Handle the last chunk which might be smaller than the buffer size
System.out.println("[" + threadName + "] bytesRead: " + bytesRead);
if (bytesRead < buffer.length) {
byteBuffer = ByteBuffer.wrap(buffer, 0, bytesRead);
} else {
byteBuffer = ByteBuffer.wrap(buffer);
}
// Send the ByteBuffer to the translation instance
if (!translator.sendAudioFrame(byteBuffer)) {
System.out.println("sentence end, stop sending");
break;
}
buffer = new byte[3200];
Thread.sleep(100);
}
fis.close();
System.out.println(LocalDateTime.now());
} catch (Exception e) {
e.printStackTrace();
}
// 通知结束
translator.stop();
// wait for the translation to complete
try {
latch.await();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
public class Main {
public static void main(String[] args)
throws NoApiKeyException, InterruptedException {
String currentDir = System.getProperty("user.dir");
// Please replace the path with your audio source
Path[] filePaths = {
Paths.get(currentDir, "hello_world.wav"),
// Paths.get(currentDir, "hello_world_male_16k_16bit_mono.wav"),
};
// Use ThreadPool to run recognition tasks
ExecutorService executorService = Executors.newFixedThreadPool(10);
for (Path filepath:filePaths) {
executorService.submit(new RealtimeTranslateChatTask(filepath));
}
executorService.shutdown();
// wait for all tasks to complete
executorService.awaitTermination(1, TimeUnit.MINUTES);
// System.exit(0);
}
}
Python
# For prerequisites running the following sample, visit https://help.aliyun.com/document_detail/xxxxx.html
import os
import requests
from http import HTTPStatus
import dashscope
from dashscope.audio.asr import *
# 若没有将API Key配置到环境变量中,需将your-api-key替换为自己的API Key
# dashscope.api_key = "your-api-key"
r = requests.get(
"https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav"
)
with open("asr_example.wav", "wb") as f:
f.write(r.content)
class Callback(TranslationRecognizerCallback):
def on_open(self) -> None:
print("TranslationRecognizerCallback open.")
def on_close(self) -> None:
print("TranslationRecognizerCallback close.")
def on_event(
self,
request_id,
transcription_result: TranscriptionResult,
translation_result: TranslationResult,
usage,
) -> None:
print("request id: ", request_id)
print("usage: ", usage)
if translation_result is not None:
print(
"translation_languages: ",
translation_result.get_language_list(),
)
english_translation = translation_result.get_translation("en")
print("sentence id: ", english_translation.sentence_id)
print("translate to english: ", english_translation.text)
if transcription_result is not None:
print("sentence id: ", transcription_result.sentence_id)
print("transcription: ", transcription_result.text)
def on_error(self, message) -> None:
print('error: {}'.format(message))
def on_complete(self) -> None:
print('TranslationRecognizerCallback complete')
callback = Callback()
translator = TranslationRecognizerChat(
model="gummy-chat-v1",
format="wav",
sample_rate=16000,
callback=callback,
)
translator.start()
try:
audio_data: bytes = None
f = open("asr_example.wav", 'rb')
if os.path.getsize("asr_example.wav"):
while True:
audio_data = f.read(12800)
if not audio_data:
break
else:
if translator.send_audio_frame(audio_data):
print("send audio frame success")
else:
print("sentence end, stop sending")
break
else:
raise Exception(
'The supplied file was empty (zero bytes long)')
f.close()
except Exception as e:
raise e
translator.stop()
输入文件限制
对本地音频文件进行翻译时:
输入文件的方式:将本地文件路径作为参数传递。
文件数量:单次调用最多输入1个文件。
文件大小:无限制。
音频时长:
Paraformer:无限制
Gummy实时语音翻译:无限制
Gummy一句话翻译:一分钟以内
文件格式:支持pcm、pcm编码的wav、mp3、ogg封装的opus、ogg封装的speex、aac、amr这几种格式。推荐pcm和wav。
由于音频文件格式及其变种众多,因此不能保证所有格式均能够被正确识别。请通过测试验证您所提供的文件能够获得正常的语音识别结果。
音频采样位数:16bit。
采样率:因模型而异。
采样率是指每秒对声音信号的采样次数。更高的采样率提供更多信息,有助于提高语音识别的准确率,但过高的采样率可能引入更多无关信息,反而影响识别效果。应根据实际采样率选择合适的模型。例如,8000Hz的语音数据应直接使用支持8000Hz的模型,无需转换为16000Hz。
API参考
模型应用上架及备案
常见问题
1. 可能影响识别准确率的因素有哪些?
声音质量:设备、环境等可能影响语音的清晰度,从而影响识别准确率。高质量的音频输入是提高识别准确性的前提。
说话人特征:不同人的声音特质(如音调、语速、口音、方言)差异很大,这些个体差异会对语音识别系统构成挑战,尤其是对于未充分训练过的特定口音或方言。
语言和词汇:语音识别模型通常针对特定的语言进行训练。当处理多语言混合、专业术语、俚语或网络用语时,识别难度会增加。若模型支持热词功能,可通过热词调整识别结果。
上下文理解:缺乏对对话上下文的理解可能会导致误解,尤其是在含义模糊或依赖于上下文的情境中。
2. 模型限流规则是怎样的?
模型名称 | 提交作业接口RPS限制 |
gummy-realtime-v1 | 10 |
gummy-chat-v1 |