快速开始_模型服务灵积(DashScope)-阿里云帮助中心

SenseVoice语音识别大模型

说明

支持的领域/任务：audio（音频）/asr（语音识别）、SER（情感识别）、AED（音频事件检测）

模型介绍

SenseVoice语音识别大模型专注于高精度多语言语音识别、情感辨识和音频事件检测，支持超过50种语言的识别，整体效果优于Whisper模型，中文与粤语识别准确率相对提升在50%以上。

高精度多语言语音识别：SenseVoice支持50+的语种的语音识别，包括中文（zh）、英文（en）、粤语（yue）、日语（ja）、韩语（ko）、法语（fr）、德语（de）、俄语（ru）、意大利语（it）、西班牙语（es）、泰语（th）、印度尼西亚语（id）等。
领先的情感和音频事件检测：SenseVoice提供最领先的情感识别能力（例如高兴、悲伤、生气等），能够检测音频中的特定事件，如背景音乐、歌唱、掌声和笑声等。

多语种识别

支持共计50+种语种的语音识别，尤其以中、英、日、韩、粤为重点支持语种，可通过language_hints参数选择语种获得更准确的识别效果，详见支持语言列表。

情感识别

支持4种情绪的情感识别，包括生气（ANGRY）、高兴（HAPPY）、伤心（SAD）和中性（NEUTRAL），若识别结果中未出现上述情感，或返回结果中包含<|SPECIAL_TOKEN_1|>，代表该语音中未检测到特定情绪。情感一般出现在识别结果最末端，以诸如今天天气好棒啊！<|HAPPY|>形式出现。

音频事件检测

示例代码

前提条件

已开通服务并获得API-KEY：API-KEY的获取与配置。
已安装最新版SDK：安装DashScope SDK。

异步文件转写示例代码

以下示例展示了调用SenseVoice语音识别文件转写异步API，对多个通过URL给出的音频文件进行语音识别批处理的代码。模型默认支持中英两个语种的语音识别，您可以参考示例中的调用入参language_hints指定特定语种进行识别。更多常用场景的代码示例请参考Github仓库。

说明

需要使用您的API-KEY替换示例中的your-dashscope-api-key，代码才能正常运行。

通过URL指定进行语音转写的文件，其大小不超过2 GB。
file_urls参数支持传入多个文件URL，示例中展示了对多个文件URL进行转写的功能。

Python

# For prerequisites running the following sample, visit https://help.aliyun.com/document_detail/611472.html

import json
from urllib import request
from http import HTTPStatus

import dashscope

dashscope.api_key='your-dashscope-api-key'

task_response = dashscope.audio.asr.Transcription.async_call(
    model='sensevoice-v1',
    file_urls=[
        'https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_example_1.wav'],
    language_hints=['en'],)

transcription_response = dashscope.audio.asr.Transcription.wait(
    task=task_response.output.task_id)

if transcription_response.status_code == HTTPStatus.OK:
    for transcription in transcription_response.output['results']:
        url = transcription['transcription_url']
        result = json.loads(request.urlopen(url).read().decode('utf8'))
        print(json.dumps(result, indent=4, ensure_ascii=False))
    print('transcription done!')
else:
    print('Error: ', transcription_response.output.message)

Java

package com.alibaba.dashscope.sample.transcription;

import com.alibaba.dashscope.audio.asr.transcription.*;
import com.google.gson.*;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.*;
import java.net.HttpURLConnection;
import java.util.Arrays;
import java.util.List;

public class Main {
    public static void main(String[] args) {
      	// 创建转写请求参数，需要用真实apikey替换your-dashscope-api-key
        TranscriptionParam param =
                TranscriptionParam.builder()
                        .apiKey("your-dashscope-api-key")
                        .model("sensevoice-v1")
                        .fileUrls(
                            Arrays.asList(
                                "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_example_1.wav"))
                        .parameter("language_hints", new String[] {"en"})  			
                        .build();
        try {
            Transcription transcription = new Transcription();
          	// 提交转写请求
            TranscriptionResult result = transcription.asyncCall(param);
            // 等待转写完成
          	result = transcription.wait(
                            TranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
            // 获取转写结果
          	List<TranscriptionTaskResult> taskResultList = result.getResults();
            if (taskResultList != null && taskResultList.size() > 0) {
                for (TranscriptionTaskResult taskResult : taskResultList) {
                  String transcriptionUrl = taskResult.getTranscriptionUrl();
                  HttpURLConnection connection =
                          (HttpURLConnection) new URL(transcriptionUrl).openConnection();
                  connection.setRequestMethod("GET");
                  connection.connect();
                  BufferedReader reader =
                          new BufferedReader(new InputStreamReader(connection.getInputStream()));
                  Gson gson = new GsonBuilder().setPrettyPrinting().create();
                  System.out.println(gson.toJson(gson.fromJson(reader, JsonObject.class)));
                }
            }
        } catch (Exception e) {
            System.out.println("error: " + e);
        }
        System.exit(0);
    }
}

调用成功后，将会返回例如以下示例的文件转写结果。

{
    "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_example_1.wav",
    "properties": {
        "audio_format": "pcm_s16le",
        "channels": [
            0
        ],
        "original_sampling_rate": 16000,
        "original_duration_in_milliseconds": 17645
    },
    "transcripts": [
        {
            "channel_id": 0,
            "content_duration_in_milliseconds": 12710,
            "text": "<|Speech|> Senior staff, Principal Doris Jackson, Wakefield faculty, and of course, my fellow classmates. <|/Speech|> <|ANGRY|><|Speech|> I am honored to have been chosen to speak before my classmates, as well as the students across America today. <|/Speech|>",
            "sentences": [
                {
                    "begin_time": 0,
                    "end_time": 7060,
                    "text": "<|Speech|> Senior staff, Principal Doris Jackson, Wakefield faculty, and of course, my fellow classmates. <|/Speech|> <|ANGRY|>"
                },
                {
                    "begin_time": 11980,
                    "end_time": 17630,
                    "text": "<|Speech|> I am honored to have been chosen to speak before my classmates, as well as the students across America today. <|/Speech|>"
                }
            ]
        }
    ]
}

了解更多

有关SenseVoice语音识别模型服务的录音文件转写的详细调用方法，请参见录音文件识别API详情页面进行了解。