Paraformer hotwords

更新时间:
复制 MD 格式
Note

Supported realm/task: audio/ASR (speech recognition)

If the speech recognition service does not accurately recognize certain vocabulary specific to your business domain, you can use the hotword feature. This feature lets you add these words to a vocabulary list to improve their recognition accuracy.

Hotword overview

The SDK uses hotwords in the form of a list. A hotword list is a dictionary where the keys are the hotwords and the values are their corresponding weights. A hotword list can contain up to 500 hotwords. The following rules apply to hotword text: A hotword containing only Chinese characters cannot be longer than 10 characters. A hotword containing only English words, or a mix of Chinese characters and English words, cannot exceed 5 words when separated by spaces. The following rules apply to hotword weights: The valid weight for a hotword is an integer in the range [1, 5] or [-6, -1]. To increase the recognition probability of a hotword, set a weight in the range [1, 5]. A larger value indicates a higher probability. To decrease the recognition probability of a hotword, set a weight in the range [-6, -1]. A smaller value indicates a lower probability.

Supported models

Model name

Model description

paraformer-realtime-v1

The Paraformer real-time Chinese speech recognition model. This model supports real-time speech recognition in scenarios such as ApsaraVideo Live and conferences. It only supports audio with a sample rate of 16 kHz.

paraformer-realtime-8k-v1

The Paraformer real-time Chinese speech recognition model. This model supports real-time speech recognition in scenarios such as 8 kHz telephone customer service.

paraformer-v1 image

The Paraformer Chinese and English speech recognition model. This model supports speech recognition for audio or video with a sample rate of 16 kHz or higher.

paraformer-8k-v1

The Paraformer Chinese speech recognition model. This model supports 8 kHz telephone speech recognition.

paraformer-mtl-v1

The Paraformer multilingual speech recognition model. This model supports speech recognition for audio or video with a sample rate of 16 kHz or higher.

Supported languages/dialects include the following: Mandarin Chinese, Chinese dialects (Cantonese, Wu, Min Nan, Northeastern, Gansu, Guizhou, Henan, Hubei, Hunan, Ningxia, Shanxi, Shaanxi, Shandong, Sichuan, and Tianjin), English, Japanese, Korean, Spanish, Indonesian, French, German, Italian, and Malay.

Prerequisites

Hotword management

In Java and Python, you can use the AsrPhraseManager class to manage hotwords, such as creating, updating, deleting, and querying hotwords.

You can import the AsrPhraseManager class as follows:

from dashscope.audio.asr import AsrPhraseManager
import com.alibaba.dashscope.audio.asr.phrase.AsrPhraseManager;

Create hotwords

This API submits a request to create hotwords by making a synchronous call.

  • API

    def create_phrases(cls,
                       model: str,
                       phrases: Dict[str, Any],
                       training_type: str = 'compile_asr_phrase',
                       **kwargs)
    AsrPhraseStatusResult CreatePhrases(AsrPhraseParam param)
          throws ApiException, NoApiKeyException, InputRequiredException
  • Parameters

    For the Java SDK, an AsrPhraseParam object is used as the parameter. The object contains the following methods and parameters:

    Parameter

    Type

    Description

    param

    AsrPhraseParam

    The configuration parameters for creating hotwords. For more information, see the preceding description of the AsrPhraseParam type. When you call CreatePhrases, you do not need to specify the pageNo or pageSize field. However, you must specify the model and phraseList fields.

    The following table describes the parameters for the Python SDK.

    Parameter

    Type

    Description

    model

    str

    The name of the Paraformer model. For more information about how to select a model, see Supported models.

    phrases

    Dict[str, Any]

    The hotword list.

    training_type

    str

    The value is fixed to compile_asr_phrase.

  • Sample response

    The Java SDK returns an AsrPhraseStatusResult object, and the Python SDK returns a Dict. You can call the corresponding get method to retrieve the members of the AsrPhraseStatusResult object. The member names are the same as those for the Python SDK, but they follow the camel case naming convention for Java.

    {
    	"status_code": 200,
    	"request_id": "2b815cfe-793f-9f3c-b528-5ade0a2d498e",
    	"code": null,
    	"message": "",
    	"output": {
    		"job_id": "ft-202309261539-2af1",
    		"status": "SUCCEEDED",
    		"finetuned_output": "paraformer-realtime-v1-ft-202309261539-2af1",
    		"training_type": "compile_asr_phrase",
    		"create_time": "2023-09-26 15:39:07"
    	},
    	"usage": null,
    	"job_id": "ft-202309261539-2af1",
    	"finetuned_output": "paraformer-realtime-v1-ft-202309261539-2af1",
    	"finetuned_outputs": null,
    	"training_type": null,
    	"create_time": "2023-09-26 15:39:07",
    	"output_type": null,
    	"model": null
    }
  • Sample request

    # coding=utf-8
    
    import dashscope
    from dashscope.audio.asr import AsrPhraseManager
    
    dashscope.api_key='your-dashscope-api-key'
    
    phrases = {'Tongyi Qianwen': 5}
    
    result = AsrPhraseManager.create_phrases(model='paraformer-realtime-v1',
                                             phrases=phrases)
    if result.output is not None and result.output['finetuned_output'] is not None:
        print('job_id:%s, finetuned_output:%s' %
              (result.output['job_id'], result.output['finetuned_output']))
    else:
        print('Error: ', str(result))
    
    package com.alibaba.test;
    
    import com.alibaba.dashscope.audio.asr.phrase.AsrPhraseManager;
    import com.alibaba.dashscope.audio.asr.phrase.AsrPhraseParam;
    import com.alibaba.dashscope.audio.asr.phrase.AsrPhraseStatusResult;
    import com.alibaba.dashscope.audio.asr.recognition.Recognition;
    import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
    import com.alibaba.dashscope.utils.Constants;
    import java.util.Collections;
    
    class Test {
      
      public static void main(String[] args) {
        AsrPhraseParam param = AsrPhraseParam.builder()
                .model("your-model")
                .phraseList(Collections.singletonMap("Tongyi Qianwen", 5))
                .apiKey("your-dashscope-api-key")
                .build();
    
        AsrPhraseStatusResult createResult = null;
        try {
          createResult = AsrPhraseManager.CreatePhrases(param);
          if (createResult.getOutput() != null && createResult.getOutput().getFineTunedOutput() != null) {
            System.out.println("job_id: " + createResult.getOutput().getJobId() + ", finetuned_output: " + createResult.getOutput().getFineTunedOutput());
          } else {
            System.out.println("Error: " + createResult);
          }
        } catch (Exception e) {
          throw new RuntimeException(e);
        }
      }
    }

Query hotwords

This API submits a request to query hotwords by making a synchronous call.

  • API

    def query_phrases(cls, phrase_id: str, **kwargs)
    AsrPhraseStatusResult QueryPhrase(AsrPhraseParam param, String phraseId)
          throws ApiException, NoApiKeyException, InputRequiredException
  • Parameters

    For the Java SDK, an AsrPhraseParam object is used as the parameter. The object contains the following methods and parameters:

    Parameter

    Type

    Description

    param

    AsrPhraseParam

    The configuration parameters for creating hotwords. For more information, see the preceding description of the AsrPhraseParam type. When you call QueryPhrase, you do not need to specify the phraseList, pageNo, or pageSize field.

    phraseId

    String

    The hotword ID returned by the getFineTunedOutput method of the AsrPhraseStatusOutput object after you call APIs such as CreatePhrases, UpdatePhrases, and QueryPhrase. The ID is of the String type. When you call ListPhrases, the getFinetunedOutputs method of the AsrPhraseStatusOutput object returns a list of all hotword information. Then, you can use the getFineTunedOutput method of the AsrPhraseInfo object to obtain the ID of the corresponding hotword.

    The following table describes the parameters for the Python SDK.

    Parameter

    Type

    Description

    phrase_id

    str

    The Dict object returned after you call APIs such as create_phrases, update_phrases, and query_phrases. You can access the corresponding phrase_id through finetuned_output.

  • Sample response

    {
    	"status_code": 200,
    	"request_id": "19ee9c5f-173b-9fed-8e61-40bc53f1eea7",
    	"code": null,
    	"message": "",
    	"output": {
    		"create_time": "2023-09-26 15:39:08",
    		"finetuned_output": "paraformer-realtime-v1-ft-202309261539-2af1",
    		"job_id": "ft-202309261539-2af1",
    		"model": "paraformer-realtime-v1",
    		"output_type": "custom_resource"
    	},
    	"usage": null,
    	"job_id": "ft-202309261539-2af1",
    	"finetuned_output": "paraformer-realtime-v1-ft-202309261539-2af1",
    	"finetuned_outputs": null,
    	"training_type": null,
    	"create_time": "2023-09-26 15:39:08",
    	"output_type": "custom_resource",
    	"model": "paraformer-realtime-v1"
    }
  • Sample request

    # coding=utf-8
    
    import dashscope
    from dashscope.audio.asr import AsrPhraseManager
    
    dashscope.api_key='your-dashscope-api-key'
    
    result = AsrPhraseManager.query_phrases(phrase_id='phrase-id')
    if result.output is not None:
        print('query phrases: ', result.output)
    else:
        print('Error: ', str(result))
    package com.alibaba.test;
    
    import com.alibaba.dashscope.audio.asr.phrase.AsrPhraseManager;
    import com.alibaba.dashscope.audio.asr.phrase.AsrPhraseParam;
    import com.alibaba.dashscope.audio.asr.phrase.AsrPhraseStatusResult;
    import com.alibaba.dashscope.audio.asr.recognition.Recognition;
    import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
    import com.alibaba.dashscope.utils.Constants;
    
    class Test {
      
      public static void main(String[] args) {
        AsrPhraseParam param = AsrPhraseParam.builder()
                .model("your-model")
                .apiKey("your-dashscope-api-key")
                .build();
    
        AsrPhraseStatusResult result = null;
        try {
          result = AsrPhraseManager.QueryPhrase(param, "phrase-id");
          if (result.getOutput() != null) {
            System.out.println("query phrases: " + result.getOutput());
          } else {
            System.out.println("Error: " + result);
          }
        } catch (Exception e) {
          throw new RuntimeException(e);
        }
      }
    }

Update hotwords

This API submits a request to update hotwords by making a synchronous call.

  • API

    def update_phrases(cls,
                       model: str,
                       phrase_id: str,
                       phrases: Dict[str, Any],
                       training_type: str = 'compile_asr_phrase',
                       **kwargs)
    AsrPhraseStatusResult UpdatePhrases(AsrPhraseParam param, String phraseId)
          throws ApiException, NoApiKeyException, InputRequiredException
  • Parameters

    For the Java SDK, an AsrPhraseParam object is used as the parameter. The object contains the following methods and parameters:

    Parameter

    Type

    Description

    param

    AsrPhraseParam

    The configuration parameters for creating hotwords. For more information, see the preceding description of the AsrPhraseParam type. When you call UpdatePhrase, you do not need to specify the pageNo or pageSize field.

    phraseId

    String

    The hotword ID returned by the getFineTunedOutput method of the AsrPhraseStatusOutput object after you call APIs such as CreatePhrases, UpdatePhrases, and QueryPhrase. The ID is of the String type. When you call ListPhrases, the getFinetunedOutputs method of the AsrPhraseStatusOutput object returns a list of all hotword information. Then, you can use the getFineTunedOutput method of the AsrPhraseInfo object to obtain the ID of the corresponding hotword.

    The following table describes the parameters for the Python SDK.

    Parameter

    Type

    Default value

    Description

    model

    str

    -

    The name of the Paraformer model used for transcribing audio and video files. For more information about how to select a model, see Supported models.

    phrase_id

    str

    -

    The Dict object returned after you call APIs such as create_phrases, update_phrases, and query_phrases. You can access the corresponding phrase_id through finetuned_output.

    phrases

    Dict[str, Any]

    -

    The hotword list. It is a Dict object where the keys are the hotword texts and the values are the corresponding weights. For more information about the requirements for hotwords, see Introduction to hotwords.

    training_type

    str

    compile_asr_phrase

    The value is fixed to compile_asr_phrase

  • Sample response

    {
    	"status_code": 200,
    	"request_id": "8c8d64e3-5198-9624-99cd-e9dcf7eb22f6",
    	"code": null,
    	"message": "",
    	"output": {
    		"job_id": "ft-202309261543-b0ae",
    		"status": "SUCCEEDED",
    		"finetuned_output": "paraformer-realtime-v1-ft-202309261539-2af1",
    		"training_type": "compile_asr_phrase",
    		"create_time": "2023-09-26 15:43:09"
    	},
    	"usage": null,
    	"job_id": "ft-202309261543-b0ae",
    	"finetuned_output": "paraformer-realtime-v1-ft-202309261539-2af1",
    	"finetuned_outputs": null,
    	"training_type": null,
    	"create_time": "2023-09-26 15:43:09",
    	"output_type": null,
    	"model": null
    }
  • Sample request

    # coding=utf-8
    
    import dashscope
    from dashscope.audio.asr import AsrPhraseManager
    
    dashscope.api_key='your-dashscope-api-key'
    
    phrases = {'Tongyi Qianwen': 2}
    
    result = AsrPhraseManager.update_phrases(model='paraformer-realtime-v1',
                                             phrase_id='phrase-id',
                                             phrases=phrases)
    if result.output is not None:
        print('update phrases: ', result.output)
    else:
        print('Error: ', str(result))
    
    package com.alibaba.test;
    
    import com.alibaba.dashscope.audio.asr.phrase.AsrPhraseManager;
    import com.alibaba.dashscope.audio.asr.phrase.AsrPhraseParam;
    import com.alibaba.dashscope.audio.asr.phrase.AsrPhraseStatusResult;
    import com.alibaba.dashscope.audio.asr.recognition.Recognition;
    import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
    import com.alibaba.dashscope.utils.Constants;
    import java.util.Collections;
    
    class Test {
      
      public static void main(String[] args) {
        AsrPhraseParam param = AsrPhraseParam.builder()
          .model("your-model")
          .phraseList(Collections.singletonMap("Tongyi Qianwen", 2))
          .apiKey("your-dashscope-api-key")
          .build();
    
        AsrPhraseStatusResult result = null;
        try {
          result = AsrPhraseManager.UpdatePhrases(param, "phrase-id");
          if (result.getOutput() != null) {
            System.out.println("update phrases: " + result.getOutput());
          } else {
            System.out.println("err: " + result);
          }
        } catch (Exception e) {
          throw new RuntimeException(e);
        }
      }
    }

Delete hotwords

This API submits a request to delete a hotword by making a synchronous call.

  • API

    def delete_phrases(cls, phrase_id: str,
                       **kwargs)
    AsrPhraseStatusResult DeletePhrase(AsrPhraseParam param, String phraseId)
          throws ApiException, NoApiKeyException, InputRequiredException
  • Parameters

    For the Java SDK, an AsrPhraseParam object is used as the parameter. The object contains the following methods and parameters:

    Parameter

    Type

    Description

    param

    AsrPhraseParam

    The configuration parameters for creating hotwords. For more information, see the preceding description of the AsrPhraseParam type. When you call DeletePhrase, you do not need to specify the phraseList, pageNo, or pageSize field.

    phraseId

    String

    The hotword ID returned by the getFineTunedOutput method of the AsrPhraseStatusOutput object after you call APIs such as CreatePhrases, UpdatePhrases, and QueryPhrase. The ID is of the String type. When you call ListPhrases, the getFinetunedOutputs method of the AsrPhraseStatusOutput object returns a list of all hotword information. Then, you can use the getFineTunedOutput method of the AsrPhraseInfo object to obtain the ID of the corresponding hotword.

    The following table describes the parameters for the Python SDK.

    Parameter

    Type

    Default value

    Description

    phrase_id

    str

    -

    The Dict object returned after you call APIs such as create_phrases, update_phrases, and query_phrases. You can access the corresponding phrase_id through finetuned_output.

  • Sample response

    {
    	"status_code": 200,
    	"request_id": "00bb0287-2593-94a3-8e21-93c90f5e9dd8",
    	"code": null,
    	"message": "",
    	"output": {
    		"finetuned_output": "paraformer-realtime-v1-ft-202309261539-2af1"
    	},
    	"usage": null,
    	"job_id": null,
    	"finetuned_output": "paraformer-realtime-v1-ft-202309261539-2af1",
    	"finetuned_outputs": null,
    	"training_type": null,
    	"create_time": null,
    	"output_type": null,
    	"model": null
    }
  • Sample request

    # coding=utf-8
    
    import dashscope
    from dashscope.audio.asr import AsrPhraseManager
    
    dashscope.api_key='your-dashscope-api-key'
    
    result = AsrPhraseManager.delete_phrases(phrase_id='phrase-id')
    if result.output is not None:
        print('delete phrases: ', result.output)
    else:
        print('Error: ', str(result))
    
    package com.alibaba.test;
    
    import com.alibaba.dashscope.audio.asr.phrase.AsrPhraseManager;
    import com.alibaba.dashscope.audio.asr.phrase.AsrPhraseParam;
    import com.alibaba.dashscope.audio.asr.phrase.AsrPhraseStatusResult;
    import com.alibaba.dashscope.audio.asr.recognition.Recognition;
    import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
    import com.alibaba.dashscope.utils.Constants;
    
    class Test {
      
      public static void main(String[] args) {
        AsrPhraseParam param = AsrPhraseParam.builder()
                .model("your-model")
                .apiKey("your-dashscope-api-key")
                .build();
    
        AsrPhraseStatusResult result = null;
        try {
          result = AsrPhraseManager.DeletePhrase(param, "phrase-id");
          if (result.getOutput() != null) {
            System.out.println("delete phrases: " + result.getOutput());
          } else {
            System.out.println("Error: " + result);
          }
        } catch (Exception e) {
          throw new RuntimeException(e);
        }
      }
    }

List all hotwords

This API submits a request to list all hotwords by making a synchronous call.

  • API

    def list_phrases(cls,
                     page: int = 1,
                     page_size: int = 10,
                     **kwargs)
    AsrPhraseStatusResult ListPhrases(AsrPhraseParam param)
          throws ApiException, NoApiKeyException, InputRequiredException
  • Parameters

    For the Java SDK, an AsrPhraseParam object is used as the parameter. The object contains the following methods and parameters:

    Parameter

    Type

    Description

    param

    AsrPhraseParam

    The configuration parameters for creating hotwords. For more information, see the preceding description of the AsrPhraseParam type. When you call ListPhrases, you do not need to specify the phraseList field.

    phraseId

    String

    The hotword ID returned by the getFineTunedOutput method of the AsrPhraseStatusOutput object after you call APIs such as CreatePhrases, UpdatePhrases, and QueryPhrase. The ID is of the String type. When you call ListPhrases, the getFinetunedOutputs method of the AsrPhraseStatusOutput object returns a list of all hotword information. Then, you can use the getFineTunedOutput method of the AsrPhraseInfo object to obtain the ID of the corresponding hotword.

    The following table describes the parameters for the Python SDK.

    Parameter

    Type

    Default value

    Description

    page

    int

    1

    This parameter is valid only when the request is list_phrases. It is used to query a specific page of the list. Default value: 1.

    page_size

    int

    10

    This parameter is valid only when the request is list_phrases. It is used to set the number of entries per page. Default value: 10.

  • Sample response

    {
    	"status_code": 200,
    	"request_id": "95f969ef-bcbc-9bb5-b05d-4caed9326409",
    	"code": null,
    	"message": "",
    	"output": {
    		"page_no": 1,
    		"page_size": 5,
    		"total": 61,
    		"finetuned_outputs": [{
    			"create_time": "2023-09-26 15:32:20",
    			"finetuned_output": "paraformer-realtime-v1-ft-202309261532-93bf",
    			"job_id": "ft-202309261532-93bf",
    			"model": "paraformer-realtime-v1",
    			"output_type": "custom_resource"
    		}, {
    			"create_time": "2023-09-26 15:32:18",
    			"finetuned_output": "paraformer-realtime-v1-ft-202309261532-7b51",
    			"job_id": "ft-202309261532-7b51",
    			"model": "paraformer-realtime-v1",
    			"output_type": "custom_resource"
    		}, {
    			"create_time": "2023-09-26 15:32:17",
    			"finetuned_output": "paraformer-realtime-v1-ft-202309261532-8bbc",
    			"job_id": "ft-202309261532-cef0",
    			"model": "paraformer-realtime-v1",
    			"output_type": "custom_resource"
    		}, {
    			"create_time": "2023-09-26 15:32:16",
    			"finetuned_output": "paraformer-realtime-v1-ft-202309261532-fc6a",
    			"job_id": "ft-202309261532-fc6a",
    			"model": "paraformer-realtime-v1",
    			"output_type": "custom_resource"
    		}, {
    			"create_time": "2023-09-26 15:31:56",
    			"finetuned_output": "paraformer-realtime-v1-ft-202309261531-e92d",
    			"job_id": "ft-202309261531-e92d",
    			"model": "paraformer-realtime-v1",
    			"output_type": "custom_resource"
    		}]
    	},
    	"usage": null,
    	"job_id": null,
    	"finetuned_output": null,
    	"finetuned_outputs": [{
    		"create_time": "2023-09-26 15:32:20",
    		"finetuned_output": "paraformer-realtime-v1-ft-202309261532-93bf",
    		"job_id": "ft-202309261532-93bf",
    		"model": "paraformer-realtime-v1",
    		"output_type": "custom_resource"
    	}, {
    		"create_time": "2023-09-26 15:32:18",
    		"finetuned_output": "paraformer-realtime-v1-ft-202309261532-7b51",
    		"job_id": "ft-202309261532-7b51",
    		"model": "paraformer-realtime-v1",
    		"output_type": "custom_resource"
    	}, {
    		"create_time": "2023-09-26 15:32:17",
    		"finetuned_output": "paraformer-realtime-v1-ft-202309261532-8bbc",
    		"job_id": "ft-202309261532-cef0",
    		"model": "paraformer-realtime-v1",
    		"output_type": "custom_resource"
    	}, {
    		"create_time": "2023-09-26 15:32:16",
    		"finetuned_output": "paraformer-realtime-v1-ft-202309261532-fc6a",
    		"job_id": "ft-202309261532-fc6a",
    		"model": "paraformer-realtime-v1",
    		"output_type": "custom_resource"
    	}, {
    		"create_time": "2023-09-26 15:31:56",
    		"finetuned_output": "paraformer-realtime-v1-ft-202309261531-e92d",
    		"job_id": "ft-202309261531-e92d",
    		"model": "paraformer-realtime-v1",
    		"output_type": "custom_resource"
    	}],
    	"training_type": null,
    	"create_time": null,
    	"output_type": null,
    	"model": null
    }
  • Sample request

    # coding=utf-8
    
    import dashscope
    from dashscope.audio.asr import AsrPhraseManager
    
    dashscope.api_key='your-dashscope-api-key'
    
    result = AsrPhraseManager.list_phrases()   
    if result.output is not None:
        print('list phrases: ', result.output)
    else:
        print('Error: ', str(result))
    
    package com.alibaba.test;
    
    import com.alibaba.dashscope.audio.asr.phrase.AsrPhraseManager;
    import com.alibaba.dashscope.audio.asr.phrase.AsrPhraseParam;
    import com.alibaba.dashscope.audio.asr.phrase.AsrPhraseStatusResult;
    import com.alibaba.dashscope.audio.asr.recognition.Recognition;
    import com.alibaba.dashscope.audio.asr.recognition.RecognitionParam;
    import com.alibaba.dashscope.utils.Constants;
    
    class Test {
      
      public static void main(String[] args) {
        AsrPhraseParam param = AsrPhraseParam.builder()
                .model("your-model")
                .apiKey("your-dashscope-api-key")
                .build();
    
        AsrPhraseStatusResult result = null;
        try {
          result = AsrPhraseManager.ListPhrases(param);
          if (result.getOutput() != null) {
            System.out.println("list phrases: " + result.getOutput());
          } else {
            System.out.println("Error: " + result);
          }
        } catch (Exception e) {
          throw new RuntimeException(e);
        }
      }
    }