Python SDK

更新时间:
复制 MD 格式

Parameters and interface details for the Fun-ASR audio file recognition Python SDK.

Important

Model Studio has released a workspace-specific domain for the Singapore region: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com. The new dedicated domain delivers superior performance and higher stability for inference requests. We recommend migrating from https://dashscope-intl.aliyuncs.com to the new domain.

{WorkspaceId} is your workspace ID, which can be found on the Workspace Details page in the Model Studio console. The existing domain remains fully functional.

User guide: Non-real-time speech recognition. For supported audio formats, file size limits, duration limits, and other input requirements, see Audio specifications.

Prerequisites

User guide: Non-real-time speech recognition. For supported audio formats, file size limits, duration limits, and other input requirements, see Audio specifications.

Getting started

The Transcription core class provides interfaces to submit tasks, wait for completion, and query results. Two calling methods are available:

  • Asynchronous submission and synchronous waiting: Submit a task and block the current thread until the task completes.

  • Asynchronous submission and asynchronous query: Submit a task and query the result when needed.

Asynchronous submission and synchronous waiting

image
  1. Call the async_call method of the core class (Transcription) and set the request parameters.

    Note
    • Tasks enter the PENDING state after submission. Queuing time (typically a few minutes) depends on the queue length and file duration. Once processing starts, speech recognition completes at significantly accelerated speed.

    • Recognition results and download URLs expire 24 hours after the task completes. Tasks become unqueryable after expiration.

  2. Call the wait method of the core class (Transcription) to synchronously wait for the task to complete.

    Task statuses include PENDING, RUNNING, SUCCEEDED, and FAILED. When the task is in the PENDING or RUNNING state, the wait interface is blocked. When the task is in the SUCCEEDED or FAILED state, the wait interface is no longer blocked and returns the task result.

    wait returns TranscriptionResponse.

Click to view the complete example

from http import HTTPStatus
from dashscope.audio.asr import Transcription
import dashscope
import os
import json

# The following URL is for the China (Beijing) region. The URLs vary by region.
dashscope.base_http_api_url = 'https://dashscope.aliyuncs.com/api/v1'

# The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://help.aliyun.com/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

task_response = Transcription.async_call(
    model='fun-asr',
    file_urls=['https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav']
)

transcribe_response = Transcription.wait(task=task_response.output.task_id)
if transcribe_response.status_code == HTTPStatus.OK:
    print(json.dumps(transcribe_response.output, indent=4, ensure_ascii=False))
    print('transcription done!')

Asynchronous submission and asynchronous query

image
  1. Call the async_call method of the core class (Transcription) and set the request parameters.

    Note
    • Tasks enter the PENDING state after submission. Queuing time (typically a few minutes) depends on the queue length and file duration. Once processing starts, speech recognition completes at significantly accelerated speed.

    • Recognition results and download URLs expire 24 hours after the task completes. Tasks become unqueryable after expiration.

  2. Repeatedly call the fetch method of the core class (Transcription) until you get the final task result.

    When the task status is SUCCEEDED or FAILED, stop polling and process the result.

    fetch returns a TranscriptionResponse.

Click to view the complete example

from http import HTTPStatus
from dashscope.audio.asr import Transcription
import dashscope
import os
import json

# The following URL is for the China (Beijing) region. The URLs vary by region.
dashscope.base_http_api_url = 'https://dashscope.aliyuncs.com/api/v1'

# The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://help.aliyun.com/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

transcribe_response = Transcription.async_call(
    model='fun-asr',
    file_urls=['https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav']
)

while True:
    if transcribe_response.output.task_status == 'SUCCEEDED' or transcribe_response.output.task_status == 'FAILED':
        break
    transcribe_response = Transcription.fetch(task=transcribe_response.output.task_id)

if transcribe_response.status_code == HTTPStatus.OK:
    print(json.dumps(transcribe_response.output, indent=4, ensure_ascii=False))
    print('transcription done!')

Request parameters

Set request parameters using the async_call method of the Transcription core class.

Parameter

Type

Default

Required

Description

model

str

-

Yes

The model used to transcribe the audio or video file.

Valid values:

  • fun-asr

  • fun-asr-2025-11-07

  • fun-asr-2025-08-25

  • fun-asr-mtl

  • fun-asr-mtl-2025-08-25

file_urls

list[str]

-

Yes

A list of URLs for audio and video file transcription. The HTTP and HTTPS protocols are supported. A single request supports only 1 URL.

If your audio files are stored in OSS, the SDK does not support temporary URLs that start with the oss:// prefix.

vocabulary_id

str

-

No

The ID of a custom vocabulary. Hotwords in this vocabulary are used for speech recognition. Disabled by default. See Customize hotwords.

channel_id

list[int]

[0]

No

Indexes of sound channels to recognize in a multi-channel audio file. The index starts from 0. For example, [0] recognizes the first channel, and [0, 1] recognizes the first and second channels. If omitted, the first channel is processed by default.

Important

Each specified sound channel is billed separately. For example, a request for [0, 1] for a single file incurs two separate charges.

special_word_filter

str

-

No

Specifies the sensitive words to handle during speech recognition. You can configure different handling actions for individual sensitive words. For details, see Sensitive word filter.

diarization_enabled

bool

False

No

Automatic speaker diarization is disabled by default. This feature applies to single-channel audio only (not supported for multi-channel audio).

When enabled, recognition results include the speaker_id field to distinguish speakers.

Note

If speaker diarization is enabled, keep the audio duration under 2 hours. Audio exceeding 2 hours may cause recognition failures or timeouts.

For an example of speaker_id, see Recognition result description.

speaker_count

int

-

No

A reference value for the number of speakers. The value must be an integer from 2 to 100, inclusive.

Takes effect only when speaker diarization is enabled (diarization_enabled is set to true).

By default, the number of speakers is automatically determined. This parameter serves as a hint to the algorithm and does not guarantee the exact number of speakers in the output.

language_hints

list[str]

-

No

The language code for recognition. If the source language is unknown, leave it unset and the model detects the language automatically.

The system reads only the first value in the array. Any extra values are ignored.

Click to view the supported language codes

  • fun-asr, fun-asr-2025-11-07, fun-asr-mtl, fun-asr-mtl-2025-08-25:

    • zh: Chinese

    • en: English

    • ja: Japanese

    • ko: Korean

    • vi: Vietnamese

    • th: Thai

    • id: Indonesian

    • ms: Malay

    • tl: Filipino

    • hi: Hindi

    • ar: Arabic

    • fr: French

    • de: German

    • es: Spanish

    • pt: Portuguese

    • ru: Russian

    • it: Italian

    • nl: Dutch

    • sv: Swedish

    • da: Danish

    • fi: Finnish

    • no: Norwegian

    • el: Greek

    • pl: Polish

    • cs: Czech

    • hu: Hungarian

    • ro: Romanian

    • bg: Bulgarian

    • hr: Croatian

    • sk: Slovak

  • fun-asr-2025-08-25:

    • zh: Chinese

    • en: English

Response results

TranscriptionResponse

TranscriptionResponse encapsulates the basic information of the task, such as task_id and task_status, and the execution result. The execution result is the content of the output property. For more information, see TranscriptionOutput.

Click to view an example of the TranscriptionResponse structure

PENDING state

{
    "status_code":200,
    "request_id":"251aceab-a6aa-9fc4-b7f7-0cc6d3e2a9f3",
    "code":null,
    "message":"",
    "output":{
        "task_id":"7d0a58a3-1dbe-4de9-8cff-5f48213128b0",
        "task_status":"PENDING",
        "submit_time":"2025-02-13 16:55:08.573",
        "scheduled_time":"2025-02-13 16:55:08.592",
        "task_metrics":{
            "TOTAL":1,
            "SUCCEEDED":0,
            "FAILED":0
        }
    },
    "usage":null
}

RUNNING state

{
    "status_code":200,
    "request_id":"d9d530f1-853c-9848-a5f1-f5de59086ff7",
    "code":null,
    "message":"",
    "output":{
        "task_id":"6351feef-9694-45d2-9d32-63454f2ffb8d",
        "task_status":"RUNNING",
        "submit_time":"2025-02-13 17:31:20.681",
        "scheduled_time":"2025-02-13 17:31:20.703",
        "task_metrics":{
            "TOTAL":1,
            "SUCCEEDED":0,
            "FAILED":0
        }
    },
    "usage":null
}

SUCCEEDED state

{
    "status_code":200,
    "request_id":"16668704-6702-9e03-8ab7-a32a5d7bb095",
    "code":null,
    "message":"",
    "output":{
        "task_id":"6351feef-9694-45d2-9d32-63454f2ffb8d",
        "task_status":"SUCCEEDED",
        "submit_time":"2025-02-13 17:31:20.681",
        "scheduled_time":"2025-02-13 17:31:20.703",
        "end_time":"2025-02-13 17:31:21.867",
        "results":[
            {
                "file_url":"https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
                "transcription_url":"https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/prod/paraformer-v2/20250213/17%3A31/20ee4e4f-0404-4806-b617-c7d4c62eed19-1.json?Expires=1739525481&OSSAccessKeyId=yourOSSAccessKeyId&Signature=3q%2B1uQmRwltd7FPn5HQM2mBKw74%3D",
                "subtask_status":"SUCCEEDED"
            }
        ],
        "task_metrics":{
            "TOTAL":1,
            "SUCCEEDED":1,
            "FAILED":0
        }
    },
    "usage":{
        "duration":9
    }
}

FAILED state

{
    "status_code":200,
    "request_id":"16668704-6702-9e03-8ab7-a32a5d7bb095",
    "code":null,
    "message":"",
    "output":{
        "task_id": "7bac899c-06ec-4a79-8875-xxxxxxxxxxxx",
        "task_status": "FAILED",
        "submit_time": "2024-12-16 16:30:59.170",
        "scheduled_time": "2024-12-16 16:30:59.204",
        "end_time": "2024-12-16 16:31:02.375",
        "results": [
            {
                "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_exaple_1.wav",
                "code": "InvalidFile.DownloadFailed",
                "message": "The audio file cannot be downloaded.",
                "subtask_status": "FAILED"
            }
        ],
        "task_metrics": {
            "TOTAL": 1,
            "SUCCEEDED": 0,
            "FAILED": 1
        }
    },
    "usage":{
        "duration":9
    }
}

Parameters to note:

Parameter

Description

status_code

The HTTP status code of the request.

code

  • You can ignore the outermost code.

  • output.results contains a code field for the error code. Check the message field and refer to Error codes to troubleshoot.

message

  • You can ignore the outermost message.

  • The message field in output.results contains the error message. Use it with the code field and refer to Error codes to troubleshoot.

task_id

The task ID.

task_status

The task status.

The possible states are PENDING, RUNNING, SUCCEEDED, and FAILED.

When a task includes multiple subtasks, the overall task status is marked as SUCCEEDED as long as at least one subtask succeeds. Check the subtask_status field for the result of each specific subtask.

results

The recognition results of the subtasks.

subtask_status

The subtask status.

The possible states are PENDING, RUNNING, SUCCEEDED, and FAILED.

file_url

The URL of the audio file to be recognized.

transcription_url

The URL for the audio recognition result.

The recognition result is stored as a JSON file. Download the file or read its content by sending an HTTP request to the transcription_url. For the JSON file content, see Recognition result description.

TranscriptionOutput

TranscriptionOutput corresponds to the output property of TranscriptionResponse and represents the result of the current task execution.

Click to view an example of the TranscriptionOutput structure

PENDING state

{
    "task_id":"f2f7c2fa-0cd9-4bb2-a283-27b26ee4bb67",
    "task_status":"PENDING",
    "submit_time":"2025-02-13 17:59:27.754",
    "scheduled_time":"2025-02-13 17:59:27.789",
    "task_metrics":{
        "TOTAL":1,
        "SUCCEEDED":0,
        "FAILED":0
    }
}

RUNNING state

{
    "task_id":"f2f7c2fa-0cd9-4bb2-a283-27b26ee4bb67",
    "task_status":"RUNNING",
    "submit_time":"2025-02-13 17:59:27.754",
    "scheduled_time":"2025-02-13 17:59:27.789",
    "task_metrics":{
        "TOTAL":1,
        "SUCCEEDED":0,
        "FAILED":0
    }
}

SUCCEEDED state

{
    "task_id":"f2f7c2fa-0cd9-4bb2-a283-27b26ee4bb67",
    "task_status":"SUCCEEDED",
    "submit_time":"2025-02-13 17:59:27.754",
    "scheduled_time":"2025-02-13 17:59:27.789",
    "end_time":"2025-02-13 17:59:28.828",
    "results":[
        {
            "file_url":"https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
            "transcription_url":"https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/prod/paraformer-v2/20250213/17%3A59/70e737cc-bf8c-418b-b0c8-83fab192a0fa-1.json?Expires=1739527168&OSSAccessKeyId=yourOSSAccessKeyId&Signature=AtGjIKI%2BdgbzjJIu%2BHsr1R5nSAY%3D",
            "subtask_status":"SUCCEEDED"
        }
    ],
    "task_metrics":{
        "TOTAL":1,
        "SUCCEEDED":1,
        "FAILED":0
    }
}

FAILED state

code is the error code and message is the error message. These two fields are returned only when an error occurs. You can use these two fields to troubleshoot the issue. For more information, see Error codes.

{
    "task_id": "7bac899c-06ec-4a79-8875-xxxxxxxxxxxx",
    "task_status": "FAILED",
    "submit_time": "2024-12-16 16:30:59.170",
    "scheduled_time": "2024-12-16 16:30:59.204",
    "end_time": "2024-12-16 16:31:02.375",
    "results": [
        {
            "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_exaple_1.wav",
            "code": "InvalidFile.DownloadFailed",
            "message": "The audio file cannot be downloaded.",
            "subtask_status": "FAILED"
        }
    ],
    "task_metrics": {
        "TOTAL": 1,
        "SUCCEEDED": 0,
        "FAILED": 1
    }
}

Parameters to note:

Parameter

Description

code

The error code. Use this with the message field and refer to Error codes to troubleshoot.

message

The error message. Use this with the code field and refer to Error codes to troubleshoot.

task_id

The task ID.

task_status

The task status.

The possible states are PENDING, RUNNING, SUCCEEDED, and FAILED.

When a task includes multiple subtasks, the overall task status is marked as SUCCEEDED as long as at least one subtask succeeds. Check the subtask_status field for the result of each specific subtask.

results

The recognition results of the subtasks.

subtask_status

The subtask status.

The possible states are PENDING, RUNNING, SUCCEEDED, and FAILED.

file_url

The URL of the audio file to be recognized.

transcription_url

The URL for the audio recognition result.

The transcription result is stored in a JSON file. Use the transcription_url to download the file or read its content with an HTTP request. For the JSON file content, see Transcription result description.

Recognition result description

The recognition result is saved as a JSON file.

Click to view a recognition result example

{
    "file_url":"https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
    "properties":{
        "audio_format":"pcm_s16le",
        "channels":[
            0
        ],
        "original_sampling_rate":16000,
        "original_duration_in_milliseconds":3834
    },
    "transcripts":[
        {
            "channel_id":0,
            "content_duration_in_milliseconds":3720,
            "text":"Hello world, this is Alibaba Speech Lab.",
            "sentences":[
                {
                    "begin_time":100,
                    "end_time":3820,
                    "text":"Hello world, this is Alibaba Speech Lab.",
                    "sentence_id":1,
                    "speaker_id":0, // This field is displayed only when automatic speaker diarization is enabled.
                    "words":[
                        {
                            "begin_time":100,
                            "end_time":596,
                            "text":"Hello ",
                            "punctuation":""
                        },
                        {
                            "begin_time":596,
                            "end_time":844,
                            "text":"world",
                            "punctuation":", "
                        }
                        // Other content is omitted here.
                    ]
                }
            ]
        }
    ]
}

The key parameters are as follows:

Parameter

Type

Description

audio_format

string

The format of the audio in the source file.

channels

array[integer]

The audio track index information in the source file. Returns [0] for single-track audio, [0, 1] for dual-track audio, and so on.

original_sampling_rate

integer

The sample rate of the audio in the source file (Hz).

original_duration_in_milliseconds

integer

The original duration of the audio in the source file (ms).

channel_id

integer

The index of the transcribed audio track, starting from 0.

content_duration_in_milliseconds

integer

The duration of the content in the audio track that is identified as speech (ms).

Important

Billing is based on speech content duration only (non-speech parts are not metered). Speech duration is typically shorter than total audio duration. The AI-based speech detection may have minor discrepancies.

transcript

string

The paragraph-level speech transcription result.

sentences

array

The sentence-level speech transcription result.

words

array

The word-level speech transcription result.

begin_time

integer

Start timestamp (ms).

end_time

integer

End timestamp (ms).

text

string

The speech transcription result.

speaker_id

integer

The index of the current speaker, starting from 0. This is used to distinguish different speakers.

This field is displayed in the recognition result only when speaker diarization is enabled.

punctuation

string

The predicted punctuation mark after the word, if any.

Key interfaces

Core class (Transcription)

Import Transcription using from dashscope.audio.asr import Transcription.

Member method

Method signature

Description

async_call

@classmethod
def async_call(cls,
               model: str,
               file_urls: List[str],
               phrase_id: str = None,
               api_key: str = None,
               workspace: str = None,
               **kwargs) -> TranscriptionResponse

Asynchronously submits a speech recognition task.

wait

@classmethod
def wait(cls,
         task: Union[str, TranscriptionResponse],
         api_key: str = None,
         workspace: str = None,
         **kwargs) -> TranscriptionResponse

Blocks the current thread until the asynchronous task is complete. The task status is SUCCEEDED or FAILED.

This method returns TranscriptionResponse.

fetch

@classmethod
def fetch(cls,
          task: Union[str, TranscriptionResponse],
          api_key: str = None,
          workspace: str = None,
          **kwargs) -> TranscriptionResponse

Asynchronously queries the result of the current task.

This method returns TranscriptionResponse.

Other interfaces: Batch query task status/Cancel task

For more information, see Manage asynchronous tasks. You can batch query audio file recognition tasks submitted within 24 hours and cancel tasks that are in the PENDING state.

Error codes

If an error occurs, see Error codes to troubleshoot the issue.

If a task contains multiple subtasks, the overall task status is marked as SUCCEEDED if at least one subtask succeeds. You must check the subtask_status field to determine the result of each subtask.

Example of an error response:

{
    "task_id": "7bac899c-06ec-4a79-8875-xxxxxxxxxxxx",
    "task_status": "SUCCEEDED",
    "submit_time": "2024-12-16 16:30:59.170",
    "scheduled_time": "2024-12-16 16:30:59.204",
    "end_time": "2024-12-16 16:31:02.375",
    "results": [
        {
            "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_exaple_1.wav",
            "code": "InvalidFile.DownloadFailed",
            "message": "The audio file cannot be downloaded.",
            "subtask_status": "FAILED"
        }
    ],
    "task_metrics": {
        "TOTAL": 1,
        "SUCCEEDED": 0,
        "FAILED": 1
    }
}

FAQ

Features

Q: Is audio in Base64 encoding supported?

This service recognizes audio from publicly accessible URLs only. It does not support audio in Base64 encoding, binary streams, or local files.

Q: How do I provide an audio file as a publicly accessible URL?

You can typically follow these steps. This is a general guide, and the specific steps may vary for different storage products. We recommend that you upload the audio to Object Storage Service (OSS).

1. Choose a storage and hosting method

Examples include the following:

  • Object Storage Service (Recommended):

    • Use a cloud provider's object storage service, such as OSS. Upload the audio file to a bucket and set its access permissions to public.

    • Advantages: High availability, CDN acceleration support, and easy management.

  • Web server:

    • Place the audio file on a web server that supports HTTP/HTTPS access, such as Nginx or Apache.

    • Advantages: Suitable for small projects or local testing.

  • Content Delivery Network (CDN):

    • Host the audio file on a CDN and access it through the URL provided by the CDN.

    • Advantages: Accelerates file transfer, suitable for high-concurrency scenarios.

2. Upload the audio file

Upload the audio file based on your chosen storage/hosting method. For example:

  • Object Storage Service:

    • Log on to the cloud provider's console and create a bucket.

    • Upload the audio file and set its permissions to "public-read" or generate a temporary access link.

  • Web server:

    • Place the audio file in a specified directory on the server, such as /var/www/html/audio/.

    • Ensure the file is accessible via HTTP/HTTPS.

3. Generate a publicly accessible URL

For example:

  • Object Storage Service:

    • After uploading the file, the system automatically generates a public access URL, typically in the format https://<bucket-name>.<region>.aliyuncs.com/<file-name>.

    • For a more user-friendly domain name, you can attach a custom domain name and enable HTTPS.

  • Web server:

    • The access URL for the file is usually the server address plus the file path, such as https://your-domain.com/audio/file.mp3.

  • CDN:

    • After configuring CDN acceleration, use the URL provided by the CDN, such as https://cdn.your-domain.com/audio/file.mp3.

4. Verify the URL's availability

In a public network environment, ensure that the generated URL is accessible. For example:

  • Open the URL in a browser to check if the audio file can be played.

  • Use a tool, such as curl or Postman, to verify that the URL returns a correct HTTP response (status code 200).

When using the SDK to access a file stored in OSS, you cannot use a temporary URL with the oss:// prefix.

When using the RESTful API to access a file stored in OSS, you can use a temporary URL with the oss:// prefix:

Important
  • The temporary URL is valid for 48 hours and cannot be used after it expires. Do not use it in a production environment.

  • The API for obtaining an upload credential is limited to 100 QPS and does not support scaling out. Do not use it in production environments, high-concurrency scenarios, or stress testing scenarios.

  • For production environments, use a stable storage service such as OSS to ensure long-term file availability and avoid rate limiting issues.

Q: How long does it take to get the recognition result?

Tasks enter the PENDING state after submission. Queuing time (typically a few minutes) varies with the queue length and file duration. The longer the audio file, the longer the processing time.

Troubleshooting

If a code error occurs, refer to Error codes to troubleshoot.

Q: Why can't I get a result after continuous polling?

This may be because of rate limiting.

Q: Why is the audio not recognized (no recognition result)?

Check whether the audio format and sample rate are correct and meet the parameter constraints.

You can use the ffprobe tool to retrieve information about the audio container, codec, sample rate, and channels:

ffprobe -v error -show_entries format=format_name -show_entries stream=codec_name,sample_rate,channels -of default=noprint_wrappers=1 input.xxx