Fun-ASR audio file transcription iOS SDK

更新时间:
复制 MD 格式

This document provides a detailed guide for using the Fun-ASR audio file transcription iOS SDK to convert speech into text.

User guide: Non-real-time speech recognition. For supported audio formats, file size limits, duration limits, and other input requirements, see Audio specifications.

Getting started

  1. Get an API key: Get an API key and API host

  2. Download the SDK and run the sample code:

    • Download the latest SDK bundle.

    • Unzip the ZIP package and add nuisdk.framework to your project.

    • In Build Phases → Link Binary With Libraries, add nuisdk.framework.

    • In General → Frameworks, Libraries, and Embedded Content, set nuisdk.framework to Embed & Sign.

    • Open the sample project in Xcode. The sample code is in DashFunAsrFileTranscriberViewController.m. Replace the API key and try the feature.

Calling steps

Synchronous mode

  1. Initialize the SDK.

  2. Configure the required parameters as needed.

  3. Call <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#8fe6ea298apzu" id="ecee78eec5geu">nui_file_trans_start</a> to start a recognition task (set async_request to false).

  4. In the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#163c1ef871tqt" id="70da73c0cftk5">onFileTransEventCallback</a> interface, listen for the EVENT_FILE_TRANS_RESULT event to obtain the final recognition result.

  5. Call <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#6c2931e9ae3eq" id="d31d1061b05os">nui_release</a> to release the SDK resources.

Asynchronous mode

  1. Initialize the SDK.

  2. Configure the required parameters as needed.

  3. Call <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#8fe6ea298apzu" id="d93fe3709a64g">nui_file_trans_start</a> to start the recognition task (async_request set to true).

  4. Call <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#d894d1c6f41ke" id="e996c50a46qto">nui_file_trans_query</a> to actively query the recognition progress or results.

  5. In the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#163c1ef871tqt" id="ee8f5b7bd4mpi">onFileTransEventCallback</a> interface, listen for the EVENT_FILE_TRANS_QUERY_RESULT event to obtain the current query result.

  6. In the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#163c1ef871tqt" id="4a5fa3d6740nh">onFileTransEventCallback</a> interface, listen for the EVENT_FILE_TRANS_RESULT event to obtain the final recognition result.

  7. Call <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#6c2931e9ae3eq" id="d86e3e6e9e2z2">nui_release</a> to release the SDK resources.

Request parameters

Connection and control parameters

You can configure these by passing a JSON string to the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#05eab5125e2pm" id="ea3603d9fd8f3">nui_initialize</a> interface's parameters parameter.

  • Parameter example: The following shows a sample JSON string. Not all parameters are listed. Add any missing parameters as needed:

    {
        "url": "wss://dashscope.aliyuncs.com/api-ws/v1/inference",
        "apikey": "st-****",
        "device_id": "my_device_id",
        "service_mode": "1"
    }
  • Parameter description

    Parameter

    Type

    Required

    Description

    url

    String

    Yes

    Service endpoint. Fixed as wss://dashscope.aliyuncs.com/api-ws/v1/inference.

    apikey

    String

    Yes

    API key.

    service_mode

    String

    Yes

    Operation mode. For audio file transcription, this must be "1".

    device_id

    String

    Yes

    A unique string that identifies the end user. Set this to the user ID within your app or a device-specific identifier generated by the client. This ID helps with log tracing and troubleshooting.

    debug_path

    String

    No

    Path where log files are stored.

    This parameter takes effect only if you set save_log to YES when calling nui_initialize. In that case, you must specify a log file path. Otherwise, an error occurs.

    The system keeps at most two local log files.

    max_log_file_size

    int

    No

    Maximum size (in bytes) for a log file.

    This parameter takes effect only if you set save_log to YES when calling nui_initialize.

    Default value: 104857600 (100 × 1024 × 1024 bytes, or 100 MiB).

    log_track_level

    int

    No

    Controls the filter level for logs sent externally through the log callback (onFileTransLogTrackCallback).

    Default value: 2.

    Valid values:

    • 0: LOG_LEVEL_VERBOSE

    • 1: LOG_LEVEL_DEBUG

    • 2: LOG_LEVEL_INFO

    • 3: LOG_LEVEL_WARNING

    • 4: LOG_LEVEL_ERROR

    • 5: LOG_LEVEL_NONE (disables this feature)

    Note: log_track_level and level (set via the nui_initialize interface) jointly determine which logs are finally sent in the callback. A log message appears in the callback only if its level is greater than or equal to both log_track_level and level. For example, if log_track_level is set to 2 (INFO) and level is set to 3 (WARNING), only WARNING-level logs (value ≥ 3) and higher appear in the callback.

Speech recognition effect parameters

You can use the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#763672f3f8dgw" id="0ab38e798dtrt">nui_set_params</a> interface to configure the nl_config parameter, or use the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#8fe6ea298apzu" id="f5ea550ae12nx">nui_file_trans_start</a> interface to configure all speech recognition effect parameters.

  • Parameter example: The following shows a sample JSON string. Not all parameters are listed. Add any missing parameters as needed:

    {
        "file_urls": [
            "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav"
        ],
        "async_request": false,
        "nls_config": {
            "model":"fun-asr",
            "diarization_enabled": false,
            "parameters": {
                "speech_noise_threshold": 0.0
            }
        }
    }
  • Parameter description

    Parameter

    Type

    Required

    Description

    file_urls

    array[string]

    Yes

    List of URLs for audio or video files to transcribe. Supports HTTP and HTTPS protocols. A single request supports only 1 URL.

    If your audio files are stored in OSS, the SDK does not support temporary URLs that start with the oss:// prefix.

    • Audio formats: aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

      Important

      Because audio and video formats and their variants are numerous, it is impossible to test them all. The API cannot guarantee correct recognition for every format. Test your files to confirm they produce normal speech recognition results.

    • Audio sampling rate: Any

    • Audio file size and duration: Files must not exceed 2 GB and must be under 12 hours long.

      If your files exceed these limits, preprocess them to reduce their size. For best practices on file preprocessing, see Preprocess video files to improve transcription efficiency (for audio file transcription scenarios).

    async_request

    boolean

    No

    Whether the speech recognition request is asynchronous.

    Default value: false.

    Valid values:

    • true: Asynchronous request

    • false: Synchronous request

    apikey

    string

    No

    If the apikey in Connection and control parameters uses a temporary API key, update it here to prevent expiration.

    nls_config

    object

    Yes

    Core configuration object for speech recognition. Contains key parameters such as model selection and recognition effect controls.

    nls_config.model

    string

    Yes

    Speech recognition model.

    nls_config.special_word_filter

    object

    No

    Specifies sensitive words to handle during speech recognition and supports different handling methods for each word.

    If you omit this parameter, the system uses built-in sensitive word filtering logic. Words matching the Alibaba Cloud Model Studio sensitive word list are replaced with asterisks (*) of equal length.

    If you provide this parameter, you can apply the following sensitive word handling strategies:

    • Replace with *: Replaces matched sensitive words with the same number of * characters.

    • Filter out completely: Matching sensitive words are removed entirely from the recognition result.

    The value must be a JSON object with the following structure:

    {
      "filter_with_signed": {
        "word_list": ["test"]
      },
      "filter_with_empty": {
        "word_list": ["start", "occur"]
      },
      "system_reserved_filter": true
    }

    JSON field descriptions:

    • filter_with_signed

      • Type: Object.

      • Required: No.

      • Description: The list of sensitive words to be replaced with *. Matched words in the recognition result are replaced by * characters of the same length.

      • Example: With the JSON above, the phrase “help me test this code” becomes “help me ** this code”.

      • Inner fields:

        • word_list: Array of strings listing sensitive words to replace.

    • filter_with_empty

      • Type: Object.

      • Required: No.

      • Description: Configures a list of sensitive words to remove (filter out) from the recognition result. Matching words are deleted entirely.

      • Example: With the JSON above, the phrase “is the match about to start?” becomes “is the match about to?”.

      • Inner fields:

        • word_list: Array of strings listing sensitive words to remove completely.

    • system_reserved_filter

      • Type: Boolean.

      • Required: No.

      • Default value: true.

      • Description: Whether to enable the system’s preset sensitive word rules. If set to true, the system also applies its built-in sensitive word filtering logic. Words matching the Alibaba Cloud Model Studio sensitive word list are replaced with asterisks (*) of equal length.

    nls_config.channel_id

    array[integer]

    No

    Indexes of sound channels to recognize in a multi-channel audio file. The index starts from 0. For example, [0] recognizes the first channel, and [0, 1] recognizes the first and second channels. If omitted, the first channel is processed by default.

    Important

    Each specified sound channel is billed separately. For example, a request for [0, 1] for a single file incurs two separate charges.

    Default value: [0]

    nls_config.diarization_enabled

    boolean

    No

    Automatic speaker diarization is disabled by default. This feature applies to single-channel audio only (not supported for multi-channel audio).

    When enabled, recognition results include the speaker_id field to distinguish speakers.

    Note

    If speaker diarization is enabled, keep the audio duration under 2 hours. Audio exceeding 2 hours may cause recognition failures or timeouts.

    For an example of speaker_id, see Recognition result description.

    nls_config.speaker_count

    integer

    No

    Reference value for the number of speakers. To use this feature, set diarization_enabled to true.

    By default, the system automatically determines the number of speakers. If you set this parameter, it only guides the algorithm to aim for the specified number—it does not guarantee that exact number.

    Valid range: [2, 100]. Because this feature distinguishes multiple speakers, the minimum value is 2.

    nls_config.vocabulary_id

    string

    No

    ID of a hotword vocabulary list to improve recognition accuracy for specific terms. This parameter applies to v2 and later models. For instructions on using hotwords, see Customize hotwords.

    nls_config.language_hints

    array[string]

    No

    The language code for recognition. If the source language is unknown, leave it unset and the model detects the language automatically.

    The system reads only the first value in the array. Any extra values are ignored.

    Click to view the supported language codes

    • fun-asr, fun-asr-2025-11-07, fun-asr-mtl, fun-asr-mtl-2025-08-25:

      • zh: Chinese

      • en: English

      • ja: Japanese

      • ko: Korean

      • vi: Vietnamese

      • th: Thai

      • id: Indonesian

      • ms: Malay

      • tl: Filipino

      • hi: Hindi

      • ar: Arabic

      • fr: French

      • de: German

      • es: Spanish

      • pt: Portuguese

      • ru: Russian

      • it: Italian

      • nl: Dutch

      • sv: Swedish

      • da: Danish

      • fi: Finnish

      • no: Norwegian

      • el: Greek

      • pl: Polish

      • cs: Czech

      • hu: Hungarian

      • ro: Romanian

      • bg: Bulgarian

      • hr: Croatian

      • sk: Slovak

    • fun-asr-2025-08-25:

      • zh: Chinese

      • en: English

    nls_config.parameters

    object

    No

    Configures additional parameters as a JSON object.

Key interfaces

NeoNui

nui_initialize

Initialize the speech recognition SDK instance. The SDK is implemented as a singleton and must not be re-initialized before calling <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#6c2931e9ae3eq" id="996c5fa78fvqe">nui_release</a>.

  • Method signature

    -(NuiResultCode) nui_initialize:(const char *)parameters
                           logLevel:(NuiSdkLogLevel)level
                            saveLog:(BOOL)save_log;
  • Parameter description

    Parameter

    Type

    Description

    parameters

    char*

    JSON string containing authentication, connection, and debug parameters. See Connection and control parameters.

    level

    NuiSdkLogLevel

    Controls the logging level for SDK internal logs.

    save_log

    BOOL

    Whether to save local logs. If set to YES, specify a path using debug_path in Connection and control parameters, and optionally set the file size using max_log_file_size.

  • Return value description

    Returns an error code. For details, see Error code reference.

nui_set_params

This API is used to independently set or update the nls_config parameter. If all parameters are provided at once in <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#8fe6ea298apzu" id="6177927196axi">nui_file_trans_start</a>, you do not need to call this method.

  • Method signature

    -(NuiResultCode) nui_set_params:(const char *)params;
  • Parameter description

    Parameter

    Type

    Description

    params

    char*

    The nls_config parameter from Speech recognition effect parameters. Parameters outside nls_config cannot be set using this method.

    Example:

    {
        "nls_config": {
            "model":"fun-asr",
            "diarization_enabled": false
        }
    }
  • Return value description

    Returns an error code. For details, see Error code reference.

nui_file_trans_start

Starts transcription.

  • Method signature

    -(NuiResultCode) nui_file_trans_start(const char *params, char *task_id);
  • Parameter description

    Parameter

    Type

    Description

    params

    char*

    Speech recognition effect parameters.

    Example:

    {
        "file_urls": [
            "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav"
        ],
        "async_request": false,
        "nls_config": {
            "model":"fun-asr",
            "diarization_enabled": false
        }
    }

    task_id

    char*

    Task ID. The SDK generates a random string. You receive the task_id after this method succeeds.

  • Return value description

    Returns an error code. For details, see Error code reference.

nui_file_trans_query

This API is used to query the current status and result of an asynchronous task. After a successful call, the result is returned in the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#163c1ef871tqt" id="96201e4feajlr">onFileTransEventCallback</a> callback through the EVENT_FILE_TRANS_QUERY_RESULT event.

  • Method signature

    -(NuiResultCode) nui_file_trans_query(const char *task_id);
  • Parameter description

    Parameter

    Type

    Description

    task_id

    char*

    Task ID to query.

  • Return value description

    Returns an error code. For details, see Error code reference.

nui_file_trans_cancel

Immediately cancels the current task.

  • Method signature

    -(NuiResultCode) nui_file_trans_cancel(const char *task_id);
  • Parameter description

    Parameter

    Type

    Description

    task_id

    char*

    Task ID to cancel.

  • Return value description

    Returns an error code. For details, see Error code reference.

nui_release

Releases all internal SDK resources and forcibly terminates all ongoing tasks. After this method is called, the SDK instance becomes unavailable. To use the SDK again, you must call <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#05eab5125e2pm" id="b3bd8352dbvus">nui_initialize</a> for initialization.

  • Method signature

    -(NuiResultCode) nui_release;
  • Return value description

    Returns an error code. For details, see Error code reference.

nui_get_version

Gets the current SDK version information.

  • Method signature

    -(const char*) nui_get_version;
  • Return value description

    Current SDK version information.

NeoNuiSdkDelegate: Callback listeners

onFileTransEventCallback: Listen for events and speech recognition results

  • Method signature

    -(void) onFileTransEventCallback:(NuiCallbackEvent)nuiEvent
                           asrResult:(const char *)asr_result
                              taskId:(const char *)task_id
                            ifFinish:(BOOL)finish
                             retCode:(int)code;
  • Parameter description

    Parameter

    Type

    Description

    nuiEvent

    <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#981ff433acpmr" id="6311be1f32d6h">NuiCallbackEvent</a>

    Callback event.

    asr_result

    char*

    Speech recognition result.

    task_id

    char*

    Task ID.

    finish

    BOOL

    Flag indicating whether the current recognition round has finished.

    code

    int

    Error code. Valid only for the EVENT_ASR_ERROR event. For details, see Error code reference.

onFileTransLogTrackCallback: Listen for trace logs

This callback receives detailed internal SDK logs to help with debugging and issue diagnosis.

-(void)onFileTransLogTrackCallback:(NuiSdkLogLevel)level
                        logMessage:(const char *)log;

NuiCallbackEvent: Event types

Event

Description

EVENT_FILE_TRANS_CONNECTED

Successfully connected to the service.

EVENT_FILE_TRANS_UPLOADED

Successfully uploaded the audio file for recognition.

EVENT_FILE_TRANS_QUERY_RESULT

Query task result.

EVENT_FILE_TRANS_RESULT

Final recognition result.

EVENT_ASR_ERROR

An error occurred during speech recognition.