Fun-ASR audio file transcription iOS SDK-Alibaba Cloud Model Studio(Model Studio)-阿里云帮助中心

This document provides a detailed guide for using the Fun-ASR audio file transcription iOS SDK to convert speech into text.

User guide: Non-real-time speech recognition. For supported audio formats, file size limits, duration limits, and other input requirements, see Audio specifications.

Getting started

Get an API key: Get an API key and API host
Download the SDK and run the sample code:
- Download the latest SDK bundle.
- Unzip the ZIP package and add nuisdk.framework to your project.
- In Build Phases → Link Binary With Libraries, add nuisdk.framework.
- In General → Frameworks, Libraries, and Embedded Content, set nuisdk.framework to Embed & Sign.
- Open the sample project in Xcode. The sample code is in DashFunAsrFileTranscriberViewController.m. Replace the API key and try the feature.

Calling steps

Synchronous mode

Initialize the SDK.
Configure the required parameters as needed.
Call <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#8fe6ea298apzu" id="ecee78eec5geu">nui_file_trans_start</a> to start a recognition task (set async_request to false).
In the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#163c1ef871tqt" id="70da73c0cftk5">onFileTransEventCallback</a> interface, listen for the EVENT_FILE_TRANS_RESULT event to obtain the final recognition result.
Call <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#6c2931e9ae3eq" id="d31d1061b05os">nui_release</a> to release the SDK resources.

Asynchronous mode

Initialize the SDK.
Configure the required parameters as needed.
Call <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#8fe6ea298apzu" id="d93fe3709a64g">nui_file_trans_start</a> to start the recognition task (async_request set to true).
Call <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#d894d1c6f41ke" id="e996c50a46qto">nui_file_trans_query</a> to actively query the recognition progress or results.
In the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#163c1ef871tqt" id="ee8f5b7bd4mpi">onFileTransEventCallback</a> interface, listen for the EVENT_FILE_TRANS_QUERY_RESULT event to obtain the current query result.
In the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#163c1ef871tqt" id="4a5fa3d6740nh">onFileTransEventCallback</a> interface, listen for the EVENT_FILE_TRANS_RESULT event to obtain the final recognition result.
Call <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#6c2931e9ae3eq" id="d86e3e6e9e2z2">nui_release</a> to release the SDK resources.

Request parameters

Connection and control parameters

You can configure these by passing a JSON string to the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#05eab5125e2pm" id="ea3603d9fd8f3">nui_initialize</a> interface's parameters parameter.

Parameter example: The following shows a sample JSON string. Not all parameters are listed. Add any missing parameters as needed:

{
    "url": "wss://dashscope.aliyuncs.com/api-ws/v1/inference",
    "apikey": "st-****",
    "device_id": "my_device_id",
    "service_mode": "1"
}

Parameter description

Parameter	Type	Required	Description
`url`	`String`	Yes	Service endpoint. Fixed as `wss://dashscope.aliyuncs.com/api-ws/v1/inference`.
`apikey`	`String`	Yes	API key.
`service_mode`	`String`	Yes	Operation mode. For audio file transcription, this must be `"1"`.
`device_id`	`String`	Yes	A unique string that identifies the end user. Set this to the user ID within your app or a device-specific identifier generated by the client. This ID helps with log tracing and troubleshooting.
`debug_path`	`String`	No	Path where log files are stored. This parameter takes effect only if you set `save_log` to `YES` when calling nui_initialize. In that case, you must specify a log file path. Otherwise, an error occurs. The system keeps at most two local log files.
`max_log_file_size`	`int`	No	Maximum size (in bytes) for a log file. This parameter takes effect only if you set `save_log` to `YES` when calling nui_initialize. Default value: 104857600 (100 × 1024 × 1024 bytes, or 100 MiB).
`log_track_level`	`int`	No	Controls the filter level for logs sent externally through the log callback (onFileTransLogTrackCallback). Default value: 2. Valid values: 0: LOG_LEVEL_VERBOSE 1: LOG_LEVEL_DEBUG 2: LOG_LEVEL_INFO 3: LOG_LEVEL_WARNING 4: LOG_LEVEL_ERROR 5: LOG_LEVEL_NONE (disables this feature) Note: `log_track_level` and `level` (set via the nui_initialize interface) jointly determine which logs are finally sent in the callback. A log message appears in the callback only if its level is greater than or equal to both `log_track_level` and `level`. For example, if `log_track_level` is set to 2 (INFO) and `level` is set to 3 (WARNING), only WARNING-level logs (value ≥ 3) and higher appear in the callback.

Speech recognition effect parameters

You can use the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#763672f3f8dgw" id="0ab38e798dtrt">nui_set_params</a> interface to configure the nl_config parameter, or use the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#8fe6ea298apzu" id="f5ea550ae12nx">nui_file_trans_start</a> interface to configure all speech recognition effect parameters.

Parameter example: The following shows a sample JSON string. Not all parameters are listed. Add any missing parameters as needed:

{
    "file_urls": [
        "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav"
    ],
    "async_request": false,
    "nls_config": {
        "model":"fun-asr",
        "diarization_enabled": false,
        "parameters": {
            "speech_noise_threshold": 0.0
        }
    }
}

Parameter description

Parameter	Type	Required	Description
`file_urls`	`array[string]`	Yes	List of URLs for audio or video files to transcribe. Supports HTTP and HTTPS protocols. A single request supports only 1 URL. If your audio files are stored in OSS, the SDK does not support temporary URLs that start with the oss:// prefix. Audio formats: `aac`, `amr`, `avi`, `flac`, `flv`, `m4a`, `mkv`, `mov`, `mp3`, `mp4`, `mpeg`, `ogg`, `opus`, `wav`, `webm`, `wma`, `wmv` Important Because audio and video formats and their variants are numerous, it is impossible to test them all. The API cannot guarantee correct recognition for every format. Test your files to confirm they produce normal speech recognition results. Audio sampling rate: Any Audio file size and duration: Files must not exceed 2 GB and must be under 12 hours long. If your files exceed these limits, preprocess them to reduce their size. For best practices on file preprocessing, see Preprocess video files to improve transcription efficiency (for audio file transcription scenarios).
`async_request`	`boolean`	No	Whether the speech recognition request is asynchronous. Default value: `false`. Valid values: true: Asynchronous request false: Synchronous request
`apikey`	`string`	No	If the `apikey` in Connection and control parameters uses a temporary API key, update it here to prevent expiration.
`nls_config`	`object`	Yes	Core configuration object for speech recognition. Contains key parameters such as model selection and recognition effect controls.
`nls_config.model`	`string`	Yes	Speech recognition model.
`nls_config.special_word_filter`	`object`	No	Specifies sensitive words to handle during speech recognition and supports different handling methods for each word. If you omit this parameter, the system uses built-in sensitive word filtering logic. Words matching the Alibaba Cloud Model Studio sensitive word list are replaced with asterisks (``) of equal length. If you provide this parameter, you can apply the following sensitive word handling strategies: Replace with ``: Replaces matched sensitive words with the same number of `` characters. Filter out completely: Matching sensitive words are removed entirely from the recognition result. The value must be a JSON object with the following structure: `{ "filter_with_signed": { "word_list": ["test"] }, "filter_with_empty": { "word_list": ["start", "occur"] }, "system_reserved_filter": true }` JSON field descriptions: `filter_with_signed` Type: Object. Required: No. Description: The list of sensitive words to be replaced with ``. Matched words in the recognition result are replaced by `` characters of the same length. Example: With the JSON above, the phrase “help me test this code” becomes “help me * this code”. Inner fields: `word_list`: Array of strings listing sensitive words to replace. `filter_with_empty` Type: Object. Required: No. Description: Configures a list of sensitive words to remove (filter out) from the recognition result. Matching words are deleted entirely. Example: With the JSON above, the phrase “is the match about to start?” becomes “is the match about to?”. Inner fields: `word_list`: Array of strings listing sensitive words to remove completely. `system_reserved_filter` Type: Boolean. Required: No. Default value: true. Description: Whether to enable the system’s preset sensitive word rules. If set to `true`, the system also applies its built-in sensitive word filtering logic. Words matching the Alibaba Cloud Model Studio sensitive word list are replaced with asterisks (`*`) of equal length.
`nls_config.channel_id`	`array[integer]`	No	Indexes of sound channels to recognize in a multi-channel audio file. The index starts from 0. For example, [0] recognizes the first channel, and [0, 1] recognizes the first and second channels. If omitted, the first channel is processed by default. Important Each specified sound channel is billed separately. For example, a request for [0, 1] for a single file incurs two separate charges. Default value: `[0]`
`nls_config.diarization_enabled`	`boolean`	No	Automatic speaker diarization is disabled by default. This feature applies to single-channel audio only (not supported for multi-channel audio). When enabled, recognition results include the `speaker_id` field to distinguish speakers. Note If speaker diarization is enabled, keep the audio duration under 2 hours. Audio exceeding 2 hours may cause recognition failures or timeouts. For an example of `speaker_id`, see Recognition result description.
`nls_config.speaker_count`	`integer`	No	Reference value for the number of speakers. To use this feature, set `diarization_enabled` to `true`. By default, the system automatically determines the number of speakers. If you set this parameter, it only guides the algorithm to aim for the specified number—it does not guarantee that exact number. Valid range: `[2, 100]`. Because this feature distinguishes multiple speakers, the minimum value is 2.
`nls_config.vocabulary_id`	`string`	No	ID of a hotword vocabulary list to improve recognition accuracy for specific terms. This parameter applies to v2 and later models. For instructions on using hotwords, see Customize hotwords.
`nls_config.language_hints`	`array[string]`	No	The language code for recognition. If the source language is unknown, leave it unset and the model detects the language automatically. The system reads only the first value in the array. Any extra values are ignored. Click to view the supported language codes fun-asr, fun-asr-2025-11-07, fun-asr-mtl, fun-asr-mtl-2025-08-25: zh: Chinese en: English ja: Japanese ko: Korean vi: Vietnamese th: Thai id: Indonesian ms: Malay tl: Filipino hi: Hindi ar: Arabic fr: French de: German es: Spanish pt: Portuguese ru: Russian it: Italian nl: Dutch sv: Swedish da: Danish fi: Finnish no: Norwegian el: Greek pl: Polish cs: Czech hu: Hungarian ro: Romanian bg: Bulgarian hr: Croatian sk: Slovak fun-asr-2025-08-25: zh: Chinese en: English
`nls_config.parameters`	`object`	No	Configures additional parameters as a JSON object.

Key interfaces

NeoNui

nui_initialize

Initialize the speech recognition SDK instance. The SDK is implemented as a singleton and must not be re-initialized before calling <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#6c2931e9ae3eq" id="996c5fa78fvqe">nui_release</a>.

Method signature

-(NuiResultCode) nui_initialize:(const char *)parameters
                       logLevel:(NuiSdkLogLevel)level
                        saveLog:(BOOL)save_log;

Parameter description

Parameter	Type	Description
`parameters`	`char*`	JSON string containing authentication, connection, and debug parameters. See Connection and control parameters.
`level`	`NuiSdkLogLevel`	Controls the logging level for SDK internal logs.
`save_log`	`BOOL`	Whether to save local logs. If set to `YES`, specify a path using `debug_path` in Connection and control parameters, and optionally set the file size using `max_log_file_size`.

Return value description

Returns an error code. For details, see Error code reference.

nui_set_params

This API is used to independently set or update the nls_config parameter. If all parameters are provided at once in <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#8fe6ea298apzu" id="6177927196axi">nui_file_trans_start</a>, you do not need to call this method.

Method signature

-(NuiResultCode) nui_set_params:(const char *)params;

Parameter description

Parameter

Type

Description

params

char*

The nls_config parameter from Speech recognition effect parameters. Parameters outside nls_config cannot be set using this method.

Example:

{
    "nls_config": {
        "model":"fun-asr",
        "diarization_enabled": false
    }
}

Return value description

Returns an error code. For details, see Error code reference.

nui_file_trans_start

Starts transcription.

Method signature

-(NuiResultCode) nui_file_trans_start(const char *params, char *task_id);

Parameter description

Parameter

Type

Description

params

char*

Speech recognition effect parameters.

Example:

{
    "file_urls": [
        "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav"
    ],
    "async_request": false,
    "nls_config": {
        "model":"fun-asr",
        "diarization_enabled": false
    }
}

task_id

char*

Task ID. The SDK generates a random string. You receive the task_id after this method succeeds.

Return value description

Returns an error code. For details, see Error code reference.

nui_file_trans_query

This API is used to query the current status and result of an asynchronous task. After a successful call, the result is returned in the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#163c1ef871tqt" id="96201e4feajlr">onFileTransEventCallback</a> callback through the EVENT_FILE_TRANS_QUERY_RESULT event.

Method signature

-(NuiResultCode) nui_file_trans_query(const char *task_id);

Parameter description

Parameter	Type	Description
`task_id`	`char*`	Task ID to query.

Return value description

Returns an error code. For details, see Error code reference.

nui_file_trans_cancel

Immediately cancels the current task.

Method signature

-(NuiResultCode) nui_file_trans_cancel(const char *task_id);

Parameter description

Parameter	Type	Description
`task_id`	`char*`	Task ID to cancel.

Return value description

Returns an error code. For details, see Error code reference.

nui_release

Releases all internal SDK resources and forcibly terminates all ongoing tasks. After this method is called, the SDK instance becomes unavailable. To use the SDK again, you must call <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#05eab5125e2pm" id="b3bd8352dbvus">nui_initialize</a> for initialization.

Method signature
```
-(NuiResultCode) nui_release;
```
Return value description

Returns an error code. For details, see Error code reference.

nui_get_version

Gets the current SDK version information.

Method signature
```
-(const char*) nui_get_version;
```
Return value description

Current SDK version information.

NeoNuiSdkDelegate: Callback listeners

onFileTransEventCallback: Listen for events and speech recognition results

Method signature

-(void) onFileTransEventCallback:(NuiCallbackEvent)nuiEvent
                       asrResult:(const char *)asr_result
                          taskId:(const char *)task_id
                        ifFinish:(BOOL)finish
                         retCode:(int)code;

Parameter description

Parameter	Type	Description
`nuiEvent`	`<a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#981ff433acpmr" id="6311be1f32d6h">NuiCallbackEvent</a>`	Callback event.
`asr_result`	`char*`	Speech recognition result.
`task_id`	`char*`	Task ID.
`finish`	`BOOL`	Flag indicating whether the current recognition round has finished.
`code`	`int`	Error code. Valid only for the EVENT_ASR_ERROR event. For details, see Error code reference.

onFileTransLogTrackCallback: Listen for trace logs

This callback receives detailed internal SDK logs to help with debugging and issue diagnosis.

-(void)onFileTransLogTrackCallback:(NuiSdkLogLevel)level
                        logMessage:(const char *)log;

NuiCallbackEvent: Event types

Event	Description
EVENT_FILE_TRANS_CONNECTED	Successfully connected to the service.
EVENT_FILE_TRANS_UPLOADED	Successfully uploaded the audio file for recognition.
EVENT_FILE_TRANS_QUERY_RESULT	Query task result.
EVENT_FILE_TRANS_RESULT	Final recognition result.
EVENT_ASR_ERROR	An error occurred during speech recognition.