This document provides a detailed guide for using the Fun-ASR audio file transcription iOS SDK to convert speech into text.
User guide: Non-real-time speech recognition. For supported audio formats, file size limits, duration limits, and other input requirements, see Audio specifications.
Getting started
-
Get an API key: Get an API key and API host
-
Download the SDK and run the sample code:
-
Unzip the ZIP package and add nuisdk.framework to your project.
-
In Build Phases → Link Binary With Libraries, add nuisdk.framework.
-
In General → Frameworks, Libraries, and Embedded Content, set nuisdk.framework to Embed & Sign.
-
Open the sample project in Xcode. The sample code is in
DashFunAsrFileTranscriberViewController.m. Replace the API key and try the feature.
Calling steps
Synchronous mode
-
Initialize the SDK.
-
Configure the required parameters as needed.
-
Call
<a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#8fe6ea298apzu" id="ecee78eec5geu">nui_file_trans_start</a>to start a recognition task (setasync_requesttofalse). -
In the
<a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#163c1ef871tqt" id="70da73c0cftk5">onFileTransEventCallback</a>interface, listen for theEVENT_FILE_TRANS_RESULTevent to obtain the final recognition result. -
Call
<a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#6c2931e9ae3eq" id="d31d1061b05os">nui_release</a>to release the SDK resources.
Asynchronous mode
-
Initialize the SDK.
-
Configure the required parameters as needed.
-
Call
<a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#8fe6ea298apzu" id="d93fe3709a64g">nui_file_trans_start</a>to start the recognition task (async_requestset totrue). -
Call
<a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#d894d1c6f41ke" id="e996c50a46qto">nui_file_trans_query</a>to actively query the recognition progress or results. -
In the
<a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#163c1ef871tqt" id="ee8f5b7bd4mpi">onFileTransEventCallback</a>interface, listen for theEVENT_FILE_TRANS_QUERY_RESULTevent to obtain the current query result. -
In the
<a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#163c1ef871tqt" id="4a5fa3d6740nh">onFileTransEventCallback</a>interface, listen for theEVENT_FILE_TRANS_RESULTevent to obtain the final recognition result. -
Call
<a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#6c2931e9ae3eq" id="d86e3e6e9e2z2">nui_release</a>to release the SDK resources.
Request parameters
Connection and control parameters
You can configure these by passing a JSON string to the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#05eab5125e2pm" id="ea3603d9fd8f3">nui_initialize</a> interface's parameters parameter.
-
Parameter example: The following shows a sample JSON string. Not all parameters are listed. Add any missing parameters as needed:
{ "url": "wss://dashscope.aliyuncs.com/api-ws/v1/inference", "apikey": "st-****", "device_id": "my_device_id", "service_mode": "1" } -
Parameter description
Parameter
Type
Required
Description
urlStringYes
Service endpoint. Fixed as
wss://dashscope.aliyuncs.com/api-ws/v1/inference.apikeyStringYes
API key.
service_modeStringYes
Operation mode. For audio file transcription, this must be
"1".device_idStringYes
A unique string that identifies the end user. Set this to the user ID within your app or a device-specific identifier generated by the client. This ID helps with log tracing and troubleshooting.
debug_pathStringNo
Path where log files are stored.
This parameter takes effect only if you set
save_logtoYESwhen calling nui_initialize. In that case, you must specify a log file path. Otherwise, an error occurs.The system keeps at most two local log files.
max_log_file_sizeintNo
Maximum size (in bytes) for a log file.
This parameter takes effect only if you set
save_logtoYESwhen calling nui_initialize.Default value: 104857600 (100 × 1024 × 1024 bytes, or 100 MiB).
log_track_levelintNo
Controls the filter level for logs sent externally through the log callback (onFileTransLogTrackCallback).
Default value: 2.
Valid values:
-
0: LOG_LEVEL_VERBOSE
-
1: LOG_LEVEL_DEBUG
-
2: LOG_LEVEL_INFO
-
3: LOG_LEVEL_WARNING
-
4: LOG_LEVEL_ERROR
-
5: LOG_LEVEL_NONE (disables this feature)
Note:
log_track_levelandlevel(set via the nui_initialize interface) jointly determine which logs are finally sent in the callback. A log message appears in the callback only if its level is greater than or equal to bothlog_track_levelandlevel. For example, iflog_track_levelis set to 2 (INFO) andlevelis set to 3 (WARNING), only WARNING-level logs (value ≥ 3) and higher appear in the callback. -
Speech recognition effect parameters
You can use the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#763672f3f8dgw" id="0ab38e798dtrt">nui_set_params</a> interface to configure the nl_config parameter, or use the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#8fe6ea298apzu" id="f5ea550ae12nx">nui_file_trans_start</a> interface to configure all speech recognition effect parameters.
-
Parameter example: The following shows a sample JSON string. Not all parameters are listed. Add any missing parameters as needed:
{ "file_urls": [ "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav" ], "async_request": false, "nls_config": { "model":"fun-asr", "diarization_enabled": false, "parameters": { "speech_noise_threshold": 0.0 } } } -
Parameter description
Parameter
Type
Required
Description
file_urlsarray[string]Yes
List of URLs for audio or video files to transcribe. Supports HTTP and HTTPS protocols. A single request supports only 1 URL.
If your audio files are stored in OSS, the SDK does not support temporary URLs that start with the oss:// prefix.
-
Audio formats:
aac,amr,avi,flac,flv,m4a,mkv,mov,mp3,mp4,mpeg,ogg,opus,wav,webm,wma,wmvImportantBecause audio and video formats and their variants are numerous, it is impossible to test them all. The API cannot guarantee correct recognition for every format. Test your files to confirm they produce normal speech recognition results.
-
Audio sampling rate: Any
-
Audio file size and duration: Files must not exceed 2 GB and must be under 12 hours long.
If your files exceed these limits, preprocess them to reduce their size. For best practices on file preprocessing, see Preprocess video files to improve transcription efficiency (for audio file transcription scenarios).
async_requestbooleanNo
Whether the speech recognition request is asynchronous.
Default value:
false.Valid values:
-
true: Asynchronous request
-
false: Synchronous request
apikeystringNo
If the
apikeyin Connection and control parameters uses a temporary API key, update it here to prevent expiration.nls_configobjectYes
Core configuration object for speech recognition. Contains key parameters such as model selection and recognition effect controls.
nls_config.modelstringYes
Speech recognition model.
nls_config.special_word_filterobjectNo
Specifies sensitive words to handle during speech recognition and supports different handling methods for each word.
If you omit this parameter, the system uses built-in sensitive word filtering logic. Words matching the Alibaba Cloud Model Studio sensitive word list are replaced with asterisks (
*) of equal length.If you provide this parameter, you can apply the following sensitive word handling strategies:
-
Replace with
*: Replaces matched sensitive words with the same number of*characters. -
Filter out completely: Matching sensitive words are removed entirely from the recognition result.
The value must be a JSON object with the following structure:
{ "filter_with_signed": { "word_list": ["test"] }, "filter_with_empty": { "word_list": ["start", "occur"] }, "system_reserved_filter": true }JSON field descriptions:
-
filter_with_signed-
Type: Object.
-
Required: No.
-
Description: The list of sensitive words to be replaced with
*. Matched words in the recognition result are replaced by*characters of the same length. -
Example: With the JSON above, the phrase “help me test this code” becomes “help me ** this code”.
-
Inner fields:
-
word_list: Array of strings listing sensitive words to replace.
-
-
-
filter_with_empty-
Type: Object.
-
Required: No.
-
Description: Configures a list of sensitive words to remove (filter out) from the recognition result. Matching words are deleted entirely.
-
Example: With the JSON above, the phrase “is the match about to start?” becomes “is the match about to?”.
-
Inner fields:
-
word_list: Array of strings listing sensitive words to remove completely.
-
-
-
system_reserved_filter-
Type: Boolean.
-
Required: No.
-
Default value: true.
-
Description: Whether to enable the system’s preset sensitive word rules. If set to
true, the system also applies its built-in sensitive word filtering logic. Words matching the Alibaba Cloud Model Studio sensitive word list are replaced with asterisks (*) of equal length.
-
nls_config.channel_idarray[integer]No
Indexes of sound channels to recognize in a multi-channel audio file. The index starts from 0. For example, [0] recognizes the first channel, and [0, 1] recognizes the first and second channels. If omitted, the first channel is processed by default.
ImportantEach specified sound channel is billed separately. For example, a request for [0, 1] for a single file incurs two separate charges.
Default value:
[0]nls_config.diarization_enabledbooleanNo
Automatic speaker diarization is disabled by default. This feature applies to single-channel audio only (not supported for multi-channel audio).
When enabled, recognition results include the
speaker_idfield to distinguish speakers.NoteIf speaker diarization is enabled, keep the audio duration under 2 hours. Audio exceeding 2 hours may cause recognition failures or timeouts.
For an example of
speaker_id, see Recognition result description.nls_config.speaker_countintegerNo
Reference value for the number of speakers. To use this feature, set
diarization_enabledtotrue.By default, the system automatically determines the number of speakers. If you set this parameter, it only guides the algorithm to aim for the specified number—it does not guarantee that exact number.
Valid range:
[2, 100]. Because this feature distinguishes multiple speakers, the minimum value is 2.nls_config.vocabulary_idstringNo
ID of a hotword vocabulary list to improve recognition accuracy for specific terms. This parameter applies to v2 and later models. For instructions on using hotwords, see Customize hotwords.
nls_config.language_hintsarray[string]No
The language code for recognition. If the source language is unknown, leave it unset and the model detects the language automatically.
The system reads only the first value in the array. Any extra values are ignored.
nls_config.parametersobjectNo
Configures additional parameters as a JSON object.
-
Key interfaces
NeoNui
nui_initialize
Initialize the speech recognition SDK instance. The SDK is implemented as a singleton and must not be re-initialized before calling <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#6c2931e9ae3eq" id="996c5fa78fvqe">nui_release</a>.
-
Method signature
-(NuiResultCode) nui_initialize:(const char *)parameters logLevel:(NuiSdkLogLevel)level saveLog:(BOOL)save_log; -
Parameter description
Parameter
Type
Description
parameterschar*JSON string containing authentication, connection, and debug parameters. See Connection and control parameters.
levelNuiSdkLogLevelControls the logging level for SDK internal logs.
save_logBOOLWhether to save local logs. If set to
YES, specify a path usingdebug_pathin Connection and control parameters, and optionally set the file size usingmax_log_file_size. -
Return value description
Returns an error code. For details, see Error code reference.
nui_set_params
This API is used to independently set or update the nls_config parameter. If all parameters are provided at once in <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#8fe6ea298apzu" id="6177927196axi">nui_file_trans_start</a>, you do not need to call this method.
-
Method signature
-(NuiResultCode) nui_set_params:(const char *)params; -
Parameter description
Parameter
Type
Description
paramschar*The
nls_configparameter from Speech recognition effect parameters. Parameters outsidenls_configcannot be set using this method.Example:
{ "nls_config": { "model":"fun-asr", "diarization_enabled": false } } -
Return value description
Returns an error code. For details, see Error code reference.
nui_file_trans_start
Starts transcription.
-
Method signature
-(NuiResultCode) nui_file_trans_start(const char *params, char *task_id); -
Parameter description
Parameter
Type
Description
paramschar*Speech recognition effect parameters.
Example:
{ "file_urls": [ "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav" ], "async_request": false, "nls_config": { "model":"fun-asr", "diarization_enabled": false } }task_idchar*Task ID. The SDK generates a random string. You receive the task_id after this method succeeds.
-
Return value description
Returns an error code. For details, see Error code reference.
nui_file_trans_query
This API is used to query the current status and result of an asynchronous task. After a successful call, the result is returned in the <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#163c1ef871tqt" id="96201e4feajlr">onFileTransEventCallback</a> callback through the EVENT_FILE_TRANS_QUERY_RESULT event.
-
Method signature
-(NuiResultCode) nui_file_trans_query(const char *task_id); -
Parameter description
Parameter
Type
Description
task_idchar*Task ID to query.
-
Return value description
Returns an error code. For details, see Error code reference.
nui_file_trans_cancel
Immediately cancels the current task.
-
Method signature
-(NuiResultCode) nui_file_trans_cancel(const char *task_id); -
Parameter description
Parameter
Type
Description
task_idchar*Task ID to cancel.
-
Return value description
Returns an error code. For details, see Error code reference.
nui_release
Releases all internal SDK resources and forcibly terminates all ongoing tasks. After this method is called, the SDK instance becomes unavailable. To use the SDK again, you must call <a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#05eab5125e2pm" id="b3bd8352dbvus">nui_initialize</a> for initialization.
-
Method signature
-(NuiResultCode) nui_release; -
Return value description
Returns an error code. For details, see Error code reference.
nui_get_version
Gets the current SDK version information.
-
Method signature
-(const char*) nui_get_version; -
Return value description
Current SDK version information.
NeoNuiSdkDelegate: Callback listeners
onFileTransEventCallback: Listen for events and speech recognition results
-
Method signature
-(void) onFileTransEventCallback:(NuiCallbackEvent)nuiEvent asrResult:(const char *)asr_result taskId:(const char *)task_id ifFinish:(BOOL)finish retCode:(int)code; -
Parameter description
Parameter
Type
Description
nuiEvent<a baseurl="t3182910_v1_0_0.xdita" data-node="6224415" data-root="85177" data-tag="xref" href="#981ff433acpmr" id="6311be1f32d6h">NuiCallbackEvent</a>Callback event.
asr_resultchar*Speech recognition result.
task_idchar*Task ID.
finishBOOLFlag indicating whether the current recognition round has finished.
codeintError code. Valid only for the EVENT_ASR_ERROR event. For details, see Error code reference.
onFileTransLogTrackCallback: Listen for trace logs
This callback receives detailed internal SDK logs to help with debugging and issue diagnosis.
-(void)onFileTransLogTrackCallback:(NuiSdkLogLevel)level
logMessage:(const char *)log;
NuiCallbackEvent: Event types
|
Event |
Description |
|
EVENT_FILE_TRANS_CONNECTED |
Successfully connected to the service. |
|
EVENT_FILE_TRANS_UPLOADED |
Successfully uploaded the audio file for recognition. |
|
EVENT_FILE_TRANS_QUERY_RESULT |
Query task result. |
|
EVENT_FILE_TRANS_RESULT |
Final recognition result. |
|
EVENT_ASR_ERROR |
An error occurred during speech recognition. |