This document explains how to use the iOS NUI SDK from Alibaba Cloud Intelligent Speech Interaction, covering SDK download, installation, key APIs, and code examples.
The SDK does not support CocoaPods integration.
Prerequisites
Before using the SDK, read the API Reference.
Obtain an app key for your project. For details, see Create a project.
Obtain an access token. For details, see Obtain a token.
Download and install
Select and download a mobile SDK.
ImportantAfter downloading the SDK, you must replace the placeholder Alibaba Cloud account information, app key, and token in the sample initialization code to run the demo.
For easier integration, starting from version 2.5.14, the iOS APIs use a pure Objective-C interface and no longer use a C++ hybrid interface.
Unzip the ZIP package.
Add the nuisdk.framework file from the ZIP package to your project. In your project's Build Phases, add nuisdk.framework to the Link Binary With Libraries section.
Open the project in Xcode. The project provides reference code and reusable utility classes for tasks such as audio playback, recording, and file operations, which you can copy directly into your project. The sample code for real-time speech transcription is in
SpeechTranscriberViewController. You can run the sample after replacing the placeholder app key and token.
Key APIs
nui_initialize: Initializes the SDK.
/** * Initializes the SDK. The SDK is a singleton. Release the current instance before re-initializing. Do not call this on the UI thread, as it may cause blocking. * @param parameters: Initialization parameters. For more information, see the API Reference: https://help.aliyun.com/en/isi/developer-reference/overview-4 * @param level: The log level. A smaller value produces more detailed logs. * @param save_log: Specifies whether to save logs to a file. The log file is stored in the directory specified by the debug_path field in the parameters. Note: Log files have no size limit. Be aware that continuous logging can fill up disk space. * @return A NuiResultCode value. */ -(NuiResultCode) nui_initialize:(const char *)parameters logLevel:(NuiSdkLogLevel)level saveLog:(BOOL)save_log;nui_set_params: Sets SDK parameters in JSON format.
/** * Sets parameters in JSON format. * @param params: For parameter details, see the API Reference: https://help.aliyun.com/en/isi/developer-reference/overview-4 * @return A NuiResultCode value. */ -(NuiResultCode) nui_set_params:(const char *)params;nui_dialog_start: Starts recognition.
/** * Starts recognition. * @param vad_mode: Specifies the VAD mode. For recognition scenarios, use P2T. * @param dialog_params: Sets recognition parameters. This is optional. For details, see the API Reference: https://help.aliyun.com/en/isi/developer-reference/overview-4 * @return A NuiResultCode value. */ -(NuiResultCode) nui_dialog_start:(NuiVadMode)vad_mode dialogParam:(const char *)dialog_params;nui_dialog_cancel: Stops recognition.
/** * Stops recognition. After this API is called, the server returns the final recognition result and ends the task. * @param force: If true, forces the task to end immediately and discards the final result. If false, stops the task but waits for the final result to be returned. * @return A NuiResultCode value. */ -(NuiResultCode) nui_dialog_cancel:(BOOL)force;nui_release: Releases the SDK.
/** * Releases SDK resources. * @return A NuiResultCode value. */ -(NuiResultCode) nui_release;nui_get_version: Gets the current SDK version.
/** * Gets the current SDK version. * @return The SDK version as a string. */ -(const char*) nui_get_version;nui_get_all_response: Gets all information for the current event callback.
/** * Gets all information for the current event callback. * @return All event information as a JSON string. */ -(const char*) nui_get_all_response;NeoNuiSdkDelegate
onNuiEventCallback: SDK event callback.
/** * The main event callback for the SDK. * @param event: The callback event. See the event list below. * @param dialog: The session ID. Currently not in use. * @param wuw: Used for the wake-up word feature (currently not supported). * @param asr_result: The speech recognition result. * @param finish: A flag indicating whether the recognition task is complete. * @param resultCode: See error codes. This is valid when an EVENT_ASR_ERROR event occurs. */ -(void) onNuiEventCallback:(NuiCallbackEvent)nuiEvent dialog:(long)dialog kwsResult:(const char *)wuw asrResult:(const char *)asr_result ifFinish:(BOOL)finish retCode:(int)code;NuiCallbackEvent event list:
Name
Description
EVENT_VAD_START
Start of speech detected.
EVENT_VAD_END
End of speech detected.
EVENT_ASR_PARTIAL_RESULT
Intermediate speech recognition result.
EVENT_ASR_RESULT
Final speech recognition result.
EVENT_ASR_ERROR
An error occurred. Use the error code to determine the cause.
EVENT_MIC_ERROR
Recording error. This indicates that the SDK has not received any audio for 2 consecutive seconds. Verify that the recording system is working correctly.
EVENT_SENTENCE_START
Real-time speech recognition event. The start of a sentence is detected.
EVENT_SENTENCE_END
Real-time speech recognition event. The end of a sentence is detected, and the complete result for the sentence is returned.
EVENT_SENTENCE_SEMANTICS
Currently not in use.
EVENT_TRANSCRIBER_COMPLETE
Reported after speech recognition is stopped.
onNuiNeedAudioData: Provides audio data to the SDK.
/** * When recognition starts, this callback is invoked continuously. Your app needs to provide audio data in this callback. * @param audioData: A buffer to fill with audio data. * @param len: The requested number of bytes of audio data. * @return The number of bytes actually written to the buffer. */ -(int) onNuiNeedAudioData:(char *)audioData length:(int)len;onNuiAudioStateChanged: Enables or disables the recording function based on the audio state.
/** * When APIs such as start, stop, or cancel are called, the SDK uses this callback to notify the app to start or stop recording. * @param state: The required state for the recorder (open/closed). */ -(void) onNuiAudioStateChanged:(NuiAudioState)state;onNuiRmsChanged: Audio energy event.
/** * SDK event callback for audio energy. * @param rms: The audio energy level, ranging from -160 to 0. */ -(void) onNuiRmsChanged:(float) rms;
Procedure
Initialize the SDK and your audio recorder.
Configure parameters based on your business requirements.
Call
nui_dialog_startto start recognition.Start the audio recorder in response to the
onNuiAudioStateChangedcallback.Provide audio data in the
onNuiNeedAudioDatacallback.Retrieve recognition results from the
EVENT_ASR_PARTIAL_RESULTandEVENT_SENTENCE_ENDevent callbacks.Call
nui_dialog_cancelto stop recognition.When you are finished, call the
nui_releaseAPI to release SDK resources.
Code examples
By default, the API uses the get_instance method to obtain a singleton. If you require multiple instances, you can also create them using alloc.
NUI SDK initialization
BOOL save_log = NO;
NSString * initParam = [self genInitParams];
[_nui nui_initialize:[initParam UTF8String] logLevel:LOG_LEVEL_VERBOSE saveLog:save_log];The genInitParams method generates a JSON string that contains the resource directory and user information. The user information includes the following fields.
-(NSString*) genInitParams {
NSString *strResourcesBundle = [[NSBundle mainBundle] pathForResource:@"Resources" ofType:@"bundle"];
NSString *bundlePath = [[NSBundle bundleWithPath:strResourcesBundle] resourcePath];
NSString *debug_path = [_utils createDir];
NSMutableDictionary *ticketJsonDict = [NSMutableDictionary dictionary];
// Obtain an access credential.
// The getTicket method in the sample project provides several possible ways to obtain a credential. Choose a secure method that fits your business needs.
//
// Note:
// Before you use the Intelligent Speech Interaction service, you must create an account and activate the service. For detailed steps, see:
// https://help.aliyun.com/en/isi/getting-started/start-here
//
// Permanent credentials:
// Your permanent credentials include your AccessKey ID (ak_id) and AccessKey Secret (ak_secret).
// These credentials must never be stored in your app's code or on the client side to prevent leaks and potential financial loss.
//
// STS temporary credentials:
// To avoid the security risks of distributing permanent credentials to clients, Alibaba Cloud provides the Security Token Service (STS).
// STS generates temporary credentials (sts_ak_id, sts_ak_secret, and sts_token) from your permanent AccessKey ID and AccessKey Secret.
// (The 'sts_' prefix is used to distinguish temporary STS credentials from permanent ones).
// What is STS: https://help.aliyun.com/en/ram/product-overview/what-is-sts
// STS SDK overview: https://help.aliyun.com/en/ram/developer-reference/sts-sdk-overview
// STS Python SDK example: https://help.aliyun.com/en/ram/developer-reference/use-the-sts-openapi-example
//
// Credential requirements:
// For offline features (such as offline Text-to-Speech and wake-up word), you must provide either an app_key, ak_id, and ak_secret, or an app_key, sts_ak_id, sts_ak_secret, and sts_token.
// For online features (such as Text-to-Speech, real-time speech transcription, short-form speech recognition, and audio file transcription), you only need to provide an app_key and a token.
[_utils getTicket:ticketJsonDict Type:get_token_from_server_for_online_features];
if ([ticketJsonDict objectForKey:@"token"] != nil) {
NSString *tokenValue = [ticketJsonDict objectForKey:@"token"];
if ([tokenValue length] == 0) {
TLog(@"The 'token' key exists but the value is empty.");
}
} else {
TLog(@"The 'token' key does not exist.");
}
[ticketJsonDict setObject:@"wss://nls-gateway.cn-shanghai.aliyuncs.com:443/ws/v1" forKey:@"url"]; // Default: The endpoint for the China (Shanghai) region.
// The workspace directory path. The SDK reads configuration files from this path.
[ticketJsonDict setObject:bundlePath forKey:@"workspace"]; // Required.
// This parameter takes effect only when the save_log parameter in the SDK initialization is set to true. It specifies whether to save debug audio, which is stored in the directory specified by debug_path. Ensure that debug_path is valid and writable.
[ticketJsonDict setObject:save_wav ? @"true" : @"false" forKey:@"save_wav"];
// The debug directory. When the save_log parameter in the SDK initialization is true, this directory is used to save intermediate audio files.
[ticketJsonDict setObject:debug_path forKey:@"debug_path"];
// FullCloud = 1 // Use this for online real-time speech recognition.
[ticketJsonDict setObject:@"1" forKey:@"service_mode"]; // Required.
NSString *id_string = [[[ASIdentifierManager sharedManager] advertisingIdentifier] UUIDString];
TLog(@"id: %s", [id_string UTF8String]);
[ticketJsonDict setObject:id_string forKey:@"device_id"]; // Required. We recommend using a unique device ID to aid in troubleshooting.
NSData *data = [NSJSONSerialization dataWithJSONObject:ticketJsonDict options:NSJSONWritingPrettyPrinted error:nil];
NSString * jsonStr = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];
return jsonStr;
}Parameter configuration
Set parameters using a JSON string.
-(NSString*) genParams {
NSMutableDictionary *nls_config = [NSMutableDictionary dictionary];
[nls_config setValue:@YES forKey:@"enable_intermediate_result"]; // Required.
// Configure parameters based on your business needs.
// For API details, see: https://help.aliyun.com/document_detail/173528.html
// See section "2. Start recognition".
//[nls_config setValue:@"<Your updated token>" forKey:@"token"];
//[nls_config setValue:@YES forKey:@"enable_punctuation_prediction"];
//[nls_config setValue:@YES forKey:@"enable_inverse_text_normalization"];
//[nls_config setValue:@YES forKey:@"enable_voice_detection"];
//[nls_config setValue:@10000 forKey:@"max_start_silence"];
//[nls_config setValue:@800 forKey:@"max_end_silence"];
//[nls_config setValue:@800 forKey:@"max_sentence_silence"];
//[nls_config setValue:@NO forKey:@"enable_words"];
//[nls_config setValue:@16000 forKey:@"sample_rate"];
//[nls_config setValue:@"opus" forKey:@"sr_format"];
NSMutableDictionary *dictM = [NSMutableDictionary dictionary];
[dictM setObject:nls_config forKey:@"nls_config"];
[dictM setValue:@(SERVICE_TYPE_SPEECH_TRANSCRIBER) forKey:@"service_type"]; // Required.
// If you are using HTTPDNS, you can configure it here.
//[dictM setObject:[_utils getDirectIp] forKey:@"direct_ip"];
/* If a parameter is not documented but is supported by the feature, you can use the following generic method to set it. */
//NSMutableDictionary *extend_config = [NSMutableDictionary dictionary];
//[extend_config setValue:@YES forKey:@"custom_test"];
//[dictM setObject:extend_config forKey:@"extend_config"];
NSData *data = [NSJSONSerialization dataWithJSONObject:dictM options:NSJSONWritingPrettyPrinted error:nil];
NSString * jsonStr = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];
return jsonStr;
}
NSString * parameters = [self genParams];
[_nui nui_set_params:[parameters UTF8String]];Start recognition
Call the nui_dialog_start API to start listening.
-(NSString*) genDialogParams {
NSMutableDictionary *dialog_params = [NSMutableDictionary dictionary];
// During runtime, you can update temporary parameters, such as an expired token, when calling nui_dialog_start.
// Note: If you do not set parameters for the next dialog, the parameters from the initialization will be used.
//[dialog_params setValue:@"" forKey:@"app_key"];
//[dialog_params setValue:@"" forKey:@"token"];
NSData *data = [NSJSONSerialization dataWithJSONObject:dialog_params options:NSJSONWritingPrettyPrinted error:nil];
NSString * jsonStr = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];
return jsonStr;
}
// To modify the token or other parameters, see genDialogParams().
NSString * parameters = [self genDialogParams];
// To use VAD mode, you must set the nls_config parameter to enable online VAD mode (see genParams()).
[_nui nui_dialog_start:MODE_P2T dialogParam:[parameters UTF8String]];Callback handling
onNuiAudioStateChanged: The audio state callback. The SDK maintains the recording state internally. Use this callback to start or stop the recorder.
-(void)onNuiAudioStateChanged:(NuiAudioState)state{ TLog(@"onNuiAudioStateChanged state=%u", state); if (state == STATE_CLOSE || state == STATE_PAUSE) { // The recording module provided in the old sample project is for reference only. You can rewrite it to fit your business needs. // [_voiceRecorder stop:YES]; // The recording module provided in the new sample project is for reference only. You can rewrite it to fit your business needs. [_audioController stopRecorder:NO]; } else if (state == STATE_OPEN){ self.recordedVoiceData = [NSMutableData data]; // The recording module provided in the old sample project is for reference only. You can rewrite it to fit your business needs. // [_voiceRecorder start]; // The recording module provided in the new sample project is for reference only. You can rewrite it to fit your business needs. [_audioController startRecorder]; } }onNuiNeedAudioData: The audio data callback. Provide the recorded audio data in this callback.
-(int)onNuiNeedAudioData:(char *)audioData length:(int)len { static int emptyCount = 0; @autoreleasepool { @synchronized(_recordedVoiceData){ if (_recordedVoiceData.length > 0) { int recorder_len = 0; if (_recordedVoiceData.length > len) recorder_len = len; else recorder_len = _recordedVoiceData.length; NSData *tempData = [_recordedVoiceData subdataWithRange:NSMakeRange(0, recorder_len)]; [tempData getBytes:audioData length:recorder_len]; tempData = nil; NSInteger remainLength = _recordedVoiceData.length - recorder_len; NSRange range = NSMakeRange(recorder_len, remainLength); [_recordedVoiceData setData:[_recordedVoiceData subdataWithRange:range]]; emptyCount = 0; return recorder_len; } else { if (emptyCount++ >= 50) { TLog(@"_recordedVoiceData length = %lu! empty 50times.", (unsigned long)_recordedVoiceData.length); emptyCount = 0; } return 0; } } } return 0; }onNuiEventCallback: The NUI SDK event callback. To avoid deadlocks, do not call other SDK APIs from within this callback.
-(void)onNuiEventCallback:(NuiCallbackEvent)nuiEvent dialog:(long)dialog kwsResult:(const char *)wuw asrResult:(const char *)asr_result ifFinish:(bool)finish retCode:(int)code { TLog(@"onNuiEventCallback event %d finish %d", nuiEvent, finish); if (nuiEvent == EVENT_ASR_PARTIAL_RESULT || nuiEvent == EVENT_SENTENCE_END) { // asr_result contains the task_id, which is useful for troubleshooting. Record and save it. TLog(@"ASR RESULT %s finish %d", asr_result, finish); NSString *result = [NSString stringWithUTF8String:asr_result]; } else if (nuiEvent == EVENT_ASR_ERROR) { // For an EVENT_ASR_ERROR, asr_result contains the error message. Use it along with the error code and the task_id to troubleshoot issues. Record and save this information. TLog(@"EVENT_ASR_ERROR error[%d], error mesg[%s]", code, asr_result); // You can call nui_get_all_response to get the complete response details. const char *response = [_nui nui_get_all_response]; if (response != NULL) { TLog(@"GET ALL RESPONSE: %s", response); } } else if (nuiEvent == EVENT_MIC_ERROR) { TLog(@"MIC ERROR"); // The recording module provided in the old sample project is for reference only. You can rewrite it to fit your business needs. // [_voiceRecorder stop:YES]; // [_voiceRecorder start]; // The recording module provided in the new sample project is for reference only. You can rewrite it to fit your business needs. [_audioController stopRecorder:NO]; [_audioController startRecorder]; } // A value of true for 'finish' indicates the end of a task lifecycle (either due to an error or successful completion). You can now start a new recognition task. if (finish) { } return; }
Stop recognition
[_nui nui_dialog_cancel:NO];