This document describes how to use the iOS NUI SDK provided by the Alibaba Cloud intelligent speech service. It covers SDK download and installation, key interfaces, and code examples.
Prerequisites
-
Before using the SDK, read the API Reference.
-
Obtain a project Appkey. See Create a project.
-
Obtain an access token. See Obtain an access token.
Download and installation
-
Select and download the mobile SDK.
ImportantAfter downloading, replace the placeholder Alibaba Cloud account information, Appkey, and token in the sample initialization code to run the demo.
For easier integration, NUI SDK versions 2.5.14 and later use a pure Objective-C interface instead of a mixed C++/Objective-C interface.
-
Unzip the package. Add the
nuisdk.frameworkfile to your project. Then, in your project's Build Phases settings, addnuisdk.frameworkto the Link Binary With Libraries section. -
Open the project in Xcode. The project includes reference code and reusable utility classes for audio playback, recording, and file operations that you can copy directly into your project. The sample code for short sentence recognition is located in the
SpeechRecognizerViewControllerfile. You can run the sample directly after you replace the placeholderAppkeyandtoken.
Key SDK interfaces
-
nui_initialize: Initializes the SDK.
/** * Initializes the SDK. The SDK is a singleton. You must release the existing instance before re-initializing. Do not call this method from the UI thread, as it may block the UI. * @param parameters: Initialization parameters. For more information, see the API Reference: https://help.aliyun.com/document_detail/493202.html * @param level: The log level. A lower value produces more detailed logs. * @param save_log: Specifies whether to save the log to a file. The log is stored in the directory specified by the debug_path field in the parameters. Note: Log files have no size limit. Continuous storage can fill up the disk. * @return See error codes. */ -(NuiResultCode) nui_initialize:(const char *)parameters logLevel:(NuiSdkLogLevel)level saveLog:(BOOL)save_log; -
nui_set_params: Sets SDK parameters in JSON format.
/** * Sets parameters in JSON format. * @param params: For parameter details, see the API Reference: https://help.aliyun.com/document_detail/493202.html * @return See error codes. */ -(NuiResultCode) nui_set_params:(const char *)params; -
nui_dialog_start: Starts recognition.
/** * Starts recognition. * @param vad_mode: Multiple modes are available. For recognition scenarios, use P2T. * @param dialog_params: Sets recognition parameters. This is optional. * @return See error codes. */ -(NuiResultCode) nui_dialog_start:(NuiVadMode)vad_mode dialogParam:(const char *)dialog_params; -
nui_dialog_cancel: Ends recognition.
/** * Ends recognition. By default, the server returns the final recognition result before ending the task. * @param force: Specifies whether to force stop and ignore the final result. A value of false or NO indicates that the process should stop but wait for the complete result to be returned. * @return See error codes. */ -(NuiResultCode) nui_dialog_cancel:(BOOL)force; -
nui_release: Releases the SDK.
/** * Releases SDK resources. * @return See error codes. */ -(NuiResultCode) nui_release; -
nui_get_version: Gets the current SDK version.
/** * Gets the current SDK version information. * @return The SDK version as a string. */ -(const char*) nui_get_version; -
nui_get_all_response: Gets the complete information for the current event callback.
/** * Gets the complete information for the current event callback. * @return The complete event information as a JSON string. */ -(const char*) nui_get_all_response; -
NeoNuiSdkDelegate: Event delegate
onNuiEventCallback: SDK event callback.
/** * The main SDK event callback. * @param event: The callback event. See the event list below. * @param dialog: The session ID (currently not supported). * @param wuw: Used for the wake-word feature (currently not supported). * @param asr_result: The speech recognition result. * @param finish: A flag that indicates if the current recognition round is complete. * @param resultCode: The error code. This parameter is valid only for EVENT_ASR_ERROR events. */ -(void) onNuiEventCallback:(NuiCallbackEvent)nuiEvent dialog:(long)dialog kwsResult:(const char *)wuw asrResult:(const char *)asr_result ifFinish:(BOOL)finish retCode:(int)code;NuiCallbackEvent event list:
Name
Description
EVENT_VAD_START
Detected start-of-speech.
EVENT_VAD_END
Detected end-of-speech.
EVENT_ASR_PARTIAL_RESULT
A partial result from speech recognition.
EVENT_ASR_RESULT
The final result from speech recognition.
EVENT_ASR_ERROR
Indicates that a speech recognition error has occurred. Check the error code for details.
EVENT_MIC_ERROR
A recording error occurred, indicating the SDK has not received audio for two consecutive seconds. Check if the recording system is functioning correctly.
onNuiNeedAudioData: Gets audio data.
/** * When recognition starts, this callback is called continuously. Your app must provide audio data in this callback. * @param audioData: The buffer to fill with audio data. * @param len: The number of bytes of audio data to provide. * @return The actual number of bytes provided. */ -(int) onNuiNeedAudioData:(char *)audioData length:(int)len;onNuiAudioStateChanged: Recording state change callback.
/** * When interfaces like start, stop, or cancel are called, the SDK uses this callback to notify the app to enable or disable recording. * @param state: The required recording state (enabled/disabled). */ -(void) onNuiAudioStateChanged:(NuiAudioState)state;onNuiRmsChanged: Audio energy event.
/** * Callback for audio energy events. * @param rms: The audio energy value, ranging from -160 to 0. */ -(void) onNuiRmsChanged:(float) rms;
Procedure
-
Initialize the SDK and the recording instance.
-
Configure parameters based on your business requirements.
-
Call
nui_dialog_startto start recognition. -
Enable the recorder based on the
onNuiAudioStateChangedaudio state callback. -
Provide recording data in the
onNuiNeedAudioDatacallback. -
Process recognition results from the
EVENT_ASR_PARTIAL_RESULTandEVENT_ASR_RESULTevent callbacks. -
Call
nui_dialog_cancelto end recognition. -
After you finish, call the
releaseinterface to release SDK resources.
Code examples
By default, the interface uses the get_instance method to obtain a singleton. If you need multiple instances, you can alloc objects directly.
NUI SDK initialization
BOOL save_log = NO;
NSString * initParam = [self genInitParams];
[_nui nui_initialize:[initParam UTF8String] logLevel:LOG_LEVEL_VERBOSE saveLog:save_log];
The genInitParams method generates a JSON string that contains the resource directory and user information. The user information includes the fields listed below. For information about how to obtain these values, see API Reference.
-(NSString*) genInitParams {
NSString *strResourcesBundle = [[NSBundle mainBundle] pathForResource:@"Resources" ofType:@"bundle"];
NSString *bundlePath = [[NSBundle bundleWithPath:strResourcesBundle] resourcePath];
NSString *debug_path = [_utils createDir];
NSMutableDictionary *ticketJsonDict = [NSMutableDictionary dictionary];
// Obtain account access credentials:
// getTicket in the sample project provides several possible methods. Choose the secure method that best suits your business needs.
//
// Important:
// You must have an account and activate the relevant services before using the speech interaction service. For detailed steps, see:
// https://www.alibabacloud.com/help/en/intelligent-speech-interaction/latest/activate-intelligent-speech-interaction
//
// Primary account:
// The account information mainly includes the AccessKey ID (ak_id) and AccessKey Secret (ak_secret).
// To prevent account information exposure and financial loss, never hardcode this information in your app or store it on the client side.
//
// STS temporary credentials:
// Because sending account information to the client poses a security risk, Alibaba Cloud provides a temporary access management service called Security Token Service (STS).
// STS generates temporary credentials (sts_ak_id, sts_ak_secret, sts_token) from your primary account's ak_id and ak_secret.
// (The "sts_" prefix is used to distinguish STS temporary credentials from primary account credentials).
// What is STS: https://www.alibabacloud.com/help/en/ram/product-overview/what-is-sts
// STS SDK overview: https://www.alibabacloud.com/help/en/ram/developer-reference/sts-sdk-overview
// STS Python SDK call example: https://www.alibabacloud.com/help/en/ram/developer-reference/use-the-sts-openapi-example
//
// Account requirements:
// If you use offline features (offline speech synthesis, wake-word), you must provide app_key, ak_id, and ak_secret, or app_key, sts_ak_id, sts_ak_secret, and sts_token.
// If you use online features (speech synthesis, real-time transcription, short sentence recognition, audio file transcription), you only need to provide app_key and token.
[_utils getTicket:ticketJsonDict Type:get_token_from_server_for_online_features];
if ([ticketJsonDict objectForKey:@"token"] != nil) {
NSString *tokenValue = [ticketJsonDict objectForKey:@"token"];
if ([tokenValue length] == 0) {
TLog(@"The 'token' key exists but the value is empty.");
}
} else {
TLog(@"The 'token' key does not exist.");
}
[ticketJsonDict setObject:@"wss://nls-gateway.cn-shanghai.aliyuncs.com:443/ws/v1" forKey:@"url"]; // Default
// The working directory path from which the SDK reads configuration files.
[ticketJsonDict setObject:bundlePath forKey:@"workspace"]; // Required
// This parameter takes effect when the save_log parameter is set to true during SDK initialization. It specifies whether to save audio for debugging. This data is saved in the debug directory. Ensure that debug_path is a valid, writable path.
[ticketJsonDict setObject:save_wav ? @"true" : @"false" forKey:@"save_wav"];
// The debug directory. When the save_log parameter is set to true during SDK initialization, this directory is used to save intermediate audio files.
[ticketJsonDict setObject:debug_path forKey:@"debug_path"];
// AsrCloud = 4 // Use this for online short sentence recognition.
[ticketJsonDict setObject:@"4" forKey:@"service_mode"]; // Required
NSString *id_string = [[[ASIdentifierManager sharedManager] advertisingIdentifier] UUIDString];
TLog(@"id: %s", [id_string UTF8String]);
[ticketJsonDict setObject:id_string forKey:@"device_id"]; // Required. We recommend using a unique ID to help with troubleshooting.
NSData *data = [NSJSONSerialization dataWithJSONObject:ticketJsonDict options:NSJSONWritingPrettyPrinted error:nil];
NSString * jsonStr = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];
return jsonStr;
}
Parameter settings
Set the parameters as a JSON string.
-(NSString*) genParams {
NSMutableDictionary *nls_config = [NSMutableDictionary dictionary];
[nls_config setValue:@YES forKey:@"enable_intermediate_result"]; // Required
// Parameters can be configured according to your business needs.
// For API details, see: https://help.aliyun.com/document_detail/173298.html
// See section "2. Start recognition".
// The public SDK (version 01B) does not include a local VAD module.
// (Only the SDK with wake-word functionality, version 029, has a VAD module).
// To use VAD mode, you must set the nls_config parameter to start the online VAD mode (see genParams()).
//
// Mode description:
// If you use P2T (Push-to-Talk) mode, where you press to start speaking and release to stop, do not enable enable_voice_detection.
// If you use VAD (Voice Activity Detection) mode, which automatically detects when the user stops speaking, enable enable_voice_detection.
//[nls_config setValue:@YES forKey:@"enable_voice_detection"];
//[nls_config setValue:@10000 forKey:@"max_start_silence"];
//[nls_config setValue:@800 forKey:@"max_end_silence"];
//[nls_config setValue:@"<update_token>" forKey:@"token"];
//[nls_config setValue:@YES forKey:@"enable_punctuation_prediction"];
//[nls_config setValue:@YES forKey:@"enable_inverse_text_normalization"];
//[nls_config setValue:@16000 forKey:@"sample_rate"];
//[nls_config setValue:@"opus" forKey:@"sr_format"];
NSMutableDictionary *dictM = [NSMutableDictionary dictionary];
[dictM setObject:nls_config forKey:@"nls_config"];
[dictM setValue:@(SERVICE_TYPE_ASR) forKey:@"service_type"]; // Required
// If you use HttpDns, you can set it here.
//[dictM setObject:[_utils getDirectIp] forKey:@"direct_ip"];
/* If a parameter is not included in the documentation but is supported by this feature, you can use the following generic interface to set it. */
//NSMutableDictionary *extend_config = [NSMutableDictionary dictionary];
//[extend_config setValue:@YES forKey:@"custom_test"];
//[dictM setObject:extend_config forKey:@"extend_config"];
NSData *data = [NSJSONSerialization dataWithJSONObject:dictM options:NSJSONWritingPrettyPrinted error:nil];
NSString * jsonStr = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];
return jsonStr;
}
NSString * parameters = [self genParams];
[_nui nui_set_params:[parameters UTF8String]];
Start recognition
Use the nui_dialog_start interface to start listening.
-(NSString*) genDialogParams {
NSMutableDictionary *dialog_params = [NSMutableDictionary dictionary];
// During runtime, you can update temporary parameters, especially an expired token, when calling nui_dialog_start.
// Note: If you do not set parameters for the next conversation round, the parameters passed during initialization will be used.
// [dialog_params setValue:@"" forKey:@"app_key"];
// [dialog_params setValue:@"" forKey:@"token"];
NSData *data = [NSJSONSerialization dataWithJSONObject:dialog_params options:NSJSONWritingPrettyPrinted error:nil];
NSString * jsonStr = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];
return jsonStr;
}
// To modify the token and other parameters, see genDialogParams().
NSString * parameters = [self genDialogParams];
// To use VAD mode, you need to set the nls_config parameter to start the online VAD mode (see genParams()).
[_nui nui_dialog_start:MODE_P2T dialogParam:[parameters UTF8String]];
Callback handling
-
onNuiAudioStateChanged: Recording state callback. Use the state provided by this callback to enable or disable your recorder.
-(void)onNuiAudioStateChanged:(NuiAudioState)state{ TLog(@"onNuiAudioStateChanged state=%u", state); if (state == STATE_CLOSE || state == STATE_PAUSE) { // The recording module provided in older versions of the sample project is for reference only. You can rewrite the recording module based on your business needs. // [_voiceRecorder stop:YES]; // The new version of the sample project provides a new recording module. It is for reference only. You can rewrite the recording module based on your business needs. [_audioController stopRecorder:NO]; } else if (state == STATE_OPEN){ self.recordedVoiceData = [NSMutableData data]; // The recording module provided in older versions of the sample project is for reference only. You can rewrite the recording module based on your business needs. // [_voiceRecorder start]; // The new version of the sample project provides a new recording module. It is for reference only. You can rewrite the recording module based on your business needs. [_audioController startRecorder]; } } -
onNuiNeedAudioData: Recording data callback. Provide audio data in this callback.
-(int)onNuiNeedAudioData:(char *)audioData length:(int)len { static int emptyCount = 0; @autoreleasepool { @synchronized(_recordedVoiceData){ if (_recordedVoiceData.length > 0) { int recorder_len = 0; if (_recordedVoiceData.length > len) recorder_len = len; else recorder_len = _recordedVoiceData.length; NSData *tempData = [_recordedVoiceData subdataWithRange:NSMakeRange(0, recorder_len)]; [tempData getBytes:audioData length:recorder_len]; tempData = nil; NSInteger remainLength = _recordedVoiceData.length - recorder_len; NSRange range = NSMakeRange(recorder_len, remainLength); [_recordedVoiceData setData:[_recordedVoiceData subdataWithRange:range]]; emptyCount = 0; return recorder_len; } else { if (emptyCount++ >= 50) { TLog(@"_recordedVoiceData length = %lu! empty 50times.", (unsigned long)_recordedVoiceData.length); emptyCount = 0; } return 0; } } } return 0; } -
onNuiEventCallback: NUI SDK event callback. Do not call SDK interfaces from within this event callback, as it may cause a deadlock.
-(void)onNuiEventCallback:(NuiCallbackEvent)nuiEvent dialog:(long)dialog kwsResult:(const char *)wuw asrResult:(const char *)asr_result ifFinish:(bool)finish retCode:(int)code { TLog(@"onNuiEventCallback event %d finish %d", nuiEvent, finish); if (nuiEvent == EVENT_ASR_PARTIAL_RESULT || nuiEvent == EVENT_ASR_RESULT) { // asr_result contains the task_id, which is helpful for troubleshooting. Please record and save it. TLog(@"ASR RESULT %s finish %d", asr_result, finish); NSString *result = [NSString stringWithUTF8String:asr_result]; } else if (nuiEvent == EVENT_ASR_ERROR) { // In EVENT_ASR_ERROR, asr_result is the error message. Combining it with the error code and task_id makes troubleshooting easier. Please record and save this information. TLog(@"EVENT_ASR_ERROR error[%d], error mesg[%s]", code, asr_result); // You can get complete information by calling nui_get_all_response. const char *response = [_nui nui_get_all_response]; if (response != NULL) { TLog(@"GET ALL RESPONSE: %s", response); } } else if (nuiEvent == EVENT_MIC_ERROR) { TLog(@"MIC ERROR"); // The recording module provided in older versions of the sample project is for reference only. You can rewrite the recording module based on your business needs. // [_voiceRecorder stop:YES]; // [_voiceRecorder start]; // The new version of the sample project provides a new recording module. It is for reference only. You can rewrite the recording module based on your business needs. [_audioController stopRecorder:NO]; [_audioController startRecorder]; } // If 'finish' is true, the current task has ended (either successfully or due to an error), and you can start a new recognition. if (finish) { } return; }
End recognition
[_nui nui_dialog_cancel:NO];