iOS SDK

更新时间:
复制 MD 格式

This document describes how to use the iOS NUI SDK provided by the Alibaba Cloud intelligent speech service. It covers SDK download and installation, key interfaces, and code examples.

Prerequisites

  • Before using the SDK, read the API Reference.

  • Obtain a project Appkey. See Create a project.

  • Obtain an access token. See Obtain an access token.

Download and installation

  1. Select and download the mobile SDK.

    Important

    After downloading, replace the placeholder Alibaba Cloud account information, Appkey, and token in the sample initialization code to run the demo.

    For easier integration, NUI SDK versions 2.5.14 and later use a pure Objective-C interface instead of a mixed C++/Objective-C interface.

  2. Unzip the package. Add the nuisdk.framework file to your project. Then, in your project's Build Phases settings, add nuisdk.framework to the Link Binary With Libraries section.

  3. Open the project in Xcode. The project includes reference code and reusable utility classes for audio playback, recording, and file operations that you can copy directly into your project. The sample code for short sentence recognition is located in the SpeechRecognizerViewController file. You can run the sample directly after you replace the placeholder Appkey and token.

Key SDK interfaces

  • nui_initialize: Initializes the SDK.

    /**
     * Initializes the SDK. The SDK is a singleton. You must release the existing instance before re-initializing. Do not call this method from the UI thread, as it may block the UI.
     * @param parameters: Initialization parameters. For more information, see the API Reference: https://help.aliyun.com/document_detail/493202.html
     * @param level: The log level. A lower value produces more detailed logs.
     * @param save_log: Specifies whether to save the log to a file. The log is stored in the directory specified by the debug_path field in the parameters. Note: Log files have no size limit. Continuous storage can fill up the disk.
     * @return See error codes.
     */
    -(NuiResultCode) nui_initialize:(const char *)parameters
                           logLevel:(NuiSdkLogLevel)level
                            saveLog:(BOOL)save_log;
  • nui_set_params: Sets SDK parameters in JSON format.

    /**
     * Sets parameters in JSON format.
     * @param params: For parameter details, see the API Reference: https://help.aliyun.com/document_detail/493202.html
     * @return See error codes.
     */
    -(NuiResultCode) nui_set_params:(const char *)params;
  • nui_dialog_start: Starts recognition.

    /**
     * Starts recognition.
     * @param vad_mode: Multiple modes are available. For recognition scenarios, use P2T.
     * @param dialog_params: Sets recognition parameters. This is optional.
     * @return See error codes.
     */
    -(NuiResultCode) nui_dialog_start:(NuiVadMode)vad_mode
                          dialogParam:(const char *)dialog_params;
  • nui_dialog_cancel: Ends recognition.

    /**
     * Ends recognition. By default, the server returns the final recognition result before ending the task.
     * @param force: Specifies whether to force stop and ignore the final result. A value of false or NO indicates that the process should stop but wait for the complete result to be returned.
     * @return See error codes.
     */
    -(NuiResultCode) nui_dialog_cancel:(BOOL)force;
  • nui_release: Releases the SDK.

    /**
     * Releases SDK resources.
     * @return See error codes.
     */
    -(NuiResultCode) nui_release;
  • nui_get_version: Gets the current SDK version.

    /**
     * Gets the current SDK version information.
     * @return The SDK version as a string.
     */
    -(const char*) nui_get_version;
  • nui_get_all_response: Gets the complete information for the current event callback.

    /**
     * Gets the complete information for the current event callback.
     * @return The complete event information as a JSON string.
     */
    -(const char*) nui_get_all_response;
  • NeoNuiSdkDelegate: Event delegate

    onNuiEventCallback: SDK event callback.

    /**
     * The main SDK event callback.
     * @param event: The callback event. See the event list below.
     * @param dialog: The session ID (currently not supported).
     * @param wuw: Used for the wake-word feature (currently not supported).
     * @param asr_result: The speech recognition result.
     * @param finish: A flag that indicates if the current recognition round is complete.
     * @param resultCode: The error code. This parameter is valid only for EVENT_ASR_ERROR events.
     */
    -(void) onNuiEventCallback:(NuiCallbackEvent)nuiEvent
                        dialog:(long)dialog
                     kwsResult:(const char *)wuw
                     asrResult:(const char *)asr_result
                      ifFinish:(BOOL)finish
                       retCode:(int)code;

    NuiCallbackEvent event list:

    Name

    Description

    EVENT_VAD_START

    Detected start-of-speech.

    EVENT_VAD_END

    Detected end-of-speech.

    EVENT_ASR_PARTIAL_RESULT

    A partial result from speech recognition.

    EVENT_ASR_RESULT

    The final result from speech recognition.

    EVENT_ASR_ERROR

    Indicates that a speech recognition error has occurred. Check the error code for details.

    EVENT_MIC_ERROR

    A recording error occurred, indicating the SDK has not received audio for two consecutive seconds. Check if the recording system is functioning correctly.

    onNuiNeedAudioData: Gets audio data.

    /**
     * When recognition starts, this callback is called continuously. Your app must provide audio data in this callback.
     * @param audioData: The buffer to fill with audio data.
     * @param len: The number of bytes of audio data to provide.
     * @return The actual number of bytes provided.
     */
    -(int) onNuiNeedAudioData:(char *)audioData length:(int)len;

    onNuiAudioStateChanged: Recording state change callback.

    /**
     * When interfaces like start, stop, or cancel are called, the SDK uses this callback to notify the app to enable or disable recording.
     * @param state: The required recording state (enabled/disabled).
     */
    -(void) onNuiAudioStateChanged:(NuiAudioState)state;

    onNuiRmsChanged: Audio energy event.

    /**
     * Callback for audio energy events.
     * @param rms: The audio energy value, ranging from -160 to 0.
     */
    -(void) onNuiRmsChanged:(float) rms;

Procedure

  1. Initialize the SDK and the recording instance.

  2. Configure parameters based on your business requirements.

  3. Call nui_dialog_start to start recognition.

  4. Enable the recorder based on the onNuiAudioStateChanged audio state callback.

  5. Provide recording data in the onNuiNeedAudioData callback.

  6. Process recognition results from the EVENT_ASR_PARTIAL_RESULT and EVENT_ASR_RESULT event callbacks.

  7. Call nui_dialog_cancel to end recognition.

  8. After you finish, call the release interface to release SDK resources.

Code examples

Note

By default, the interface uses the get_instance method to obtain a singleton. If you need multiple instances, you can alloc objects directly.

NUI SDK initialization

BOOL save_log = NO;
NSString * initParam = [self genInitParams];
[_nui nui_initialize:[initParam UTF8String] logLevel:LOG_LEVEL_VERBOSE saveLog:save_log];

The genInitParams method generates a JSON string that contains the resource directory and user information. The user information includes the fields listed below. For information about how to obtain these values, see API Reference.

-(NSString*) genInitParams {
    NSString *strResourcesBundle = [[NSBundle mainBundle] pathForResource:@"Resources" ofType:@"bundle"];
    NSString *bundlePath = [[NSBundle bundleWithPath:strResourcesBundle] resourcePath];
    NSString *debug_path = [_utils createDir];
    NSMutableDictionary *ticketJsonDict = [NSMutableDictionary dictionary];
    // Obtain account access credentials:
    //  getTicket in the sample project provides several possible methods. Choose the secure method that best suits your business needs.
    //
    // Important:
    //  You must have an account and activate the relevant services before using the speech interaction service. For detailed steps, see:
    //    https://www.alibabacloud.com/help/en/intelligent-speech-interaction/latest/activate-intelligent-speech-interaction
    //
    // Primary account:
    //  The account information mainly includes the AccessKey ID (ak_id) and AccessKey Secret (ak_secret).
    //  To prevent account information exposure and financial loss, never hardcode this information in your app or store it on the client side.
    //
    // STS temporary credentials:
    //  Because sending account information to the client poses a security risk, Alibaba Cloud provides a temporary access management service called Security Token Service (STS).
    //  STS generates temporary credentials (sts_ak_id, sts_ak_secret, sts_token) from your primary account's ak_id and ak_secret.
    //  (The "sts_" prefix is used to distinguish STS temporary credentials from primary account credentials).
    // What is STS: https://www.alibabacloud.com/help/en/ram/product-overview/what-is-sts
    // STS SDK overview: https://www.alibabacloud.com/help/en/ram/developer-reference/sts-sdk-overview
    // STS Python SDK call example: https://www.alibabacloud.com/help/en/ram/developer-reference/use-the-sts-openapi-example
    //
    // Account requirements:
    //  If you use offline features (offline speech synthesis, wake-word), you must provide app_key, ak_id, and ak_secret, or app_key, sts_ak_id, sts_ak_secret, and sts_token.
    //  If you use online features (speech synthesis, real-time transcription, short sentence recognition, audio file transcription), you only need to provide app_key and token.
    [_utils getTicket:ticketJsonDict Type:get_token_from_server_for_online_features];
    if ([ticketJsonDict objectForKey:@"token"] != nil) {
        NSString *tokenValue = [ticketJsonDict objectForKey:@"token"];
        if ([tokenValue length] == 0) {
            TLog(@"The 'token' key exists but the value is empty.");
        }
    } else {
        TLog(@"The 'token' key does not exist.");
    }
    [ticketJsonDict setObject:@"wss://nls-gateway.cn-shanghai.aliyuncs.com:443/ws/v1" forKey:@"url"]; // Default
    // The working directory path from which the SDK reads configuration files.
    [ticketJsonDict setObject:bundlePath forKey:@"workspace"]; // Required
    // This parameter takes effect when the save_log parameter is set to true during SDK initialization. It specifies whether to save audio for debugging. This data is saved in the debug directory. Ensure that debug_path is a valid, writable path.
    [ticketJsonDict setObject:save_wav ? @"true" : @"false" forKey:@"save_wav"];
    // The debug directory. When the save_log parameter is set to true during SDK initialization, this directory is used to save intermediate audio files.
    [ticketJsonDict setObject:debug_path forKey:@"debug_path"];
    // AsrCloud = 4 // Use this for online short sentence recognition.
    [ticketJsonDict setObject:@"4" forKey:@"service_mode"]; // Required
    NSString *id_string = [[[ASIdentifierManager sharedManager] advertisingIdentifier] UUIDString];
    TLog(@"id: %s", [id_string UTF8String]);
    [ticketJsonDict setObject:id_string forKey:@"device_id"]; // Required. We recommend using a unique ID to help with troubleshooting.
    NSData *data = [NSJSONSerialization dataWithJSONObject:ticketJsonDict options:NSJSONWritingPrettyPrinted error:nil];
    NSString * jsonStr = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];
    return jsonStr;
}

Parameter settings

Set the parameters as a JSON string.

-(NSString*) genParams {
    NSMutableDictionary *nls_config = [NSMutableDictionary dictionary];
    [nls_config setValue:@YES forKey:@"enable_intermediate_result"]; // Required
    // Parameters can be configured according to your business needs.
    // For API details, see: https://help.aliyun.com/document_detail/173298.html
    // See section "2. Start recognition".
    // The public SDK (version 01B) does not include a local VAD module.
    // (Only the SDK with wake-word functionality, version 029, has a VAD module).
    // To use VAD mode, you must set the nls_config parameter to start the online VAD mode (see genParams()).
    //
    // Mode description:
    // If you use P2T (Push-to-Talk) mode, where you press to start speaking and release to stop, do not enable enable_voice_detection.
    // If you use VAD (Voice Activity Detection) mode, which automatically detects when the user stops speaking, enable enable_voice_detection.
    //[nls_config setValue:@YES forKey:@"enable_voice_detection"];
    //[nls_config setValue:@10000 forKey:@"max_start_silence"];
    //[nls_config setValue:@800 forKey:@"max_end_silence"];
    //[nls_config setValue:@"<update_token>" forKey:@"token"];
    //[nls_config setValue:@YES forKey:@"enable_punctuation_prediction"];
    //[nls_config setValue:@YES forKey:@"enable_inverse_text_normalization"];
    //[nls_config setValue:@16000 forKey:@"sample_rate"];
    //[nls_config setValue:@"opus" forKey:@"sr_format"];
    NSMutableDictionary *dictM = [NSMutableDictionary dictionary];
    [dictM setObject:nls_config forKey:@"nls_config"];
    [dictM setValue:@(SERVICE_TYPE_ASR) forKey:@"service_type"]; // Required
    // If you use HttpDns, you can set it here.
    //[dictM setObject:[_utils getDirectIp] forKey:@"direct_ip"];
    /* If a parameter is not included in the documentation but is supported by this feature, you can use the following generic interface to set it. */
    //NSMutableDictionary *extend_config = [NSMutableDictionary dictionary];
    //[extend_config setValue:@YES forKey:@"custom_test"];
    //[dictM setObject:extend_config forKey:@"extend_config"];
    NSData *data = [NSJSONSerialization dataWithJSONObject:dictM options:NSJSONWritingPrettyPrinted error:nil];
    NSString * jsonStr = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];
    return jsonStr;
}
NSString * parameters = [self genParams];
[_nui nui_set_params:[parameters UTF8String]];

Start recognition

Use the nui_dialog_start interface to start listening.

-(NSString*) genDialogParams {
    NSMutableDictionary *dialog_params = [NSMutableDictionary dictionary];
    // During runtime, you can update temporary parameters, especially an expired token, when calling nui_dialog_start.
    // Note: If you do not set parameters for the next conversation round, the parameters passed during initialization will be used.
//    [dialog_params setValue:@"" forKey:@"app_key"];
//    [dialog_params setValue:@"" forKey:@"token"];
    NSData *data = [NSJSONSerialization dataWithJSONObject:dialog_params options:NSJSONWritingPrettyPrinted error:nil];
    NSString * jsonStr = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];
    return jsonStr;
}
// To modify the token and other parameters, see genDialogParams().
NSString * parameters = [self genDialogParams];
// To use VAD mode, you need to set the nls_config parameter to start the online VAD mode (see genParams()).
[_nui nui_dialog_start:MODE_P2T dialogParam:[parameters UTF8String]];

Callback handling

  • onNuiAudioStateChanged: Recording state callback. Use the state provided by this callback to enable or disable your recorder.

    -(void)onNuiAudioStateChanged:(NuiAudioState)state{
        TLog(@"onNuiAudioStateChanged state=%u", state);
        if (state == STATE_CLOSE || state == STATE_PAUSE) {
            // The recording module provided in older versions of the sample project is for reference only. You can rewrite the recording module based on your business needs.
            // [_voiceRecorder stop:YES];
            // The new version of the sample project provides a new recording module. It is for reference only. You can rewrite the recording module based on your business needs.
            [_audioController stopRecorder:NO];
        } else if (state == STATE_OPEN){
            self.recordedVoiceData = [NSMutableData data];
            // The recording module provided in older versions of the sample project is for reference only. You can rewrite the recording module based on your business needs.
            // [_voiceRecorder start];
            // The new version of the sample project provides a new recording module. It is for reference only. You can rewrite the recording module based on your business needs.
            [_audioController startRecorder];
        }
    }
  • onNuiNeedAudioData: Recording data callback. Provide audio data in this callback.

    -(int)onNuiNeedAudioData:(char *)audioData length:(int)len {
        static int emptyCount = 0;
        @autoreleasepool {
            @synchronized(_recordedVoiceData){
                if (_recordedVoiceData.length > 0) {
                    int recorder_len = 0;
                    if (_recordedVoiceData.length > len)
                        recorder_len = len;
                    else
                        recorder_len = _recordedVoiceData.length;
                    NSData *tempData = [_recordedVoiceData subdataWithRange:NSMakeRange(0, recorder_len)];
                    [tempData getBytes:audioData length:recorder_len];
                    tempData = nil;
                    NSInteger remainLength = _recordedVoiceData.length - recorder_len;
                    NSRange range = NSMakeRange(recorder_len, remainLength);
                    [_recordedVoiceData setData:[_recordedVoiceData subdataWithRange:range]];
                    emptyCount = 0;
                    return recorder_len;
                } else {
                    if (emptyCount++ >= 50) {
                        TLog(@"_recordedVoiceData length = %lu! empty 50times.", (unsigned long)_recordedVoiceData.length);
                        emptyCount = 0;
                    }
                    return 0;
                }
            }
        }
        return 0;
    }
  • onNuiEventCallback: NUI SDK event callback. Do not call SDK interfaces from within this event callback, as it may cause a deadlock.

    -(void)onNuiEventCallback:(NuiCallbackEvent)nuiEvent
                       dialog:(long)dialog
                    kwsResult:(const char *)wuw
                    asrResult:(const char *)asr_result
                     ifFinish:(bool)finish
                      retCode:(int)code {
        TLog(@"onNuiEventCallback event %d finish %d", nuiEvent, finish);
        if (nuiEvent == EVENT_ASR_PARTIAL_RESULT || nuiEvent == EVENT_ASR_RESULT) {
            // asr_result contains the task_id, which is helpful for troubleshooting. Please record and save it.
            TLog(@"ASR RESULT %s finish %d", asr_result, finish);
            NSString *result = [NSString stringWithUTF8String:asr_result];
        } else if (nuiEvent == EVENT_ASR_ERROR) {
            // In EVENT_ASR_ERROR, asr_result is the error message. Combining it with the error code and task_id makes troubleshooting easier. Please record and save this information.
            TLog(@"EVENT_ASR_ERROR error[%d], error mesg[%s]", code, asr_result);
            // You can get complete information by calling nui_get_all_response.
            const char *response = [_nui nui_get_all_response];
            if (response != NULL) {
                TLog(@"GET ALL RESPONSE: %s", response);
            }
        } else if (nuiEvent == EVENT_MIC_ERROR) {
            TLog(@"MIC ERROR");
            // The recording module provided in older versions of the sample project is for reference only. You can rewrite the recording module based on your business needs.
            // [_voiceRecorder stop:YES];
            // [_voiceRecorder start];
            // The new version of the sample project provides a new recording module. It is for reference only. You can rewrite the recording module based on your business needs.
            [_audioController stopRecorder:NO];
            [_audioController startRecorder];
        }
        // If 'finish' is true, the current task has ended (either successfully or due to an error), and you can start a new recognition.
        if (finish) {
        }
        return;
    }

End recognition

[_nui nui_dialog_cancel:NO];

FAQ

Header not found

Ensure you are importing the header file correctly: #import .

SDK for Apsara Stack

Yes. Although the Android and iOS SDKs are not included in the default Apsara Stack environment installation, you can download them from service documentation in the Alibaba Cloud Help Center, such as the Android SDK and iOS SDK for Real-Time Speech Recognition. The mobile SDKs can be used to call public cloud ASR and speech synthesis (TTS) services and can also be used in an Apsara Stack environment.

Background processing support

The SDK itself does not have foreground or background limitations. By default, the sample project for the iOS SDK only supports foreground processing. To enable background processing, perform the following steps:

  1. In your project's Info.plist, add the Required background modes key. Add an Item to this key with the value App plays audio or streams audio/video using AirPlay. This enables the background audio mode.

  2. In the _appResignActive method of the NLSVoiceRecorder.m file, comment out the AudioSessionSetActive(NO) line to prevent recording from stopping when the app enters the background:

    - (void)_unregisterForBackgroundNotifications {
        [[NSNotificationCenter defaultCenter] removeObserver:self];
    }
    - (void)_appResignActive {
        _inBackground = true;
    //    AudioSessionSetActive(NO);
    }
    - (void)_appEnterForeground {
        _inBackground = false;
    }
    @end

"No suitable image" error

We recommend that you delete the app from your phone, run xcode clean, and then try to run the app again. You should also check if the signature is correct. If it is incorrect, revoke the original in-house certificate, create a new certificate and provisioning profile, re-sign the code, and then package the app again.

Microphone error

Check if the recording device is currently in use by another application.

Build error after integration

This issue is usually caused by a problem with the SDK import. Please confirm whether the following parameters are checked. If they are, we recommend that you change the header file import method to #import <nuisdk/NeoNui.h>. In the General page of your Xcode project, find the Frameworks, Libraries, and Embedded Content section, and set the Embed option for nuisdk.framework to Embed & Sign.

Framework built for simulator

This error can occur due to an Xcode version mismatch. To resolve it, change the Validate Workspace setting in your project configuration to Yes and then recompile.

Undefined symbols with Flutter

You can open the Podfile in your iOS project, modify the code in the post_install do |installer| section, and then run the build again to complete it successfully.

# Other code omitted
post_install do |installer|
  installer.pods_project.targets.each do |target|
    flutter_additional_ios_build_settings(target)
    target.build_configurations.each do |config|
      config.build_settings["EXCLUDED_ARCHS[sdk=iphonesimulator*]"] = "arm64"
    end
  end
end

Microphone conflict with TRTC

We recommend that you try the TRTC audio and video stream: use localStream.getAudioTrack to obtain a MediaStreamTrack object, convert it to an audio stream that meets ASR standards, and then send a request using the Speech Recognition SDK.

Unsupported architectures error

This error occurs because the framework contains simulator architectures, which are not allowed in App Store submissions. Follow these steps to check the framework's architectures and remove the ones for the simulator.

  1. Navigate to the framework's directory.

  2. Run the command lipo -info xxxFramework to check the architectures of the framework. If a simulator architecture is included, you must remove it.

Legacy build system issue

This issue can often be resolved by enabling workspace validation. In your project configuration, change the Validate Workspace setting to Yes and then recompile.