iOS SDK

更新时间:
复制 MD 格式

This document explains how to use the iOS NUI SDK from Alibaba Cloud Intelligent Speech Interaction, covering SDK download, installation, key APIs, and code examples.

The SDK does not support CocoaPods integration.

Prerequisites

  • Before using the SDK, read the API Reference.

  • Obtain an app key for your project. For details, see Create a project.

  • Obtain an access token. For details, see Obtain a token.

Download and install

  1. Select and download a mobile SDK.

    Important

    After downloading the SDK, you must replace the placeholder Alibaba Cloud account information, app key, and token in the sample initialization code to run the demo.

    For easier integration, starting from version 2.5.14, the iOS APIs use a pure Objective-C interface and no longer use a C++ hybrid interface.

  2. Unzip the ZIP package.

    Add the nuisdk.framework file from the ZIP package to your project. In your project's Build Phases, add nuisdk.framework to the Link Binary With Libraries section.

  3. Open the project in Xcode. The project provides reference code and reusable utility classes for tasks such as audio playback, recording, and file operations, which you can copy directly into your project. The sample code for real-time speech transcription is in SpeechTranscriberViewController. You can run the sample after replacing the placeholder app key and token.

Key APIs

  • nui_initialize: Initializes the SDK.

    /**
     * Initializes the SDK. The SDK is a singleton. Release the current instance before re-initializing. Do not call this on the UI thread, as it may cause blocking.
     * @param parameters: Initialization parameters. For more information, see the API Reference: https://help.aliyun.com/en/isi/developer-reference/overview-4
     * @param level: The log level. A smaller value produces more detailed logs.
     * @param save_log: Specifies whether to save logs to a file. The log file is stored in the directory specified by the debug_path field in the parameters. Note: Log files have no size limit. Be aware that continuous logging can fill up disk space.
     * @return A NuiResultCode value.
     */
    -(NuiResultCode) nui_initialize:(const char *)parameters
                           logLevel:(NuiSdkLogLevel)level
                            saveLog:(BOOL)save_log;
  • nui_set_params: Sets SDK parameters in JSON format.

    /**
     * Sets parameters in JSON format.
     * @param params: For parameter details, see the API Reference: https://help.aliyun.com/en/isi/developer-reference/overview-4
     * @return A NuiResultCode value.
     */
    -(NuiResultCode) nui_set_params:(const char *)params;
  • nui_dialog_start: Starts recognition.

    /**
     * Starts recognition.
     * @param vad_mode: Specifies the VAD mode. For recognition scenarios, use P2T.
     * @param dialog_params: Sets recognition parameters. This is optional. For details, see the API Reference: https://help.aliyun.com/en/isi/developer-reference/overview-4
     * @return A NuiResultCode value.
     */
    -(NuiResultCode) nui_dialog_start:(NuiVadMode)vad_mode
                          dialogParam:(const char *)dialog_params;
  • nui_dialog_cancel: Stops recognition.

    /**
     * Stops recognition. After this API is called, the server returns the final recognition result and ends the task.
     * @param force: If true, forces the task to end immediately and discards the final result. If false, stops the task but waits for the final result to be returned.
     * @return A NuiResultCode value.
     */
    -(NuiResultCode) nui_dialog_cancel:(BOOL)force;
  • nui_release: Releases the SDK.

    /**
     * Releases SDK resources.
     * @return A NuiResultCode value.
     */
    -(NuiResultCode) nui_release;
  • nui_get_version: Gets the current SDK version.

    /**
     * Gets the current SDK version.
     * @return The SDK version as a string.
     */
    -(const char*) nui_get_version;
  • nui_get_all_response: Gets all information for the current event callback.

    /**
     * Gets all information for the current event callback.
     * @return All event information as a JSON string.
     */
    -(const char*) nui_get_all_response;
  • NeoNuiSdkDelegate

    onNuiEventCallback: SDK event callback.

    /**
     * The main event callback for the SDK.
     * @param event: The callback event. See the event list below.
     * @param dialog: The session ID. Currently not in use.
     * @param wuw: Used for the wake-up word feature (currently not supported).
     * @param asr_result: The speech recognition result.
     * @param finish: A flag indicating whether the recognition task is complete.
     * @param resultCode: See error codes. This is valid when an EVENT_ASR_ERROR event occurs.
     */
    -(void) onNuiEventCallback:(NuiCallbackEvent)nuiEvent
                        dialog:(long)dialog
                     kwsResult:(const char *)wuw
                     asrResult:(const char *)asr_result
                      ifFinish:(BOOL)finish
                       retCode:(int)code;

    NuiCallbackEvent event list:

    Name

    Description

    EVENT_VAD_START

    Start of speech detected.

    EVENT_VAD_END

    End of speech detected.

    EVENT_ASR_PARTIAL_RESULT

    Intermediate speech recognition result.

    EVENT_ASR_RESULT

    Final speech recognition result.

    EVENT_ASR_ERROR

    An error occurred. Use the error code to determine the cause.

    EVENT_MIC_ERROR

    Recording error. This indicates that the SDK has not received any audio for 2 consecutive seconds. Verify that the recording system is working correctly.

    EVENT_SENTENCE_START

    Real-time speech recognition event. The start of a sentence is detected.

    EVENT_SENTENCE_END

    Real-time speech recognition event. The end of a sentence is detected, and the complete result for the sentence is returned.

    EVENT_SENTENCE_SEMANTICS

    Currently not in use.

    EVENT_TRANSCRIBER_COMPLETE

    Reported after speech recognition is stopped.

    onNuiNeedAudioData: Provides audio data to the SDK.

    /**
     * When recognition starts, this callback is invoked continuously. Your app needs to provide audio data in this callback.
     * @param audioData:  A buffer to fill with audio data.
     * @param len: The requested number of bytes of audio data.
     * @return The number of bytes actually written to the buffer.
     */
    -(int) onNuiNeedAudioData:(char *)audioData length:(int)len;

    onNuiAudioStateChanged: Enables or disables the recording function based on the audio state.

    /**
     * When APIs such as start, stop, or cancel are called, the SDK uses this callback to notify the app to start or stop recording.
     * @param state: The required state for the recorder (open/closed).
     */
    -(void) onNuiAudioStateChanged:(NuiAudioState)state;

    onNuiRmsChanged: Audio energy event.

    /**
     * SDK event callback for audio energy.
     * @param rms: The audio energy level, ranging from -160 to 0.
     */
    -(void) onNuiRmsChanged:(float) rms;

Procedure

  1. Initialize the SDK and your audio recorder.

  2. Configure parameters based on your business requirements.

  3. Call nui_dialog_start to start recognition.

  4. Start the audio recorder in response to the onNuiAudioStateChanged callback.

  5. Provide audio data in the onNuiNeedAudioData callback.

  6. Retrieve recognition results from the EVENT_ASR_PARTIAL_RESULT and EVENT_SENTENCE_END event callbacks.

  7. Call nui_dialog_cancel to stop recognition.

  8. When you are finished, call the nui_release API to release SDK resources.

Code examples

Note

By default, the API uses the get_instance method to obtain a singleton. If you require multiple instances, you can also create them using alloc.

NUI SDK initialization

BOOL save_log = NO;
NSString * initParam = [self genInitParams];
[_nui nui_initialize:[initParam UTF8String] logLevel:LOG_LEVEL_VERBOSE saveLog:save_log];

The genInitParams method generates a JSON string that contains the resource directory and user information. The user information includes the following fields.

-(NSString*) genInitParams {
    NSString *strResourcesBundle = [[NSBundle mainBundle] pathForResource:@"Resources" ofType:@"bundle"];
    NSString *bundlePath = [[NSBundle bundleWithPath:strResourcesBundle] resourcePath];
    NSString *debug_path = [_utils createDir];

    NSMutableDictionary *ticketJsonDict = [NSMutableDictionary dictionary];

    // Obtain an access credential.
    //  The getTicket method in the sample project provides several possible ways to obtain a credential. Choose a secure method that fits your business needs.
    //
    // Note:
    //  Before you use the Intelligent Speech Interaction service, you must create an account and activate the service. For detailed steps, see:
    //    https://help.aliyun.com/en/isi/getting-started/start-here
    //
    // Permanent credentials:
    //  Your permanent credentials include your AccessKey ID (ak_id) and AccessKey Secret (ak_secret).
    //  These credentials must never be stored in your app's code or on the client side to prevent leaks and potential financial loss.
    //
    // STS temporary credentials:
    //  To avoid the security risks of distributing permanent credentials to clients, Alibaba Cloud provides the Security Token Service (STS).
    //  STS generates temporary credentials (sts_ak_id, sts_ak_secret, and sts_token) from your permanent AccessKey ID and AccessKey Secret.
    //  (The 'sts_' prefix is used to distinguish temporary STS credentials from permanent ones).
    // What is STS: https://help.aliyun.com/en/ram/product-overview/what-is-sts
    // STS SDK overview: https://help.aliyun.com/en/ram/developer-reference/sts-sdk-overview
    // STS Python SDK example: https://help.aliyun.com/en/ram/developer-reference/use-the-sts-openapi-example
    //
    // Credential requirements:
    //  For offline features (such as offline Text-to-Speech and wake-up word), you must provide either an app_key, ak_id, and ak_secret, or an app_key, sts_ak_id, sts_ak_secret, and sts_token.
    //  For online features (such as Text-to-Speech, real-time speech transcription, short-form speech recognition, and audio file transcription), you only need to provide an app_key and a token.
    [_utils getTicket:ticketJsonDict Type:get_token_from_server_for_online_features];
    if ([ticketJsonDict objectForKey:@"token"] != nil) {
        NSString *tokenValue = [ticketJsonDict objectForKey:@"token"];
        if ([tokenValue length] == 0) {
            TLog(@"The 'token' key exists but the value is empty.");
        }
    } else {
        TLog(@"The 'token' key does not exist.");
    }

    [ticketJsonDict setObject:@"wss://nls-gateway.cn-shanghai.aliyuncs.com:443/ws/v1" forKey:@"url"]; // Default: The endpoint for the China (Shanghai) region.

    // The workspace directory path. The SDK reads configuration files from this path.
    [ticketJsonDict setObject:bundlePath forKey:@"workspace"]; // Required.
    // This parameter takes effect only when the save_log parameter in the SDK initialization is set to true. It specifies whether to save debug audio, which is stored in the directory specified by debug_path. Ensure that debug_path is valid and writable.
    [ticketJsonDict setObject:save_wav ? @"true" : @"false" forKey:@"save_wav"];
    // The debug directory. When the save_log parameter in the SDK initialization is true, this directory is used to save intermediate audio files.
    [ticketJsonDict setObject:debug_path forKey:@"debug_path"];

    // FullCloud = 1 // Use this for online real-time speech recognition.
    [ticketJsonDict setObject:@"1" forKey:@"service_mode"]; // Required.

    NSString *id_string = [[[ASIdentifierManager sharedManager] advertisingIdentifier] UUIDString];
    TLog(@"id: %s", [id_string UTF8String]);
    [ticketJsonDict setObject:id_string forKey:@"device_id"]; // Required. We recommend using a unique device ID to aid in troubleshooting.

    NSData *data = [NSJSONSerialization dataWithJSONObject:ticketJsonDict options:NSJSONWritingPrettyPrinted error:nil];
    NSString * jsonStr = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];
    return jsonStr;
}

Parameter configuration

Set parameters using a JSON string.

-(NSString*) genParams {
    NSMutableDictionary *nls_config = [NSMutableDictionary dictionary];
    [nls_config setValue:@YES forKey:@"enable_intermediate_result"]; // Required.
    // Configure parameters based on your business needs.
    // For API details, see: https://help.aliyun.com/document_detail/173528.html
    // See section "2. Start recognition".

    //[nls_config setValue:@"<Your updated token>" forKey:@"token"];
    //[nls_config setValue:@YES forKey:@"enable_punctuation_prediction"];
    //[nls_config setValue:@YES forKey:@"enable_inverse_text_normalization"];
    //[nls_config setValue:@YES forKey:@"enable_voice_detection"];
    //[nls_config setValue:@10000 forKey:@"max_start_silence"];
    //[nls_config setValue:@800 forKey:@"max_end_silence"];
    //[nls_config setValue:@800 forKey:@"max_sentence_silence"];
    //[nls_config setValue:@NO forKey:@"enable_words"];
    //[nls_config setValue:@16000 forKey:@"sample_rate"];
    //[nls_config setValue:@"opus" forKey:@"sr_format"];

    NSMutableDictionary *dictM = [NSMutableDictionary dictionary];
    [dictM setObject:nls_config forKey:@"nls_config"];
    [dictM setValue:@(SERVICE_TYPE_SPEECH_TRANSCRIBER) forKey:@"service_type"]; // Required.

    // If you are using HTTPDNS, you can configure it here.
    //[dictM setObject:[_utils getDirectIp] forKey:@"direct_ip"];
    
    /* If a parameter is not documented but is supported by the feature, you can use the following generic method to set it. */
    //NSMutableDictionary *extend_config = [NSMutableDictionary dictionary];
    //[extend_config setValue:@YES forKey:@"custom_test"];
    //[dictM setObject:extend_config forKey:@"extend_config"];
    
    NSData *data = [NSJSONSerialization dataWithJSONObject:dictM options:NSJSONWritingPrettyPrinted error:nil];
    NSString * jsonStr = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];
    return jsonStr;
}

NSString * parameters = [self genParams];
[_nui nui_set_params:[parameters UTF8String]];

Start recognition

Call the nui_dialog_start API to start listening.

-(NSString*) genDialogParams {
    NSMutableDictionary *dialog_params = [NSMutableDictionary dictionary];
    // During runtime, you can update temporary parameters, such as an expired token, when calling nui_dialog_start.
    // Note: If you do not set parameters for the next dialog, the parameters from the initialization will be used.
    //[dialog_params setValue:@"" forKey:@"app_key"];
    //[dialog_params setValue:@"" forKey:@"token"];
    
    NSData *data = [NSJSONSerialization dataWithJSONObject:dialog_params options:NSJSONWritingPrettyPrinted error:nil];
    NSString * jsonStr = [[NSString alloc]initWithData:data encoding:NSUTF8StringEncoding];
    return jsonStr;
}

// To modify the token or other parameters, see genDialogParams().
NSString * parameters = [self genDialogParams];
// To use VAD mode, you must set the nls_config parameter to enable online VAD mode (see genParams()).
[_nui nui_dialog_start:MODE_P2T dialogParam:[parameters UTF8String]];

Callback handling

  • onNuiAudioStateChanged: The audio state callback. The SDK maintains the recording state internally. Use this callback to start or stop the recorder.

    -(void)onNuiAudioStateChanged:(NuiAudioState)state{
        TLog(@"onNuiAudioStateChanged state=%u", state);
        if (state == STATE_CLOSE || state == STATE_PAUSE) {
            // The recording module provided in the old sample project is for reference only. You can rewrite it to fit your business needs.
            // [_voiceRecorder stop:YES];
    
            // The recording module provided in the new sample project is for reference only. You can rewrite it to fit your business needs.
            [_audioController stopRecorder:NO];
        } else if (state == STATE_OPEN){
            self.recordedVoiceData = [NSMutableData data];
            // The recording module provided in the old sample project is for reference only. You can rewrite it to fit your business needs.
            // [_voiceRecorder start];
    
            // The recording module provided in the new sample project is for reference only. You can rewrite it to fit your business needs.
            [_audioController startRecorder];
        }
    }
  • onNuiNeedAudioData: The audio data callback. Provide the recorded audio data in this callback.

    -(int)onNuiNeedAudioData:(char *)audioData length:(int)len {
        static int emptyCount = 0;
        @autoreleasepool {
            @synchronized(_recordedVoiceData){
                if (_recordedVoiceData.length > 0) {
                    int recorder_len = 0;
                    if (_recordedVoiceData.length > len)
                        recorder_len = len;
                    else
                        recorder_len = _recordedVoiceData.length;
                    NSData *tempData = [_recordedVoiceData subdataWithRange:NSMakeRange(0, recorder_len)];
                    [tempData getBytes:audioData length:recorder_len];
                    tempData = nil;
                    NSInteger remainLength = _recordedVoiceData.length - recorder_len;
                    NSRange range = NSMakeRange(recorder_len, remainLength);
                    [_recordedVoiceData setData:[_recordedVoiceData subdataWithRange:range]];
                    emptyCount = 0;
                    return recorder_len;
                } else {
                    if (emptyCount++ >= 50) {
                        TLog(@"_recordedVoiceData length = %lu! empty 50times.", (unsigned long)_recordedVoiceData.length);
                        emptyCount = 0;
                    }
                    return 0;
                }
    
            }
        }
        return 0;
    }
  • onNuiEventCallback: The NUI SDK event callback. To avoid deadlocks, do not call other SDK APIs from within this callback.

    -(void)onNuiEventCallback:(NuiCallbackEvent)nuiEvent
                       dialog:(long)dialog
                    kwsResult:(const char *)wuw
                    asrResult:(const char *)asr_result
                     ifFinish:(bool)finish
                      retCode:(int)code {
        TLog(@"onNuiEventCallback event %d finish %d", nuiEvent, finish);
        if (nuiEvent == EVENT_ASR_PARTIAL_RESULT || nuiEvent == EVENT_SENTENCE_END) {
            // asr_result contains the task_id, which is useful for troubleshooting. Record and save it.
            TLog(@"ASR RESULT %s finish %d", asr_result, finish);
            NSString *result = [NSString stringWithUTF8String:asr_result];
        } else if (nuiEvent == EVENT_ASR_ERROR) {
            // For an EVENT_ASR_ERROR, asr_result contains the error message. Use it along with the error code and the task_id to troubleshoot issues. Record and save this information.
            TLog(@"EVENT_ASR_ERROR error[%d], error mesg[%s]", code, asr_result);
    
            // You can call nui_get_all_response to get the complete response details.
            const char *response = [_nui nui_get_all_response];
            if (response != NULL) {
                TLog(@"GET ALL RESPONSE: %s", response);
            }
        } else if (nuiEvent == EVENT_MIC_ERROR) {
            TLog(@"MIC ERROR");
            // The recording module provided in the old sample project is for reference only. You can rewrite it to fit your business needs.
            // [_voiceRecorder stop:YES];
            // [_voiceRecorder start];
    
            // The recording module provided in the new sample project is for reference only. You can rewrite it to fit your business needs.
            [_audioController stopRecorder:NO];
            [_audioController startRecorder];
        }
    
        // A value of true for 'finish' indicates the end of a task lifecycle (either due to an error or successful completion). You can now start a new recognition task.
        if (finish) {
        }
        
        return;
    }

Stop recognition

[_nui nui_dialog_cancel:NO];

FAQ

Mobile SDKs and private cloud

Yes, SDKs are available for both Android and iOS. While not included in the default private cloud installation, you can download them from the Alibaba Cloud Help Center. These mobile SDKs can call services in both public cloud and private cloud environments, including real-time speech recognition and Text-to-Speech (TTS).

iOS background processing

The SDK itself does not have foreground or background limitations. The sample project for the iOS SDK supports only foreground processing by default. To enable background processing, make the following changes:

  1. In your project's Info.plist file, add the Required background modes setting. Under this setting, add an Item and set its value to App plays audio or streams audio/video using AirPlay.

  2. To continue recording when the application enters the background, do not stop the recording. This means that in the NLSVoiceRecorder.m file, the stop recording function in the _appResignActive method must not be called. To do this, comment out the AudioSessionSetActive(NO) call:

    - (void)_unregisterForBackgroundNotifications {
        [[NSNotificationCenter defaultCenter] removeObserver:self];
    }
    
    - (void)_appResignActive {
        _inBackground = true;
    //    AudioSessionSetActive(NO);
    }
    
    - (void)_appEnterForeground {
        _inBackground = false;
    }
    
    @end

"No suitable image found" error

We recommend that you delete the corresponding app from your phone, run xcode clean, and then try to run the app again. Additionally, you need to check if the signature is correct. If the signature is incorrect, revoke the original in-house certificate, create a new certificate and provisioning profile, re-sign the code, and package the app again.

Microphone error on iOS

Verify that the recording device is not being used by another application.

Framework import error

This issue is typically caused by a problem with the SDK import. Check whether the following parameters are selected, and if they are, change the header file import method to #import <nuisdk/NeoNui.h>. In Xcode, select the project Target, go to the General tab, find nuisdk.framework in the Frameworks, Libraries, and Embedded Content section, and set its Embed option to Embed & Sign.

"Built for iOS + iOS Simulator" error

This may be caused by an incompatible version. In your project's build settings, set Validate Workspace to Yes and then recompile.

Undefined symbols error with Flutter

Open the Podfile in your iOS project, modify the post_install do |installer| block as shown below, and then run the build again.

# Other code is omitted.

post_install do |installer|
  installer.pods_project.targets.each do |target|
    flutter_additional_ios_build_settings(target)
    target.build_configurations.each do |config|
      config.build_settings["EXCLUDED_ARCHS[sdk=iphonesimulator*]"] = "arm64"
    end
  end
end

Microphone conflict with TRTC

You can use the audio and video stream from TRTC, use localStream.getAudioTrack to obtain a MediaStreamTrack object, convert the object into an audio stream that meets ASR standards, and then send a request by using the speech recognition SDK.

Legacy build system requirement

In your project's build settings, set Validate Workspace to Yes and then recompile.

"Unsupported Architectures" error

This is likely caused by the simulator architectures included in the framework. Follow these steps to check the framework's architectures and remove the ones for the simulator.

  1. Navigate to the framework's directory.

  2. Run the command lipo -info xxxFramework to check the architectures of the framework. If the simulator architecture is included, you must remove it.