This topic describes how to use the HarmonyOS Next NUI software development kit (SDK) from Alibaba Cloud Voice Service. This topic covers SDK download and installation, key interfaces, and code examples.
Prerequisites
Before you use the SDK, read the API reference. For more information, see API reference.
You have obtained a project AppKey. For more information, see Create a project.
You have obtained an Access Token. For more information, see Overview of obtaining a token.
Download and installation
Download harmony_neonui_sdk.tar.gz.
ImportantAfter you download the package, you must replace the sample Alibaba Cloud account information, AppKey, and Token in the initialization code to run the application.
Category
Compatibility
System
Supports HarmonyOS Next 5.0, API LEVEL 12, and DevEco Studio version 5.0.3.403
Architecture
arm64-v8a
This SDK also includes the following features:
Feature
Support
Short sentence recognition
Yes
Real-time speech recognition
Yes
Speech synthesis
Yes
Real-time long-text speech synthesis
Yes
Streaming text-to-speech
Yes
Offline speech synthesis
No
Rapid transcription for audio files
Yes
Wake words and command words
No
Tingwu real-time stream ingest
Yes
Integrate the SDK as an arkts HAR package. After you decompress the package, you can find the HAR package file generated by the SDK at entry/libs/neonui.har. Import and call this file in your project. To connect using HarmonyOS Next C++, you can obtain the dynamic library and header files from the native/libs and native/include directories in the compressed package.
Open the project using DevEco Studio. The sample code for real-time speech recognition is in the SpeechTranscriberPage.ets file. Replace the AppKey and Token of the UserKey class in the UserKey.ets file, and then run the project.
Key SDK interfaces
initialize: Initializes the SDK.
/**
* Initialize the SDK. You can create multiple SDK instances. Release an instance before re-initializing it. Do not call this method in the UI thread to prevent blocking.
* @param callback: The event listener callback. For more information, see the callbacks described below.
* @param parameters: The initialization parameters in a JSON string. For more information, see the description below or the API reference at https://help.aliyun.com/document_detail/173298.html.
* @param level: The log printing level. A smaller value indicates that more logs are printed.
* @param save_log: Specifies whether to save logs to a file. The logs are stored in the directory specified by the debug_path field in the ticket. Note: Log files have no size limit. Continuous storage may fill up the disk.
* @return: For more information about error codes, see https://help.aliyun.com/document_detail/459864.html.
*/
public initialize(callback:INativeNuiCallback ,
parameters:string ,
level:number ,
save_log:boolean=false ):numberThe INativeNuiCallback interface type includes the following callbacks.
onNuiAudioStateChanged: Enables or disables the recording feature based on the audio state.
/** * When interfaces such as start, stop, or cancel are called, the SDK uses this callback to notify the application to enable or disable recording. * @param state: The required state for recording (open, stop, or close). */ onNuiAudioStateChanged:(state:Constants.AudioState)=>voidonNuiNeedAudioData: Provides audio data in the callback. Note: Because asynchronous interface calls are frequent in ArkTS, we recommend that you do not use this callback to provide audio data. Instead, you can actively call updateAudio() to sequentially pass audio data to the SDK.
/** * When recognition starts, this callback is called continuously. The application needs to fill in the audio data in the callback. * @param buffer: The storage area to fill with audio. * @return: The actual number of bytes filled. */ onNuiNeedAudioData:(buffer:ArrayBuffer)=>number;onNuiEventCallback: The SDK event callback.
/** * The main SDK event callback. * @param event: The callback event. For more information, see the event list below. * @param resultCode: The error code. This is valid when an EVENT_ASR_ERROR event occurs. * @param arg2: A reserved parameter. * @param kwsResult: The wake word feature (not currently supported). * @param asrResult: The speech recognition result. */ onNuiEventCallback:(event:Constants.NuiEvent, resultCode:number, arg2:number, kwsResult:KwsResult, asrResult:AsrResult)=>void;onNuiAudioRMSChanged: The audio energy level callback.
/** * The audio energy level callback. * @param val: The audio data energy level callback. The range is -160 to 0. It is generally used for displaying voice animation effects on the UI. */ onNuiAudioRMSChanged:(val:number)=>number;Event list:
Name
Description
EVENT_VAD_START
The start of human speech is detected.
EVENT_VAD_END
The end of human speech is detected.
EVENT_ASR_PARTIAL_RESULT
An intermediate result of speech recognition.
EVENT_ASR_ERROR
Determine the cause of the error based on the error code information.
EVENT_MIC_ERROR
A recording error. This indicates that the SDK has not received any audio for 2 consecutive seconds. Check if the recording system is working correctly.
EVENT_SENTENCE_START
A real-time speech recognition event. This indicates that the start of a sentence is detected.
EVENT_SENTENCE_END
A real-time speech recognition event. This indicates that the end of a sentence is detected and a complete result is returned.
EVENT_SENTENCE_SEMANTICS
Not in use.
EVENT_TRANSCRIBER_COMPLETE
The final event after stopping speech recognition.
setParams: Sets SDK parameters in JSON format.
/** * Sets parameters in JSON format. * @param params: For more information, see the API reference at https://help.aliyun.com/document_detail/173298.html. * @return: For more information about error codes, see https://help.aliyun.com/document_detail/459864.html. */ public setParams(params:string):numberstartDialog: Starts recognition.
/** * Starts recognition. * @param vad_mode: Multiple modes are available. For recognition scenarios, use P2T. * @param dialog_params: The dialogue parameters in a JSON string. For more information, see the API reference at https://help.aliyun.com/document_detail/173298.html. * @return: For more information about error codes, see https://help.aliyun.com/document_detail/459864.html. */ public startDialog(vad_mode:Constants.VadMode, dialog_params:string):numberstopDialog: Ends recognition.
/** * Ends recognition. After this interface is called, the server returns the final recognition result and ends the task. * @return: For more information about error codes, see https://help.aliyun.com/document_detail/459864.html. */ public stopDialog():numbercancelDialog: Ends recognition immediately.
/** * Ends recognition immediately. After this interface is called, the task ends immediately without waiting for the server to return the final recognition result. * @return: For more information about error codes, see https://help.aliyun.com/document_detail/459864.html. */ public cancelDialog():numberrelease: Releases the SDK.
/** * Releases SDK resources. * @return: For more information about error codes, see https://help.aliyun.com/document_detail/459864.html. */ public release():numberGetVersion: Obtains the current SDK version information.
/** * Gets the current SDK version information. * @return: The SDK version information as a string. */ public GetVersion():string
Procedure
Create an instance of the SDK class object.
Initialize the SDK and the audio recording instance.
Set parameters as needed.
Call startDialog to start recognition.
Open the audio recorder based on the event in the onNuiAudioStateChanged audio state callback.
Call updateAudio() to provide audio data to the SDK.
The EVENT_SENTENCE_START event callback indicates that the recognition of a sentence has started. You can obtain the intermediate recognition result from the EVENT_ASR_PARTIAL_RESULT event callback. You can obtain the complete recognition result and related information for the sentence from the EVENT_SENTENCE_END event callback.
Call stopDialog to end the recognition. You can confirm that recognition has stopped from the EVENT_TRANSCRIBER_COMPLETE event callback.
After the call ends, call the release interface to release SDK resources.
Code examples
If you have multiple requirements, you can also directly use a new object. You can also use GetInstance to obtain a singleton.
NUI SDK initialization
// Define the NativeNuiCallbackHandle class to implement the INativeNuiCallback interface.
class NativeNuiCallbackHandle implements INativeNuiCallback{
// Implement the five interface functions in INativeNuiCallback.
// Omitted here.
}
let context = getContext(this) as common.UIAbilityContext;
this.filesDir = context.filesDir;
this.resourceDir = context.resourceDir;
// Obtain the resource path. Because the resource files are stored in the resfiles directory of the project, use the resfiles directory in the sandbox path.
let asset_path:string = this.resourceDir+"/resources_cloud"
// Because you cannot directly operate the device directory, set the debug path to the public directory filesDir in the sandbox path where the application is located.
let debug_path:string = this.filesDir
// Initialize the SDK. Note that you must fill in the relevant ID information in genInitParams before you can use it.
cbhandle:NativeNuiCallbackHandle = new NativeNuiCallbackHandle()
g_asrinstance:NativeNui = new NativeNui(Constants.ModeType.MODE_DIALOG, "asr")
let ret:number = this.g_asrinstance.initialize(this.cbhandle, this.genInitParams(asset_path,debug_path), Constants.LogLevel.LOG_LEVEL_VERBOSE, false);
console.info("result = " + ret);
if (ret == Constants.NuiResultCode.SUCCESS) {
console.error(`call g_asrinstance.initialize() return success`);
} else {
// Throw an error message.
console.error(`call g_asrinstance.initialize() return error:${ret}`);
}The genInitParams function generates a JSON string that contains the resource directory and user information. The user information includes the following fields.
genInitParams(workpath:string, debugpath:string):string {
let str:string = "";
// Method to obtain a token:
// Use the Map type to store data in JSON format. You can also use your own JSON implementation.
let object:Map<string, string|number|boolean|object> = new Map();
// Account and project creation
// To learn how to obtain an ak_id, ak_secret, and app_key, see https://help.aliyun.com/document_detail/72138.html
object.set("app_key", "Your own app_key"); // Required
// Method 1:
// First, to learn how to obtain an ak_id, ak_secret, and app_key, see https://help.aliyun.com/document_detail/72138.html
// Then, see https://help.aliyun.com/document_detail/466615.html and use Method 1 to obtain a temporary credential.
// Description: The remote server generates a temporary credential with a validity period and sends it to the mobile client. This ensures that the ak_id and ak_secret are not disclosed.
// Method to obtain a token (runs on the application server): https://help.aliyun.com/document_detail/450255.html?spm=a2c4g.72153.0.0.79176297EyBj4k
object.set("token", "Your own token"); // Required
// Method 2:
// Obtaining temporary credentials using STS is not supported.
// Method 3: (Strongly not recommended due to the risk of Alibaba Cloud account credential leakage)
// Refer to the implementation of the Auth class to access the Alibaba Cloud Token service on the client to obtain the SDK. Do not store the AK/SK in the local or client-side environment.
// Advantage: The client obtains the token without the need to build an application server.
// Disadvantage: The client obtains the AK/SK information, which can be easily leaked.
// JSONObject object = Auth.getAliYunTicket();
object.set("device_id", "The unique ID of your device"); // Required. We recommend that you enter a unique ID to help locate problems.
object.set("url", "wss://nls-gateway.cn-shanghai.aliyuncs.com/ws/v1"); // Default
object.set("workspace", workpath); // Required, and read and write permissions are needed.
// This parameter takes effect when the save_log parameter is set to true during SDK initialization. It specifies whether to save audio for debugging. The data is saved in the debug directory. Make sure that debug_path is valid and writable.
// object.put("save_wav", "true");
// The debug directory. When the save_log parameter is set to true during SDK initialization, this directory is used to save intermediate audio files.
object.set("debug_path", debugpath);
// FullMix = 0 // Select this mode to enable local features and register for authentication.
// FullCloud = 1
// FullLocal = 2 // Select this mode to enable local features and register for authentication.
// AsrMix = 3 // Select this mode to enable local features and register for authentication.
// AsrCloud = 4
// AsrLocal = 5 // Select this mode to enable local features and register for authentication.
// Short sentence recognition
console.log("init asr for real-time speech recognition")
object.set("service_mode", Constants.ModeFullCloud); // Required. This is the first of three configuration differences between real-time speech recognition and short sentence recognition.
str = MapToJson(object) // Convert the JSON format to a string.
console.info("configinfo genInitParams:" + str);
return str;
}
function MapToJson(map:Map<string, string|number|boolean|object>):string {
let obj:object = Object({});
map.forEach( (value, key) => {
obj[key] = value;
});
return JSON.stringify(obj)
}Parameter settings
Set the parameters in a JSON string.
// Set recognition parameters. For more information, see the API reference.
// Call this after initialize() and before startDialog().
nui_instance.setParams(genParams());
genParams():string {
let params:string = "";
let nls_config:Map<string, string|number|boolean|object> = new Map();
nls_config.set("enable_intermediate_result", true);
// Configure the parameters as needed.
// For API reference, see https://help.aliyun.com/document_detail/173528.html
// See 2. Start recognition.
// This is the third of three configuration differences between real-time speech recognition and short sentence recognition. You do not need to set VAD-related parameters.
nls_config.set("enable_punctuation_prediction", true);
nls_config.set("enable_inverse_text_normalization", true);
// nls_config.set("customization_id", "test_id");
// nls_config.set("vocabulary_id", "test_id");
// nls_config.put("enable_words", false);
// nls_config.set("sample_rate", 16000);
// nls_config.set("sr_format", "opus");
let parameters:Map<string, string|number|boolean|object> = new Map();
parameters.set("nls_config", Object( JSON.parse(MapToJson(nls_config)) ) );
// Short sentence recognition
console.log("start asr for real-time speech recognition")
parameters.set("service_type", Constants.kServiceTypeSpeechTranscriber); // Required. This is the second of three configuration differences between real-time speech recognition and short sentence recognition.
params = MapToJson(parameters);//parameters.toString();
console.log("configinfo genParams" + params)
return params;
}Start recognition
Use the startDialog interface to start listening.
// By default, Constants.VadMode.TYPE_P2T is used.
// Constants.VadMode.TYPE_VAD is supported only in SDKs with offline features. To start VAD, set the enable_voice_detection parameter.
nui_instance.startDialog(Constants.VadMode.TYPE_P2T, genDialogParams());
genDialogParams():string {
let params:string = "";
let dialog_param:Map<string, string|number|boolean|object> = new Map();
// During runtime, you can update temporary parameters when calling startDialog, especially for expired tokens.
// Note: If you do not set parameters for the next round of dialogue, the parameters passed during initialization are used.
// dialog_param.put("app_key", "");
// dialog_param.put("token", "");
params = MapToJson(dialog_param);
console.info("configinfo dialog params: " + params);
return params;
}Push audio data
updateAudio: In the callback function registered in on('readData') of AudioCapturer, you can directly call the updateAudio interface to send the audio data to the SDK.
//g_asrinstance.updateAudio(buffer,false) /* The 'readData' interface registered in AudioCapturer is AudioCapturer.readDataCallback. *AudioCapturer.audioCapturer.on('readData', AudioCapturer.readDataCallback); */ class AudioCapturer{ static readDataCallback = (buffer: ArrayBuffer) => { console.log(`${TAG} read data bytelength is ${buffer.byteLength}. uid[${process.uid}] pid[${process.pid}] tid[${process.tid}]`); AudioCapturer.g_asrinstance.updateAudio(buffer,false) } }
Callback handling
onNuiAudioStateChanged: The recording state callback. The SDK maintains the recording state internally. When this callback is triggered, you can enable or disable the audio recorder based on the state in this callback.
/* For IDE versions earlier than 5.0.3.403 in the HarmonyOS development environment, if you register a callback [on("readData",)] to read audio data using the AudioCapturer module, the callback may not be triggered if you call start() immediately after stop(). In this case, you must follow the (stop, release) and then (createAudioCapturer, start) procedure for it to work correctly. This issue is resolved in IDE version 5.0.3.403. Therefore, the calls to the create/release related interfaces are commented out in the following sample code. */ onNuiAudioStateChanged(state:Constants.AudioState):void { console.info(`womx onUsrNuiAudioStateChanged(${state})`) if (state === Constants.AudioState.STATE_OPEN){ console.info(`womx onUsrNuiAudioStateChanged(${state}) audio recorder start`) //AudioCapturer.init(g_asrinstance) AudioCapturer.start() console.info(`womx onUsrNuiAudioStateChanged(${state}) audio recorder start done`) } else if (state === Constants.AudioState.STATE_CLOSE){ console.info(`womx onUsrNuiAudioStateChanged(${state}) audio recorder close`) AudioCapturer.stop() //AudioCapturer.release() console.info(`womx onUsrNuiAudioStateChanged(${state}) audio recorder close done`) } else if (state === Constants.AudioState.STATE_PAUSE){ console.info(`womx onUsrNuiAudioStateChanged(${state}) audio recorder pause`) AudioCapturer.stop() //AudioCapturer.release() console.info(`womx onUsrNuiAudioStateChanged(${state}) audio recorder pause done`) } }onNuiNeedAudioData: The audio data callback. Fill in the audio data in this callback.
public int onNuiNeedAudioData(byte[] buffer, int len) { console.info(`warning,this callback should not be called in HarmonyOS Next`) return 0; }onNuiEventCallback: The NUI SDK event callback. Do not call SDK interfaces in the event callback to avoid deadlocks.
onNuiEventCallback(event:Constants.NuiEvent, resultCode:number, arg2:number, kwsResult:KwsResult, asrResult:AsrResult):void { console.log("onUsrNuiEventCallback event is " + event); // asrResult contains a task_id, which helps with troubleshooting. Record and save it. // // The new version adds asrResult.allResponse. If it is not nullptr and not empty, it provides the complete information in a JSON string. if (event == Constants.NuiEvent.EVENT_TRANSCRIBER_COMPLETE) { // For example, display the recognition result. showText(asrView, asrResult.asrResult); } else if (event == Constants.NuiEvent.EVENT_ASR_PARTIAL_RESULT || event === Constants.NuiEvent.EVENT_SENTENCE_END) { if (event === Constants.NuiEvent.EVENT_ASR_PARTIAL_RESULT ) { // For example, display the intermediate recognition result of the current sentence. this.message = "EVENT_ASR_PARTIAL_RESULT" } else if(event === Constants.NuiEvent.EVENT_SENTENCE_END){ // For example, display the complete recognition result of the current sentence. this.message = "EVENT_SENTENCE_END" } showText(asrView, asrResult.asrResult); } else if (event == Constants.NuiEvent.EVENT_ASR_ERROR) { // In EVENT_ASR_ERROR, asrResult contains error information. Using it with the error code resultCode and the task_id makes troubleshooting easier. Record and save them. } else if (event == Constants.NuiEvent.EVENT_MIC_ERROR) { // EVENT_MIC_ERROR indicates that no audio data has been received for 2 seconds. Check the recording-related code, permissions, or whether the recording module is occupied by another application. } else if (event == Constants.NuiEvent.EVENT_DIALOG_EX) { /* unused */ // You can ignore this event. } // Parse the ASR recognition result. if (asrResult) { let asrinfo:string = "" asrinfo = asrResult.asrResult if (asrinfo) { try { let asrresult_json:object|null = JSON.parse(asrResult.asrResult) if (asrresult_json) { let payload:object|null = asrresult_json["payload"]; if (payload) { //console.log(JSON.stringify(payload)) let asrmessage:string = payload["result"]; // Parse the recognition result returned from the cloud. } } } catch (e){ console.error("got asrinfo not json, so do not refresh asrinfo." + JSON.stringify(e)) } } } }
End recognition
nui_instance.stopDialog();Release the SDK
nui_instance.release();