This document explains how to use the Android NUI SDK for Alibaba Cloud Intelligent Speech Interaction. It covers SDK installation, key API operations, and code examples.
Prerequisites
Before using the SDK, read the API reference.
You have created a project and obtained an Appkey. For more information, see Create a project.
You have obtained an access token. For more information, see Overview of token acquisition.
Download and install
Select and download the mobile SDK.
ImportantAfter downloading the SDK, replace the placeholders in the sample initialization code with your Alibaba Cloud account credentials, Appkey, and access token to run the demo.
Unzip the package. In the
app/libsdirectory, find the SDK package in AAR format and add it as a dependency to your project. If you need to use the Android C++ API, the dynamic library and header files are in the android_libs and android_include directories of the ZIP package.Open the project in Android Studio to view the sample code. The sample code for real-time speech recognition is in the SpeechTranscriberActivity.java file. Replace the Appkey and access token, then run the project directly.
Key APIs
initialize: Initializes the SDK.
/** * Initializes the SDK. The SDK is a singleton. Before re-initializing, you must release the previous instance. * Do not call this method on the UI thread, as it may cause blocking. * @param callback: The event listener callback. For more information, see the callback descriptions below. * @param parameters: The initialization parameters in a JSON string format. For more information, see the parameter description below or the API reference: https://help.aliyun.com/document_detail/173528.html. * @param level: The log level. A lower value generates more detailed logs. * @param save_log: Specifies whether to save logs to a file. The logs are stored in the directory specified by the debug_path field in the ticket. Note: Log files have no size limit. Monitor disk space to prevent it from filling up. * @return An integer that indicates the operation status. For a list of possible error codes, see https://help.aliyun.com/document_detail/459864.html. */ public synchronized int initialize(final INativeNuiCallback callback, String parameters, final Constants.LogLevel level, final boolean save_log)The INativeNuiCallback type includes the following callbacks.
onNuiAudioStateChanged: Enables or disables the recording feature based on the audio state.
/** * When operations like start, stop, or cancel are called, the SDK uses this callback to notify the app to enable or disable recording. * @param state: The required state for the recorder (open/close). */ void onNuiAudioStateChanged(AudioState state);onNuiNeedAudioData: Provides audio data in the callback.
/** * After recognition starts, the SDK invokes this callback continuously. Your app must provide audio data in this callback. * @param buffer: The buffer to store the audio data. * @param len: The number of bytes of audio data to provide. * @return: The actual number of bytes provided. */ int onNuiNeedAudioData(byte[] buffer, int len);onNuiEventCallback: The SDK event callback.
/** * The main event callback for the SDK. * @param event: The callback event. For more information, see the event list below. * @param resultCode: The error code. This is valid when an EVENT_ASR_ERROR event occurs. * @param arg2: A reserved parameter. * @param kwsResult: The result of keyword spotting (not currently supported). * @param asrResult: The speech recognition result. */ void onNuiEventCallback(NuiEvent event, final int resultCode, final int arg2, KwsResult kwsResult, AsrResult asrResult);onNuiAudioRMSChanged: The audio energy level callback.
/** * The audio energy level callback. * @param val: The energy value of the audio data. The range is from -160 to 0. This is typically used for UI animations. */ public void onNuiAudioRMSChanged(float val);Event list:
Parameter
Description
EVENT_VAD_START
Indicates that the start of speech is detected.
EVENT_VAD_END
Indicates that the end of speech is detected.
EVENT_ASR_PARTIAL_RESULT
The intermediate result of speech recognition.
EVENT_ASR_ERROR
An error occurred. Determine the cause based on the error code.
EVENT_MIC_ERROR
Indicates a recording error. This event is triggered if the SDK receives no audio data for two consecutive seconds. Verify that the recording system is working correctly.
EVENT_SENTENCE_START
An event for real-time speech recognition that indicates the start of a sentence is detected.
EVENT_SENTENCE_END
An event for real-time speech recognition that indicates the end of a sentence is detected. A complete result for the sentence is returned.
EVENT_SENTENCE_SEMANTICS
Not in use.
EVENT_TRANSCRIBER_COMPLETE
The final event, which is triggered after speech recognition stops.
setParams: Sets SDK parameters in JSON format.
/** * Sets parameters in JSON format. * @param params: For more information, see API reference: https://help.aliyun.com/document_detail/173528.html. * @return: For more information, see Error codes: https://help.aliyun.com/document_detail/459864.html. */ public synchronized int setParams(String params);startDialog: Starts recognition.
/** * Starts recognition. * @param vad_mode: The VAD mode. For recognition, use P2T. * @param dialog_params: The dialog parameters in a JSON string format. For more information, see API reference: https://help.aliyun.com/document_detail/173528.html. * @return: For more information, see Error codes: https://help.aliyun.com/document_detail/459864.html. */ public synchronized int startDialog(VadMode vad_mode, String dialog_params);stopDialog: Stops recognition.
/** * Stops recognition. When this method is called, the server returns the final recognition result and ends the task. * @return: For more information, see Error codes: https://help.aliyun.com/document_detail/459864.html. */ public synchronized int stopDialog();cancelDialog: Immediately stops recognition.
/** * Immediately stops recognition. When this method is called, the task ends immediately without waiting for the server to return a final recognition result. * @return: For more information, see Error codes: https://help.aliyun.com/document_detail/459864.html. */ public synchronized int cancelDialog();release: Releases the SDK.
/** * Releases SDK resources. * @return: For more information, see Error codes: https://help.aliyun.com/document_detail/459864.html. */ public synchronized int release();GetVersion: Gets the current SDK version information.
/** * Gets the current SDK version information. * @return: A string containing the SDK version information. */ public synchronized String GetVersion();
Procedure
Initialize the SDK and the audio recorder instance.
Set parameters based on your business requirements.
Call
startDialogto start recognition.Open the audio recorder based on the
onNuiAudioStateChangedaudio state callback.Provide audio data in the
onNuiNeedAudioDatacallback.The event callbacks indicate sentence progress:
EVENT_SENTENCE_STARTsignals the start,EVENT_ASR_PARTIAL_RESULTprovides intermediate results, andEVENT_SENTENCE_ENDprovides the final result for the sentence.Call
stopDialogto stop recognition. Use theEVENT_TRANSCRIBER_COMPLETEevent callback to confirm that recognition has stopped.When you are finished, call
release()to free SDK resources.
ProGuard configuration
If your code uses obfuscation, add the following configuration to your proguard-rules.pro file:
-keep class com.alibaba.idst.nui.*{*;}Code examples
If you need multiple instances, you can create them directly by using the new operator. You can also get a singleton instance by using GetInstance.
NUI SDK initialization
// Get the resource path, which is the workspace.
// A workspace is created internally by using context.getApplicationContext().getFilesDir().toString() + "/asr_my",
// for example, /data/user/0/mit.alibaba.nuidemo/files/asr_my.
String workspace = CommonUtils.getModelPath(this);
// Create the debug path.
String debug_path = getExternalCacheDir().getAbsolutePath() + "/debug_" + System.currentTimeMillis();
Utils.createDir(debug_path);
// Copy asset resources from nuisdk.aar to the workspace.
CommonUtils.copyAssetsData(this);
// Initialize the SDK. Note: You must fill in your ID information in genInitParams to use the SDK.
NativeNui nui_instance = new NativeNui();
int ret = nui_instance.initialize(this, genInitParams(asset_path,debug_path), Constants.LogLevel.LOG_LEVEL_VERBOSE, true);The genInitParams method generates a JSON string that contains the resource directory and user information. The user information includes the following fields.
private String genInitParams(String workpath, String debugpath) {
String str = "";
try{
// Obtain account access credentials:
// The getTicket method in the sample project provides several possible ways to get credentials. Choose a secure method that suits your business.
//
// Note:
// Before you use the speech interaction service, you must prepare an account and activate the relevant services. For more information, see:
// https://help.aliyun.com/zh/isi/getting-started/start-here
//
// Primary account:
// The account (RAM user) information includes an AccessKey ID (hereafter ak_id) and an AccessKey Secret (hereafter ak_secret).
// Never store this account information in app code or on the mobile client. Otherwise, credential leakage may cause unexpected charges.
//
// STS temporary credential:
// Because sending primary account credentials to a client poses a security risk, Alibaba Cloud provides Security Token Service (STS) for temporary access management.
// STS uses your primary AccessKey ID and AccessKey secret to generate a temporary sts_ak_id, sts_ak_secret, and sts_token.
// (To distinguish between primary account credentials and STS temporary credentials, the prefix sts_ is used for credentials generated by STS.)
// What is STS: https://help.aliyun.com/zh/ram/product-overview/what-is-sts
// STS SDK overview: https://help.aliyun.com/zh/ram/developer-reference/sts-sdk-overview
// STS Python SDK call example: https://help.aliyun.com/zh/ram/developer-reference/use-the-sts-openapi-example
//
// Account requirements:
// To use offline features (offline text-to-speech, keyword spotting), you must provide an Appkey, ak_id, and ak_secret, or an Appkey, sts_ak_id, sts_ak_secret, and sts_token.
// To use online features (text-to-speech, real-time speech recognition, Short Sentence Recognition, Audio File Transcription, etc.), you only need to provide an Appkey and an access token.
JSONObject object = Auth.getTicket(Auth.GetTicketMethod.GET_TOKEN_FROM_SERVER_FOR_ONLINE_FEATURES);
if (!object.containsKey("token")) {
Log.e(TAG, "Cannot get token!!!");
}
object.put("device_id", Utils.getDeviceId()); // Required. We recommend that you provide a unique ID to help troubleshoot issues.
object.put("url", "wss://nls-gateway.cn-shanghai.aliyuncs.com:443/ws/v1"); // Default URL.
object.put("workspace", workpath); // Required. The path must have read and write permissions.
// This parameter takes effect only when the save_log parameter is set to true during SDK initialization. It specifies whether to save audio for debugging. This data is saved in the debug directory. Make sure the debug_path is valid and writable.
//object.put("save_wav", "true");
// The debug directory. When the save_log parameter is set to true during SDK initialization, this directory is used to save intermediate audio files.
object.put("debug_path", debugpath);
object.put("service_mode", Constants.ModeFullCloud); // Required.
str = object.toString();
} catch (JSONException e) {
e.printStackTrace();
}
Log.i(TAG, "InsideUserContext:" + str);
return str;
}Parameter settings
Set parameters in a JSON string format.
// Set recognition-related parameters. For more information, see the API documentation.
// Call this method after initialize() and before startDialog().
nui_instance.setParams(genParams());
private String genParams() {
String params = "";
try {
JSONObject nls_config = new JSONObject();
nls_config.put("enable_intermediate_result", true);
// You can configure parameters based on your business requirements.
// For API details, see https://help.aliyun.com/document_detail/173528.html
// See section 2. Start recognition.
// nls_config.put("enable_punctuation_prediction", true);
// nls_config.put("enable_inverse_text_normalization", true);
// nls_config.put("max_sentence_silence", 800);
// nls_config.put("enable_words", false);
// nls_config.put("sample_rate", 16000);
// nls_config.put("sr_format", "opus");
/* If a parameter is not in the documentation but is supported by the feature, you can use the following universal interface to set it. */
// JSONObject extend_config = new JSONObject();
// extend_config.put("custom_test", true);
// nls_config.put("extend_config", extend_config);
JSONObject parameters = new JSONObject();
parameters.put("nls_config", nls_config);
parameters.put("service_type", Constants.kServiceTypeSpeechTranscriber); // Required.
// You can set HttpDns if it is available.
//parameters.put("direct_ip", Utils.getDirectIp());
params = parameters.toString();
} catch (JSONException e) {
e.printStackTrace();
}
return params;
}Start recognition
Start listening by calling the startDialog operation.
nui_instance.startDialog(Constants.VadMode.TYPE_P2T, genDialogParams());
private String genDialogParams() {
String params = "";
try {
JSONObject dialog_param = new JSONObject();
// You can update parameters, especially an expired token, when calling startDialog.
//dialog_param.put("token", "");
params = dialog_param.toString();
} catch (JSONException e) {
e.printStackTrace();
}
return params;
}Callback handling
onNuiAudioStateChanged: The audio state callback. The SDK maintains the recording state internally. Use this callback to enable or disable the audio recorder.
public void onNuiAudioStateChanged(Constants.AudioState state) { Log.i(TAG, "onNuiAudioStateChanged"); if (state == Constants.AudioState.STATE_OPEN) { Log.i(TAG, "audio recorder start"); mAudioRecorder.startRecording(); } else if (state == Constants.AudioState.STATE_CLOSE) { Log.i(TAG, "audio recorder close"); mAudioRecorder.release(); } else if (state == Constants.AudioState.STATE_PAUSE) { Log.i(TAG, "audio recorder pause"); mAudioRecorder.stop(); } }onNuiNeedAudioData: The audio data callback. Provide audio data in this callback.
public int onNuiNeedAudioData(byte[] buffer, int len) { int ret = 0; if (mAudioRecorder.getState() != AudioRecord.STATE_INITIALIZED) { Log.e(TAG, "audio recorder not init"); return -1; } ret = mAudioRecorder.read(buffer, 0, len); // The return value tells the SDK how much data was read. // A value less than 0 indicates an error. // A value of 0 indicates no audio data. If 0 is returned for 2 consecutive seconds, an EVENT_MIC_ERROR is triggered. return ret; }onNuiEventCallback: The NUI SDK event callback. Do not call other SDK operations from within this callback to avoid a potential deadlock.
public void onNuiEventCallback(Constants.NuiEvent event, final int resultCode, final int arg2, KwsResult kwsResult, AsrResult asrResult) { Log.i(TAG, "event=" + event + " resultCode=" + resultCode); // asrResult includes task_id. task_id helps troubleshoot issues. Record and save it. // // In newer versions, asrResult.allResponse is added. If it is not null or empty, it provides the complete information in a JSON-formatted string. if (event == Constants.NuiEvent.EVENT_TRANSCRIBER_COMPLETE) { // Real-time recognition is complete. } else if (event == Constants.NuiEvent.EVENT_ASR_PARTIAL_RESULT) { // Example: Display the intermediate recognition result for the current sentence. showText(asrView, asrResult.asrResult); } else if (event == Constants.NuiEvent.EVENT_SENTENCE_END) { // Example: Display the complete recognition result for the current sentence. showText(asrView, asrResult.asrResult); } else if (event == Constants.NuiEvent.EVENT_ASR_ERROR) { // In an EVENT_ASR_ERROR, asrResult contains error information. Use it with the error code from resultCode and the task_id to facilitate troubleshooting. We recommend that you record and save this information. } else if (event == Constants.NuiEvent.EVENT_MIC_ERROR) { // EVENT_MIC_ERROR indicates that no audio data has been received for 2 seconds. Check your recording-related code and permissions, or see if the recording module is being used by another application. } else if (event == Constants.NuiEvent.EVENT_DIALOG_EX) { /* unused */ // You can ignore this event. } }
Stop recognition
nui_instance.stopDialog();Release the SDK
nui_instance.release();