This document provides a detailed guide on how to use the Fun-ASR real-time speech recognition Android SDK to convert speech into text.
User guide: For model descriptions and selection recommendations, see Real-Time Speech Recognition - Fun-ASR/Gummy/Paraformer.
Getting Started
-
Download the SDK and run the sample code:
-
Unzip the ZIP file. Obtain the AAR-formatted SDK from the
app/libsfolder and add it to your project dependencies.
If you need Android C++ integration, use theandroid_libsandandroid_includefolders in the ZIP package to obtain dynamic libraries and header files. -
Open the project in Android Studio. The sample code is located in
DashFunAsrSpeechTranscriberActivity.java. Replace the API key to test the feature.
Call Steps
-
Initialize the SDK.
-
Set parameters based on your business needs. Use the
parametersparameter of the initialize method to set connection and control parameters. Use the setParams method to set speech recognition effect parameters. -
Call startDialog to start the recognition process.
-
In the onNuiAudioStateChanged callback, enable the audio recording device based on the audio state.
-
In the onNuiNeedAudioData callback, continuously provide recorded audio data.
-
In the onNuiEventCallback callback, listen for events and retrieve speech recognition results.
-
Call stopDialog to stop the recognition. You can confirm that the recognition has ended by listening for the EVENT_TRANSCRIBER_COMPLETE event.
-
When you no longer need the recognition feature, call the release method to release SDK resources.
Request Parameters
Connection and Control Parameters
You can configure these parameters by passing a JSON string through the parameters argument of the initialize method.
-
Parameter example: The following code shows a sample JSON string. Not all parameters are listed. You can add other parameters as needed during implementation.
{ "url": "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference", "apikey": "st-****", "device_id": "my_device_id", "service_mode": "1" } -
Parameter description
Parameter
Type
Required
Description
urlStringYes
Service endpoint. Fixed as
wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference.apikeyStringYes
API key.
service_modeStringYes
Operation mode. For real-time speech recognition, this must be
"1".device_idStringYes
A unique string identifying the end user. Set this to an in-app user ID or a client-generated device identifier. This ID is used primarily for log tracing and troubleshooting.
debug_pathStringNo
Storage path for log files.
This parameter takes effect only when
save_logis set to true in the initialize call. You must specify a log file path in this case, or an error occurs.The system keeps at most two local log files.
save_wavStringNo
Whether to save audio files for debugging. Audio files are saved under the
debug_pathdirectory.Default value: "false".
Valid values:
-
"true": yes
-
"false": no
This parameter takes effect only when
save_logis set to true in the initialize call. You must also setdebug_path.max_log_file_sizeintNo
Maximum size (in bytes) for each log file.
This parameter takes effect only when
save_logis set to true in the initialize call.Default value: 104857600 (100 × 1024 × 1024 bytes, or 100 MiB).
log_track_levelintNo
Controls the filter level of log content sent to an external destination through the onNuiLogTrackCallback callback.
Default value: 2.
Valid values:
-
0: LOG_LEVEL_VERBOSE
-
1: LOG_LEVEL_DEBUG
-
2: LOG_LEVEL_INFO
-
3: LOG_LEVEL_WARNING
-
4: LOG_LEVEL_ERROR
-
5: LOG_LEVEL_NONE (disables this feature)
Note:
log_track_levelandlevel(set via the initialize interface) jointly determine which logs are finally sent through the callback. A log entry is sent only if its level number is greater than or equal to bothlog_track_levelandlevel. For example, iflog_track_levelis set to 2 (INFO) andlevelis set to 3 (WARNING), only logs at WARNING level or higher (numeric value ≥ 3) are sent. -
Speech Recognition Effect Parameters
You can configure these parameters by passing a JSON string through the params argument of the setParams method.
-
Parameter example: The following code shows a sample JSON string. Not all parameters are listed. You can add other parameters as needed during implementation.
{ "service_type": 4, "nls_config": { "model": "fun-asr-realtime", "sr_format": "pcm", "sample_rate": 16000, "parameters": { "speech_noise_threshold": 0.0 } } } -
Parameter description
Top-level parameter
Type
Required
Description
service_typeintYes
Voice service type. For real-time speech recognition, this must be
4.nls_configobjectYes
Core speech recognition configuration object containing key parameters such as model selection and recognition behavior control.
nls_config.modelstringYes
Speech recognition model.
nls_config.sr_formatstringYes
Audio format for recognition.
Supported formats: pcm, wav, opus.
Important-
opus: Must be PCM-encoded. The SDK internally encodes it into OPUS format.
-
wav/pcm: Must be PCM-encoded.
nls_config.sample_rateintYes
Audio sampling rate in Hz.
Only 16000 Hz is supported.
nls_config.semantic_punctuation_enabledbooleanNo
Sets sentence segmentation mode.
Default value: false.
Valid values:
-
true: Enables semantic punctuation and disables VAD (Voice Activity Detection) segmentation.
-
false: Enables VAD segmentation and disables semantic punctuation.
Semantic punctuation offers higher accuracy and suits meeting transcription scenarios. VAD segmentation has lower latency and suits real-time interactive scenarios.
nls_config.max_sentence_silenceintNo
VAD (Voice Activity Detection) silence threshold in milliseconds for sentence segmentation.
Default value: 800.
Valid range: [200, 6000].
When silence after speech exceeds this threshold, the system considers the sentence complete.
This parameter takes effect only when
semantic_punctuation_enabledis false.nls_config.multi_threshold_mode_enabledbooleanNo
Enables protection against overly long VAD segmentation. When enabled, it prevents VAD from cutting sentences too long.
Default value: false (disabled).
Valid values:
-
true: enabled
-
false: disabled
This parameter takes effect only when
semantic_punctuation_enabledis false.nls_config.heartbeatbooleanNo
Is a persistent connection to the server-side maintained?
Default value: false.
Valid values:
-
true: Keeps the connection alive even when sending continuous silence audio.
-
false: Closes the connection after 60 seconds of inactivity due to timeout. This 60-second timeout is a server-side default and cannot be configured on the client.
nls_config.vocabulary_idstringNo
ID of a hotword vocabulary list to improve recognition accuracy for specific terms. For instructions on using hotwords, see Customize Hotwords.
nls_config.language_hintsarray[string]No
Sets the language codes for recognition. If the language is unknown in advance, leave this parameter unset and the model will identify it automatically.
The system reads only the first value in the array and ignores all other values.
Supported language codes by model:
-
fun-asr-realtime, fun-asr-realtime-2025-11-07:
-
zh: Chinese
-
en: English
-
ja: Japanese
-
-
fun-asr-realtime-2025-09-15:
-
zh: Chinese
-
en: English
-
nls_config.parametersobjectNo
Configures additional parameters as a JSON object.
nls_config.parameters.speech_noise_thresholdfloatNo
Adjusts the speech-noise detection threshold to control VAD sensitivity.
Range: [-1.0, 1.0].
Guidelines:
-
Near -1: Lowers the noise threshold — more noise may be transcribed as speech.
-
Near +1: Raises the noise threshold — some speech may be filtered out as noise.
ImportantThis is an advanced parameter. Adjustments can significantly affect recognition quality.
-
Test thoroughly before adjusting.
-
Make small adjustments (step size 0.1) based on your audio environment.
-
Key Interfaces
NativeNui
initialize
Initializes the speech recognition SDK instance. The SDK uses the singleton pattern. Do not reinitialize the instance before you call release.
This method is a blocking method. You must call it from a non-UI thread.
-
Method signature
public synchronized int initialize(final INativeNuiCallback callback, String parameters, final Constants.LogLevel level, final boolean save_log) -
Parameter description
Parameter
Type
Description
callbackImplementation of the event and data callback interface.
parametersStringJSON string containing authentication, connection, and debugging parameters. See Connection and Control Parameters.
levelConstants.LogLevelControls the SDK's internal log printing level.
save_logbooleanWhether to save local logs. If set to
true, you must specify a path usingdebug_pathin Connection and Control Parameters and optionally set file size usingmax_log_file_size. -
Return value description
Returns an error code. For more information, see Error Code Reference.
setParams
Sets the speech recognition effect parameters in JSON format. You must call this method before you call startDialog.
-
Method signature
public synchronized int setParams(String params) -
Parameter description
Parameter
Type
Description
paramsString -
Return value description
Returns an error code. For more information, see Error Code Reference.
startDialog
Starts the recognition process.
-
Method signature
public synchronized int startDialog(VadMode vad_mode, String dialog_params) -
Parameter description
Parameter
Type
Description
vad_modeVadModeVAD mode. Fixed as
VadMode.TYPE_P2T.dialog_paramsStringIf the
apikeyparameter in Connection and Control Parameters uses a temporary API key, update it here when it expires.Format as JSON:
{ "apikey": "st-****" } -
Return value description
Returns an error code. For more information, see Error Code Reference.
stopDialog
Ends the recognition. After you call this method, the server returns the final recognition result and ends the task.
-
Method signature
public synchronized int stopDialog(); -
Return value description
Returns an error code. For more information, see Error Code Reference.
cancelDialog
Immediately ends the recognition. After you call this method, the task ends without waiting for the final recognition result from the server.
-
Method signature
public synchronized int cancelDialog(); -
Return value description
Returns an error code. For more information, see Error Code Reference.
release
Releases all internal SDK resources. After this method is called, the SDK instance becomes unusable. To use the instance again, you must reinitialize it by calling initialize.
-
Method signature
public synchronized int release(); -
Return value description
Returns an error code. For more information, see Error Code Reference.
GetVersion
Retrieves the current SDK version information.
-
Method signature
public synchronized String GetVersion(); -
Return value description
The current SDK version information.
INativeNuiCallback: Listener Callbacks
onNuiEventCallback: Listen for Events and Speech Recognition Results
-
Method signature
void onNuiEventCallback(NuiEvent event, final int resultCode, final int arg2, KwsResult kwsResult, AsrResult asrResult); -
Parameter description
Parameter
Type
Description
eventCallback event.
resultCodeintError code. Valid only for the EVENT_ASR_ERROR event.
asrResultAsrResultSpeech recognition result.
kwsResultKwsResultWake-word detection result. Ignore this parameter.
arg2intReserved parameter.
onNuiAudioStateChanged: Listen for Audio State
The SDK uses this callback to notify your application when to start or stop recording.
-
Method signature
void onNuiAudioStateChanged(AudioState state); -
AudioState description
State
Description
STATE_OPENInteraction started. Open the audio recording device.
STATE_PAUSEInteraction stopped. Stop recording.
STATE_CLOSESDK instance released. Fully close the audio recording device.
onNuiNeedAudioData: Provide Audio Data for Recognition
After the recognition starts, this callback is triggered continuously. You must provide audio data within this callback.
-
Method signature
int onNuiNeedAudioData(byte[] buffer, int len); -
Parameter description
Parameter
Type
Description
bufferbyte[]Audio data to fill.
lenintNumber of bytes of audio data to fill.
-
Return value description
The actual number of bytes that are filled.
onNuiLogTrackCallback: Listen for Trace Logs
This callback receives detailed internal SDK logs that can be used for troubleshooting and debugging.
default void onNuiLogTrackCallback(Constants.LogLevel level, String log)
NuiEvent: Event Types
|
Event |
Description |
|
EVENT_TRANSCRIBER_STARTED |
Task started successfully. |
|
EVENT_VAD_START |
Triggered immediately after task start. Does not indicate voice onset detection. |
|
EVENT_VAD_END |
Voice endpoint detected. |
|
EVENT_ASR_PARTIAL_RESULT |
Intermediate speech recognition result. |
|
EVENT_ASR_ERROR |
Error occurred during speech recognition. |
|
EVENT_MIC_ERROR |
Triggered after receiving no audio data for 2 consecutive seconds. |
|
EVENT_SENTENCE_END |
End of a sentence detected. Returns a complete recognition result for that sentence. |
|
EVENT_TRANSCRIBER_COMPLETE |
Speech recognition completed. |