Fun-ASR Real-Time Speech Recognition Android SDK-Alibaba Cloud Model Studio(Model Studio)-阿里云帮助中心

This document provides a detailed guide on how to use the Fun-ASR real-time speech recognition Android SDK to convert speech into text.

User guide: For model descriptions and selection recommendations, see Real-Time Speech Recognition - Fun-ASR/Gummy/Paraformer.

Getting Started

Obtain your API key and API host.
Download the SDK and run the sample code:
- Download the latest SDK package.
- Unzip the ZIP file. Obtain the AAR-formatted SDK from the app/libs folder and add it to your project dependencies.
  If you need Android C++ integration, use the android_libs and android_include folders in the ZIP package to obtain dynamic libraries and header files.
- Open the project in Android Studio. The sample code is located in DashFunAsrSpeechTranscriberActivity.java. Replace the API key to test the feature.

Call Steps

Initialize the SDK.
Set parameters based on your business needs. Use the parameters parameter of the initialize method to set connection and control parameters. Use the setParams method to set speech recognition effect parameters.
Call startDialog to start the recognition process.
In the onNuiAudioStateChanged callback, enable the audio recording device based on the audio state.
In the onNuiNeedAudioData callback, continuously provide recorded audio data.
In the onNuiEventCallback callback, listen for events and retrieve speech recognition results.
Call stopDialog to stop the recognition. You can confirm that the recognition has ended by listening for the EVENT_TRANSCRIBER_COMPLETE event.
When you no longer need the recognition feature, call the release method to release SDK resources.

Request Parameters

Connection and Control Parameters

You can configure these parameters by passing a JSON string through the parameters argument of the initialize method.

Parameter example: The following code shows a sample JSON string. Not all parameters are listed. You can add other parameters as needed during implementation.
```
{
    "url": "wss://dashscope.aliyuncs.com/api-ws/v1/inference",
    "apikey": "st-****",
    "device_id": "my_device_id",
    "service_mode": "1"
}
```

Parameter description

Parameter	Type	Required	Description
`url`	`String`	Yes	Service endpoint. Fixed as `wss://dashscope.aliyuncs.com/api-ws/v1/inference`.
`apikey`	`String`	Yes	API key.
`service_mode`	`String`	Yes	Operation mode. For real-time speech recognition, this must be `"1"`.
`device_id`	`String`	Yes	A unique string identifying the end user. Set this to an in-app user ID or a client-generated device identifier. This ID is used primarily for log tracing and troubleshooting.
`debug_path`	`String`	No	Storage path for log files. This parameter takes effect only when `save_log` is set to true in the initialize call. You must specify a log file path in this case, or an error occurs. The system keeps at most two local log files.
`save_wav`	`String`	No	Whether to save audio files for debugging. Audio files are saved under the `debug_path` directory. Default value: "false". Valid values: "true": yes "false": no This parameter takes effect only when `save_log` is set to true in the initialize call. You must also set `debug_path`.
`max_log_file_size`	`int`	No	Maximum size (in bytes) for each log file. This parameter takes effect only when `save_log` is set to true in the initialize call. Default value: 104857600 (100 × 1024 × 1024 bytes, or 100 MiB).
`log_track_level`	`int`	No	Controls the filter level of log content sent to an external destination through the `<a baseurl="t3182723_v1_0_0.xdita" data-node="6223775" data-root="85177" data-tag="xref" href="#9c10968457gc6" id="8388e1e4b7m65">onNuiLogTrackCallback</a>` callback. Default value: 2. Valid values: 0: LOG_LEVEL_VERBOSE 1: LOG_LEVEL_DEBUG 2: LOG_LEVEL_INFO 3: LOG_LEVEL_WARNING 4: LOG_LEVEL_ERROR 5: LOG_LEVEL_NONE (disables this feature) Note: `log_track_level` and `level` (set via the initialize interface) jointly determine which logs are finally sent through the callback. A log entry is sent only if its level number is greater than or equal to both `log_track_level` and `level`. For example, if `log_track_level` is set to 2 (INFO) and `level` is set to 3 (WARNING), only logs at WARNING level or higher (numeric value ≥ 3) are sent.

Speech Recognition Effect Parameters

You can configure these parameters by passing a JSON string through the params argument of the setParams method.

Parameter example: The following code shows a sample JSON string. Not all parameters are listed. You can add other parameters as needed during implementation.

{
    "service_type": 4,
    "nls_config": {
        "model": "fun-asr-realtime",
        "sr_format": "pcm",
        "sample_rate": "16000",
        "parameters": {
            "speech_noise_threshold": 0.0
        }
    }
}

Parameter description

Top-level parameter	Type	Required	Description
`service_type`	`int`	Yes	Voice service type. For real-time speech recognition, this must be `4`.
`nls_config`	`object`	Yes	Core speech recognition configuration object containing key parameters such as model selection and recognition behavior control.
`nls_config.model`	`string`	Yes	Speech recognition supported model.
`nls_config.sr_format`	`string`	Yes	Audio format for recognition. Supported formats: pcm, wav, opus. Important opus: Must be PCM-encoded. The SDK internally encodes it into OPUS format. wav/pcm: Must be PCM-encoded.
`nls_config.sample_rate`	`int`	Yes	Audio sampling rate in Hz. Only 16000 Hz is supported.
`nls_config.semantic_punctuation_enabled`	`boolean`	No	Sets sentence segmentation mode. Default value: false. Valid values: true: Enables semantic punctuation and disables VAD (Voice Activity Detection) segmentation. false: Enables VAD segmentation and disables semantic punctuation. Semantic punctuation offers higher accuracy and suits meeting transcription scenarios. VAD segmentation has lower latency and suits real-time interactive scenarios.
`nls_config.max_sentence_silence`	`int`	No	VAD (Voice Activity Detection) silence threshold in milliseconds for sentence segmentation. Default value: 800. Valid range: [200, 6000]. When silence after speech exceeds this threshold, the system considers the sentence complete. This parameter takes effect only when `semantic_punctuation_enabled` is false.
`nls_config.multi_threshold_mode_enabled`	`boolean`	No	Enables protection against overly long VAD segmentation. When enabled, it prevents VAD from cutting sentences too long. Default value: false (disabled). Valid values: true: enabled false: disabled This parameter takes effect only when `semantic_punctuation_enabled` is false.
`nls_config.heartbeat`	`boolean`	No	Is a persistent connection to the server-side maintained? Default value: false. Valid values: true: Keeps the connection alive even when sending continuous silence audio. false: Closes the connection after 60 seconds of inactivity due to timeout. This 60-second timeout is a server-side default and cannot be configured on the client.
`nls_config.vocabulary_id`	`string`	No	ID of a hotword vocabulary list to improve recognition accuracy for specific terms. For instructions on using hotwords, see Customize Hotwords.
`nls_config.language_hints`	`array[string]`	No	Sets the language codes for recognition. If the language is unknown in advance, leave this parameter unset and the model will identify it automatically. The system reads only the first value in the array and ignores all other values. Supported language codes by model: fun-asr-realtime, fun-asr-realtime-2025-11-07: zh: Chinese en: English ja: Japanese ko: Korean vi: Vietnamese th: Thai id: Indonesian ms: Malay tl: Filipino hi: Hindi ar: Arabic fr: French de: German es: Spanish pt: Portuguese ru: Russian it: Italian nl: Dutch sv: Swedish da: Danish fi: Finnish no: Norwegian el: Greek pl: Polish cs: Czech hu: Hungarian ro: Romanian bg: Bulgarian hr: Croatian sk: Slovak fun-asr-realtime-2026-02-28: zh: Chinese en: English ja: Japanese fun-asr-realtime-2025-09-15: zh: Chinese en: English fun-asr-flash-8k-realtime, fun-asr-flash-8k-realtime-2026-01-28: zh: Chinese
`nls_config.parameters`	`object`	No	Configures additional parameters as a JSON object.
`nls_config.parameters.speech_noise_threshold`	`float`	No	Adjusts the speech-noise detection threshold to control VAD sensitivity. Range: [-1.0, 1.0]. Guidelines: Near -1: Lowers the noise threshold — more noise may be transcribed as speech. Near +1: Raises the noise threshold — some speech may be filtered out as noise. Important This is an advanced parameter. Adjustments can significantly affect recognition quality. Test thoroughly before adjusting. Make small adjustments (step size 0.1) based on your audio environment.

Key Interfaces

NativeNui

initialize

Initializes the speech recognition SDK instance. The SDK uses the singleton pattern. Do not reinitialize the instance before you call release.

This method is a blocking method. You must call it from a non-UI thread.

Method signature

public synchronized int initialize(final INativeNuiCallback callback,
                                   String parameters,
                                   final Constants.LogLevel level,
                                   final boolean save_log)

Parameter description

Parameter	Type	Description
`callback`	`<a baseurl="t3182723_v1_0_0.xdita" data-node="6223775" data-root="85177" data-tag="xref" href="#8b030fec74e01" id="a5f0c1aab6tcr">INativeNuiCallback</a>`	Implementation of the event and data callback interface.
`parameters`	`String`	JSON string containing authentication, connection, and debugging parameters. See Connection and Control Parameters.
`level`	`Constants.LogLevel`	Controls the SDK's internal log printing level.
`save_log`	`boolean`	Whether to save local logs. If set to `true`, you must specify a path using `debug_path` in Connection and Control Parameters and optionally set file size using `max_log_file_size`.

Return value description

Returns an error code. For more information, see Error Code Reference.

setParams

Sets the speech recognition effect parameters in JSON format. You must call this method before you call startDialog.

Method signature

public synchronized int setParams(String params)

Parameter description

Parameter	Type	Description
`params`	`String`	Speech recognition effect parameters.

Return value description

Returns an error code. For more information, see Error Code Reference.

startDialog

Starts the recognition process.

Method signature

public synchronized int startDialog(VadMode vad_mode, String dialog_params)

Parameter description

Parameter

Type

Description

vad_mode

VadMode

VAD mode. Fixed as VadMode.TYPE_P2T.

dialog_params

String

If the apikey parameter in Connection and Control Parameters uses a temporary API key, update it here when it expires.

Format as JSON:

{
  "apikey": "st-****"
}

Return value description

Returns an error code. For more information, see Error Code Reference.

stopDialog

Ends the recognition. After you call this method, the server returns the final recognition result and ends the task.

Method signature
```
public synchronized int stopDialog();
```
Return value description

Returns an error code. For more information, see Error Code Reference.

cancelDialog

Immediately ends the recognition. After you call this method, the task ends without waiting for the final recognition result from the server.

Method signature
```
public synchronized int cancelDialog();
```
Return value description

Returns an error code. For more information, see Error Code Reference.

release

Releases all internal SDK resources. After this method is called, the SDK instance becomes unusable. To use the instance again, you must reinitialize it by calling initialize.

Method signature
```
public synchronized int release();
```
Return value description

Returns an error code. For more information, see Error Code Reference.

GetVersion

Retrieves the current SDK version information.

Method signature

public synchronized String GetVersion();

Return value description

The current SDK version information.

INativeNuiCallback: Listener Callbacks

onNuiEventCallback: Listen for Events and Speech Recognition Results

Method signature

void onNuiEventCallback(NuiEvent event, final int resultCode, final int arg2, KwsResult kwsResult, AsrResult asrResult);

Parameter description

Parameter	Type	Description
`event`	`<a baseurl="t3182723_v1_0_0.xdita" data-node="6223775" data-root="85177" data-tag="xref" href="#981ff433acpmr" id="2fa4cea91ar3z">NuiEvent</a>`	Callback event.
`resultCode`	`int`	Error code. Valid only for the EVENT_ASR_ERROR event.
`asrResult`	`AsrResult`	Speech recognition result.
`kwsResult`	`KwsResult`	Wake-word detection result. Ignore this parameter.
`arg2`	`int`	Reserved parameter.

onNuiAudioStateChanged: Listen for Audio State

The SDK uses this callback to notify your application when to start or stop recording.

Method signature

void onNuiAudioStateChanged(AudioState state);

AudioState description

State	Description
`STATE_OPEN`	Interaction started. Open the audio recording device.
`STATE_PAUSE`	Interaction stopped. Stop recording.
`STATE_CLOSE`	SDK instance released. Fully close the audio recording device.

onNuiNeedAudioData: Provide Audio Data for Recognition

After the recognition starts, this callback is triggered continuously. You must provide audio data within this callback.

Method signature

int onNuiNeedAudioData(byte[] buffer, int len);

Parameter description

Parameter	Type	Description
`buffer`	`byte[]`	Audio data to fill.
`len`	`int`	Number of bytes of audio data to fill.

Return value description

The actual number of bytes that are filled.

onNuiLogTrackCallback: Listen for Trace Logs

This callback receives detailed internal SDK logs that can be used for troubleshooting and debugging.

default void onNuiLogTrackCallback(Constants.LogLevel level, String log)

`NuiEvent`: Event Types

Event	Description
EVENT_TRANSCRIBER_STARTED	Task started successfully.
EVENT_VAD_START	Triggered immediately after task start. Does not indicate voice onset detection.
EVENT_VAD_END	Voice endpoint detected.
EVENT_ASR_PARTIAL_RESULT	Intermediate speech recognition result.
EVENT_ASR_ERROR	Error occurred during speech recognition.
EVENT_MIC_ERROR	Triggered after receiving no audio data for 2 consecutive seconds.
EVENT_SENTENCE_END	End of a sentence detected. Returns a complete recognition result for that sentence.
EVENT_TRANSCRIBER_COMPLETE	Speech recognition completed.

Getting Started

Call Steps

Request Parameters

Connection and Control Parameters

Speech Recognition Effect Parameters

Key Interfaces

NativeNui

initialize

setParams

startDialog

stopDialog

cancelDialog

release

GetVersion

INativeNuiCallback: Listener Callbacks

onNuiEventCallback: Listen for Events and Speech Recognition Results

onNuiAudioStateChanged: Listen for Audio State

onNuiNeedAudioData: Provide Audio Data for Recognition

onNuiLogTrackCallback: Listen for Trace Logs

NuiEvent: Event Types

`NuiEvent`: Event Types