Frequently asked questions about the Intelligent Speech Interaction SDK-Intelligent Speech Interaction(ISI)-阿里云帮助中心

This topic answers common questions about using the SDK.

General questions

Set hotwords with the SDK

Hotwords trained by using the POP API are bound to your project's AppKey through a custom hotword vocabulary that is configured in the console. You do not need to set them in the SDK. However, for a custom hotword vocabulary obtained by using the POP API, you must set its vocabulary ID in the SDK. The hotword settings in the SDK have a higher priority and will override the console settings. For more information, see Use an SDK to set a custom hotword vocabulary. This topic describes how to set hotwords for Short Sentence Recognition, Real-time Speech Recognition, and Audio File Transcription.

"DNS resolved timeout" error

Check the nameserver setting in the /etc/resolv.conf file. We recommend adding the following configuration and giving it the highest priority: nameserver 114.114.114.114.

Set a custom language model with the SDK

If you create a custom language model in the console, you can select the model when you switch models for a project. After you publish the model, it is automatically bound to the Appkey and you do not need to configure it in your code. However, if you obtain a custom language model by using a POP API, you must set its model ID in the SDK to use the model. For more information about how to set a custom language model for Short Sentence Recognition, Real-time Speech Recognition, and Audio File Transcription, see Set a custom language model by using SDK 2.0.

Android and iOS SDKs for Apsara Stack

Yes, SDKs are available but are not included in the Apsara Stack installation package by default. You can download them from the service documentation on the Alibaba Cloud Help Center, such as the Android SDK and iOS SDK for Real-time Speech Recognition. These mobile SDKs can call public cloud ASR and Text-to-Speech services and can also be used in an Apsara Stack environment.

Use a token

On the public cloud, you can share a single token across projects, processes, and threads. You must obtain a new token before the current one expires. For security reasons, we recommend integrating the token SDK on your server and having your client applications fetch the token from your server. For more information, see Obtain a token.

In Apsara Stack, the token is currently set to default and does not require changes.

New token invalidation

No, obtaining a new token does not invalidate an existing one. A token's validity is determined solely by its expiration time. You can get the expiration time from the ExpireTime parameter in the server's response message. For more information, see Token Protocol Description.

Effect of token expiration on active NlsClient instances

A token expiration does not affect any NlsClient instance that has already been created. Existing instances continue to function normally after a token expires.

For reliable SDK integration, follow these best practices:

Use a single global NlsClient instance for the lifetime of your application. Do not create a new NlsClient for each request.
Implement automatic token refresh: Proactively obtain a new token before the current one expires and update it in the NlsClient. Do not wait for a token to expire before refreshing.
Do not call SDK methods from within a callback function: Calling SDK interfaces directly inside a callback can cause a deadlock. Use a separate thread if you need to trigger SDK actions in response to a callback event.

For more information about token management, see Get a token.

Service IP allowlist

We do not provide an IP allowlist because the server IPs change dynamically with scaling and machine replacements, making any static list quickly outdated. Instead, we recommend adding access rules for the following domains on ports 80 and 443: nls-meta.cn-shanghai.aliyuncs.com and nls-gateway-cn-shanghai.aliyuncs.com.

Control logging and save audio

The SDK provides multi-level log control, which you can configure in the initialization API.

It also allows you to save audio data to help diagnose issues. To do this, set the save_wav and debug_path initialization parameters. For more information, see API Reference.

Note

The save_wav and debug_path parameters for Real-time Speech Recognition have the same meaning as those for Short Sentence Recognition.

API call limitations

The SDK encapsulates access to the speech services. You only need to call the start method and handle the appropriate events in the callbacks, typically for errors and recognition results. Do not call SDK methods directly within a callback, as this may cause a deadlock.

Why start() must be called before send()

You must always call start() before calling send(). The required sequence is:

Call start() to establish a connection to the speech service.
Wait for start() to return successfully, or wait for the on_start callback to confirm the connection is established.
Only then call send() to stream audio data to the server.

Key behaviors:

start() is a synchronous blocking call. It waits until the connection is fully established before returning. Do not call send() while start() is still in progress.
Do not close the connection before the task completes. Wait for the server to signal completion (for example, a recognition-complete event) before closing the connection.

Common error: Calling send() without first calling start() causes a sequencing error or connection failure, because no connection exists to receive the audio data.

Framework linking issues

The framework's code is a mix of Objective-C and C++. Therefore, you must use a file with a .mm extension to call it. Also, ensure that the header and library search paths in your project are set correctly.

Diagnose SDK errors

When the SDK reports an error, check the error code first. It usually provides a general idea of the problem.

Supported programming languages and platforms

The Intelligent Speech Interaction SDK is currently available for the following languages and platforms:

Java
C++
Android
iOS
Go

The following languages and platforms are not supported by the SDK:

C#: No C# SDK is available. Use the RESTful API to integrate instead.
Douyin (TikTok) Mini Programs: No SDK support. Use the RESTful API instead.
uniapp: No SDK support. You can bridge to the native Android or iOS SDK through a native plugin, or use the RESTful API instead.
WeChat Mini Programs (streaming output): Streaming output is not supported. Use the RESTful API instead.

Voice speaker discrepancy between code and console

If the synthesized voice does not match what you configured in the console, the issue is caused by parameter priority rules:

SDK code parameters take priority over console settings. If you explicitly set a voice parameter in your code, it overrides the console configuration.
If no voice is specified in the code, the SDK uses its built-in default value (for example, aixia), not the voice configured in the console.
To use a console-configured voice (for example, Stanley): pass the same voice parameter explicitly in your code, or consult your SDK version's documentation to determine whether omitting the parameter inherits the console setting.

Custom wake word training for the offline wake word SDK

The offline wake word feature in Intelligent Speech Interaction does not support custom wake word fine-tuning through the platform. Specifically:

Wake word detection logic is part of an external encapsulation or multimodal development kit, not a directly configurable underlying model capability.
Custom wake word training is not available through the Intelligent Speech Interaction console or API.

The related algorithm has been open-sourced. If you require custom wake words, you must implement the detection logic at the code level or refer to open-source community solutions.

Resolve PHP SDK openapi-util version conflict

When installing the PHP SDK with Composer, you may encounter a dependency conflict error involving alibabacloud/openapi-util.

Cause: The composer.lock file has locked alibabacloud/openapi-util to a version (for example, 0.2.1) that does not satisfy the SDK's version requirement (for example, ^0.1.10).

Solution: Run the following command. The -W flag allows Composer to upgrade or downgrade packages that are already locked in composer.lock in order to resolve the conflict:

composer require alibabacloud/speechfiletranscriberlite-20211221 -W

Get the taskID in the Real-time Speech Recognition Go SDK

The taskID is found in the header.task_id field of the JSON data returned by the callback function. Example structure:

{
  "header": {
    "task_id": "fba2f3db6e7e42bc804c2d99295db109",
    "namespace": "SpeechTranscriber",
    "name": "TranscriptionStarted",
    "status": 20000000
  }
}

Parse the header.task_id field from the callback's JSON response to retrieve the task ID.

C/C++ SDK for gender recognition and ARM Docker image for nls-cloud-sdm MRCP

Gender recognition C/C++ SDK: A C or C++ SDK for gender recognition is not currently available.

ARM Docker image for nls-cloud-sdm MRCP module: The nls-cloud-sdm MRCP module does not currently provide an ARM architecture Docker image. If ARM support becomes available in the future, this documentation will be updated.

Java SDK questions

send() method parameters and usage

In the Java SDK, both Short Sentence Recognition and Real-time Speech Recognition provide three overloaded send() methods:

public void send(InputStream ins);
public void send(InputStream ins, int batchSize, int sleepInterval);
public void send(byte[] data);

When using these methods, you must send audio data to the server continuously and in real time.

The demo simulates a real-time audio stream by sending audio from a file. Typically, a chunk of audio data is sent every 100 ms or 200 ms (sleepInterval). The size of each chunk (batchSize) depends on the sampling rate. A long sending interval can cause high latency and connection drops, while a short interval can consume excessive server and network resources. You can experiment to find an optimal value.

For the second method, the ins parameter is the simulated audio stream, and you need to control the sending rate. For a 16 kHz sampling rate, send 3,200 bytes from ins every 100 ms. Example call:

public void send(ins, 3200, 100); // For 16 kHz audio

For the third method, the data parameter is the chunk of data to be sent in one go. You control the interval between calls in a loop. Example call:

recognizer.send(data); // 100 ms of audio data
try {
 Thread.sleep(100);
} catch (InterruptedException e) {
 e.printStackTrace();
}

Analyze latency with SDK logs

The following examples use Java SDK logs.

For Short Sentence Recognition, latency is the time from when an utterance ends to when the final recognition result is received.

Search for the keywords StopRecognition and RecognitionCompleted in the logs to find the log entries for when audio sending finishes and when recognition completes. The time difference is the latency recorded by the SDK. In the following logs, the latency is 984 - 844 = 140 ms.

14:24:44.844 DEBUG [           main] [c.a.n.c.transport.netty4.NettyConnection] thread:1,send:{"header":{"namespace":"SpeechRecognizer","name":"StopRecognition","message_id":"bccac69b505f4e2897d12940e5b38953","appkey":"FWpPCaVYDRp6J1rO","task_id":"8c5c28d9a40c4a229a5345c09bc9c968"}}
14:24:44.984 DEBUG [ntLoopGroup-2-1] [  c.a.n.c.p.asr.SpeechRecognizerListener] on message:{"header":{"namespace":"SpeechRecognizer","name":"RecognitionCompleted","status":20000000,"message_id":"2869e93427b9429190206123b7a3d397","task_id":"8c5c28d9a40c4a229a5345c09bc9c968","status_text":"Gateway:SUCCESS:Success."},"payload":{"result":"北京的天气。","duration":2959}}

For Text-to-Speech, the key metric is first-packet latency, which is the time from sending the synthesis request to receiving the first audio packet.

Search the logs for the keyword send. The time difference between this entry and the subsequent entry for a received audio packet is the first-packet latency. In the following logs, the latency is 1035 - 813 = 222 ms.

14:32:13.813 DEBUG [           main] [c.a.n.c.transport.netty4.NettyConnection] thread:1,send:{"payload":{"volume":50,"voice":"Ruoxi","sample_rate":8000,"format":"wav","text":"国家是由领土、人民、文化和政府四个要素组成的，国家也是政治地理学名词。从广义的角度，国家是指拥有共同的语言、文化、血统、领土、政府或者历史等的社会群体。从狭义的角度，国家是一定范围内的人群所形成的共同体形式。"},"context":{"sdk":{"name":"nls-sdk-java","version":"2.1.0"},"network":{"upgrade_cost":160,"connect_cost":212}},"header":{"namespace":"SpeechSynthesizer","name":"StartSynthesis","message_id":"6bf2a84444434c0299974d8242380d6c","appkey":"FWpPCaVYDRp6J1rO","task_id":"affa5c90986e4378907fbf49eddd283a"}}
14:32:14.035  INFO [ntLoopGroup-2-1] [  c.a.n.c.protocol.tts.SpeechSynthesizer] write array:6896

The logs for the Real-time Speech Recognition SDK are similar to those for Short Sentence Recognition. You can calculate the end-of-speech latency from the logs using the keywords StopTranscription and TranscriptionCompleted.
For RESTful API calls, the client-side logs do not show latency. You need to write custom code to measure it or check the server-side logs.

Missing com.alibaba JAR package

See Original SDK User Guide to install Alibaba Cloud SDK for Java.

Connection timeout when calling SpeechRecognizer.stop()

The state state:STATE_STOP_SENT indicates that a stop signal has been sent. The timeout error occurs when a confirmation is not received from the server in time. This can be caused by network instability or jitter, so check your network connection.

Troubleshoot the Audio File Transcription demo

Verify that the region where you enabled Intelligent Speech Interaction matches the region specified in your code. In the Intelligent Speech Interaction console, confirm that the current region (for example, China (Shanghai)) is the same as the region configured in your code. Also, check that the status of the required service (for example, Audio File Transcription) is Enabled.
```
private static String appKey = "Your AppKey";
private static String accessKeyId = "Your AccessKey ID";
private static String accessKeySecret = "Your AccessKey Secret";
private static String REGION_ID = "cn-shanghai";
```
Check the dependency versions for the fastjson and aliyun-java-sdk-core libraries. The Alibaba Cloud Java SDK core library must be version 3.5.0 or later. If you use version 4.0.0 or later, you may need to add corresponding third-party dependencies as indicated by error messages. For more information, see Audio File Transcription Java SDK.
```
<dependency>
    <groupId>com.aliyun</groupId>
    <artifactId>aliyun-java-sdk-core</artifactId>
    <version>3.7.1</version>
</dependency>
<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>fastjson</artifactId>
    <version>1.2.83</version>
</dependency>
```

Android SDK for Audio File Transcription

The Intelligent Speech Interaction Audio File Transcription service does not have an Android SDK.

For Android use cases involving long audio (more than 2 hours) or multi-language requirements, consider the following alternatives:

Recommended: Use the Model Studio platform with the fun-asr-mtl model. This model supports an Android SDK and multiple languages. Note that the valid_times parameter is not supported by this model.
Segmented submission with ISI: If you must use the Intelligent Speech Interaction service, you need to implement audio splitting on the client side and submit each segment separately.

Call SpeechRecognizer.stop() in Real-time Speech Recognition

Under normal circumstances, the stop signal is sent automatically, so you do not need to call stop() manually. The service returns error 41010120 if it does not receive audio data for 10 seconds. If you continuously stream audio data to the server, recognition runs continuously on the server side. You can set the enable_intermediate_result=true parameter to get intermediate recognition results in real time.
If you determine that an utterance has ended, you can also call stop() manually to stop sending data and receive the final recognition result.

Trigger the onTranscriptionComplete callback

Calling stop() triggers the onTranscriptionComplete callback. The state changes to STATE_STOP_SENT, and after the callback is processed, the state becomes STATE_COMPLETE.

Resolve the "invalid token" error

This error indicates that the token is invalid. Generate a new token. For more information, see Obtain a token.

SpeechSynthesizer creation failure (ClosedChannelException)

We recommend you test the demo in a new, empty project within a Java environment. If the demo runs without issues, the issue is likely in your client-side integration, not the SDK.

Find JAR packages for testing

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>com.alibaba.nls</groupId>
<artifactId>nls-sdk-java-examples</artifactId>
<version>2.0.0</version>
<relativePath>../pom.xml</relativePath>
</parent>

<groupId>com.alibaba.nls</groupId>
<artifactId>nls-example-tts</artifactId>

<dependencies>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.0.13</version>
</dependency>
<dependency>
<groupId>com.alibaba.nls</groupId>
<artifactId>nls-sdk-tts</artifactId>
<version>${sdk.version}</version>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.0.0</version>
<configuration>
<archive>
<manifest>
<mainClass>com.alibaba.nls.client.SpeechSynthesizerMultiThreadDemo</mainClass>
</manifest>
</archive>

<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>

</configuration>
<executions>
<execution>
<id>make-assembly</id> <!-- this is used for inheritance merges -->
<phase>package</phase> <!-- bind to the packaging phase -->
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>

Resolve the "hostname can't be null" error

If you are not using the demo, you must specify the hostname, which is the endpoint of the Alibaba Cloud speech service.

For endpoint details, see wss://nls-gateway-cn-shanghai.aliyuncs.com/ws/v1 or Real-time Speech Recognition Java SDK.

Resolve ClosedChannelException in speech synthesis

If a TaskId is not generated, the request did not reach the server. This typically points to a local environment issue. Check your local network and environment, and compare your implementation with the official demo.

Resolve the "JSONArray.iterator()" error

Ensure that all required dependency packages are included. Find and add the following two dependencies.

<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20170516</version>
</dependency>

<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.8.2</version>
</dependency>

Upload an audio file for recognition

The SDK example demonstrates how to upload an audio file for recognition by calling a RESTful API. For more information, see Short Sentence Recognition Java SDK.

// The SDK example demonstrates how to upload an audio file for recognition by calling a RESTful API.
String result = SpeechFlashRecognizer.submit(url, appKey, token);

Recognition failure with a custom audio file

Use the file command to check if your audio format meets the product requirements. The standard format for 8K models is 8 kHz sampling rate, 16 bit sample size, and mono WAV. The standard format for 16K models is 16 kHz sampling rate, 16 bit sample size, and mono WAV. For testing, you can use tools like Sox or FFmpeg to convert your file to the standard format. For production use, refer to the relevant product documentation.

See the API Reference for Real-time Speech Recognition for an example.

The Real-time Speech Recognition API supports the following audio format parameters: the sampling rate (sample_rate) supports 8000 and 16000, the audio encoding format (format) supports pcm, wav, opus, and opu, the sample size is 16 bit, and the number of channels is mono.

C++ SDK questions

UTF-8 encoding error in speech synthesis

If the input text contains Chinese characters and is not UTF-8 encoded, the start function call in the Text-to-Speech SDK will fail and return the error message Socket recv failed, errorCode: 0. An error code of 0 means the server closed the connection. In this case, verify that the text is UTF-8 encoded.

Obtain a token with the C++ SDK

For more information, see Obtain a token.

After downloading, you will get the NlsCommonSDK directory, which contains the lib/ and include/ subdirectories.

The downloaded C++ token SDK provides NlsCommonSDK, which includes functions for obtaining a token and for Audio File Transcription.

NlsCppSDK (versions 3.0.X and the older 2.X) does not include NlsCommonSDK. It supports Real-time Speech Recognition, Short Sentence Recognition, and Text-to-Speech. For these older NlsCppSDK versions, you must obtain a token separately as described above.

NlsCppSDK (versions 3.1.X and later) includes NlsCommonSDK. It supports obtaining a token, Audio File Transcription, Real-time Speech Recognition, Short Sentence Recognition, and Text-to-Speech, so you do not need to obtain a token separately.

Real-time Speech Recognition API call failure

This issue occurs rarely in C++ SDK versions 3.0 and earlier and can be ignored. In version 3.1 and later, this issue is likely caused by network problems in the runtime environment. Check your local network environment.
If a TaskId is not returned, it means the connection was dropped during the connection process. You do not need to repeatedly call the Real-time Speech Recognition API. Repeated calls are subject to concurrency limits and timeouts. If the concurrency limit is exceeded, an error is returned immediately. A WebSocket connection will automatically disconnect if no audio is received for 10 seconds, but it will return a TaskId.
This situation is often caused by network congestion. Use a packet capture tool to check for tcp retransmission. You can run the traceroute command or use an MTR tool on the client to test the link to nls-gateway-cn-shanghai.aliyuncs.com and determine if the network connection to the service endpoint is unstable.

Resolve the "appkey not set" error

A status of 40000003 indicates an invalid parameter. Double-check that the parameters you passed, such as appkey, voice, and url, are correct.

GCC version compatibility

Yes. On Linux, GCC 4.8.5 or later is supported. We have verified that the SDK compiles and runs successfully with GCC versions 4.8.5, 5.5.0, and 8.4.0.

Get g_akid and g_akSecret values

See Prepare an account.

Resolve the idle timeout error

This error occurs because the connection timed out and was automatically closed after 10 seconds of inactivity. Also, verify that the URI is correct: wss://nls-gateway-cn-shanghai.aliyuncs.com/ws/v1. If this happens, implement a retry mechanism to retry the request.
This can also happen if the server receives too many concurrent requests, preventing it from processing your request in time. We recommend retrying the request.

Custom timestamp for token generation

No. The token must be generated based on the system time.

Resolve DNS resolution failures

Older versions (3.0 and earlier): This issue is more likely to occur under high concurrency or when the system's DNS is busy. We recommend updating to version 3.1.X or retrying the request.
Newer versions (3.1 and later): A defense mechanism has been added for this issue. If it still occurs occasionally, it is due to a busy system DNS, and you should retry the request.

Error 10000002 when creating multiple channels

Error code 10000002 corresponds to the OpenSSL error Resource temporarily unavailable. This typically occurs when no data is sent to the server after a connection is established, causing a WebSocket timeout. The next expected directive is StartTranscription.

Compilation failure after changing _GLIBCXX_USE_CXX11_ABI

This parameter needs to be changed for the entire project, not just in CMakeLists.txt. For example, you also need to modify config/linux.thirdparty.debug.cmake and config/linux.thirdparty.release.cmake. Search the entire directory for _GLIBCXX_USE_CXX11_ABI and modify all instances.

Resolve the "missing nlsCommonSdk.dll" error

The nlsCommonSdk.dll is not part of SDK versions 3.1 and later.

Difference between NlsSdkCpp2.0 and NlsSdkCpp3.X

In NlsSdkCpp2.0, each request runs in a separate thread, and the API is synchronous.

In NlsSdkCpp3.X, a third-party library, libevent, handles all event messages, providing better concurrency performance. The API is asynchronous.

C11 standard and linking issues

By default, the project is set to _GLIBCXX_USE_CXX11_ABI=0. This parameter must be changed for the entire project. Search the entire directory for _GLIBCXX_USE_CXX11_ABI and modify all instances.

Resolve "invalid token" and call order errors

The message Meta:ACCESS_DENIED:The token '' is invalid!, start finised. means the token is invalid and authentication failed. Check if the appkey is incorrect, if a token was provided, or if the token has expired. The message status code: 10000011 indicates an incorrect order of API calls. This error is reported if you call the send method after the SDK has already closed the connection upon receiving a Failed or Complete event.

Resolve the "certificate verify failed" error

This indicates an SSL certificate verification failure. Check that the system time on the calling machine is correct.

Resolve error 40000004 after calling stop()

This may happen if the server receives a large number of requests at once, preventing it from processing a specific request in time. We recommend retrying the request.

DNS resolution failure during project integration

The SDK attempts DNS resolution using all enabled protocol families (IPv4, IPv6) on the device. Because nls-gateway-cn-shanghai.aliyuncs.com does not support IPv6, the resolution fails, causing the SDK to exit. You can disable the IPv6 protocol family on your device. Future C++ SDK versions will make this configurable.
We recommend upgrading to version 3.1.12 or later.

Network connection failure during project integration

This indicates a network connection failure. The logs show that the IP address resolved by DNS is unreachable, which can be confirmed by a ping test. This issue is caused by local DNS interception, which causes the SDK's internal libevent function evdns_getaddrinfo to obtain an incorrect IP address.

Solutions:

For versions before 3.1.12, manually replace evdns_getaddrinfo() with the system's getaddrinfo().
For version 3.1.12, modify CMakeLists.txt by adding add_definitions(-DNLS_USE_NATIVE_GETADDRINFO).
Versions 3.1.12 and later include the setDirectHost() API. You can perform DNS resolution outside the SDK and pass the correct IP address using this API.
Versions 3.1.13 and later have resolved this issue. If the problem persists at runtime, call the setUseSysGetAddrInfo(true) API.

TTS demo WAV file generation failure

The demo included in NlsSdkCpp3.X-20210629 is not suitable for Windows. Use the demo from the alibabacloud-nls-cpp-sdk repository on GitHub, which is intended for Windows.

Android SDK questions

Opus audio support in Real-time Speech Recognition

Short Sentence Recognition and Flash ASR support Opus data. However, Real-time Speech Recognition only supports PCM-encoded, 16-bit, mono audio. For more information, see API Reference.

Automatic reconnection on network disconnection

No, the SDK will not reconnect automatically after a network disconnection. You must implement a retry mechanism to reconnect.

Minimum supported Android version

The Intelligent Speech Interaction SDK supports all mainstream Android versions.

Query task status in Flash ASR

No, it does not.

Callbacks not received in Flash ASR

You need to verify the following:

Whether the resource files were copied successfully.
Whether the CommonUtils.copyAssetsData function was called.

Recognition issues with 8K models

Recognition accuracy depends on your parameter settings. Check the following:

Ensure the nls_config.put("sr_format", "pcm") parameter value is in lowercase.
Ensure the static variable SAMPLE_RATE is set to 8000: public final static int SAMPLE_RATE = 8000.
Ensure that the 8K model is correctly selected.

Resolve initialization error 240021

This error code indicates a FILE_ACCESS_FAIL error. Check the following:

Whether your app has file read/write permissions.
Whether the SDK configuration files have been copied. The following code shows an example of how to check if the copy operation is complete.

if (CommonUtils.copyAssetsData(this)) {
Log.i(TAG, "copy assets data done");
} else {
Log.i(TAG, "copy assets failed");
return;
}

Unnatural pauses in offline speech synthesis

You can try using SSML to control the pauses. For more information, see Introduction to SSML.

ANR when calling cancel in Text-to-Speech

The SDK methods are synchronous. Do not call them from the main thread.

Android version requirements for Real-time Speech Recognition

No, it does not.

Resolve the "audio recoder not init" error

You can troubleshoot this issue by checking the following:

Verify that AudioRecord is initialized correctly.
Check if there is an issue with the audio player.
Write a separate piece of code that only uses AudioRecord to test if recording works normally.

Application crashes on simulators

Simulators can have unexpected issues. We recommend testing on a physical device.

Callback delay and error 50000000

This indicates an internal server error. Please retry the request.

Resolve TTS error 140002

We recommend you check if the input text is valid.

Troubleshoot the Text-to-Speech SDK

Check if the following conditions are met:

Ensure that you have met all the prerequisites for the Text-to-Speech Android SDK. For more information, see Prerequisites.
Check if you have enabled the commercial version of the service.

iOS SDK questions

Background processing

The SDK itself does not restrict foreground or background execution. However, the iOS SDK sample project only supports foreground processing by default. To enable background processing, make the following changes:

In your project's Info.plist file, add the Required background modes setting. Under this setting, add an item with the value App plays audio or streams audio/video using AirPlay. The type for this setting is an Array.
In your recording module, do not stop recording when the app enters the background. In other words, do not call the stop recording method in the _appResignActive interface of NLSVoiceRecorder.m.
```
- (void)_appResignActive {
    // Do not stop recording when entering the background.
    // [self stop];
}
```

App fails to run on a real device

We recommend deleting the corresponding app from your device, running xcode clean, and then trying to run the app again. Additionally, check your code signing. If the signature is incorrect, revoke your original in-house certificate, create a new certificate and provisioning profile, re-sign your code, and package the app again.

Resolve MIC errors when running NuiSdk

Check if the recording device is currently being used by another application.

Header file import failure

This is usually caused by an incorrect SDK import. In your Xcode project's Frameworks, Libraries, and Embedded Content section, confirm that nuisdk.framework has been added and set to Embed & Sign. If it is already configured correctly, try changing the header import statement to #import <nuisdk/NeoNui.h>.

Resolve framework build errors

This may be due to a high Xcode version. Try changing your project's Validate Workspace setting to Yes and then recompiling.

Using the Legacy Build System

We recommend changing your project's Validate Workspace setting to Yes and then recompiling.

Resolve "Embed & Sign" errors

Check if the header file was imported correctly as described in the documentation. The correct import format is #import <nuisdk/NeoNui.h>.

Resolve "Unsupported Architectures" error on submission

This is likely caused by the inclusion of simulator architectures. You can use the following method to check the framework's architectures and remove the ones for the simulator.

Navigate to the framework's directory.
Run the command lipo -info xxxFramework to view the framework's architectures. If it includes simulator architectures, you must remove them before submission.

Resolve microphone conflicts with TRTC

Try using TRTC's audio/video stream, get the MediaStreamTrack object using localStream.getAudioTrack, convert it to an ASR-compatible audio stream, and then send a request using the speech recognition SDK.

Audio playback in the background

Audio playback is not possible after the app has been terminated.

Get timestamp information in TTS

By default, the SDK does not return timestamps. To get timestamp information, use the setparamTts method to set enable_subtitle. For more information, see API Reference.

Save synthesized audio to a file

You can save the synthesized data to a file within the onNuiTtsUserdataCallback interface. The format of the synthesized audio is determined by the output parameter, for example, [nls_config setObject:@"mp3" forKey:@"encode_type"].

Supported formats are PCM, WAV, and MP3. Note that the audio player in the Text-to-Speech documentation examples does not support the MP3 format, and using it directly may produce noise. However, you can listen to the saved MP3 file with a player that supports MP3. If the synthesized audio seems to be missing words, the synthesis speed of the selected voice (such as xiaoyun) might be too fast, causing some data to be discarded before being written to the file. You can call fflush after fwrite to ensure all data is written to the file.

- (void)onNuiTtsUserdataCallback:(NSString *)info infoLen:(int)infoLen buffer:(NSData *)data len:(int)len {
    fwrite(data.bytes, 1, data.length, tts_file);
    fflush(tts_file);
}

App termination on rapid playback

Online synthesis requires a network connection, and network conditions can directly affect API response times. If your business requires quickly stopping one task and starting the next, you can adjust the network timeout according to your business needs.

Resolve "Undefined symbols" error with Flutter

Open the Podfile in your iOS project, modify the post_install do |installer| section with the following code, and then rebuild the project.

post_install do |installer|
  installer.pods_project.targets.each do |target|
    target.build_configurations.each do |config|
      config.build_settings['EXCLUDED_ARCHS[sdk=iphonesimulator*]'] = 'arm64'
      config.build_settings['CLANG_CXX_LANGUAGE_STANDARD'] = 'gnu++17'
    end
  end
end

Audio plays as noise

First, confirm the synthesized audio format (PCM, WAV, or MP3). If the saved audio stream is in MP3 format but the player does not support it, you will hear noise. Try a different media player. Some users also report noise only at the end of the audio. Use a tool like BeyondCompare to inspect the audio stream and check for accidentally included log data.

WebSocket

WebSocket audio body instructions

The instruction consists of a header and a payload. The header has a uniform format, while the payload format varies by instruction.

The server can support sending 3,200 Bytes or 1,600 Bytes at a time. When sending data, ensure the audio is not corrupted during transmission.

Resolve the "Invalid message id" error

When using WebSocket, you are constructing the request yourself. The message_id is a 32-character unique ID that you generate.

You need to change the message_id to a 32-character hexadecimal string and check if the sent message conforms to the required format.

Connection closed by server without error

If the Real-time Speech Recognition WebSocket connection is dropped, we recommend you do the following:

Check if the token was generated correctly.
Confirm that the client is sending the audio stream correctly.

If you do not receive an error message, try setting a status code. The default value is 20000000.

Get message_id and task_id in WebSocket

You generate the message_id and task_id on the client side.

Both message_id and task_id can be 32-character unique IDs. Within a single connection, the task_id should remain constant to uniquely identify the connection. We recommend generating a new random message_id for each message you send to uniquely identify that message.

Service terminates automatically

This indicates that the system has received your audio. Both WSS and HTTP protocols can achieve Real-time Speech Recognition, provided the protocol and parameters are correct. You must manage the WSS connection to send data continuously. The Intelligent Speech Interaction documentation provides a Java backend code example that simulates a real-time audio stream by reading from a local file. For more information, see Sample Code.