This article answers frequently asked questions about using Intelligent Speech Interaction.
This FAQ is divided into the following categories:
-
Features
-
How do I use Alibaba Cloud RTC to call the speech recognition service?
-
What ports are used for speech recognition and speech synthesis in Intelligent Speech Interaction?
-
Is there a limit on the number of projects that I can create in Intelligent Speech Interaction?
-
Does a single project on the console support multiple base models?
-
Are there any technologies or plug-ins for H5 voice wake-up?
-
What are the domain names for Intelligent Speech Interaction and their corresponding IP addresses?
-
Are there tutorials available for Intelligent Speech Interaction?
-
-
Performance
-
SDK Usage
-
Is the source code for Intelligent Speech Interaction available?
-
Why do I still receive data from the server after the audio stream is interrupted?
-
What are the possible reasons for an initialization failure?
-
What are the possible reasons for a recognition start failure?
-
Why are there no recognition results after starting recognition?
-
-
Billing
Features
Use Alibaba Cloud RTC for speech recognition
You can integrate Intelligent Speech Interaction with Alibaba Cloud RTC. For more information, see Process audio data.
Ports for speech recognition and synthesis
The service uses port 80 for the HTTP protocol, and port 443 for the HTTPS and WebSocket protocols.
Console update latency for activation and upgrades
Available concurrency is updated in real time. The previous day's concurrency is updated on a T+1 basis. Metering data on the console is also updated on a T+1 basis.
Project creation limit
No. There is currently no limit.
Multiple base models per project
No. Each Appkey maps to a single model.
H5 voice wake-up technologies and plug-ins
This feature is not supported on-device but is supported in the cloud. For on-device implementations, a hybrid approach is common. A wake-up model runs on the device, and after a trigger, a cloud-based model performs a secondary confirmation to reduce false positives.
Check ASR service usage
You can monitor the usage of Intelligent Speech Interaction on the console, including metrics such as duration, number of calls, and concurrency. This data helps you determine if your current usage is reasonable and decide whether to adjust your service capacity. For more information, see Service usage.
Domain names and IP addresses
The domain names for Intelligent Speech Interaction are nls-meta.cn-shanghai.aliyuncs.com and nls-gateway-cn-shanghai.aliyuncs.com. The service uses ports 80 and 443, and supports the HTTPS and WebSocket protocols. You can find the corresponding IP addresses by running the dig nls-gateway-cn-shanghai.aliyuncs.com or dig nls-meta.cn-shanghai.aliyuncs.com command. These are dynamic IP addresses that may change. You must monitor these IP addresses for any updates.
Find AccessKey ID and AccessKey Secret
See Activate the service to obtain your AccessKey ID and AccessKey Secret.
Find your UID
-
Log on to the Alibaba Cloud console.
-
Hover over your profile avatar in the upper-right corner. The value labeled Account ID in the panel is your UID.
Tutorials
The following video tutorials are available:
Performance
Exceeding the concurrency limit
Exceeding the concurrency limit can cause the following issues:
-
Your logs will show a large number of timeout errors with the status code 40000005, which indicates too many requests.
-
High concurrency can cause connections to the ASR or TTS service to be dropped.
Free-tier limitations
-
For Short Speech Recognition and Real-Time Speech Recognition, you can send a maximum of 2 concurrent streams.
-
For Audio File Transcription, new users can transcribe up to 2 hours of audio for free every 24 hours during a 3-month trial period. After you use the free quota, it renews after 24 hours.
How is concurrency calculated?
Concurrency is the number of requests that are being processed simultaneously for a single account (Alibaba Cloud UID).
A typical voice request remains active for a period of time. For example, if you create a speech recognition request and continuously send audio data to the server, the concurrency is 1. If you create a second request while the first one is still being processed, the server is handling two requests from your account at the same time, and the concurrency becomes 2.
SDK usage
Source code availability
Yes, the SDK protocol and source code are open-source. You can find them on GitHub. We provide open-source versions for C++, Java, and Python.
If you want to study the protocol architecture, you can refer to the source code on GitHub. However, we recommend using the official SDK for integration. For more information, see Get started. We provide limited support for issues related to custom API implementations.
Multi-process support for Android SDK
No. This is not currently supported.
TTS SDK playback listeners and modules
Playback functionality is not part of the SDK. The SDK only provides events related to synthesis.
Continuous audio data transmission
Yes, audio data must be sent continuously.
If the server does not receive audio data within a specific period (10 seconds), the connection times out and closes, returning error code 40000004. To send data again, the client must initiate a new request.
Receiving data after stream interruption
If the connection is interrupted due to a timeout, the server will continue to process and return results for any data that was already in its buffer. However, the recognition result for the complete sentence will be incorrect.
Reasons for initialization failure
Verify that you are using the correct AccessKey ID and AccessKey Secret to generate an Access Token. Also, ensure that all required parameters, such as Appkey and Access Token, are correct.
Reasons for recognition start failure
The SDK uses a singleton pattern. Ensure the previous recognition process has finished before you start a new one.
No recognition results
Confirm the following:
-
Initialization was successful.
-
The call to start recognition was successful, and the vad_mode parameter is used correctly.
-
You are receiving audio status callbacks, and recording has started correctly.
If you still receive no results after checking these points, an EVENT_ASR_ERROR event has likely occurred. Use the error code included in the event to identify the problem.
Billing
Maximum concurrency for real-time speech recognition
The trial version includes 2 free concurrent streams and is valid for 3 months. After you activate the commercial version, you get 200 concurrent streams by default. You can purchase additional concurrency packs if needed.
Requirements for English recognition
English recognition requires you to purchase both an extension pack and additional concurrency.