This feature provides single-language conversations for popular languages across Europe, the Americas, and Asia. Select a language when you create a multimodal application, and the system configures the most suitable speech models, prompts, and features automatically.
Supported languages
The following table summarizes language support for speech recognition, speech synthesis, and LLM features.
| Language | Application type | Speech recognition | Speech synthesis | LLM features |
| Chinese (Mandarin) | Multimodal Application, Voice Application | Supported | Supported | All features available |
| English | Multimodal Application, Voice Application | Supported | Supported | Chit-chat and knowledge-based Q&A, web search, semantic rejection and conversation ending detection, conversational follow-ups, knowledge base, system prompts and custom prompts, custom plugins, custom MCPs, agent integration (Model Studio "My Apps", third-party applications, video calls, visual Q&A, and photo translation) |
| French, German, Spanish, Italian, Russian, Portuguese, Korean, Japanese, Thai, Indonesian, Malay | Multimodal Application | Supported (hotword not supported) | Supported | Chit-chat and knowledge-based Q&A (web search is not currently supported), system prompts and custom prompts, custom plugins, custom MCPs, agent integration (Model Studio "My Apps", third-party applications, video calls, visual Q&A, and photo translation) |
| Cantonese | Multimodal Application | Supported (hotword not supported) | Supported | Same as the group above |
| Vietnamese, Filipino | Multimodal Application | Supported (hotword not supported) | Requires integration with a third-party model | Same as the group above |
| Arabic, Hindi, Turkish, Ukrainian, Czech, Danish, Finnish, Icelandic, Norwegian, Polish, Dutch, Swedish | Multimodal Application | Supported (hotword not supported) | Requires integration with a third-party model | Same as the group above |
Usage notes
Only single-language conversations are supported. Mixed-language conversations (code-switching) are not supported, with the exception of Chinese-English.
To support multiple languages on a single device, create a separate application for each language. Switch between applications by changing the application ID to deliver the correct language to end users.
Billing for all supported languages follows the same logic as for Chinese.
ASR and TTS model recommendations
The following table lists the available and recommended speech models for each language.
An empty cell indicates that no native speech model is available for that language. In this case, you must call a third-party speech model.
| Language | Recommended ASR model (Performance-optimized) | Available ASR | Recommended TTS and timbre (Performance-optimized) | Available TTS |
| Chinese | Fun-ASR Real-time Speech Recognition, Qwen3-ASR-Flash-Realtime | Fun-ASR Real-time Speech Recognition, Qwen3-ASR-Flash-Realtime, Paraformer Speech Recognition, Multimodal Interaction Lightweight Speech Recognition | CosyVoice-v3-Flash Large Model Longanhuan | CosyVoice-v3-Flash Large Model, Qwen3-TTS-Flash-Realtime, CosyVoice-v3-Plus Large Model, CosyVoice-v2 Large Model, Sambert Speech Synthesis Model, Multimodal Interaction Lightweight Speech Synthesis |
| English | Fun-ASR Real-time Speech Recognition, Qwen3-ASR-Flash-Realtime | Fun-ASR Real-time Speech Recognition, Qwen3-ASR-Flash-Realtime, Paraformer Speech Recognition (including lightweight version) | CosyVoice-v3-Flash Large Model Longanhuan | CosyVoice-v3-Flash Large Model, Qwen3-TTS-Flash-Realtime, CosyVoice-v3-Plus Large Model, CosyVoice-v2 Large Model, Sambert Speech Synthesis Model, Multimodal Interaction Lightweight Speech Synthesis |
| Japanese | Fun-ASR Real-time Speech Recognition, Qwen3-ASR-Flash-Realtime | Fun-ASR Real-time Speech Recognition, Qwen3-ASR-Flash-Realtime, Paraformer Speech Recognition, Multimodal Interaction Lightweight Speech Recognition | Qwen3-TTS-Flash-Realtime Qianyue | Qwen3-TTS-Flash-Realtime, Multimodal Interaction Lightweight Speech Synthesis |
| Korean | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime, Paraformer Speech Recognition, Multimodal Interaction Lightweight Speech Recognition | Qwen3-TTS-Flash-Realtime Qianyue | Qwen3-TTS-Flash-Realtime, Multimodal Interaction Lightweight Speech Synthesis |
| French | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime, Paraformer Speech Recognition, Multimodal Interaction Lightweight Speech Recognition | Qwen3-TTS-Flash-Realtime Qianyue | Qwen3-TTS-Flash-Realtime, Sambert Speech Synthesis Model |
| German | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime, Paraformer Speech Recognition, Multimodal Interaction Lightweight Speech Recognition | Qwen3-TTS-Flash-Realtime Qianyue | Qwen3-TTS-Flash-Realtime, Sambert Speech Synthesis Model |
| Italian | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime | Qwen3-TTS-Flash-Realtime Qianyue | Qwen3-TTS-Flash-Realtime, Sambert Speech Synthesis Model |
| Spanish | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime | Qwen3-TTS-Flash-Realtime Qianyue | Qwen3-TTS-Flash-Realtime, Sambert Speech Synthesis Model |
| Portuguese | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime | Qwen3-TTS-Flash-Realtime Qianyue | Qwen3-TTS-Flash-Realtime |
| Russian | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime, Paraformer Speech Recognition, Multimodal Interaction Lightweight Speech Recognition | Qwen3-TTS-Flash-Realtime Qianyue | Qwen3-TTS-Flash-Realtime |
| Thai | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime | Sambert Speech Synthesis Model Waan | Sambert Speech Synthesis Model |
| Indonesian | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime | Sambert Speech Synthesis Model Indah | Sambert Speech Synthesis Model |
| Filipino | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime | — | — |
| Cantonese | Fun-ASR Real-time Speech Recognition, Qwen3-ASR-Flash-Realtime | Fun-ASR Real-time Speech Recognition, Qwen3-ASR-Flash-Realtime | Qwen3-TTS-Flash-Realtime Qianyue | Qwen3-TTS-Flash-Realtime, CosyVoice-v3-Flash Large Model, Multimodal Interaction Lightweight Speech Synthesis |
| Arabic | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime | — | Qwen3-TTS-Flash-Realtime |
| Hindi | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime | — | — |
| Turkish | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime | — | Qwen3-TTS-Flash-Realtime |
| Ukrainian | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime | — | — |
| Czech | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime | — | — |
| Danish | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime | — | — |
| Finnish | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime | — | — |
| Icelandic | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime | — | — |
| Norwegian | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime | — | — |
| Polish | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime | — | — |
| Dutch | Requires integration with a third-party model | Requires integration with a third-party model | — | — |
| Swedish | Qwen3-ASR-Flash-Realtime | Qwen3-ASR-Flash-Realtime | — | — |
Configure a multilingual conversation
In the console, click Create Multimodal Application.

Select a Language and Category to create the application. The system automatically selects the most suitable models, prompts, and features.

Adjust the prompt and other settings for your business scenario. Features that are unavailable for the selected language are hidden.

Click Run and select a timbre from the panel on the right.

Test the application in the right-side panel, then proceed with publishing, development integration, and purchase.