Multilingual conversation

更新时间:
复制 MD 格式

This feature provides single-language conversations for popular languages across Europe, the Americas, and Asia. Select a language when you create a multimodal application, and the system configures the most suitable speech models, prompts, and features automatically.

Supported languages

The following table summarizes language support for speech recognition, speech synthesis, and LLM features.

LanguageApplication typeSpeech recognitionSpeech synthesisLLM features
Chinese (Mandarin)Multimodal Application, Voice ApplicationSupportedSupportedAll features available
EnglishMultimodal Application, Voice ApplicationSupportedSupportedChit-chat and knowledge-based Q&A, web search, semantic rejection and conversation ending detection, conversational follow-ups, knowledge base, system prompts and custom prompts, custom plugins, custom MCPs, agent integration (Model Studio "My Apps", third-party applications, video calls, visual Q&A, and photo translation)
French, German, Spanish, Italian, Russian, Portuguese, Korean, Japanese, Thai, Indonesian, MalayMultimodal ApplicationSupported (hotword not supported)SupportedChit-chat and knowledge-based Q&A (web search is not currently supported), system prompts and custom prompts, custom plugins, custom MCPs, agent integration (Model Studio "My Apps", third-party applications, video calls, visual Q&A, and photo translation)
CantoneseMultimodal ApplicationSupported (hotword not supported)SupportedSame as the group above
Vietnamese, FilipinoMultimodal ApplicationSupported (hotword not supported)Requires integration with a third-party modelSame as the group above
Arabic, Hindi, Turkish, Ukrainian, Czech, Danish, Finnish, Icelandic, Norwegian, Polish, Dutch, SwedishMultimodal ApplicationSupported (hotword not supported)Requires integration with a third-party modelSame as the group above

Usage notes

  • Only single-language conversations are supported. Mixed-language conversations (code-switching) are not supported, with the exception of Chinese-English.

  • To support multiple languages on a single device, create a separate application for each language. Switch between applications by changing the application ID to deliver the correct language to end users.

  • Billing for all supported languages follows the same logic as for Chinese.

ASR and TTS model recommendations

The following table lists the available and recommended speech models for each language.

Note

An empty cell indicates that no native speech model is available for that language. In this case, you must call a third-party speech model.

LanguageRecommended ASR model (Performance-optimized)Available ASRRecommended TTS and timbre (Performance-optimized)Available TTS
ChineseFun-ASR Real-time Speech Recognition, Qwen3-ASR-Flash-RealtimeFun-ASR Real-time Speech Recognition, Qwen3-ASR-Flash-Realtime, Paraformer Speech Recognition, Multimodal Interaction Lightweight Speech RecognitionCosyVoice-v3-Flash Large Model LonganhuanCosyVoice-v3-Flash Large Model, Qwen3-TTS-Flash-Realtime, CosyVoice-v3-Plus Large Model, CosyVoice-v2 Large Model, Sambert Speech Synthesis Model, Multimodal Interaction Lightweight Speech Synthesis
EnglishFun-ASR Real-time Speech Recognition, Qwen3-ASR-Flash-RealtimeFun-ASR Real-time Speech Recognition, Qwen3-ASR-Flash-Realtime, Paraformer Speech Recognition (including lightweight version)CosyVoice-v3-Flash Large Model LonganhuanCosyVoice-v3-Flash Large Model, Qwen3-TTS-Flash-Realtime, CosyVoice-v3-Plus Large Model, CosyVoice-v2 Large Model, Sambert Speech Synthesis Model, Multimodal Interaction Lightweight Speech Synthesis
JapaneseFun-ASR Real-time Speech Recognition, Qwen3-ASR-Flash-RealtimeFun-ASR Real-time Speech Recognition, Qwen3-ASR-Flash-Realtime, Paraformer Speech Recognition, Multimodal Interaction Lightweight Speech RecognitionQwen3-TTS-Flash-Realtime QianyueQwen3-TTS-Flash-Realtime, Multimodal Interaction Lightweight Speech Synthesis
KoreanQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-Realtime, Paraformer Speech Recognition, Multimodal Interaction Lightweight Speech RecognitionQwen3-TTS-Flash-Realtime QianyueQwen3-TTS-Flash-Realtime, Multimodal Interaction Lightweight Speech Synthesis
FrenchQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-Realtime, Paraformer Speech Recognition, Multimodal Interaction Lightweight Speech RecognitionQwen3-TTS-Flash-Realtime QianyueQwen3-TTS-Flash-Realtime, Sambert Speech Synthesis Model
GermanQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-Realtime, Paraformer Speech Recognition, Multimodal Interaction Lightweight Speech RecognitionQwen3-TTS-Flash-Realtime QianyueQwen3-TTS-Flash-Realtime, Sambert Speech Synthesis Model
ItalianQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-RealtimeQwen3-TTS-Flash-Realtime QianyueQwen3-TTS-Flash-Realtime, Sambert Speech Synthesis Model
SpanishQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-RealtimeQwen3-TTS-Flash-Realtime QianyueQwen3-TTS-Flash-Realtime, Sambert Speech Synthesis Model
PortugueseQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-RealtimeQwen3-TTS-Flash-Realtime QianyueQwen3-TTS-Flash-Realtime
RussianQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-Realtime, Paraformer Speech Recognition, Multimodal Interaction Lightweight Speech RecognitionQwen3-TTS-Flash-Realtime QianyueQwen3-TTS-Flash-Realtime
ThaiQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-RealtimeSambert Speech Synthesis Model WaanSambert Speech Synthesis Model
IndonesianQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-RealtimeSambert Speech Synthesis Model IndahSambert Speech Synthesis Model
FilipinoQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-Realtime
CantoneseFun-ASR Real-time Speech Recognition, Qwen3-ASR-Flash-RealtimeFun-ASR Real-time Speech Recognition, Qwen3-ASR-Flash-RealtimeQwen3-TTS-Flash-Realtime QianyueQwen3-TTS-Flash-Realtime, CosyVoice-v3-Flash Large Model, Multimodal Interaction Lightweight Speech Synthesis
ArabicQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-RealtimeQwen3-TTS-Flash-Realtime
HindiQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-Realtime
TurkishQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-RealtimeQwen3-TTS-Flash-Realtime
UkrainianQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-Realtime
CzechQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-Realtime
DanishQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-Realtime
FinnishQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-Realtime
IcelandicQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-Realtime
NorwegianQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-Realtime
PolishQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-Realtime
DutchRequires integration with a third-party modelRequires integration with a third-party model
SwedishQwen3-ASR-Flash-RealtimeQwen3-ASR-Flash-Realtime

Configure a multilingual conversation

  1. In the console, click Create Multimodal Application.

Create Multimodal Application

  1. Select a Language and Category to create the application. The system automatically selects the most suitable models, prompts, and features.

Select language and category

  1. Adjust the prompt and other settings for your business scenario. Features that are unavailable for the selected language are hidden.

Adjust prompt settings

  1. Click Run and select a timbre from the panel on the right.

Select timbre

  1. Test the application in the right-side panel, then proceed with publishing, development integration, and purchase.