Omni-modal

更新时间:
复制 MD 格式

Select the right model for multimodal understanding, audio and video analysis, voice conversation, content moderation, voice translation, and other omni-modal use cases.

Use cases

Omni-modal models can simultaneously understand text, audio, images, and video, and produce both text and speech output. Three model families are available: Qwen3.5-Omni (flagship, fullest capabilities), Qwen3-Omni-Flash (lightweight, lower cost, supports deep thinking), and Qwen3.5-Livetranslate (purpose-built translation, out-of-the-box). Select a model based on your use case:

Use case

Recommended model

User guide

Real-time voice/video conversation: Interact with AI through a microphone and camera (voice assistants, customer service bots, visual Q&A, live-stream analysis)

Qwen3.5-Omni Realtime (WebSocket)

Qwen-Omni-Realtime

Audio and video analysis: Upload audio or video files for AI-generated text or speech responses (video content moderation, meeting transcription, caption generation)

Qwen3.5-Omni (HTTP)

Non-real-time (Qwen-Omni)

Lightweight audio and video analysis: Analyze uploaded audio or video files at lower cost (single input capped at 150 seconds). Supports thinking mode (deep thinking) with text-only output

Qwen3-Omni-Flash (HTTP)

Non-real-time (Qwen-Omni)

Real-time voice translation: Simultaneous interpretation with approximately 3-second latency, supporting 60 languages (live interpretation, multilingual meetings)

Qwen3.5-Livetranslate (WebSocket)

Real-time audio and video translation - Qwen

Audio and video file translation: Upload audio or video files and translate them into a target language (video dubbing, podcast translation)

Qwen3-Livetranslate (HTTP)

Audio and video file translation (Qwen)

Voice cloning: Provide a reference audio clip and the AI generates speech responses in that voice

Qwen3.5-Omni Plus / Flash (HTTP / WebSocket)

Non-real-time (Qwen-Omni)

  • For content analysis, Qwen3.5-Omni supports audio up to 3 hours and video up to 1 hour per request.

  • Function calling is supported by Qwen3.5-Omni (WebSocket + HTTP) and Qwen3-Omni-Flash (HTTP only).

  • Web search is supported by Qwen3.5-Omni only (HTTP / WebSocket). Web search and function calling cannot be enabled at the same time.

Translation

Omni-modal models support voice translation, with different models suited to different translation scenarios.

Note

For quick setup, use Qwen3.5-Livetranslate (60 languages, approximately 3-second latency, out-of-the-box). For the highest quality and broadest language coverage, use Qwen3.5-Omni (29 output languages, with web search and term injection). For cost-sensitive workloads, use Qwen3-Omni-Flash (11 output languages, lower cost).

Supported languages

Language

Qwen3.5-Livetranslate

Qwen3-Livetranslate

Qwen3.5-Omni

Qwen3-Omni-Flash

English

Supported

Supported

Supported

Supported

Chinese (Mandarin)

Supported

Supported

Supported

Supported

Cantonese

Supported

Supported

Supported Text only

Supported

Sichuanese

Supported

Supported

Supported

Supported

Shanghainese

Supported

Supported

Supported

Supported

Beijingese

Supported

Supported

Supported

Supported

Tianjinese

Supported

Supported

Supported

Supported

Nanjingese

Unsupported Text only

Unsupported

Supported

Supported

Shaanxi dialect

Unsupported Text only

Unsupported

Supported

Supported

Hokkien

Unsupported Text only

Unsupported

Supported

Supported

French

Supported

Supported

Supported

Supported

German

Supported

Supported

Supported

Supported

Russian

Supported

Supported

Supported

Supported

Italian

Supported

Supported

Supported

Supported

Spanish

Supported

Supported

Supported

Supported

Portuguese

Supported

Supported

Supported

Supported

Japanese

Supported

Supported

Supported

Supported

Korean

Supported

Supported

Supported

Supported

Thai

Supported

Supported Text only

Supported

Supported

Indonesian

Supported

Supported Text only

Supported

Unsupported

Vietnamese

Supported

Supported Text only

Supported

Unsupported

Arabic

Supported

Supported Text only

Supported

Unsupported

Hindi

Supported

Supported Text only

Supported

Unsupported

Turkish

Supported

Supported Text only

Supported

Unsupported

Finnish

Unsupported Text only

Unsupported

Supported

Unsupported

Polish

Unsupported Text only

Unsupported

Supported

Unsupported

Dutch

Unsupported Text only

Unsupported

Supported

Unsupported

Czech

Unsupported Text only

Unsupported

Supported

Unsupported

Urdu

Unsupported Text only

Unsupported

Supported

Unsupported

Tagalog

Unsupported Text only

Unsupported

Supported

Unsupported

Swedish

Unsupported Text only

Unsupported

Supported

Unsupported

Danish

Unsupported Text only

Unsupported

Supported

Unsupported

Hebrew

Unsupported Text only

Unsupported

Supported

Unsupported

Icelandic

Unsupported Text only

Unsupported

Supported

Unsupported

Malay

Unsupported Text only

Unsupported

Supported

Unsupported

Norwegian

Unsupported Text only

Unsupported

Supported

Unsupported

Persian

Unsupported Text only

Unsupported

Supported

Unsupported

Greek

Supported Text only

Supported Text only

Unsupported

Unsupported

"Supported" indicates both speech and text output. "Text only" indicates text output without speech for that language.

Qwen3.5-Livetranslate supports 60 languages in total: 29 with both audio and text output, and 31 with text-only output.

Qwen3.5-Omni supports 113 input languages and dialects.

The legacy qwen-omni-turbo supports only Chinese and English.

Recommended models

Model ID

API

Input

Function calling

Web search

Thinking mode

qwen3.5-omni-plus-realtime

WebSocket

Text, audio, images

Supported

Supported

Unsupported

qwen3.5-omni-plus

HTTP

Text, audio, images, video

Supported

Supported

Unsupported

qwen3.5-omni-flash-realtime

WebSocket

Text, audio, images

Supported

Supported

Unsupported

qwen3.5-omni-flash

HTTP

Text, audio, images, video

Supported

Supported

Unsupported

qwen3-omni-flash-realtime

WebSocket

Text, audio, images, video

Unsupported

Unsupported

Unsupported

qwen3-omni-flash

HTTP

Text, audio, images, video

Supported

Unsupported

Supported

qwen3-livetranslate-flash-realtime

WebSocket

Audio, images

Unsupported

Unsupported

Unsupported

qwen3-livetranslate-flash

HTTP

Audio, video

Unsupported

Unsupported

Unsupported

All models

Qwen3.5-Omni

Model ID

API

Input

Function calling

Web search

Thinking mode

qwen3.5-omni-plus-realtime

WebSocket

Text, audio, images, video

Supported

Supported

Unsupported

qwen3.5-omni-plus-realtime-2026-03-15

WebSocket

Text, audio, images, video

Supported

Supported

Unsupported

qwen3.5-omni-plus

HTTP

Text, audio, images, video

Supported

Supported

Unsupported

qwen3.5-omni-plus-2026-03-15

HTTP

Text, audio, images, video

Supported

Supported

Unsupported

qwen3.5-omni-flash-realtime

WebSocket

Text, audio, images, video

Supported

Supported

Unsupported

qwen3.5-omni-flash-realtime-2026-03-15

WebSocket

Text, audio, images, video

Supported

Supported

Unsupported

qwen3.5-omni-flash

HTTP

Text, audio, images, video

Supported

Supported

Unsupported

qwen3.5-omni-flash-2026-03-15

HTTP

Text, audio, images, video

Supported

Supported

Unsupported

Qwen3-Omni

Model ID

API

Input

Function calling

Web search

Thinking mode

qwen3-omni-flash-realtime

WebSocket

Text, audio, images, video

Unsupported

Unsupported

Unsupported

qwen3-omni-flash-realtime-2025-12-01

WebSocket

Text, audio, images, video

Unsupported

Unsupported

Unsupported

qwen3-omni-flash-realtime-2025-09-15

WebSocket

Text, audio, images, video

Unsupported

Unsupported

Unsupported

qwen3-omni-flash

HTTP

Text, audio, images, video

Supported

Unsupported

Supported

qwen3-omni-flash-2025-12-01

HTTP

Text, audio, images, video

Supported

Unsupported

Supported

qwen3-omni-flash-2025-09-15

HTTP

Text, audio, images, video

Supported

Unsupported

Supported

Qwen3.5-Livetranslate

Model ID

API

Input

Languages

qwen3.5-livetranslate-flash-realtime

WebSocket

Audio

60

qwen3.5-livetranslate-flash-realtime-2026-05-19

WebSocket

Audio

60

Qwen3-Livetranslate

Model ID

API

Input

Languages

qwen3-livetranslate-flash-realtime

WebSocket

Audio

18

qwen3-livetranslate-flash-realtime-2025-09-22

WebSocket

Audio

18

qwen3-livetranslate-flash

HTTP

Audio, video

18

qwen3-livetranslate-flash-2025-12-01

HTTP

Audio, video

18

Legacy models

The following models are no longer updated. Use Qwen3.5-Omni for new projects.

Model ID

Input

API

qwen2.5-omni-7b

Text, audio, images, video

HTTP

qwen-omni-turbo

Text, audio, images, video

HTTP

qwen-omni-turbo-latest

Text, audio, images, video

HTTP

qwen-omni-turbo-2025-03-26

Text, audio, images, video

HTTP

qwen-omni-turbo-realtime

Text, audio

WebSocket

qwen-omni-turbo-realtime-latest

Text, audio

WebSocket

qwen-omni-turbo-realtime-2025-05-08

Text, audio

WebSocket