This topic describes the latest updates and related documentation for Tongyi Tingwu.
October 30, 2025
|
Feature Name |
Description |
|
Translation upgraded with large language model |
Both offline and online translation now use the latest version of qwen-mt to improve translation quality. |
September 30, 2025
|
Feature Name |
Description |
|
Support for fun-asr |
Use the fun-asr parameter in the ASR domain model Transcription.Model and set Input.SourceLanguage to multilingual to invoke Bailing’s latest ASR large language model. |
|
Disable sensitive words filter |
Offline and online ASR now support disabling the sensitive words filter. |
July 30, 2025
|
Feature Name |
Description |
|
Selectable large language models |
When creating a task, choose from qwen-plus, qwq, or ccai-pro models in addition to the default large language model combination. |
|
Optimized to-do and Q&A extraction |
Combine two LLMs and apply multi-round large model processing to improve extraction of to-do items and Q&A pairs from complex or long conversations. |
June 30, 2025
|
Feature Name |
Description |
|
Adjustable chapter overview |
Adjust title length and chapter granularity in the chapter overview for a more personalized experience. |
|
Output language setting for mind maps and chapter overview |
Large model outputs are displayed uniformly in either Chinese or English, regardless of the original audio or video language. This simplifies understanding of content in less common languages. |
May 30, 2025
|
Feature Name |
Description |
|
Real-time source language switching |
Multilingual supports language switching during real-time transcription. |
|
Automotive model optimization |
The offline automotive domain model improves recognition accuracy for 437 vehicle models. A new real-time 16K automotive domain model is now available. |
|
Expanded offline file format support |
Offline processing now supports M3U8 files. |
|
Custom Prompt supports sentence ID |
Custom Prompt now supports processing content by sentence ID, making it easier to locate original text. |
|
Bug fixes |
Fixed speaker identification errors. Fixed issues where sensitive word filtering was not applied. |
April 30, 2025
|
Feature Name |
Description |
|
Timestamps added to real-time translation interim results |
Timestamp data is now included in interim results of real-time translation to enhance on-screen subtitle rendering. |
|
Switch translation language during real-time meetings |
Change the target translation language if it was set incorrectly at meeting creation or needs to be updated during the meeting. |
|
Bug fixes |
Improved stability of service quality inspection and dialogue content extraction. Fixed incorrect sorting of transcription results by SentenceID. Fixed failure to create offline tasks when parameters were empty. Fixed compilation errors when rare words were included in hotword lists. |
March 20, 2025
|
Feature Name |
Description |
|
Identification |
Define and recognize speaker identities in conversations. Use together with speaker separation. No additional charge. |
|
Identity-aware dialogue content extraction |
Add identity information to dialogue content extraction to strengthen role-specific instructions. Greatly improves targeting in business scenarios such as sales and customer service. |
February 24, 2025
|
Feature Name |
Description |
|
Supports event bus |
Upgraded MQ push to integrate with Alibaba Cloud EventBridge. Supports future MQ updates and multiple notification methods such as DingTalk and email without requiring RAM user authorization. Improves integration efficiency, account security, and developer experience. |
January 22, 2025
|
Feature Name |
Description |
|
Content extraction |
Launched dialogue content extraction to extract topics, feedback, and other elements based on required analysis dimensions. Performs exceptionally well in sales scenarios for capturing customer needs, product sentiment, and competitor evaluations. |
|
Create English hotwords |
Support creating hotwords using English words. |
November 30, 2024
|
Feature Name |
Description |
|
Multi-track audio support |
Transcription now supports dual-track and multi-track 16K audio. |
|
PPT extraction optimization |
PPT images can still be extracted even if no PPT speech summary is returned. |
|
Performance improvement |
Reduced file processing time and improved end-to-end service stability. |
October 31, 2024
|
Feature Name |
Description |
|
Thai added to automatic language detection |
Automatic language detection for audio and video files now supports Thai recognition and Thai ASR. |
|
Automotive domain model supports phone calls |
Phone call recordings can now use the automotive domain model for transcription, supporting automotive sales and service calls. |
|
Full-text summary optimization |
No longer depends on chapter overview results, reducing processing time. |
September 30, 2024
|
Feature Name |
Description |
|
Mixed-language recognition for offline audio/video transcription |
Offline 16K transcription now supports mixed speech in Chinese, English, Japanese, Korean, Cantonese, German, French, and Russian. Control parameters can restrict detected languages to avoid interference from unintended languages. |
|
Thai added to offline transcription |
Automatic language detection now includes Thai. Audio/video files in Chinese, English, Japanese, Korean, Cantonese, or Thai are automatically recognized and transcribed accordingly (one language per file). Thai is also supported as a standalone language option. |
|
Offline transcription efficiency optimization |
Reduced offline transcription processing time. |
August 31, 2024
|
Feature Name |
Description |
|
Real-time 8K efficiency optimization |
Reduced transcription latency for real-time 8K ASR. |
|
Service quality inspection returns speaker information |
Sentence IDs returned by “service quality inspection” now support inclusion of speaker IDs. |
|
ITN output optimization |
Improved formatting of punctuation such as percentages. |
|
Python SDK for real-time stream ingest |
Released Python SDK for real-time stream ingest. |
|
Real-time multilingual free-speaking upgrade |
Added German, French, and Russian. Real-time mixed speech now supports Chinese, English, Japanese, Korean, Cantonese, German, French, and Russian. Control parameters can restrict detected languages to avoid interference. |
|
Real-time speaker separation |
Supports speaker separation in real-time meetings. |
|
Savings plan launched |
Supports upfront savings plans to further reduce usage costs. |
|
Appkey-level billing |
Billing is now broken down by Appkey to help track costs across projects. |
June 30, 2024
|
Feature name |
Description |
|
Usage statistics added to console |
Enhanced usage tracking and querying capabilities. |
|
Real-time multilingual free-speaking |
Launched free-speaking support for Chinese, English, Japanese, Korean, and Cantonese. |
|
Service quality inspection |
Added large language model capabilities to service quality inspection. |
May 29, 2024
|
Feature Name |
Description |
|
Significant price reduction |
Prices significantly reduced to align with Qwen. |
May 21, 2024
|
Feature Name |
Description |
|
New transcription languages for real-time recording |
16K now supports free-speaking recognition for Chinese, English, Japanese, Korean, and Cantonese. |
April 24, 2024
|
Feature Name |
Description |
|
Price adjustment |
Speech-to-text pricing unified at CNY 0.6/hour. Each large language model capability costs CNY 0.22/hour. Fees stack when multiple capabilities are used. Video PPT extraction reduced to CNY 0.8/hour. Translation prices also reduced. |
|
Custom Prompt |
Apply custom Prompts to transcription results to leverage large model capabilities based on your business needs. When creating large model tasks, choose from Tingwu-Turbo, Tingwu-Plus, or Qwen-Max model specifications. |
|
Automatic language detection for offline audio/video transcription |
Offline transcription now supports automatic language detection for Chinese, English, Japanese, Korean, and Cantonese (one language per file). Users no longer need to select a language when uploading files, simplifying operations and integration. |
March 26, 2024
|
Feature Name |
Description |
|
AI model capability upgrade |
Added support for generating mind maps with up to four levels of depth. Supports inputs up to 20,000 characters (approximately 80 minutes of audio). |
March 22, 2024
|
Feature Name |
Description |
|
Text translation upgrade |
Supports bidirectional translation between Chinese, English, Japanese, and Korean. |
March 13, 2024
Pricing adjusted. Instead of a single fee for all AI capabilities, you now pay only for the capabilities you actually use. When multiple large model capabilities are invoked, fees are additive (for example, using both full-text summary and chapter overview costs CNY 0.4 + 0.4 = CNY 0.8/hour).
Overall costs have also decreased further.
February 22, 2024
Large language model capabilities fully upgraded. English and Chinese-English mixed capabilities now match Chinese performance. Improved real-time multilingual and offline compatibility. Service stability enhanced.
|
Feature Name |
Description |
|
Large language model version upgrade |
Character limits increased from 250 to 1,000 for three large model features: full-text summary, chapter overview, and speaker summary. Reduces omissions and provides more detailed descriptions. |
|
Enhanced English large model capabilities |
English now supports Q&A extraction, scene recognition, and spoken-to-written conversion. Chinese-English free-speaking supports full-text summary, chapter overview, speaker summary, to-do extraction, and Q&A extraction. |
|
Korean added to real-time |
Supports Korean language recognition and Chinese-Korean translation. |
|
Audio/video transcription upgrade |
New formats supported: ○ Audio: aiff. ○ Video: avi, mpeg, 3gp, ogg. Fixed inaccurate audio/video duration reporting. |
|
Core performance |
Improved service stability. |
January 8, 2024
Tongyi Tingwu API service upgraded to offer spoken-to-written conversion powered by Qwen large language model.
|
Feature Name |
Description |
|
Spoken-to-written conversion |
Rewrites and polishes speech-to-text results to produce formal written transcripts. |
November 8, 2023
Tongyi Tingwu API service upgraded to offer summary generation, key point extraction, PPT extraction, and summarization powered by Qwen large language model. API invocation simplified to reduce integration costs.
|
Feature Name |
Description |
|
Chapter overview |
Segments audio/video content by topic and summarizes each chapter’s title and abstract. |
|
Full-text summary |
Summarizes the entire content. |
|
Speaker summary |
Meetings often involve multiple participants. Tongyi Tingwu distinguishes speakers and summarizes each person’s viewpoints. Speaker summary clearly organizes and presents who said what during the meeting. |
|
Q&A pair extraction |
Questions and answers in meetings often contain critical information. Tongyi Tingwu’s Q&A review feature locates and extracts questions raised during the meeting, listing all discussed topics. |
|
Video PPT extraction and summarization |
Identifies and extracts PPT slides shown in videos and summarizes the content of each slide. Helps retrieve PPT materials and quickly understand their content. |
|
Model capability upgrade |
Keyword and to-do extraction upgraded to use large language models for more focused results. |
June 1, 2023
Tongyi Tingwu enters public preview. During this period, users can try all AI features, including advanced capabilities such as full-text summary, chapter overview, and speaker summary. Log on with an Alibaba Cloud account to enjoy the following benefits:
-
Daily logins to Tongyi Tingwu automatically grant transcription minutes. Storage and remaining minute limits are also increased.
-
Invite one friend to register and log in to Tongyi Tingwu to receive extra transcription minutes.
-
Enter a security token to receive additional transcription minutes.
-
Link your Tongyi Tingwu account to Alibaba Cloud Drive to share its large storage space.
|
Feature Name |
Description |
|
Real-time recording |
Use “real-time recording” to capture dialogue in meetings, training sessions, interviews, and other scenarios. Speech recognition accurately converts speech to text. |
|
Multilingual translation |
Foreign participants in meetings are no longer a barrier. Tongyi Tingwu provides real-time multilingual translation so everyone can understand and follow the discussion. |
|
Q&A review |
Questions and answers in meetings often contain critical information. Tongyi Tingwu’s Q&A review feature locates and extracts questions raised during the meeting, listing all discussed topics. |
|
Speaker summary |
Meetings often involve multiple participants. Tongyi Tingwu distinguishes speakers and summarizes each person’s viewpoints. Speaker summary clearly organizes and presents who said what during the meeting. |
|
Local upload & cloud disk import |
Use “upload audio/video” to analyze pre-recorded files. Tongyi Tingwu integrates with Alibaba Cloud Drive. Audio and video files stored in Alibaba Cloud Drive can be imported into Tongyi Tingwu for AI-powered analysis. |
|
Full-text summary |
Leveraging the powerful comprehension of large language models, full-text summary distills the most important information into a concise 200–300-word overview faithful to the original content. |
|
Chapter overview |
To explore content in greater depth, Tongyi Tingwu segments the audio/video into chapters along the timeline, presenting the central idea and key points of each. This new experience makes “reading” audio/video content at a glance a reality. |
March 14, 2023
|
Category |
Feature Name |
Description |
Type |
Document Link |
|
Real-time recording API |
Real-time API |
|
New |
|
|
Tongyi Tingwu interface service |
Tongyi Tingwu website and WeChat mini program |
|
New |
February 8, 2023
|
Category |
Feature Name |
Description |
Type |
Documentation link |
|
Audio/video file recording |
Real-time API |
|
New |
December 12, 2022
|
Category |
Feature Name |
Description |
Update Type |
Document link |
|
Audio/video file recording |
Real-time API |
|
New |
October 25, 2022
|
Category |
Feature Name |
Description |
Type |
Document Link |
|
Real-time recording |
Real-time API |
|
New |
|
|
Audio/video file recording |
Real-time API |
|
New |
|
|
Console configuration |
Console interface |
Activate service, set access policies, create projects, and test performance. |
New |