Release Notes

更新时间:
复制 MD 格式

This topic describes the latest updates and related documentation for Tongyi Tingwu.

October 30, 2025

Feature Name

Description

Translation upgraded with large language model

Both offline and online translation now use the latest version of qwen-mt to improve translation quality.

September 30, 2025

Feature Name

Description

Support for fun-asr

Use the fun-asr parameter in the ASR domain model Transcription.Model and set Input.SourceLanguage to multilingual to invoke Bailing’s latest ASR large language model.

Disable sensitive words filter

Offline and online ASR now support disabling the sensitive words filter.

July 30, 2025

Feature Name

Description

Selectable large language models

When creating a task, choose from qwen-plus, qwq, or ccai-pro models in addition to the default large language model combination.

Optimized to-do and Q&A extraction

Combine two LLMs and apply multi-round large model processing to improve extraction of to-do items and Q&A pairs from complex or long conversations.

June 30, 2025

Feature Name

Description

Adjustable chapter overview

Adjust title length and chapter granularity in the chapter overview for a more personalized experience.

Output language setting for mind maps and chapter overview

Large model outputs are displayed uniformly in either Chinese or English, regardless of the original audio or video language. This simplifies understanding of content in less common languages.

May 30, 2025

Feature Name

Description

Real-time source language switching

Multilingual supports language switching during real-time transcription.

Automotive model optimization

The offline automotive domain model improves recognition accuracy for 437 vehicle models.

A new real-time 16K automotive domain model is now available.

Expanded offline file format support

Offline processing now supports M3U8 files.

Custom Prompt supports sentence ID

Custom Prompt now supports processing content by sentence ID, making it easier to locate original text.

Bug fixes

Fixed speaker identification errors.

Fixed issues where sensitive word filtering was not applied.

April 30, 2025

Feature Name

Description

Timestamps added to real-time translation interim results

Timestamp data is now included in interim results of real-time translation to enhance on-screen subtitle rendering.

Switch translation language during real-time meetings

Change the target translation language if it was set incorrectly at meeting creation or needs to be updated during the meeting.

Bug fixes

Improved stability of service quality inspection and dialogue content extraction.

Fixed incorrect sorting of transcription results by SentenceID.

Fixed failure to create offline tasks when parameters were empty.

Fixed compilation errors when rare words were included in hotword lists.

March 20, 2025

Feature Name

Description

Identification

Define and recognize speaker identities in conversations. Use together with speaker separation. No additional charge.

Identity-aware dialogue content extraction

Add identity information to dialogue content extraction to strengthen role-specific instructions. Greatly improves targeting in business scenarios such as sales and customer service.

February 24, 2025

Feature Name

Description

Supports event bus

Upgraded MQ push to integrate with Alibaba Cloud EventBridge. Supports future MQ updates and multiple notification methods such as DingTalk and email without requiring RAM user authorization. Improves integration efficiency, account security, and developer experience.

January 22, 2025

Feature Name

Description

Content extraction

Launched dialogue content extraction to extract topics, feedback, and other elements based on required analysis dimensions. Performs exceptionally well in sales scenarios for capturing customer needs, product sentiment, and competitor evaluations.

Create English hotwords

Support creating hotwords using English words.

November 30, 2024

Feature Name

Description

Multi-track audio support

Transcription now supports dual-track and multi-track 16K audio.

PPT extraction optimization

PPT images can still be extracted even if no PPT speech summary is returned.

Performance improvement

Reduced file processing time and improved end-to-end service stability.

October 31, 2024

Feature Name

Description

Thai added to automatic language detection

Automatic language detection for audio and video files now supports Thai recognition and Thai ASR.

Automotive domain model supports phone calls

Phone call recordings can now use the automotive domain model for transcription, supporting automotive sales and service calls.

Full-text summary optimization

No longer depends on chapter overview results, reducing processing time.

September 30, 2024

Feature Name

Description

Mixed-language recognition for offline audio/video transcription

Offline 16K transcription now supports mixed speech in Chinese, English, Japanese, Korean, Cantonese, German, French, and Russian. Control parameters can restrict detected languages to avoid interference from unintended languages.

Thai added to offline transcription

Automatic language detection now includes Thai. Audio/video files in Chinese, English, Japanese, Korean, Cantonese, or Thai are automatically recognized and transcribed accordingly (one language per file). Thai is also supported as a standalone language option.

Offline transcription efficiency optimization

Reduced offline transcription processing time.

August 31, 2024

Feature Name

Description

Real-time 8K efficiency optimization

Reduced transcription latency for real-time 8K ASR.

Service quality inspection returns speaker information

Sentence IDs returned by “service quality inspection” now support inclusion of speaker IDs.

ITN output optimization

Improved formatting of punctuation such as percentages.

Python SDK for real-time stream ingest

Released Python SDK for real-time stream ingest.

Real-time multilingual free-speaking upgrade

Added German, French, and Russian. Real-time mixed speech now supports Chinese, English, Japanese, Korean, Cantonese, German, French, and Russian. Control parameters can restrict detected languages to avoid interference.

Real-time speaker separation

Supports speaker separation in real-time meetings.

Savings plan launched

Supports upfront savings plans to further reduce usage costs.

Appkey-level billing

Billing is now broken down by Appkey to help track costs across projects.

June 30, 2024

Feature name

Description

Usage statistics added to console

Enhanced usage tracking and querying capabilities.

Real-time multilingual free-speaking

Launched free-speaking support for Chinese, English, Japanese, Korean, and Cantonese.

Service quality inspection

Added large language model capabilities to service quality inspection.

May 29, 2024

Feature Name

Description

Significant price reduction

Prices significantly reduced to align with Qwen.

May 21, 2024

Feature Name

Description

New transcription languages for real-time recording

16K now supports free-speaking recognition for Chinese, English, Japanese, Korean, and Cantonese.

April 24, 2024

Feature Name

Description

Price adjustment

Speech-to-text pricing unified at CNY 0.6/hour.

Each large language model capability costs CNY 0.22/hour. Fees stack when multiple capabilities are used.

Video PPT extraction reduced to CNY 0.8/hour.

Translation prices also reduced.

Custom Prompt

Apply custom Prompts to transcription results to leverage large model capabilities based on your business needs.

When creating large model tasks, choose from Tingwu-Turbo, Tingwu-Plus, or Qwen-Max model specifications.

Automatic language detection for offline audio/video transcription

Offline transcription now supports automatic language detection for Chinese, English, Japanese, Korean, and Cantonese (one language per file).

Users no longer need to select a language when uploading files, simplifying operations and integration.

March 26, 2024

Feature Name

Description

AI model capability upgrade

Added support for generating mind maps with up to four levels of depth. Supports inputs up to 20,000 characters (approximately 80 minutes of audio).

March 22, 2024

Feature Name

Description

Text translation upgrade

Supports bidirectional translation between Chinese, English, Japanese, and Korean.

March 13, 2024

Pricing adjusted. Instead of a single fee for all AI capabilities, you now pay only for the capabilities you actually use. When multiple large model capabilities are invoked, fees are additive (for example, using both full-text summary and chapter overview costs CNY 0.4 + 0.4 = CNY 0.8/hour).

Overall costs have also decreased further.

February 22, 2024

Large language model capabilities fully upgraded. English and Chinese-English mixed capabilities now match Chinese performance. Improved real-time multilingual and offline compatibility. Service stability enhanced.

Feature Name

Description

Large language model version upgrade

Character limits increased from 250 to 1,000 for three large model features: full-text summary, chapter overview, and speaker summary. Reduces omissions and provides more detailed descriptions.

Enhanced English large model capabilities

English now supports Q&A extraction, scene recognition, and spoken-to-written conversion.

Chinese-English free-speaking supports full-text summary, chapter overview, speaker summary, to-do extraction, and Q&A extraction.

Korean added to real-time

Supports Korean language recognition and Chinese-Korean translation.

Audio/video transcription upgrade

New formats supported:

○ Audio: aiff.

○ Video: avi, mpeg, 3gp, ogg.

Fixed inaccurate audio/video duration reporting.

Core performance

Improved service stability.

January 8, 2024

Tongyi Tingwu API service upgraded to offer spoken-to-written conversion powered by Qwen large language model.

Feature Name

Description

Spoken-to-written conversion

Rewrites and polishes speech-to-text results to produce formal written transcripts.

November 8, 2023

Tongyi Tingwu API service upgraded to offer summary generation, key point extraction, PPT extraction, and summarization powered by Qwen large language model. API invocation simplified to reduce integration costs.

Feature Name

Description

Chapter overview

Segments audio/video content by topic and summarizes each chapter’s title and abstract.

Full-text summary

Summarizes the entire content.

Speaker summary

Meetings often involve multiple participants. Tongyi Tingwu distinguishes speakers and summarizes each person’s viewpoints. Speaker summary clearly organizes and presents who said what during the meeting.

Q&A pair extraction

Questions and answers in meetings often contain critical information. Tongyi Tingwu’s Q&A review feature locates and extracts questions raised during the meeting, listing all discussed topics.

Video PPT extraction and summarization

Identifies and extracts PPT slides shown in videos and summarizes the content of each slide. Helps retrieve PPT materials and quickly understand their content.

Model capability upgrade

Keyword and to-do extraction upgraded to use large language models for more focused results.

June 1, 2023

Tongyi Tingwu enters public preview. During this period, users can try all AI features, including advanced capabilities such as full-text summary, chapter overview, and speaker summary. Log on with an Alibaba Cloud account to enjoy the following benefits:

  • Daily logins to Tongyi Tingwu automatically grant transcription minutes. Storage and remaining minute limits are also increased.

  • Invite one friend to register and log in to Tongyi Tingwu to receive extra transcription minutes.

  • Enter a security token to receive additional transcription minutes.

  • Link your Tongyi Tingwu account to Alibaba Cloud Drive to share its large storage space.

Feature Name

Description

Real-time recording

Use “real-time recording” to capture dialogue in meetings, training sessions, interviews, and other scenarios. Speech recognition accurately converts speech to text.

Multilingual translation

Foreign participants in meetings are no longer a barrier. Tongyi Tingwu provides real-time multilingual translation so everyone can understand and follow the discussion.

Q&A review

Questions and answers in meetings often contain critical information. Tongyi Tingwu’s Q&A review feature locates and extracts questions raised during the meeting, listing all discussed topics.

Speaker summary

Meetings often involve multiple participants. Tongyi Tingwu distinguishes speakers and summarizes each person’s viewpoints. Speaker summary clearly organizes and presents who said what during the meeting.

Local upload & cloud disk import

Use “upload audio/video” to analyze pre-recorded files.

Tongyi Tingwu integrates with Alibaba Cloud Drive. Audio and video files stored in Alibaba Cloud Drive can be imported into Tongyi Tingwu for AI-powered analysis.

Full-text summary

Leveraging the powerful comprehension of large language models, full-text summary distills the most important information into a concise 200–300-word overview faithful to the original content.

Chapter overview

To explore content in greater depth, Tongyi Tingwu segments the audio/video into chapters along the timeline, presenting the central idea and key points of each. This new experience makes “reading” audio/video content at a glance a reality.

March 14, 2023

Category

Feature Name

Description

Type

Document Link

Real-time recording API

Real-time API

  • Supports 8k single-channel audio streams.

New

Real-time recording

Tongyi Tingwu interface service

Tongyi Tingwu website and WeChat mini program

  • Website and WeChat mini program launched.

    (Search “Tongyi Tingwu” in WeChat.)

  • Register personal accounts with mobile number.

  • Earn transcription minutes through registration, sign-in, and inviting friends.

  • Support real-time recording and audio/video file recording, speaker separation, and intelligent key information extraction.

  • Share recordings and invite friends to register.

  • Bookmark and folder management.

New

Tongyi Tingwu interface service

February 8, 2023

Category

Feature Name

Description

Type

Documentation link

Audio/video file recording

Real-time API

  • Supports real-time meeting translation into Chinese, English, or both.

  • Toggle translation on or off during the meeting.

New

Audio/video file recording

December 12, 2022

Category

Feature Name

Description

Update Type

Document link

Audio/video file recording

Real-time API

  • Audio sampling rate: Added support for 8K.

New

Audio/video file recording

October 25, 2022

Category

Feature Name

Description

Type

Document Link

Real-time recording

Real-time API

  • Supported input formats: PCM (uncompressed PCM or WAV files), 16-bit audio bit depth, mono.

  • Supported audio sampling rate: 16,000 Hz.

  • Configure result output: choose whether to return interim recognition results.

  • Set multilingual recognition: Chinese, English, Cantonese, or Chinese-English free-speaking.

  • Enable synchronized audio transcoding: transcode to 128kb/s MP3 and write quasi-real-time to a specified OSS bucket.

  • Enable post-meeting smart extraction: toggle effective audio segment detection, save in-meeting recognition results, and extract keywords, key sentences, subtopics, and to-do items.

New

Real-time recording

Audio/video file recording

Real-time API

  • Supported audio formats: mp3, wav, m4a, wma, aac, ogg, amr, flac, mp4.

  • File size must not exceed 4 GB.

  • Audio duration must not exceed 4 hours.

  • Audio sampling rate must be at least 16K.

  • Audio files must be stored in an OSS bucket managed by Tingwu.

  • Supported invocation methods: polling and callback.

  • Set multilingual recognition: Chinese, English, Cantonese, or Chinese-English free-speaking.

New

Audio/video file recording

Console configuration

Console interface

Activate service, set access policies, create projects, and test performance.

New

Quick Start