If you operate a live streaming, voice chat, or media sharing platform, manually monitoring audio for compliance violations is impractical. This topic shows how to use Voice Moderation 2.0 to automatically detect audio content risks such as profanity, pornography, political content, and AIGC-generated voices. You will learn how to activate the service, configure moderation rules, integrate the API, view results, and monitor usage.
Limitations
You can query moderation results from the last 30 days only.
The Result Query page stores up to 50,000 records. For longer-term storage, save API results to your own system.
Each lexicon supports up to 100,000 keywords.
You can create up to 20 lexicons per account.
A single keyword cannot exceed 20 characters. Special characters are not supported.
Compared to Voice Moderation 1.0, the Voice Moderation 2.0 service offers more capabilities for improved content security.
Feature | Voice Moderation 2.0 | Voice Moderation 1.0 |
Business scenario support | Pre-configured for multiple business scenarios, including social live streaming, comprehensive audio and video, and voice chat rooms. This simplifies integration and expands compliance risk coverage. | The default configuration is for general scenarios and lacks business-specific features. You must define your standards and adapt the policy before use. |
Voice slicing | Uses an adjustable slicing method to create fixed-duration voice slices, improving manual review efficiency. | Uses semantic segmentation, resulting in voice slices ranging from a few to dozens of seconds. |
Moderation capabilities |
|
|
API features |
|
|
Features
Use cases
Voice Moderation 2.0 provides two service tiers. Choose based on your analysis needs:
LLM Edition: For deep semantic analysis, including sentiment detection, metaphor identification, and ideological content recognition. Powered by a text moderation large language model.
Rule-based services: For keyword matching, cost-efficient moderation of known risk types such as profanity, pornography, and terrorism.
LLM Edition services are newly available. See the first four rows below for details.
Use LLM Edition when you need deep semantic analysis of nuanced content. Use rule-based services for cost-efficient moderation of well-defined risk types.
Service name | service | Use cases |
Audio and video media detection (LLM Edition) | audio_detection_byllm |
|
Audio and video media detection (LLM Edition, Global) | audio_detection_byllm_cb |
|
Social live stream detection (LLM Edition) | stream_detection_byllm |
|
Social live stream detection (LLM Edition, Global) | stream_detection_byllm_cb |
|
Rule-based services (existing) | ||
Social live stream detection | live_stream_detection |
|
Social live stream detection (Professional Edition) | live_stream_detection_pro |
|
Social live stream multilingual detection | stream_multilingual_cb |
|
Audio and video media detection | audio_media_detection |
|
Audio and video media detection (Professional Edition) | audio_media_detection_pro |
|
Audio and video media multilingual detection | audio_multilingual_cb |
|
AI-generated voice detection | voice_aigc_detector |
|
Moderation labels
This service supports a wide range of moderation labels and can return multiple labels for a single audio file. The following table lists the available labels.
Label type | Category |
Voice Moderation labels (labels) |
|
AI-generated voice moderation labels (labels) |
|
LLM Edition labels (labels) |
|
Moderation performance
The Voice Moderation 2.0 service uses a high-performance core engine that schedules dozens of models and policies with high concurrency, delivering low-latency protection for scenarios like live audio streaming and voice chat.
Performance metric | Description |
File size | The maximum audio file size is increased from 200 MB to 500 MB. |
QPS (Queries Per Second) | The task submission QPS is increased from 50 to 100. |
Concurrency | The default concurrency limit is increased from 20 to 50. |
QPS for Voice Moderation is the number of API requests processed per second. Concurrency is the number of audio files or streams being detected simultaneously.
Billing
The Voice Moderation 2.0 service supports two billing methods: pay-as-you-go and resource plan deduction.
Pay-as-you-go
After you activate the service, the default billing method is pay-as-you-go, where you are billed daily based on your usage. You are not charged if you do not use the service. For more information, see Activate Content Moderation-Enhanced Edition.
Moderation type | Applicable services | Unit price |
Standard Voice Moderation (audio_standard) |
| CNY 225 per 10,000 minutes, equivalent to CNY 1.35 per hour. |
Advanced Voice Moderation (audio_advanced) |
| CNY 375 per 10,000 minutes, equivalent to CNY 2.25 per hour. |
LLM Edition Standard (audio_llm_standard) |
| CNY 300 per 10,000 minutes, equivalent to CNY 1.80 per hour. |
LLM Edition Advanced (audio_llm_advanced) |
| CNY 450 per 10,000 minutes, equivalent to CNY 2.70 per hour. |
Resource plan deduction
If you have a large volume of content to moderate or a consistent moderation workload, purchase a resource plan in advance. Larger plans offer greater discounts. You can purchase and use multiple plans simultaneously. For more information, see Purchase a resource plan for Content Moderation-Enhanced Edition.
This resource plan deducts charges for the Voice Moderation 2.0 service and is not compatible with Content Security traffic packages. The deduction factors are as follows:
Moderation type | Deduction factor |
Standard Voice Moderation (audio_standard) | The deduction factor is 30. This means that for every minute of audio successfully processed, 30 units are deducted from your resource plan. For example, if you purchase a resource plan with a capacity of 100 units and you moderate one minute of audio, 30 units are consumed, leaving 70 units. |
Advanced Voice Moderation (audio_advanced) | The deduction factor is 50. This means that for every minute of audio successfully processed, 50 units are deducted from your resource plan. For example, if you purchase a resource plan with a capacity of 100 units and you moderate one minute of audio, 50 units are consumed, leaving 50 units. |
LLM Edition Standard (audio_llm_standard) | The deduction factor is 40. This means that for every minute of audio successfully processed, 40 units are deducted from your resource plan. For example, if you purchase a resource plan with a capacity of 100 units and you moderate one minute of audio, 40 units are consumed, leaving 60 units. |
LLM Edition Advanced (audio_llm_advanced) | The deduction factor is 60. This means that for every minute of audio successfully processed, 60 units are deducted from your resource plan. For example, if you purchase a resource plan with a capacity of 100 units and you moderate one minute of audio, 60 units are consumed, leaving 40 units. |
Step 1: Activate service
Before using the Voice Moderation 2.0 feature, you must activate the service.
Go to the Content Moderation-Enhanced Edition page, and then read and agree to the service agreement.
Click Activate Now.
Step 2: Configure moderation rules
If the built-in labels are insufficient for your business needs, create custom lexicons and configure rules to flag or ignore specific terms.
Log on to the Content Security console.
In the left-side navigation pane, choose .
Configure a lexicon.
On the Keyword Library Management tab, click Create Library.
In the Create Library panel, provide a Library Name and add words by using Add words by this page or Upload File.
Select the Create a library first and add words later option to create a library without immediately adding keywords. Add them later based on your business needs. Add up to 100,000 keywords and create up to 20 libraries in a single account. A single keyword cannot exceed 20 characters, and special characters are not supported.
Click Create Library.
If the lexicon fails to be created, a message appears. Follow the instructions in the message to try again.
In the Operation column, click Modify or Clear All to modify keywords or remove all keywords at once.
Configure rules.
On the Rules Management tab, select the target service, and then click Set Thesaurus in the Operation column.
In the Settings panel, select the ignore lexicon. Then, click Next.
This feature is used to whitelist keywords, preventing them from being flagged.
For example, if the keywords in the ignored lexicon are moderator and fan, and the text converted from speech for review is Welcome to the livestream, double-tap to like, fan badge plus moderator gets streamer friend spot, then the keywords moderator and fan are first ignored. A risk detection is then performed only on the remaining text: Welcome to the livestream, double-tap to like, badge plus gets streamer friend spot.
Select the match lexicon, and then click OK.
If any keyword from a lexicon is successfully matched with the text transcribed from the audio, the
labelsparameter returnsC_customizedwhen you call the Voice Moderation 2.0 API. The valueC_customizedindicates a match in a custom lexicon that you created. This scenario is primarily used to detect violation risks in the text transcribed from the audio.For example, suppose the keywords in a matched lexicon are room manager and fans. If the text converted from audio for moderation is Welcome to the livestream. Double-tap to like. Get a fan badge and become a room manager to be added as the streamer's friend, the audio risk detection process matches the keywords room manager and fans. When you call the Voice Moderation 2.0 API, the value of the returned
labelsparameter includesC_customizedin addition to any matched built-in labels.
The rule configuration takes about three minutes to take effect.
Step 3: Integrate the service
The Voice Moderation 2.0 service supports the following integration methods.
Submit a Voice Moderation task by calling the DetectAudit API. Specify the service parameter to select a moderation service. For example, to use the LLM Edition service for audio and video media detection:
curl -X POST 'https://green-cip.ap-southeast-1.aliyuncs.com/?Action=DetectAudit&Version=2022-03-02' \
-H 'Content-Type: application/json' \
-d '{
"Service": "audio_detection_byllm",
"AudioUrl": "https://example.com/sample.mp3"
}'
Replace the endpoint with the one that matches your region. For the Service parameter values, see the Supported business scenarios table above.
Call the API to integrate the service. For more information, see Voice Moderation 2.0 API.
Use an SDK to integrate the service. For more information, see SDK Integration Guide.
Use HTTP to integrate the service. For more information, see HTTP Integration Guide.
Step 4: View moderation results
View moderation results to analyze common violation types in your audio content.
Log on to the Content Security console.
In the left-side navigation pane, choose .
On the 文档页结果 page, view the moderated audio, matched labels, and request times.
Search for specific information by setting a Time Range or by filtering by Request ID, Text (from transcribed audio), or Label. Query data from the last 30 days. The Result Query page can store up to 50,000 records. If you require longer-term storage, save the API results to your own system.
When searching by label, use the following filter options:
Contains: Returns results where the label field includes the specified label value.
Does not contain: Returns results where the label field does not include the specified label value.
Empty: Returns results that did not match any label.
Not empty: Returns results that matched any label (no label value input is needed).
Locate a specific transcribed text entry and click View in the Operation column.
In the View details panel, review the detailed moderation information for that transcribed text.
Step 5: Track usage statistics
Track call volumes to monitor service usage for your Alibaba Cloud account and its RAM users.
In the left-side navigation pane, choose .
On the Statistics page, view risk statistics for Voice Moderation, including high, medium, low, and no-risk counts, as well as risk trends and distributions.
On the Statistics page, view the Voice Moderation call volume.
Specify a custom time range to query call volume data from the last 365 days. Query the data Daily or Monthly. By default, the Daily view displays the daily call volume for the last 30 days, and the Monthly view displays the monthly call volume for the last 12 months. Also view call volume data based on Alibaba Cloud accounts and their RAM users.