Content Moderation 2.0 uses large language models (LLMs) to detect inappropriate text. Compared to rule-based approaches, LLMs identify complex and subtle violations with greater accuracy.
To share feedback or feature requests, contact your account manager.
Services
The following LLM-based text moderation services are available:
|
Service |
Description |
Use cases |
|
Service name: UGC Text Moderation Large Model Service_Professional Edition Service: |
Professional edition of the LLM-based UGC text moderation service. Provides more granular risk labels for fine-grained content analysis. For a detailed list of detectable items, see the Content Moderation console. |
Use when you need fine-grained risk categorization for UGC content. |
|
Service name: LLM-based Text Moderation Service in UGC Scenarios Service: |
LLM-based text moderation service for UGC scenarios. For a detailed list of detectable items, see the Content Moderation console. |
General-purpose UGC text moderation. |
|
Service name: Cross-border UGC text moderation (LLM) Service: |
Cross-border UGC text moderation service supporting 119 languages, including Chinese, English, Spanish, French, Portuguese, Italian, Arabic, Japanese, Korean, Indonesian, Russian, Vietnamese, German, and Thai. For a detailed list of detectable items, see the Content Moderation console. |
Use for multilingual UGC content moderation across regions. |
|
Service name: LLM-based Text Moderation Service in AIGC Scenarios Service: |
Text moderation service designed for AI-generated content (AIGC) scenarios. For a detailed list of detectable items, see the Content Moderation console. |
Use for moderating LLM-generated or AI-created content. |
Billing
The LLM-based text moderation service supports two billing methods: pay-as-you-go and resource plans.
Pay-as-you-go
When you activate the Content Moderation Enhanced Edition service, pay-as-you-go is the default billing method. You are billed daily based on your actual usage. You are not charged if you do not use the service.
|
Moderation type |
Services |
Unit price |
|
LLM-based text moderation (Standard) (text_llm_standard) |
|
CNY 20.00 per 10,000 calls Note
You are charged for each call to any of the services on the left. For example, if you make 100 calls to the UGC Text Moderation Large Model Service_Professional Edition, you are charged CNY 0.20. |
|
LLM-based text moderation (Advanced) (text_llm_advanced) |
|
CNY 40.00 per 10,000 calls per 1,000 characters Note
|
For the pay-as-you-go billing method of Content Moderation Enhanced Edition, the system generates bills every 1 hours. In your billing details, the moderationType field corresponds to the moderation type. You can view your billing details.
Resource plans
For high or consistent moderation volumes, resource plans offer significant discounts. You can purchase and stack multiple plans. For more information, see Purchase a resource plan for Content Moderation Enhanced Edition.
This resource plan offsets the usage of Content Moderation Enhanced Edition and cannot be shared with resource plans for Content Moderation V1.0. The following table lists the offset factors.
|
Moderation type |
Offset factor |
|
LLM-based text moderation (Standard) (text_llm_standard) |
Each successful API call consumes 2.67 calls from your resource plan. Note
For example, if your resource plan has a quota of 10 calls, one successful API call consumes 2.67 calls, leaving 7.33 calls in your plan. |
|
LLM-based text moderation (Advanced) (text_llm_advanced) |
Each successful API call consumes 5.34 calls from your resource plan. Note
For example, if your resource plan has a quota of 10 calls, one successful API call consumes 5.34 calls, leaving 4.66 calls in your plan. |
After you purchase a resource plan, your API usage for Content Moderation Enhanced Edition is first deducted from your resource plan. When your resource plan is depleted, subsequent usage is billed on a pay-as-you-go basis. Monitor your remaining plan balance and pay-as-you-go charges. You can set low-balance alerts in the Resource Plan system.
Risk labels
Label definitions
Text Moderation 2.0 supports over 60 granular labels across 10 risk categories, and returns a confidence score (0–100, where a higher score indicates greater confidence) for each. If content contains multiple risk types, the service returns multiple granular labels. The following tables list the risk label values, their corresponding confidence score ranges, and their meanings.
-
Risk labels for services in the Chinese mainland:
|
Label |
Confidence score |
Description |
|
pornographic_adult |
0–100 |
Suspected pornographic content |
|
sexual_terms |
0–100 |
Suspected sexual health content |
|
sexual_suggestive |
0–100 |
Suspected vulgar content |
|
political_figure |
0–100 |
Suspected political figure |
|
political_entity |
0–100 |
Suspected political entity |
|
political_n |
0–100 |
Suspected sensitive political content |
|
political_p |
0–100 |
Suspected politically prohibited figure |
|
political_a |
0–100 |
Enhanced detection of high-priority political content |
|
violent_extremist |
0–100 |
Suspected extremist organization |
|
violent_incidents |
0–100 |
Suspected extremist content |
|
violent_weapons |
0–100 |
Suspected weapons and ammunition |
|
contraband_drug |
0–100 |
Suspected drug-related content |
|
contraband_gambling |
0–100 |
Suspected gambling-related content |
|
contraband_act |
0–100 |
Suspected prohibited behavior |
|
contraband_entity |
0–100 |
Suspected prohibited tools |
|
inappropriate_discrimination |
0–100 |
Suspected biased or discriminatory content |
|
inappropriate_ethics |
0–100 |
Suspected unethical content |
|
inappropriate_profanity |
0–100 |
Suspected offensive or abusive content |
|
inappropriate_oral |
0–100 |
Suspected vulgar language |
|
inappropriate_superstition |
0–100 |
Suspected superstitious content |
|
inappropriate_nonsense |
0–100 |
Suspected spam or meaningless content |
|
pt_to_sites |
0–100 |
Suspected redirection to external sites |
|
pt_by_recruitment |
0–100 |
Suspected ads for part-time jobs or online money-making schemes |
|
pt_to_contact |
0–100 |
Suspected contact information for advertising |
|
religion_b |
0–100 |
Suspected content related to Buddhism |
|
religion_t |
0–100 |
Suspected content related to Taoism |
|
religion_c |
0–100 |
Suspected content related to Christianity |
|
religion_i |
0–100 |
Suspected content related to Islam |
|
religion_h |
0–100 |
Suspected content related to Hinduism |
|
customized |
0–100 |
Hit a custom keyword list |
-
Risk labels for services for international markets:
|
Label |
Confidence score |
Description |
|
pornographic_adult |
0–100 |
Suspected pornographic content |
|
sexual_terms |
0–100 |
Suspected sexual health content |
|
sexual_suggestive |
0–100 |
Suspected vulgar content |
|
sexual_orientation |
0–100 |
Suspected content related to sexual orientation |
|
regional_cn |
0–100 |
Suspected politically sensitive content related to the Chinese mainland |
|
regional_illegal |
0–100 |
Suspected illegal political content |
|
regional_controversial |
0–100 |
Suspected political controversy |
|
regional_racism |
0–100 |
Suspected racism |
|
violent_extremist |
0–100 |
Suspected extremist organization |
|
violent_incidents |
0–100 |
Suspected extremist content |
|
violent_weapons |
0–100 |
Suspected weapons and ammunition |
|
violence_unscList |
0–100 |
United Nations sanctions list |
|
contraband_drug |
0–100 |
Suspected drug-related content |
|
contraband_gambling |
0–100 |
Suspected gambling-related content |
|
inappropriate_ethics |
0–100 |
Suspected unethical content |
|
inappropriate_profanity |
0–100 |
Suspected offensive or abusive content |
|
inappropriate_oral |
0–100 |
Suspected vulgar language |
|
inappropriate_religion |
0–100 |
Suspected religious blasphemy |
|
pt_to_contact |
0–100 |
Suspected contact information for advertising |
|
pt_to_sites |
0–100 |
Suspected redirection to external sites |
|
customized |
0–100 |
Hit a custom keyword list |
Configure risk labels
Enable or disable risk labels in the console. You can also adjust the detection scope for specific labels. See the Content Moderation console for details.
-
In the left navigation pane, choose Machine Moderation V2.0>Text Moderation>Rules.
-
On the Rules Management tab, find a large model moderation solution, for example,
aigc_moderation_byllm, and click Set Thesaurus in the Operation column.-
Select a detection type to configure, such as inappropriate content detection.
-
Click Edit and modify the detection settings.
-
Click Save. The new configuration takes effect in the production environment in 2 to 5 minutes.
-
Integration
Step 1: Activate the service
To activate the Text Moderation Plus service, visit activate service.
Step 2: Grant permissions to a RAM user
Before using the SDK or calling an API, grant the required permissions to a RAM user. Create an AccessKey pair for your Alibaba Cloud account or a RAM user to authenticate API calls. For instructions, see Obtain an access key.
-
Log on to the RAM console using your Alibaba Cloud account.
-
Create a RAM user. For details, see Create a RAM user.
-
Grant the
AliyunYundunGreenWebFullAccesssystem policy to the RAM user. This policy grants full access to Content Moderation. For details, see Manage RAM user permissions.
The RAM user can now call the Content Moderation API.
Step 3: Install and integrate the SDK
For the SDK integration guide, see TextModerationPlus 2.0 PLUS Service SDK and Integration Guide.
API reference
Overview
Use the TextModerationPlus operation to create a text content moderation task. For HTTP request construction, see Request Structure. You can also use a pre-constructed request as described in the Getting Started guide.
You can test this operation in OpenAPI Explorer without manual signature calculation. After you test a call, OpenAPI Explorer generates SDK code examples automatically.
-
Service interface: TextModerationPlus
-
Supported regions and endpoints:
|
Region |
Public endpoint |
VPC endpoint |
Supported services |
|
China (Shanghai) |
green-cip.cn-shanghai.aliyuncs.com |
green-cip-vpc.cn-shanghai.aliyuncs.com |
ugc_moderation_byllm_pro, ugc_moderation_byllm, aigc_moderation_byllm |
|
China (Beijing) |
green-cip.cn-beijing.aliyuncs.com |
green-cip-vpc.cn-beijing.aliyuncs.com |
|
|
China (Hangzhou) |
green-cip.cn-hangzhou.aliyuncs.com |
green-cip-vpc.cn-hangzhou.aliyuncs.com |
|
|
China (Shenzhen) |
green-cip.cn-shenzhen.aliyuncs.com |
green-cip-vpc.cn-shenzhen.aliyuncs.com |
|
|
China (Chengdu) |
green-cip.cn-chengdu.aliyuncs.com |
Not available |
|
|
China (Hong Kong) |
green-cip.cn-hongkong.aliyuncs.com |
green-cip-vpc.cn-hongkong.aliyuncs.com |
ugc_moderation_byllm_cb |
|
Singapore |
green-cip.ap-southeast-1.aliyuncs.com |
green-cip-vpc.ap-southeast-1.aliyuncs.com |
|
|
US (Virginia) |
green-cip.us-east-1.aliyuncs.com |
green-cip-vpc.us-east-1.aliyuncs.com |
|
|
Germany (Frankfurt) |
green-cip.eu-central-1.aliyuncs.com |
green-cip-vpc.eu-central-1.aliyuncs.com |
For the Germany (Frankfurt) and China (Hong Kong) regions, nodes in the Singapore region perform text moderation inference. The service processes inference results, data, and logs locally in the Germany (Frankfurt) and China (Hong Kong) regions.
-
Billing: This operation is billed. You are charged only for requests that return an HTTP status code of 200. No fees are incurred for requests that return other error codes. For more information about billing, see Pricing.
QPS limit
The default rate limit is 50 requests per second per account. Exceeding this limit triggers throttling, which may disrupt your application. To request a higher rate limit, contact your account manager.
Request parameters
|
Parameter |
Type |
Required |
Example |
Description |
|
Service |
String |
Yes |
ugc_moderation_byllm |
|
|
ServiceParameters |
JSONString |
Yes |
The moderation service parameters, specified as a JSON string. For details, see ServiceParameters. |
Table 1. ServiceParameters
|
Parameter |
Type |
Required |
Example |
Description |
|
content |
String |
Yes |
testing content |
The text content to moderate. The content can be up to 2,000 characters in length. |
|
dataId |
String |
No |
text0424**** |
A unique identifier for your business data. Maximum 64 characters. Allowed characters: letters, digits, underscores (_), hyphens (-), and periods (.). |
|
accountId |
String |
No |
ID0728**** |
The account ID of the end user on your platform. Use this parameter to link results to a specific user. For example, if user A chats with user B, pass A's ID for A’s messages and B's ID for B’s messages. Note
Enables context-aware moderation. To activate this feature, contact your account manager or submit a ticket. |
|
infoType |
String |
No |
llmContent |
The type of supplementary information to retrieve. Valid values:
|
Response parameters
|
Parameter |
Type |
Example |
Description |
|
Code |
Integer |
200 |
The HTTP status code. For more information, see Status codes. |
|
Data |
JSONObject |
{"Result":[...]} |
The moderation result data. For more information, see Data. |
|
Message |
String |
OK |
The result message for the request. |
|
RequestId |
String |
AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE**** |
The request ID. |
Table 2. Data
|
Parameter |
Type |
Example |
Description |
|
Result |
JSONArray |
The detection results, including risk labels and confidence scores. For more information, see Result. |
|
|
RiskLevel |
String |
high |
The risk level, which is determined based on the configured high and low risk score thresholds. Valid values:
Note
Take immediate action on |
|
DataId |
String |
text0424**** |
The data ID of the moderated content. Note
If you specified the |
|
AccountId |
String |
10123**** |
The account ID. Note
If you specified the |
|
Ext |
Object |
Supplementary information for the text. For more information, see Ext. |
|
|
TranslatedContent |
String |
The translated text content. Returned only when the text translation feature is enabled. Note
The text translation feature is currently available only in the Singapore (Singapore) region. You can configure it by managing detection rules in the console. Additional charges apply. |
Table 3. Result
|
Parameter |
Type |
Example |
Description |
|
Label |
String |
political_xxx |
The risk label for the moderated content. Multiple labels and scores can be returned. For a list of supported labels, see Risk labels. |
|
Description |
String |
Suspected pornographic content |
A description of the Important
This field is for reference only and may change. For your handling logic, use the |
|
Confidence |
Float |
81.22 |
The confidence score, which ranges from 0 to 100. The value is accurate to two decimal places. Some labels do not return a confidence score. |
|
Riskwords |
String |
AA,BB,CC |
The detected risk words, separated by commas. This field is not returned for some labels. |
|
CustomizedHit |
JSONArray |
[{"LibName":"...","Keywords":"..."}] |
If the content matches an entry in a custom library, the |
|
RiskPositions |
JSONArray |
Information about the position of the detected risk words. For more information, see RiskPositions. |
Table 4. CustomizedHit
|
Parameter |
Type |
Example |
Description |
|
LibName |
String |
Custom Library 1 |
The name of the custom library. |
|
Keywords |
String |
Custom Keyword 1,Custom Keyword 2 |
The matched custom keywords, separated by commas. |
Table 5. RiskPositions
|
Parameter |
Type |
Example |
Description |
|
RiskWord |
String |
AA |
The detected risk word. |
|
StartPos |
Integer |
10 |
The start position of the risk word in the text. |
|
EndPos |
Integer |
12 |
The end position of the risk word in the text. |
Table 6. Ext
|
Parameter |
Type |
Example |
Description |
|
LlmContent |
Object |
The raw detection result from the LLM. For more information, see LlmContent. |
Table 7. LlmContent
|
Parameter |
Type |
Example |
Description |
|
OutputText |
String |
Suspected vulgar language |
The raw detection result from the LLM-based text moderation model. |
Examples
Request example:
{
"Service": "ugc_moderation_byllm_pro",
"ServiceParameters": {
"content": "testing content",
"dataId": "text0424****"
}
}
Response examples:
-
System policy match:
{
"Code": 200,
"Data": {
"Result": [
{
"Label": "political_entity",
"Description": "Suspected political entity",
"Confidence": 100.0,
"RiskWords": "WordA,WordB",
"RiskPositions": [
{
"EndPos": 14,
"RiskWord": "WordA",
"StartPos": 16
}
]
},
{
"Label": "political_figure",
"Description": "Suspected political figure",
"Confidence": 100.0,
"RiskWords": "WordB,WordC",
"RiskPositions": [
{
"EndPos": 24,
"RiskWord": "WordC",
"StartPos": 26
}
]
}
],
"RiskLevel": "high",
"DataId": "text0424****"
},
"Message": "OK",
"RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}
-
Custom library match:
{
"Code": 200,
"Data": {
"Result": [
{
"Description": "Hit a custom library",
"CustomizedHit": [
{
"LibName": "Custom Library Name 1",
"Keywords": "custom keyword"
}
],
"Confidence": 100,
"Label": "customized"
}
],
"RiskLevel": "high",
"DataId": "text0424****"
},
"Message": "OK",
"RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}
-
Raw LLM result:
{
"RequestId": "ZZZZZ-2024-0307-FORYOU-EVER",
"Message": "OK",
"Data": {
"Ext": {
"LlmContent": {
"OutputText": "Suspected offensive or abusive content"
}
},
"Result": [
{
"RiskWords": "risk word",
"Description": "Suspected offensive or abusive content",
"Confidence": 100.0,
"Label": "inappropriate_profanity",
"RiskPositions": [
{
"RiskWord": "risk word",
"EndPos": 5,
"StartPos": 2
}
]
}
],
"RiskLevel": "high"
},
"Code": 200
}
Status codes
|
Code |
Status text |
Description |
|
200 |
OK |
The request was successful. |
|
400 |
BAD_REQUEST |
Invalid request. Check your request parameters. |
|
408 |
PERMISSION_DENY |
Your account may be unauthorized, have an overdue payment, or the service is not activated. |
|
500 |
GENERAL_ERROR |
Internal server error. Retry the request. If the error persists, contact Online Support. |
|
581 |
TIMEOUT |
Request timed out. Retry the request. If the error persists, contact Online Support. |
|
588 |
EXCEED_QUOTA |
Rate limit exceeded. Reduce your request frequency. |