Text moderation with LLMs-AI Guardrails(AI Guardrails)-阿里云帮助中心

Text Moderation 2.0 uses large language models (LLMs) to detect inappropriate text. Compared to rule-based approaches, LLMs identify complex and subtle violations with greater accuracy.

Important

To share feedback or feature requests, contact your account manager.

Services

The following LLM-based text moderation services are available:

Service	Description	Use cases
Service name: UGC Text Moderation Large Model Service_Professional Edition Service: `ugc_moderation_byllm_pro`	Professional edition of the LLM-based UGC text moderation service. Provides more granular risk labels for fine-grained content analysis. For a detailed list of detectable items, see the Content Moderation console.	Use when you need fine-grained risk categorization for UGC content.
Service name: LLM-based Text Moderation Service in UGC Scenarios Service: `ugc_moderation_byllm`	LLM-based text moderation service for UGC scenarios. For a detailed list of detectable items, see the Content Moderation console.	General-purpose UGC text moderation.
Service name: Cross-border UGC text moderation (LLM) Service: `ugc_moderation_byllm_cb`	Cross-border UGC text moderation service supporting 119 languages, including Chinese, English, Spanish, French, Portuguese, Italian, Arabic, Japanese, Korean, Indonesian, Russian, Vietnamese, German, and Thai. For a detailed list of detectable items, see the Content Moderation console.	Use for multilingual UGC content moderation across regions.
Service name: LLM-based Text Moderation Service in AIGC Scenarios Service: `aigc_moderation_byllm`	Text moderation service designed for AI-generated content (AIGC) scenarios. For a detailed list of detectable items, see the Content Moderation console.	Use for moderating LLM-generated or AI-created content.

Billing

The LLM-based text moderation service supports two billing methods: pay-as-you-go and resource plans.

Pay-as-you-go

When you activate the Content Moderation Enhanced Edition service, pay-as-you-go is the default billing method. You are billed daily based on your actual usage. You are not charged if you do not use the service.

Moderation type	Services	Unit price
LLM-based text moderation (Standard) (text_llm_standard)	UGC Text Moderation Large Model Service_Professional Edition: ugc_moderation_byllm_pro LLM-based Text Moderation Service in UGC Scenarios: ugc_moderation_byllm Cross-border UGC text moderation (LLM): ugc_moderation_byllm_cb LLM-based Text Moderation Service in AIGC Scenarios: aigc_moderation_byllm	CNY 20.00 per 10,000 calls Note You are charged for each call to any of the services on the left. For example, if you make 100 calls to the UGC Text Moderation Large Model Service_Professional Edition, you are charged CNY 0.20.
LLM-based text moderation (Advanced) (text_llm_advanced)	Text translation feature	CNY 40.00 per 10,000 calls per 1,000 characters Note Text translation feature: After the text translation feature is enabled, each request is billed once per 500 characters.

Note

For the pay-as-you-go billing method of Content Moderation Enhanced Edition, the system generates bills every 1 hours. In your billing details, the moderationType field corresponds to the moderation type. You can view your billing details.

Resource plans

For high or consistent moderation volumes, resource plans offer significant discounts. You can purchase and stack multiple plans. For more information, see Purchase a resource plan for Content Moderation Enhanced Edition.

This resource plan offsets the usage of Content Moderation Enhanced Edition and cannot be shared with resource plans for Content Moderation V1.0. The following table lists the offset factors.

Moderation type	Offset factor
LLM-based text moderation (Standard) (text_llm_standard)	Each successful API call consumes 2.67 calls from your resource plan. Note For example, if your resource plan has a quota of 10 calls, one successful API call consumes 2.67 calls, leaving 7.33 calls in your plan.
LLM-based text moderation (Advanced) (text_llm_advanced)	Each successful API call consumes 5.34 calls from your resource plan. Note For example, if your resource plan has a quota of 10 calls, one successful API call consumes 5.34 calls, leaving 4.66 calls in your plan.

After you purchase a resource plan, your API usage for Content Moderation Enhanced Edition is first deducted from your resource plan. When your resource plan is depleted, subsequent usage is billed on a pay-as-you-go basis. Monitor your remaining plan balance and pay-as-you-go charges. You can set low-balance alerts in the Resource Plan system.

Risk labels

Label definitions

Text Moderation 2.0 supports over 60 granular labels across 10 risk categories, and returns a confidence score (0–100, where a higher score indicates greater confidence) for each. If content contains multiple risk types, the service returns multiple granular labels. The following tables list the risk label values, their corresponding confidence score ranges, and their meanings.

Risk labels for services in the Chinese mainland:

Label	Confidence score	Description
violent_extremism	0–100	Suspected extremist or violent/gory behavior
violent_weapons	0–100	Suspected weapons and ammunition
pornographic_special_taste	0–100	Suspected pornographic fetish
sexual_terms_activity	0–100	Suspected description of sexual acts
pornographic_adult_works	0–100	Suspected erotic works
sexual_terms_suggestive	0–100	Suspected sexual suggestion
pornographic_adult_activity	0–100	Suspected obscene or pornographic content
sexual_terms_sex	0–100	Suspected sexual education content
pornographic_adult_goods	0–100	Suspected adult products
pornographic_adult_trade	0–100	Suspected erotic trade
sexual_suggestive_rude	0–100	Suspected soft-core or vulgar content
sexual_suggestive_hint	0–100	Suspected soft pornographic content
sexual_terms_offend	0–100	Suspected sexual harassment content
sexual_terms_physical	0–100	Suspected sexual physiology education content
sexual_terms_kiss	0–100	Suspected inappropriate kissing depiction
sexual_terms_animal	0–100	Suspected animal reproduction education
pornographic_lgbtq_group	0–100	Suspected LGBTQ+ content
contraband_drug	0–100	Suspected drug-related content
contraband_gambling	0–100	Suspected gambling-related content
contraband_entity	0–100	Suspected prohibited tools
contraband_act_threat	0–100	Suspected threatening behavior
contraband_act_law	0–100	Suspected criminal activity
contraband_fraud	0–100	Suspected pyramid selling or fraud
customized_p	0–100	Customized content by client
special_language	0–100	Special language
inappropriate_minor_sex	0–100	Suspected minor sexual content
inappropriate_minor_behavior	0–100	Suspected inappropriate minor behavior
inappropriate_minor_abuse	0–100	Suspected abuse or exploitation of minors
inappropriate_suicide	0–100	Suspected self-harm or suicide content
inappropriate_minor_safty	0–100	Suspected content endangering minor safety
inappropriate_minor_relationship	0–100	Suspected inappropriate minor socializing
inappropriate_minor_phychology	0–100	Suspected content affecting minor mental health
inappropriate_discrimination	0–100	Suspected prejudiced or discriminatory content
inappropriate_superstition	0–100	Suspected superstitious content
inappropriate_minor_addiction	0–100	Suspected minor anti-addiction circumvention
inappropriate_oral	0–100	Suspected vulgar catchphrase content
inappropriate_profanity	0–100	Suspected abusive or insulting content
inappropriate_nonsense	0–100	Suspected meaningless spam content
inappropriate_minor_mention	0–100	Suspected mention of minors
inappropriate_ethics	0–100	Suspected unethical content
political_country_humans	0–100	Suspected country personification
political_past_coreleader	0–100	Suspected content about past core national leaders
political_sensitive_event	0–100	Suspected other politically prohibited events
political_rights_conflict	0–100	Suspected rights defense conflicts
political_foreign_leader	0–100	Suspected current or past foreign leaders
political_cn_ideology	0–100	Suspected ideological violations
political_event_internationality	0–100	Suspected modern political events or international relations
political_cn_otherleader	0–100	Suspected other major Chinese leaders
political_private_family	0–100	Suspected undisclosed family members of core leaders
political_negative_group	0–100	Suspected negative figures or groups
political_known_family	0–100	Suspected disclosed family members of core leaders
political_limited_event	0–100	Suspected major politically prohibited events
political_unproper_coreleader	0–100	Suspected inappropriate depiction of core leaders
political_current_coreleader	0–100	Suspected content about the current national chairman
political_cn_separatism	0–100	Suspected China territorial separatism
political_cn_entity	0–100	Suspected political entity
political_a	0–100	Enhanced detection of high-priority political content
privacy_b	0–100	Suspected commercial sensitive data
privacy_p	0–100	Suspected personal privacy information
pt_by_spam	0–100	Suspected spam advertising
pt_to_sites	0–100	Suspected redirection to external sites
pt_to_phone	0–100	Suspected phone number included
pt_to_contact	0–100	Suspected advertising contact number
pt_by_tradeingame	0–100	Suspected in-game trade advertising
religion_b	0–100	Suspected content related to Buddhism
religion_c	0–100	Suspected content related to Christianity
religion_t	0–100	Suspected content related to Taoism
religion_h	0–100	Suspected content related to Hinduism
religion_i	0–100	Suspected content related to Islam
violent_extremist	0–100	Suspected extremist organization
violent_incidents	0–100	Suspected extremist content
sexual_suggestive	0–100	Suspected vulgar content
pornographic_adult	0–100	Suspected pornographic content
sexual_terms	0–100	Suspected sexual health content
contraband_act	0–100	Suspected prohibited behavior
inappropriate_minor	0–100	Suspected minor-inappropriate content
inappropriate_oral	0–100	Suspected vulgar catchphrase content
political_entity	0–100	Suspected political entity
political_limited_event	0–100	Suspected major politically prohibited events
political_current_coreleader	0–100	Suspected content about current core leader
political_n	0–100	Suspected sensitive political content
political_other_negative	0–100	Suspected other illegal groups
political_p	0–100	Suspected politically prohibited figure
political_main_negative	0–100	Suspected major illegal groups
political_figure	0–100	Suspected political figure
pt_by_recruitment	0–100	Suspected ads for part-time jobs or online money-making schemes

Risk labels for services for international markets:

Label	Confidence score	Description
pornographic_adult	0–100	Suspected pornographic content
sexual_terms	0–100	Suspected sexual health content
sexual_suggestive	0–100	Suspected vulgar content
sexual_orientation	0–100	Suspected content related to sexual orientation
regional_cn	0–100	Suspected politically sensitive content related to the Chinese mainland
regional_illegal	0–100	Suspected illegal political content
regional_controversial	0–100	Suspected political controversy
regional_racism	0–100	Suspected racism
violent_extremist	0–100	Suspected extremist organization
violent_incidents	0–100	Suspected extremist content
violent_weapons	0–100	Suspected weapons and ammunition
violence_unscList	0–100	United Nations sanctions list
contraband_drug	0–100	Suspected drug-related content
contraband_gambling	0–100	Suspected gambling-related content
inappropriate_ethics	0–100	Suspected unethical content
inappropriate_profanity	0–100	Suspected offensive or abusive content
inappropriate_oral	0–100	Suspected vulgar language
inappropriate_religion	0–100	Suspected religious blasphemy
pt_to_contact	0–100	Suspected contact information for advertising
pt_to_sites	0–100	Suspected redirection to external sites
customized	0–100	Hit a custom keyword list

Configure risk labels

Enable or disable risk labels in the console. You can also adjust the detection scope for specific labels. See the Content Moderation console for details.

In the left navigation pane, choose Machine Moderation V2.0>Text Moderation>Rules.
On the Rules Management tab, find a large model moderation solution, for example, aigc_moderation_byllm, and click Set Thesaurus in the Operation column.
1. Select a detection type to configure, such as inappropriate content detection.
2. Click Edit and modify the detection settings.
3. Click Save. The new configuration takes effect in the production environment in 2 to 5 minutes.

Integration

Step 1: Activate the service

To activate the Text Moderation Plus service, visit activate service.

Step 2: Grant permissions to a RAM user

Before using the SDK or calling an API, grant the required permissions to a RAM user. Create an AccessKey pair for your Alibaba Cloud account or a RAM user to authenticate API calls. For instructions, see Obtain an access key.

Log on to the RAM console using your Alibaba Cloud account.
Create a RAM user. For details, see Create a RAM user.
Grant the AliyunYundunGreenWebFullAccess system policy to the RAM user. This policy grants full access to Content Moderation. For details, see Manage RAM user permissions.

The RAM user can now call the Content Moderation API.

Step 3: Install and integrate the SDK

For the SDK integration guide, see TextModerationPlus 2.0 PLUS Service SDK and Integration Guide.

API reference

Overview

Use the TextModerationPlus operation to create a text content moderation task. For HTTP request construction, see Request Structure. You can also use a pre-constructed request as described in the Getting Started guide.

You can test this operation in OpenAPI Explorer without manual signature calculation. After you test a call, OpenAPI Explorer generates SDK code examples automatically.

Service interface: TextModerationPlus
Supported regions and endpoints:

Region	Public endpoint	VPC endpoint	Supported services
China (Shanghai)	green-cip.cn-shanghai.aliyuncs.com	green-cip-vpc.cn-shanghai.aliyuncs.com	ugc_moderation_byllm_pro, ugc_moderation_byllm, aigc_moderation_byllm
China (Beijing)	green-cip.cn-beijing.aliyuncs.com	green-cip-vpc.cn-beijing.aliyuncs.com
China (Hangzhou)	green-cip.cn-hangzhou.aliyuncs.com	green-cip-vpc.cn-hangzhou.aliyuncs.com
China (Shenzhen)	green-cip.cn-shenzhen.aliyuncs.com	green-cip-vpc.cn-shenzhen.aliyuncs.com
China (Chengdu)	green-cip.cn-chengdu.aliyuncs.com	Not available
China (Hong Kong)	green-cip.cn-hongkong.aliyuncs.com	green-cip-vpc.cn-hongkong.aliyuncs.com	ugc_moderation_byllm_cb
Singapore	green-cip.ap-southeast-1.aliyuncs.com	green-cip-vpc.ap-southeast-1.aliyuncs.com
US (Virginia)	green-cip.us-east-1.aliyuncs.com	green-cip-vpc.us-east-1.aliyuncs.com
Germany (Frankfurt)	green-cip.eu-central-1.aliyuncs.com	green-cip-vpc.eu-central-1.aliyuncs.com

Important

For the Germany (Frankfurt) and China (Hong Kong) regions, nodes in the Singapore region perform text moderation inference. The service processes inference results, data, and logs locally in the Germany (Frankfurt) and China (Hong Kong) regions.

Billing: This operation is billed. You are charged only for requests that return an HTTP status code of 200. No fees are incurred for requests that return other error codes. For more information about billing, see Pricing.

QPS limit

The default rate limit is 50 requests per second per account. Exceeding this limit triggers throttling, which may disrupt your application. To request a higher rate limit, contact your account manager.

Request parameters

Parameter	Type	Required	Example	Description
Service	String	Yes	ugc_moderation_byllm	ugc_moderation_byllm_pro: UGC Text Moderation Large Model Service_Professional Edition ugc_moderation_byllm: LLM-based Text Moderation Service in UGC Scenarios ugc_moderation_byllm_cb: Cross-border UGC text moderation (LLM) aigc_moderation_byllm: LLM-based Text Moderation Service in AIGC Scenarios
ServiceParameters	JSONString	Yes		The moderation service parameters, specified as a JSON string. For details, see ServiceParameters.

Table 1. ServiceParameters

Parameter	Type	Required	Example	Description
content	String	Yes	testing content	The text content to moderate. The content can be up to 2,000 characters in length.
dataId	String	No	text0424****	A unique identifier for your business data. Maximum 64 characters. Allowed characters: letters, digits, underscores (_), hyphens (-), and periods (.).
accountId	String	No	ID0728****	The account ID of the end user on your platform. Use this parameter to link results to a specific user. For example, if user A chats with user B, pass A's ID for A’s messages and B's ID for B’s messages. Note Enables context-aware moderation. To activate this feature, contact your account manager or submit a ticket.
infoType	String	No	llmContent	The type of supplementary information to retrieve. Valid values: `llmContent`: Returns the raw detection result from the LLM.

Response parameters

Parameter	Type	Example	Description
Code	Integer	200	The HTTP status code. For more information, see Status codes.
Data	JSONObject	{"Result":[...]}	The moderation result data. For more information, see Data.
Message	String	OK	The result message for the request.
RequestId	String	AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****	The request ID.

Table 2. Data

Parameter	Type	Example	Description
Result	JSONArray		The detection results, including risk labels and confidence scores. For more information, see Result.
RiskLevel	String	high	The risk level, which is determined based on the configured high and low risk score thresholds. Valid values: `high`: High risk (If the content matches a custom library, the risk level is `high`.) `medium`: Medium risk `low`: Low risk `none`: No risk detected Note Take immediate action on `high`-risk content and manually review `medium`-risk content. Handle `low`-risk content only if you require a high recall rate; otherwise, treat it the same as content with a `none` risk level. You can configure risk score thresholds in the Content Moderation console.
DataId	String	text0424****	The data ID of the moderated content. Note If you specified the `dataId` parameter in the request, the same value is returned in this field.
AccountId	String	10123****	The account ID. Note If you specified the `accountId` parameter in the request, the same value is returned in this field.
Ext	Object		Supplementary information for the text. For more information, see Ext.
TranslatedContent	String		The translated text content. Returned only when the text translation feature is enabled. Note The text translation feature is currently available only in the Singapore (Singapore) region. You can configure it by managing detection rules in the console. Additional charges apply.

Table 3. Result

Parameter	Type	Example	Description
Label	String	political_xxx	The risk label for the moderated content. Multiple labels and scores can be returned. For a list of supported labels, see Risk labels.
Description	String	Suspected pornographic content	A description of the `Label` field. Important This field is for reference only and may change. For your handling logic, use the `Label` field instead of this one.
Confidence	Float	81.22	The confidence score, which ranges from 0 to 100. The value is accurate to two decimal places. Some labels do not return a confidence score.
Riskwords	String	AA,BB,CC	The detected risk words, separated by commas. This field is not returned for some labels.
CustomizedHit	JSONArray	[{"LibName":"...","Keywords":"..."}]	If the content matches an entry in a custom library, the `Label` is `customized`, and this field returns the library name and the matched keywords. For more information, see CustomizedHit.
RiskPositions	JSONArray		Information about the position of the detected risk words. For more information, see RiskPositions.

Table 4. CustomizedHit

Parameter	Type	Example	Description
LibName	String	Custom Library 1	The name of the custom library.
Keywords	String	Custom Keyword 1,Custom Keyword 2	The matched custom keywords, separated by commas.

Table 5. RiskPositions

Parameter	Type	Example	Description
RiskWord	String	AA	The detected risk word.
StartPos	Integer	10	The start position of the risk word in the text.
EndPos	Integer	12	The end position of the risk word in the text.

Table 6. Ext

Parameter	Type	Example	Description
LlmContent	Object		The raw detection result from the LLM. For more information, see LlmContent.

Table 7. LlmContent

Parameter	Type	Example	Description
OutputText	String	Suspected vulgar language	The raw detection result from the LLM-based text moderation model.

Examples

Request example:

{
    "Service": "ugc_moderation_byllm_pro",
    "ServiceParameters": {
        "content": "testing content",
        "dataId": "text0424****"
    }
}

Response examples:

System policy match:

{
    "Code": 200,
    "Data": {
        "Result": [
            {
                "Label": "political_entity",
                "Description": "Suspected political entity",
                "Confidence": 100.0,
                "RiskWords": "WordA,WordB",
                "RiskPositions": [
                    {
                        "EndPos": 14,
                        "RiskWord": "WordA",
                        "StartPos": 16
                    }
                ]
            },
            {
                "Label": "political_figure",
                "Description": "Suspected political figure",
                "Confidence": 100.0,
                "RiskWords": "WordB,WordC",
                "RiskPositions": [
                    {
                        "EndPos": 24,
                        "RiskWord": "WordC",
                        "StartPos": 26
                    }
                ]
            }
        ],
        "RiskLevel": "high",
        "DataId": "text0424****"
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Custom library match:

{
    "Code": 200,
    "Data": {
        "Result": [
            {
                "Description": "Hit a custom library",
                "CustomizedHit": [
                     {
                        "LibName": "Custom Library Name 1",
                        "Keywords": "custom keyword"
                     }
                ],
                "Confidence": 100,
                "Label": "customized"
             }
        ],
        "RiskLevel": "high",
        "DataId": "text0424****"
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Raw LLM result:

{
  "RequestId": "ZZZZZ-2024-0307-FORYOU-EVER",
  "Message": "OK",
  "Data": {
    "Ext": {
      "LlmContent": {
        "OutputText": "Suspected offensive or abusive content"
      }
    },
    "Result": [
      {
        "RiskWords": "risk word",
        "Description": "Suspected offensive or abusive content",
        "Confidence": 100.0,
        "Label": "inappropriate_profanity",
        "RiskPositions": [
          {
            "RiskWord": "risk word",
            "EndPos": 5,
            "StartPos": 2
          }
        ]
      }
    ],
    "RiskLevel": "high"
  },
  "Code": 200
}

Status codes

Code	Status text	Description
200	OK	The request was successful.
400	BAD_REQUEST	Invalid request. Check your request parameters.
408	PERMISSION_DENY	Your account may be unauthorized, have an overdue payment, or the service is not activated.
500	GENERAL_ERROR	Internal server error. Retry the request. If the error persists, contact Online Support.
581	TIMEOUT	Request timed out. Retry the request. If the error persists, contact Online Support.
588	EXCEED_QUOTA	Rate limit exceeded. Reduce your request frequency.