Text Moderation with LLMs

更新时间:
复制 MD 格式

Content Moderation 2.0 uses large language models (LLMs) to detect inappropriate text. Compared to rule-based approaches, LLMs identify complex and subtle violations with greater accuracy.

Important

To share feedback or feature requests, contact your account manager.

Services

The following LLM-based text moderation services are available:

Service

Description

Use cases

Service name: UGC Text Moderation Large Model Service_Professional Edition

Service: ugc_moderation_byllm_pro

Professional edition of the LLM-based UGC text moderation service. Provides more granular risk labels for fine-grained content analysis. For a detailed list of detectable items, see the Content Moderation console.

Use when you need fine-grained risk categorization for UGC content.

Service name: LLM-based Text Moderation Service in UGC Scenarios

Service: ugc_moderation_byllm

LLM-based text moderation service for UGC scenarios. For a detailed list of detectable items, see the Content Moderation console.

General-purpose UGC text moderation.

Service name: Cross-border UGC text moderation (LLM)

Service: ugc_moderation_byllm_cb

Cross-border UGC text moderation service supporting 119 languages, including Chinese, English, Spanish, French, Portuguese, Italian, Arabic, Japanese, Korean, Indonesian, Russian, Vietnamese, German, and Thai. For a detailed list of detectable items, see the Content Moderation console.

Use for multilingual UGC content moderation across regions.

Service name: LLM-based Text Moderation Service in AIGC Scenarios

Service: aigc_moderation_byllm

Text moderation service designed for AI-generated content (AIGC) scenarios. For a detailed list of detectable items, see the Content Moderation console.

Use for moderating LLM-generated or AI-created content.

Billing

The LLM-based text moderation service supports two billing methods: pay-as-you-go and resource plans.

Pay-as-you-go

When you activate the Content Moderation Enhanced Edition service, pay-as-you-go is the default billing method. You are billed daily based on your actual usage. You are not charged if you do not use the service.

Moderation type

Services

Unit price

LLM-based text moderation (Standard) (text_llm_standard)

  • UGC Text Moderation Large Model Service_Professional Edition: ugc_moderation_byllm_pro

  • LLM-based Text Moderation Service in UGC Scenarios: ugc_moderation_byllm

  • Cross-border UGC text moderation (LLM): ugc_moderation_byllm_cb

  • LLM-based Text Moderation Service in AIGC Scenarios: aigc_moderation_byllm

CNY 20.00 per 10,000 calls

Note

You are charged for each call to any of the services on the left. For example, if you make 100 calls to the UGC Text Moderation Large Model Service_Professional Edition, you are charged CNY 0.20.

LLM-based text moderation (Advanced) (text_llm_advanced)

  • Text translation feature

CNY 40.00 per 10,000 calls per 1,000 characters

Note
  • Text translation feature: After the text translation feature is enabled, each request is billed once per 500 characters.

Note

For the pay-as-you-go billing method of Content Moderation Enhanced Edition, the system generates bills every 1 hours. In your billing details, the moderationType field corresponds to the moderation type. You can view your billing details.

Resource plans

For high or consistent moderation volumes, resource plans offer significant discounts. You can purchase and stack multiple plans. For more information, see Purchase a resource plan for Content Moderation Enhanced Edition.

This resource plan offsets the usage of Content Moderation Enhanced Edition and cannot be shared with resource plans for Content Moderation V1.0. The following table lists the offset factors.

Moderation type

Offset factor

LLM-based text moderation (Standard) (text_llm_standard)

Each successful API call consumes 2.67 calls from your resource plan.

Note

For example, if your resource plan has a quota of 10 calls, one successful API call consumes 2.67 calls, leaving 7.33 calls in your plan.

LLM-based text moderation (Advanced) (text_llm_advanced)

Each successful API call consumes 5.34 calls from your resource plan.

Note

For example, if your resource plan has a quota of 10 calls, one successful API call consumes 5.34 calls, leaving 4.66 calls in your plan.

After you purchase a resource plan, your API usage for Content Moderation Enhanced Edition is first deducted from your resource plan. When your resource plan is depleted, subsequent usage is billed on a pay-as-you-go basis. Monitor your remaining plan balance and pay-as-you-go charges. You can set low-balance alerts in the Resource Plan system.

Risk labels

Label definitions

Text Moderation 2.0 supports over 60 granular labels across 10 risk categories, and returns a confidence score (0–100, where a higher score indicates greater confidence) for each. If content contains multiple risk types, the service returns multiple granular labels. The following tables list the risk label values, their corresponding confidence score ranges, and their meanings.

  • Risk labels for services in the Chinese mainland:

Label

Confidence score

Description

pornographic_adult

0–100

Suspected pornographic content

sexual_terms

0–100

Suspected sexual health content

sexual_suggestive

0–100

Suspected vulgar content

political_figure

0–100

Suspected political figure

political_entity

0–100

Suspected political entity

political_n

0–100

Suspected sensitive political content

political_p

0–100

Suspected politically prohibited figure

political_a

0–100

Enhanced detection of high-priority political content

violent_extremist

0–100

Suspected extremist organization

violent_incidents

0–100

Suspected extremist content

violent_weapons

0–100

Suspected weapons and ammunition

contraband_drug

0–100

Suspected drug-related content

contraband_gambling

0–100

Suspected gambling-related content

contraband_act

0–100

Suspected prohibited behavior

contraband_entity

0–100

Suspected prohibited tools

inappropriate_discrimination

0–100

Suspected biased or discriminatory content

inappropriate_ethics

0–100

Suspected unethical content

inappropriate_profanity

0–100

Suspected offensive or abusive content

inappropriate_oral

0–100

Suspected vulgar language

inappropriate_superstition

0–100

Suspected superstitious content

inappropriate_nonsense

0–100

Suspected spam or meaningless content

pt_to_sites

0–100

Suspected redirection to external sites

pt_by_recruitment

0–100

Suspected ads for part-time jobs or online money-making schemes

pt_to_contact

0–100

Suspected contact information for advertising

religion_b

0–100

Suspected content related to Buddhism

religion_t

0–100

Suspected content related to Taoism

religion_c

0–100

Suspected content related to Christianity

religion_i

0–100

Suspected content related to Islam

religion_h

0–100

Suspected content related to Hinduism

customized

0–100

Hit a custom keyword list

  • Risk labels for services for international markets:

Label

Confidence score

Description

pornographic_adult

0–100

Suspected pornographic content

sexual_terms

0–100

Suspected sexual health content

sexual_suggestive

0–100

Suspected vulgar content

sexual_orientation

0–100

Suspected content related to sexual orientation

regional_cn

0–100

Suspected politically sensitive content related to the Chinese mainland

regional_illegal

0–100

Suspected illegal political content

regional_controversial

0–100

Suspected political controversy

regional_racism

0–100

Suspected racism

violent_extremist

0–100

Suspected extremist organization

violent_incidents

0–100

Suspected extremist content

violent_weapons

0–100

Suspected weapons and ammunition

violence_unscList

0–100

United Nations sanctions list

contraband_drug

0–100

Suspected drug-related content

contraband_gambling

0–100

Suspected gambling-related content

inappropriate_ethics

0–100

Suspected unethical content

inappropriate_profanity

0–100

Suspected offensive or abusive content

inappropriate_oral

0–100

Suspected vulgar language

inappropriate_religion

0–100

Suspected religious blasphemy

pt_to_contact

0–100

Suspected contact information for advertising

pt_to_sites

0–100

Suspected redirection to external sites

customized

0–100

Hit a custom keyword list

Configure risk labels

Enable or disable risk labels in the console. You can also adjust the detection scope for specific labels. See the Content Moderation console for details.

  1. In the left navigation pane, choose Machine Moderation V2.0>Text Moderation>Rules.

  2. On the Rules Management tab, find a large model moderation solution, for example, aigc_moderation_byllm, and click Set Thesaurus in the Operation column.

    1. Select a detection type to configure, such as inappropriate content detection.

    2. Click Edit and modify the detection settings.

    3. Click Save. The new configuration takes effect in the production environment in 2 to 5 minutes.

Integration

Step 1: Activate the service

To activate the Text Moderation Plus service, visit activate service.

Step 2: Grant permissions to a RAM user

Before using the SDK or calling an API, grant the required permissions to a RAM user. Create an AccessKey pair for your Alibaba Cloud account or a RAM user to authenticate API calls. For instructions, see Obtain an access key.

  1. Log on to the RAM console using your Alibaba Cloud account.

  2. Create a RAM user. For details, see Create a RAM user.

  3. Grant the AliyunYundunGreenWebFullAccess system policy to the RAM user. This policy grants full access to Content Moderation. For details, see Manage RAM user permissions.

The RAM user can now call the Content Moderation API.

Step 3: Install and integrate the SDK

API reference

Overview

Use the TextModerationPlus operation to create a text content moderation task. For HTTP request construction, see Request Structure. You can also use a pre-constructed request as described in the Getting Started guide.

You can test this operation in OpenAPI Explorer without manual signature calculation. After you test a call, OpenAPI Explorer generates SDK code examples automatically.

  • Service interface: TextModerationPlus

  • Supported regions and endpoints:

Region

Public endpoint

VPC endpoint

Supported services

China (Shanghai)

green-cip.cn-shanghai.aliyuncs.com

green-cip-vpc.cn-shanghai.aliyuncs.com

ugc_moderation_byllm_pro, ugc_moderation_byllm, aigc_moderation_byllm

China (Beijing)

green-cip.cn-beijing.aliyuncs.com

green-cip-vpc.cn-beijing.aliyuncs.com

China (Hangzhou)

green-cip.cn-hangzhou.aliyuncs.com

green-cip-vpc.cn-hangzhou.aliyuncs.com

China (Shenzhen)

green-cip.cn-shenzhen.aliyuncs.com

green-cip-vpc.cn-shenzhen.aliyuncs.com

China (Chengdu)

green-cip.cn-chengdu.aliyuncs.com

Not available

China (Hong Kong)

green-cip.cn-hongkong.aliyuncs.com

green-cip-vpc.cn-hongkong.aliyuncs.com

ugc_moderation_byllm_cb

Singapore

green-cip.ap-southeast-1.aliyuncs.com

green-cip-vpc.ap-southeast-1.aliyuncs.com

US (Virginia)

green-cip.us-east-1.aliyuncs.com

green-cip-vpc.us-east-1.aliyuncs.com

Germany (Frankfurt)

green-cip.eu-central-1.aliyuncs.com

green-cip-vpc.eu-central-1.aliyuncs.com

Important

For the Germany (Frankfurt) and China (Hong Kong) regions, nodes in the Singapore region perform text moderation inference. The service processes inference results, data, and logs locally in the Germany (Frankfurt) and China (Hong Kong) regions.

  • Billing: This operation is billed. You are charged only for requests that return an HTTP status code of 200. No fees are incurred for requests that return other error codes. For more information about billing, see Pricing.

QPS limit

The default rate limit is 50 requests per second per account. Exceeding this limit triggers throttling, which may disrupt your application. To request a higher rate limit, contact your account manager.

Request parameters

Parameter

Type

Required

Example

Description

Service

String

Yes

ugc_moderation_byllm

  • ugc_moderation_byllm_pro: UGC Text Moderation Large Model Service_Professional Edition

  • ugc_moderation_byllm: LLM-based Text Moderation Service in UGC Scenarios

  • ugc_moderation_byllm_cb: Cross-border UGC text moderation (LLM)

  • aigc_moderation_byllm: LLM-based Text Moderation Service in AIGC Scenarios

ServiceParameters

JSONString

Yes

The moderation service parameters, specified as a JSON string. For details, see ServiceParameters.

Table 1. ServiceParameters

Parameter

Type

Required

Example

Description

content

String

Yes

testing content

The text content to moderate. The content can be up to 2,000 characters in length.

dataId

String

No

text0424****

A unique identifier for your business data.

Maximum 64 characters. Allowed characters: letters, digits, underscores (_), hyphens (-), and periods (.).

accountId

String

No

ID0728****

The account ID of the end user on your platform. Use this parameter to link results to a specific user. For example, if user A chats with user B, pass A's ID for A’s messages and B's ID for B’s messages.

Note

Enables context-aware moderation. To activate this feature, contact your account manager or submit a ticket.

infoType

String

No

llmContent

The type of supplementary information to retrieve. Valid values:

  • llmContent: Returns the raw detection result from the LLM.

Response parameters

Parameter

Type

Example

Description

Code

Integer

200

The HTTP status code. For more information, see Status codes.

Data

JSONObject

{"Result":[...]}

The moderation result data. For more information, see Data.

Message

String

OK

The result message for the request.

RequestId

String

AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****

The request ID.

Table 2. Data

Parameter

Type

Example

Description

Result

JSONArray

The detection results, including risk labels and confidence scores. For more information, see Result.

RiskLevel

String

high

The risk level, which is determined based on the configured high and low risk score thresholds. Valid values:

  • high: High risk (If the content matches a custom library, the risk level is high.)

  • medium: Medium risk

  • low: Low risk

  • none: No risk detected

Note

Take immediate action on high-risk content and manually review medium-risk content. Handle low-risk content only if you require a high recall rate; otherwise, treat it the same as content with a none risk level. You can configure risk score thresholds in the Content Moderation console.

DataId

String

text0424****

The data ID of the moderated content.

Note

If you specified the dataId parameter in the request, the same value is returned in this field.

AccountId

String

10123****

The account ID.

Note

If you specified the accountId parameter in the request, the same value is returned in this field.

Ext

Object

Supplementary information for the text. For more information, see Ext.

TranslatedContent

String

The translated text content. Returned only when the text translation feature is enabled.

Note

The text translation feature is currently available only in the Singapore (Singapore) region. You can configure it by managing detection rules in the console. Additional charges apply.

Table 3. Result

Parameter

Type

Example

Description

Label

String

political_xxx

The risk label for the moderated content. Multiple labels and scores can be returned. For a list of supported labels, see Risk labels.

Description

String

Suspected pornographic content

A description of the Label field.

Important

This field is for reference only and may change. For your handling logic, use the Label field instead of this one.

Confidence

Float

81.22

The confidence score, which ranges from 0 to 100. The value is accurate to two decimal places. Some labels do not return a confidence score.

Riskwords

String

AA,BB,CC

The detected risk words, separated by commas. This field is not returned for some labels.

CustomizedHit

JSONArray

[{"LibName":"...","Keywords":"..."}]

If the content matches an entry in a custom library, the Label is customized, and this field returns the library name and the matched keywords. For more information, see CustomizedHit.

RiskPositions

JSONArray

Information about the position of the detected risk words. For more information, see RiskPositions.

Table 4. CustomizedHit

Parameter

Type

Example

Description

LibName

String

Custom Library 1

The name of the custom library.

Keywords

String

Custom Keyword 1,Custom Keyword 2

The matched custom keywords, separated by commas.

Table 5. RiskPositions

Parameter

Type

Example

Description

RiskWord

String

AA

The detected risk word.

StartPos

Integer

10

The start position of the risk word in the text.

EndPos

Integer

12

The end position of the risk word in the text.

Table 6. Ext

Parameter

Type

Example

Description

LlmContent

Object

The raw detection result from the LLM. For more information, see LlmContent.

Table 7. LlmContent

Parameter

Type

Example

Description

OutputText

String

Suspected vulgar language

The raw detection result from the LLM-based text moderation model.

Examples

Request example:

{
    "Service": "ugc_moderation_byllm_pro",
    "ServiceParameters": {
        "content": "testing content",
        "dataId": "text0424****"
    }
}

Response examples:

  • System policy match:

{
    "Code": 200,
    "Data": {
        "Result": [
            {
                "Label": "political_entity",
                "Description": "Suspected political entity",
                "Confidence": 100.0,
                "RiskWords": "WordA,WordB",
                "RiskPositions": [
                    {
                        "EndPos": 14,
                        "RiskWord": "WordA",
                        "StartPos": 16
                    }
                ]
            },
            {
                "Label": "political_figure",
                "Description": "Suspected political figure",
                "Confidence": 100.0,
                "RiskWords": "WordB,WordC",
                "RiskPositions": [
                    {
                        "EndPos": 24,
                        "RiskWord": "WordC",
                        "StartPos": 26
                    }
                ]
            }
        ],
        "RiskLevel": "high",
        "DataId": "text0424****"
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}
  • Custom library match:

{
    "Code": 200,
    "Data": {
        "Result": [
            {
                "Description": "Hit a custom library",
                "CustomizedHit": [
                     {
                        "LibName": "Custom Library Name 1",
                        "Keywords": "custom keyword"
                     }
                ],
                "Confidence": 100,
                "Label": "customized"
             }
        ],
        "RiskLevel": "high",
        "DataId": "text0424****"
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}
  • Raw LLM result:

{
  "RequestId": "ZZZZZ-2024-0307-FORYOU-EVER",
  "Message": "OK",
  "Data": {
    "Ext": {
      "LlmContent": {
        "OutputText": "Suspected offensive or abusive content"
      }
    },
    "Result": [
      {
        "RiskWords": "risk word",
        "Description": "Suspected offensive or abusive content",
        "Confidence": 100.0,
        "Label": "inappropriate_profanity",
        "RiskPositions": [
          {
            "RiskWord": "risk word",
            "EndPos": 5,
            "StartPos": 2
          }
        ]
      }
    ],
    "RiskLevel": "high"
  },
  "Code": 200
}

Status codes

Code

Status text

Description

200

OK

The request was successful.

400

BAD_REQUEST

Invalid request. Check your request parameters.

408

PERMISSION_DENY

Your account may be unauthorized, have an overdue payment, or the service is not activated.

500

GENERAL_ERROR

Internal server error. Retry the request. If the error persists, contact Online Support.

581

TIMEOUT

Request timed out. Retry the request. If the error persists, contact Online Support.

588

EXCEED_QUOTA

Rate limit exceeded. Reduce your request frequency.