Text moderation enhanced plus

更新时间:
复制 MD 格式

Text Moderation Enhanced Edition has been upgraded to the PLUS service, allowing you to enable or disable moderation labels. This topic explains how to use the service.

Risk labels

Label description

Text Moderation Pro can return over 60 granular risk labels across 10 categories, each with a confidence score. If content contains multiple risks, the service can return multiple labels for a single request. The following table describes these risk labels, their confidence score ranges, and their meanings.

Label type

Label value

Confidence score

Description

Text moderation risk labels

pornographic_adult

0–100

Indicates potentially pornographic content.

sexual_terms

0–100

Indicates content related to sexual health.

sexual_suggestive

0–100

Indicates potentially vulgar content.

political_figure

0–100

Indicates content is potentially related to political figures.

political_entity

0–100

Indicates content is potentially related to political entities.

political_n

0–100

Indicates potentially sensitive political content.

political_p

0–100

Indicates content is potentially related to prohibited political figures.

political_a

0–100

Indicates enhanced protection is applied for political content.

violent_extremist

0–100

Indicates content is potentially related to extremist organizations.

violent_incidents

0–100

Indicates potentially extremist content.

violent_weapons

0–100

Indicates content is potentially related to weapons and ammunition.

contraband_drug

0–100

Indicates content is potentially related to drugs.

contraband_gambling

0–100

Indicates content is potentially related to gambling.

contraband_act

0–100

Indicates content is potentially related to illegal activities.

contraband_entity

0–100

Indicates content is potentially related to illegal items.

inappropriate_discrimination

0–100

Indicates potentially biased or discriminatory content.

inappropriate_ethics

0–100

Indicates content with potentially harmful values.

inappropriate_profanity

0–100

Indicates potentially insulting or abusive content.

inappropriate_oral

0–100

Indicates potentially vulgar spoken content.

inappropriate_superstition

0–100

Indicates potentially superstitious content.

inappropriate_nonsense

0–100

Indicates potentially meaningless spam content.

pt_to_sites

0–100

Indicates content that potentially diverts traffic to external sites.

pt_by_recruitment

0–100

Indicates advertisements for online money-making schemes or part-time jobs.

pt_to_contact

0–100

Indicates advertisements that divert traffic.

religion_b

0–100

Indicates content is potentially related to Buddhism.

religion_t

0–100

Indicates content is potentially related to Taoism.

religion_c

0–100

Indicates content is potentially related to Christianity.

religion_i

0–100

Indicates content is potentially related to Islam.

religion_h

0–100

Indicates content is potentially related to Hinduism.

ad_compliance

0–100

Indicates content that violates advertising laws.

customized

0–100

Indicates a match with a keyword in a custom dictionary.

nonLabel

Not applicable

Confirms that no risks were detected.

AIGC detection labels

aigc

0–100

Indicates text is potentially AI-generated.

ugc

0–100

Indicates text is not AI-generated.

nonLabel

Not applicable

Confirms that no risks were detected.

Manage labels

You can enable or disable each risk label in the console. Some risk labels provide switches for more granular detection scopes. For more information, see the Content Moderation Console.

  1. In the left-side navigation pane, choose Content Moderation Pro > Text Moderation > Rule Configuration.

  2. On the Rule Management tab, for the llm_query_moderation service, click Modify Rules in the Actions column.

    1. Select the detection type you want to adjust, for example, Unwanted Content Detection.

    2. Click Edit, then modify the status of the detection item.

    3. Click Save. The new configuration takes about 2 to 5 minutes to take effect.

Enable a Service

On the Rule Configuration page, each Service is listed as Unused by default. You do not need to activate a Service; after configuring its detection rules, you can use it by making an API call.

  1. Log on to the Content Moderation Console.

  2. In the left-side navigation pane, choose Content Moderation Pro > Text Moderation > Rules.

  3. In the Service list, find the target Service, such as comment_detection_pro, and click Modify Rules in the Actions column.

  4. On the Detection Scope tab, enable or disable the required detection items, then click Save. The changes take effect in about 2 to 5 minutes.

  5. (Optional) To use a custom dictionary for keyword detection, return to the Service list and click Set Thesaurus in the Actions column.

  6. When making an API call, set the Service parameter to the corresponding Service name, such as comment_detection_pro.

Integration

Step 1: Activate the service

Visit Activate Service to activate the Content Moderation Enhanced Edition.

Step 2: Grant permissions to a RAM user

Before you use the SDK or call the API, you must grant permissions to a RAM user. To authenticate API calls, use an access key from either your Alibaba Cloud account or a RAM user. For more information, see Obtain an access key.

Grant permissions to a RAM user

  1. Log on to the RAM console using your Alibaba Cloud account.

  2. Create a RAM user. For details, see Create a RAM user.

  3. Grant the AliyunYundunGreenWebFullAccess system policy to the RAM user. This policy grants full access to Content Moderation. For details, see Manage RAM user permissions.

    The RAM user can now call the Content Moderation API.

Step 3: Install and integrate the SDK

For the SDKs for the Content Moderation Enhanced Edition PLUS service, see SDKs and integration guide for the Content Moderation 2.0 PLUS service.

API

Usage

Call this operation to create a text moderation task. To learn how to build an HTTP request, see native HTTPS call. Alternatively, use a prebuilt HTTP request. For details, see getting started.

Use OpenAPI Explorer to run this API directly without having to calculate the signature. Upon a successful request, OpenAPI Explorer automatically generates SDK code examples.

  • API: TextModerationPlus

  • Regions and endpoints:

Region

Public endpoint

Private endpoint

Supported services

China (Shanghai)

green-cip.cn-shanghai.aliyuncs.com

green-cip-vpc.cn-shanghai.aliyuncs.com

ugc_moderation_byllm_pro, ugc_moderation_byllm, nickname_detection_pro, chat_detection_pro, comment_detection_pro, ad_compliance_detection_pro, text_aigc_detector

China (Beijing)

green-cip.cn-beijing.aliyuncs.com

green-cip-vpc.cn-beijing.aliyuncs.com

China (Hangzhou)

green-cip.cn-hangzhou.aliyuncs.com

green-cip-vpc.cn-hangzhou.aliyuncs.com

China (Shenzhen)

green-cip.cn-shenzhen.aliyuncs.com

green-cip-vpc.cn-shenzhen.aliyuncs.com

China (Chengdu)

green-cip.cn-chengdu.aliyuncs.com

N/A

Singapore

green-cip.ap-southeast-1.aliyuncs.com

green-cip-vpc.ap-southeast-1.aliyuncs.com

comment_multilingual_pro_cb, ugc_moderation_byllm_cb

UK (London)

green-cip.eu-west-1.aliyuncs.com

N/A

comment_multilingual_pro_cb

US (Virginia)

green-cip.us-east-1.aliyuncs.com

green-cip-vpc.us-east-1.aliyuncs.com

US (Silicon Valley)

green-cip.us-west-1.aliyuncs.com

N/A

Germany (Frankfurt)

green-cip.eu-central-1.aliyuncs.com

green-cip-vpc.eu-central-1.aliyuncs.com

Note

The UK (London) region reuses the console configuration of the Singapore region, and the US (Silicon Valley) region reuses that of the US (Virginia) region.

  • Billing: This is a paid API. You are only charged for requests that return a 200 HTTP status code; requests that result in an error are not charged. For more information about our billing method, see the billing description.

QPS limit

This API is subject to a single-user QPS limit. Exceeding this limit triggers API throttling and can disrupt your service.

  • AI-generated text detection (text_aigc_detector): 50 requests per second.

  • LLM-based UGC text moderation service (ugc_moderation_byllm_pro, ugc_moderation_byllm, and ugc_moderation_byllm_cb): 50 requests per second.

  • Other services: 100 requests per second.

Note

The UGC text moderation large model service has a lower QPS limit than other services. If your request volume is high, implement traffic control to avoid exceeding this limit.

Request parameters

Parameter

Type

Required

Example value

Description

Service

String

Yes

comment_detection_pro

  • ugc_moderation_byllm_pro: LLM-based text moderation for UGC scenarios (Pro).

  • ugc_moderation_byllm: LLM-based text moderation for UGC scenarios.

  • ugc_moderation_byllm_cb: LLM-based text moderation for UGC scenarios (Cross-border).

  • nickname_detection_pro: Nickname detection (Pro).

  • chat_detection_pro: Content moderation for private chats (Pro).

  • comment_detection_pro: Content moderation for public comments (Pro).

  • ad_compliance_detection_pro: Ad compliance detection (Pro).

  • comment_multilingual_pro_cb: Multilingual comment detection for cross-border scenarios.

  • text_aigc_detector: AI-generated text detection.

Note

For details on the multilingual detection service for international business, see Content Moderation Enhanced V2.0 Multilingual PLUS Service.

ServiceParameters

JSONString

Yes

The required parameter set for the moderation service, formatted as a JSON string. See the ServiceParameters table for parameter descriptions.

Table 1. ServiceParameters

Parameter

Type

Required

Example

Description

content

String

Yes

Text to moderate

The text to moderate.

The character limit varies by service:

  • ugc_moderation_byllm_pro, ugc_moderation_byllm, and ugc_moderation_byllm_cb: 2,000 characters.

  • text_aigc_detector: 5,000 characters. Other services: 600 characters.

dataId

String

No

text0424****

A unique data ID for the content to be moderated.

This ID can contain uppercase and lowercase letters, digits, underscores (_), hyphens (-), and periods (.), and must not exceed 64 characters.

accountId

String

No

ID0728****

A unique account ID that identifies an end user. The platform uses this ID for record-keeping. For example, in a chat between User A and User B, pass User A's ID when moderating User A's text, and pass User B's ID when moderating User B's text.

Note

The account ID can be used for context-aware moderation. To enable this feature, contact your business representative or submit a ticket.

Return parameters

Parameter

Type

Example value

Description

Code

Integer

200

The status code. See Code Description.

Data

JSONObject

{"Result":[...]}

The moderation result data. For details, see Data.

Message

String

OK

The response message.

RequestId

String

AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****

The request ID.

Table 2. Data

Parameter

Type

Value

Description

Result

JSONArray

The detection results, including risk tags and confidence scores. For more information, see Result.

dataId

String

text0424****

The data ID of the detection object.

Note

If you passed dataId in the request, this field returns the same value.

accountId

String

ID0728****

The account ID.

Note

If you passed accountId in the request, this field returns the same value.

riskLevel

String

high

The risk level, determined by the risk score thresholds you configure. Possible values include:

  • high: High risk (If the content hits a custom dictionary, the risk level defaults to high).

  • medium: Medium risk.

  • low: Low risk.

  • none: No risk detected.

Note

Handle high-risk content immediately. Send medium-risk content for manual review. Handle low-risk content only when you need high recall. Otherwise, treat it as content with no risk. You can configure score thresholds in the Content Security console.

manualTaskId

String

m_tx_042407280307***

The manual review task ID. Use it to query the manual review result. This field is returned only if you enable human-machine review and the content meets the criteria for manual review. For configuration details, see Human-Machine Review Service Configuration.

Ext

Object

Supplemental information for text moderation. For more information, see Ext.

Table 3. Result

Parameter

Type

Value

Description

Label

String

political_xxx

The label returned by the text moderation service. The service may return multiple labels and confidence scores. For a list of supported labels, see the Risk Labels section.

Description

String

Suspected pornographic content

A human-readable description of the Label.

Important

This field is for informational purposes only and is subject to change. For automated processing, use the Label field instead of the Description field.

Confidence

Float

81.22

The confidence score for the detected label. The value ranges from 0 to 100 and is accurate to two decimal places. Some labels may not have a confidence score.

Riskwords

String

AA,BB,CC

The detected risk words. Multiple words are separated by commas. This field is not returned for all labels.

CustomizedHit

JSONArray

[{"LibName":"...","Keywords":"..."}]

If content matches a term in a custom library, the Label is customized. This field returns the name of the custom library and the matched keywords. For more details, see CustomizedHit.

RiskPositions

JSONArray

The positions of the detected risk words in the text. For more information, see RiskPositions.

Table 4. CustomizedHit

Parameter

Type

Example

Description

LibName

String

Custom Library 1

The name of the custom library.

Keywords

String

Custom Keyword 1,Custom Keyword 2

Custom keywords, separated by commas.

Table 5. Extension fields

Parameter

Type

Example

Description

LlmContent

Object

The detection results from the large language model. For more information, see LlmContent.

Table 6. LlmContent

Parameter

Type

Value

Description

OutputText

String

Suspected abusive or insulting content

Raw output from the text moderation large language model.

Table 7. RiskPositions

Parameter

Type

Example

Description

RiskWord

String

AA

The detected risk word.

StartPos

Integer

10

The start position of the RiskWord.

EndPos

Integer

12

The end position of the RiskWord.

Example

Example request

{
    "Service": "comment_detection_pro",
    "ServiceParameters": {
        "content": "testing content",
        "dataId": "text0424****"
    }
}

Response examples:

  • System policy match:

{
    "Code": 200,
    "Data": {
        "Result": [
            {
                "Label": "political_entity",
                "Description": "Suspected political entity",
                "Confidence": 100.0,
                "RiskWords": "wordA,wordB",
                "RiskPositions": [
                    {
                        "EndPos": 14,
                        "RiskWord": "wordA",
                        "StartPos": 12
                    }
                ]
            },
            {
                "Label": "political_figure",
                "Description": "Suspected political figure",
                "Confidence": 100.0,
                "RiskWords": "wordB,wordC",
                "RiskPositions": [
                    {
                        "EndPos": 20,
                        "RiskWord": "wordB",
                        "StartPos": 18
                    }
                ]
            }
        ],
        "RiskLevel": "high",
        "DataId": "text0424****"
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}
  • Custom library match:

{
    "Code": 200,
    "Data": {
        "Result": [
            {
                "Description": "Custom library match",
                "CustomizedHit": [
                    {
                        "LibName": "Custom library name 1",
                        "KeyWords": "custom keyword"
                    }
                ],
                "Confidence": 100,
                "Label": "customized"
            }
        ],
        "RiskLevel": "high",
        "DataId": "text0424****"
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}
  • Response from large model:

{
    "Code": 200,
    "Data": {
        "Ext": {
            "LlmContent": {
                "OutputText": "Suspected abusive or insulting content"
            }
        },
        "Result": [
            {
                "Confidence": 100.0,
                "CustomizedHit": null,
                "Description": "Suspected abusive or insulting content",
                "Label": "inappropriate_profanity",
                "RiskWords": "violatingWord1,violatingWord2"
            }
        ],
        "RiskLevel": "high"
    },
    "Message": "OK",
    "RequestId": "12345-ABCDE-XXXXX-66666"
}

Code

Code

Status code

Description

200

OK

The request succeeded.

400

BAD_REQUEST

The request is invalid. This may be caused by incorrect request parameters. Please check them and try again.

403

PERMISSION_DENY

This error occurs if your account lacks the necessary permissions, has an overdue payment, is not enabled for the service, or is suspended.

500

GENERAL_ERROR

This may be a temporary server error. Retry the request. If the error persists, contact Online Support.

581

TIMEOUT

The request timed out. Retry the request. If the error persists, contact Online Support.

588

EXCEED_QUOTA

The request rate exceeds the quota.