Text moderation for streaming output from a language model

更新时间:
复制 MD 格式

Content ModerationEnhanced Edition enhances its AIGC-generated text detection capability to automatically concatenate and moderate text from the streaming output of models. This topic describes the streaming output text moderation feature of the Content Moderation service.

Use cases

Large language model (LLM) applications use streaming output to display intermediate results on the user interface immediately. For example, with Qianwen, developers can control the streaming output mode by using parameters. For more information, see Use Tongyi Qianwen through an API.

With a model's streaming output, developers typically assemble text fragments before submitting the full text to Content Moderation. However, this method has significant limitations:

  • Assembling the full output can result in text that exceeds the input size limit for Content Moderation.

  • Streaming output displays text fragments as they are generated, which can expose users to harmful content long before the full response is reviewed.

To address these issues, Alibaba Cloud Content Moderation Enhanced Edition automatically assembles and moderates streaming text from models, avoiding the input size limits for long text and significantly reducing user exposure to potentially harmful content.

Features

Scenario

Supported service

Description

Generating text from a large language model using stream output with incremental output enabled. In this mode, new content fragments do not include previously sent text.

Service name: Large Language Model-Generated Text Moderation

Service: llm_response_moderation

  • Specify the sessionId input parameter to group request content into a single data stream. The text moderation engine then automatically assembles these fragments for moderation.

  • The Large Language Model-Generated Text Moderation service analyzes the current text fragment and the assembled text of previous fragments, up to a 5,000-character limit, and returns any detected labels. For more information, see Text Moderation PLUS for large language models.

  • This scenario requires higher QPS for the text moderation service and may incur additional costs. Evaluate the potential cost based on your business needs. For more information, see the Billing section.

Generating AIGC text using stream output with incremental output enabled. In this mode, new content fragments do not include previously sent text.

Service name: AIGC-like Text Moderation. Service: ai_art_detection

  • Specify the sessionId input parameter to group request content into a single data stream. The text moderation engine then automatically assembles these fragments for moderation.

  • The AIGC-like Text Moderation service analyzes the current text fragment and the assembled text of previous fragments, up to a 2,000-character limit, and returns any detected labels. For more information, see Use Enhanced Content Moderation Edition to identify text violations.

  • This scenario requires higher QPS for the text moderation service and may incur additional costs. Evaluate the potential cost based on your business needs. For more information, see the Billing section.

Billing

The default billing method for Alibaba Cloud text moderation Enhanced Edition is pay-as-you-go, which is based on the number of text moderation calls. When using streaming output, the number of moderation calls increases significantly.

Therefore, we recommend the prepaid QPS billing method, which starts at a minimum of 500 QPS. Contact sales to enable and purchase this plan.

Comparison of billing methods

This example uses Qwen to compare model service and text moderation costs for streaming output with different billing methods.

Model

Usage description

Estimated monthly cost

qwen-turbo

  • 10 QPS for model service calls, generating an average of 1K tokens per call.

  • Equivalent to generating 864,000K tokens per day.

  • Official model pricing is CNY 0.008 per 1K tokens.

Approximately CNY 207,000

Moderation billing method

Usage description

Estimated monthly cost

Pay-as-you-go

  • Moderating every 200 tokens results in 4.32 million moderation calls per day.

  • Official text moderation pricing is CNY 7.5 per 10,000 calls.

Approximately CNY 97,200

Pay-as-you-go

  • Moderating every 500 tokens results in 1.728 million moderation calls per day.

  • Official text moderation pricing is CNY 7.5 per 10,000 calls.

Approximately CNY 38,800

Prepaid by QPS

  • Moderating every 3 to 5 tokens requires 1,000 QPS.

  • Official QPS pricing for text moderation is CNY 3,333 per 100 QPS per month.

Approximately CNY 33,300

Prepaid by QPS

  • Moderating every 5 to 10 tokens requires 500 QPS.

  • Official QPS pricing for text moderation is CNY 3,333 per 100 QPS per month.

Approximately CNY 16,700

Pay-as-you-go

After you enable text moderation Enhanced Edition, the default billing method is pay-as-you-go. You are billed daily based on your actual usage. No fees are incurred if you do not use the service.

Moderation type

Service

Unit price

Text Moderation - Advanced (text_advanced)

Large language model-generated text moderation: llm_response_moderation

CNY 15 per 10,000 calls

Text Moderation - Standard (text_standard)

AIGC text detection:

ai_art_detection

CNY 7.5 per 10,000 calls

Resource plan deduction

If you have a large or consistent volume of content to moderate, we recommend purchasing a resource plan. The larger the resource plan, the greater the discount. You can purchase and stack multiple resource plans. For more information, see Purchase a resource plan for text moderation Enhanced Edition.

This resource plan covers usage of text moderation Enhanced Edition and cannot be shared with Content Moderation traffic plans. The specific deduction factors are as follows:

Moderation type

Service

Deduction factor

Text Moderation - Advanced (text_advanced)

Large language model-generated text moderation: llm_response_moderation

The deduction factor is 2. This means 2 calls are deducted from your resource plan for each successful API call.

For example, if you purchase a resource plan with a quota of 10 calls and make one successful API call, 2 calls are deducted, leaving a remaining quota of 8 calls.

Text Moderation - Standard (text_standard)

AIGC text detection:

ai_art_detection

The deduction factor is 1. This means 1 call is deducted from your resource plan for each successful API call.

For example, if you purchase a resource plan with a quota of 10 calls and make one successful API call, 1 call is deducted, leaving a remaining quota of 9 calls.

Access

Step 1: Activate the service

Visit Activate Service to activate Text Moderation Enhanced Edition.

After you activate Text Moderation Enhanced Edition, the default billing method is Pay-As-You-Go. You are billed daily based on your usage. You are not charged if you do not use the service. After you start making API calls, the system automatically generates bills based on your usage. For more information, see Billing Details.You can also purchase a resource package. Resource packages offer tiered discounts compared to the Pay-As-You-Go method and are ideal for users with high, predictable usage volumes.

Step 2: Grant permissions to a RAM user

Before you use an SDK or call an API, you must grant the required permissions to a RAM user. You can create an AccessKey for your Alibaba Cloud account or a RAM user. The AccessKey verifies your identity when you call Alibaba Cloud APIs. For more information, see Obtain an AccessKey.

Procedure

  1. Log on to the RAM console with your Alibaba Cloud account.

  2. Create a RAM user.

    For details, see Create a RAM user.

  3. Grant the AliyunYundunGreenWebFullAccess system policy to the RAM user.

    After completing these steps, you can call the Content Moderation API as the RAM user.

Step 3: Install and integrate an SDK

You can use this service in the following regions. For more information, see Text Moderation Enhanced Edition 2.0 PLUS service SDKs and access guide:

Region

Public endpoint

Internal endpoint

China (Shanghai)

green-cip.cn-shanghai.aliyuncs.com

green-cip-vpc.cn-shanghai.aliyuncs.com

China (Beijing)

green-cip.cn-beijing.aliyuncs.com

green-cip-vpc.cn-beijing.aliyuncs.com

China (Hangzhou)

green-cip.cn-hangzhou.aliyuncs.com

green-cip-vpc.cn-hangzhou.aliyuncs.com

China (Shenzhen)

green-cip.cn-shenzhen.aliyuncs.com

green-cip-vpc.cn-shenzhen.aliyuncs.com

China (Chengdu)

green-cip.cn-chengdu.aliyuncs.com

Not available

Two moderation services are available: detecting text generated by large language models (see the Text Moderation-Plus Service API) and AIGC text (see the Text Moderation-Standard Service API).

Text moderation-plus API

Usage notes

Service endpoint: https://green-cip.{region}.aliyuncs.com.

You can call this endpoint to create text content moderation tasks. For information about how to construct an HTTP request, see Request structure. You can also use pre-built HTTP requests. For more information, see Text Moderation-Plus service 2.0 SDK and Access Guide.

  • Billing:

    This is a billable endpoint. You are billed only for requests that return an HTTP status code of 200. You are not billed for requests that return other error codes. For more information about billing, see Billing.

QPS limit

By default, this endpoint is limited to 100 queries per second (QPS) per user. API calls that exceed this limit are throttled, which may affect your business. Plan your request rate accordingly. If you have a subscription-based QPS plan, you can increase your capacity to handle peak traffic. Contact Sales to enable this option.

Request parameters

Parameter

Type

Required

Example

Description

Service

String

Yes

llm_response_moderation

The type of moderation service. Set the value to:

  • llm_response_moderation: Moderation for text generated by a large language model.

ServiceParameters

JSONString

Yes

The required service parameters, provided as a JSON-formatted string. For a description of each parameter, see the ServiceParameters table.

Table 1. ServiceParameters

Parameter

Type

Required

Example

Description

content

String

Yes

Text to be moderated.

The text content to moderate. The content cannot exceed 600 characters in length.

sessionId

String

No

10123****

The session ID. This ID indicates that the content of the current request belongs to the same stream. The text moderation engine automatically concatenates content fragments for moderation. The total length of the concatenated content cannot exceed 2,000 characters.

Note

Do not specify the accountId parameter if you specify sessionId.

Response parameters

Parameter

Type

Example

Description

Code

Integer

200

The status code. For more information, see Status codes.

Data

JSONObject

{"Result":[...]}

The moderation results. For more information, see Data.

Message

String

OK

The message returned for the request.

RequestId

String

AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****

The unique ID of the request.

Table 2. Data

Parameter

Type

Example

Description

Result

JSONArray

The moderation results, including detected risk labels and confidence scores. For more information, see Result.

Advice

JSONArray

[{"Answer":"This is a standard answer"}]

This parameter is returned when you call the llm_query_moderation service. If an input prompt matches an entry in a specified knowledge base, a standard answer is returned. For more information, see Advice.

Table 3. Result

Parameter

Type

Example

Description

Label

String

political_xxx

The risk label detected in the text. For a list of supported labels, see Risk labels.

Confidence

Float

81.22

The confidence score. Valid values: 0 to 100. The value is accurate to two decimal places. This parameter may not be returned for all risk labels.

Riskwords

String

AA,BB,CC

The detected sensitive words, separated by commas. This parameter may not be returned for some labels.

CustomizedHit

JSONArray

[{"LibName":"...","Keywords":"..."}]

Details about a match in a custom library. This parameter is returned when the Label parameter is set to customized. For more information, see CustomizedHit.

Table 4. CustomizedHit

Parameter

Type

Example

Description

LibName

String

Custom library 1

The name of the custom library.

Keywords

String

Custom word 1,Custom word 2

The matched custom words. Multiple words are separated by commas.

Table 5. Advice

Parameter

Type

Example

Description

Answer

String

This is a standard answer.

When you call the moderation service, an alternative answer can be returned in the following scenarios:

  • Match with a custom reject-and-reply library: If a risk label is triggered and a match is found in your custom library, the system returns a randomly selected answer from that library.

  • Match with the system reject-and-reply library: If a risk label is triggered and a match is found in the system library, the system returns a randomly selected default answer.

HitLabel

String

political_xxx

The label with the highest risk level among the labels returned after analysis. For a list of supported labels, see Risk labels.

HitLibName

String

Custom reject-and-reply library 001

The name of the matched custom reject-and-reply library.

Examples

Request example

{
    "Service": "llm_response_moderation",
    "ServiceParameters": {
        "content": "Streaming output content",
        "sessionId": "10123****"
    }
}

Response example

{
    "Code": 200,
    "Data": {
       "Advice": [
            {
                "HitLabel": "political_entity",
                "Answer": "This is an example of a standard returned answer.",
                "HitLibName": "political_entity-001"
            }
        ],
       "Result": [
            {
                "Label": "political_entity",
                "Confidence": 100.0,
                "RiskWords": "Word A, Word B, Word C"
            },
            {
                "Label": "political_figure",
                "Confidence": 100.0,
                "RiskWords": "Word A, Word B, Word C"
            }
        ]
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Status codes

Code

Message

Description

200

OK

The request was successful.

400

BAD_REQUEST

The request is invalid. This error often indicates that the request parameters are incorrect.

408

PERMISSION_DENY

This error may occur because your account is not authorized, has an overdue payment, has not been activated for the service, or has been suspended.

500

GENERAL_ERROR

A temporary server-side error occurred. We recommend that you retry the request. If the error persists, contact us through online support.

581

TIMEOUT

The request timed out. We recommend that you retry the request. If the error persists, contact us through online support.

588

EXCEED_QUOTA

The request rate exceeds your quota.

Text Moderation-Enhanced Edition API

Usage

Service endpoint: https://green-cip.{region}.aliyuncs.com.

Use this operation to create a text content moderation task. For information on how to construct an HTTP request, see request structure. You can also use pre-built HTTP requests. For more information, see Text Moderation-Enhanced Edition 2.0 PLUS SDK and Access Guide.

  • Billing

    This is a billable operation. You are charged only for requests that return a status code of 200. Requests that result in other status codes are not charged. For more information, see billing details.

QPS limit

The default QPS limit for this operation is 100 queries per second (QPS) per user. If you exceed this limit, your API calls are throttled, which may affect your business. Plan your request rate accordingly. To handle business peaks, you can increase your QPS limit with a prepaid machine moderation plan. Contact Sales to enable this option.

Request parameters

Parameter

Type

Required

Example

Description

Service

String

Yes

ai_art_detection

The type of the moderation service. Valid value:

  • ai_art_detection: AIGC-generated text detection

ServiceParameters

JSONString

Yes

Parameters for the moderation service, provided as a JSON string. See the following table for field details.

Table 1. ServiceParameters

Parameter

Type

Required

Example

Description

content

String

Yes

The content to detect.

The text content to moderate. The maximum length is 600 characters.

sessionId

String

No

10123****

The session ID. For streaming content, use this parameter to link multiple requests in the same session. The moderation engine concatenates content fragments from the same session for moderation. The total concatenated content cannot exceed 2,000 characters.

Note

You cannot specify the sessionId and accountId parameters in the same request.

Response parameters

Parameter

Type

Example

Description

Code

Integer

200

The status code. For more information, see Status codes.

Data

JSONObject

{"labels": "sexual_content","reason": "{\"riskTips\":\"色情_低俗词\",\"riskWords\":\"色情服务\"}"}

The moderation results. For details, see the Data object.

Message

String

OK

The response message.

RequestId

String

AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****

The request ID.

Table 2. Data

Parameter

Type

Example

Description

Labels

String

sexual_content

The risk labels for the detected content. If multiple risks are detected, the labels are returned as a comma-separated string. Valid labels include:

  • ad: Ad

  • political_content: Political content

  • profanity: Profanity

  • contraband: Contraband

  • sexual_content: Sexual content

  • violence: Violence

  • nonsense: Nonsense

  • negative_content: Negative content

  • religion: Religious content

  • cyberbullying: Cyberbullying

  • ad_compliance: Ad compliance

  • C_customized: User-defined library hit

Note

New labels are added periodically. We recommend that you design your business logic to ignore unknown labels.

Reason

String

{\"riskTips\":\"色情_低俗词\",\"riskWords\":\"色情服务\"}

A JSON-formatted string that provides details about why a label was assigned. It includes the following fields:

  • riskTips: The sub-label.

  • riskWords: The matched risk fragment.

  • customizedWords: The matched user-defined keyword.

  • customizedLibs: The matched user-defined library name.

Examples

Request example

{
    "Service": "ai_art_detection",
    "ServiceParameters": {
        "content": "Streaming output content",
        "sessionId": "10123****"
    }
}

Successful response example

{
    "Code": 200,
    "Data": {
        "labels": "sexual_content",
        "reason": "{\"riskTips\":\"色情_低俗词\",\"riskWords\":\"色情服务\"}"
    },
    "Message": "OK",
    "RequestId": "AAAAAA-BBBB-CCCCC-DDDD-EEEEEEEE****"
}

Status codes

Code

Status

Description

200

OK

The request was successful.

400

BAD_REQUEST

Invalid request. Check the request parameters for errors.

407

NOT_SUPPORT

The language is not recognized or supported.

408

PERMISSION_DENY

This error may occur if your account is not authorized, has an overdue payment, has not been activated for the service, or has been suspended.

500

GENERAL_ERROR

A temporary server-side error occurred. Retry the request. If the error persists, contact Online Support.

581

TIMEOUT

The request timed out. Retry the request. If the error persists, contact Online Support.

588

EXCEED_QUOTA

The request rate exceeded the QPS limit.