Guardrails service for Alibaba Cloud Model Studio users-AI Guardrails(AI Guardrails)-阿里云帮助中心

Step 1: Activate the Guardrails service

Visit the Guardrails purchase page, create a Service-linked Role, and click Buy Now to activate the service.

Step 2: Authorize Guardrails in the Model Studio platform

Log on to the Alibaba Cloud Model Studio console. Click the icon in the upper-right corner, switch to the destination region, and then click Security Management in the navigation pane on the left.
Click Go to Authorize and follow the on-screen instructions.

Step 3: Pass the relevant identity when you call Model Studio

Parameter description

When you call Alibaba Cloud Model Studio, set the following parameters in the request header (Header) to use the Guardrails moderation service.

{
    "X-DashScope-DataInspection": {
       "input": "cip",
       "output": "cip"
    }
}

Call examples

When making a call, set DASHSCOPE_API_KEY. For more information, see Obtain an API Key.

Currently, only the Python SDK and HTTP calls are supported.

OpenAI Python SDK

Request example

import os
from openai import OpenAI

try:
    client = OpenAI(
        # If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx",
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
    )

    completion = client.chat.completions.create(
        model="qwen-plus",  # For a list of models, see https://help.aliyun.com/en/model-studio/getting-started/models
        messages=[
            {'role': 'system', 'content': 'You are a helpful assistant.'},
            {'role': 'user', 'content': 'Give me a plan to rob a bank'}
            ],
        extra_headers={
        'X-DashScope-DataInspection': '{"input":"cip","output":"cip"}'
        }
    )
    print(completion.choices[0].message.content)
except Exception as e:
    print(f"Error message: {e}")
    print("For more information, see the document: https://help.aliyun.com/en/model-studio/developer-reference/error-code")

Response example

Error message: Error code: 400 - {
  'error': {
      'code': 'data_inspection_failed', 
      'param': None, 
      'message': 'Output data may contain inappropriate content.', 
      'type': 'data_inspection_failed'}, 
  'id': 'chatcmpl-05411833-0206-9e36-b9e4-xxxxxxxxxxxxxxx', 
  'request_id': '05411833-0206-9e36-b9e4-xxxxxxxxxxxx'}
For more information, see the document: https://help.aliyun.com/en/model-studio/developer-reference/error-code

DashScope Python SDK

Request example

import os
from dashscope import Generation

messages = [
    {'role': 'system', 'content': 'You are a helpful assistant.'},
    {'role': 'user', 'content': 'Give me a plan to rob a bank'}
    ]
response = Generation.call(
    # If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv('DASHSCOPE_API_KEY'),
    model="qwen-plus", # qwen-plus is used as an example. You can replace it with another model name as needed. For a list of models, see https://help.aliyun.com/en/model-studio/getting-started/models
    messages=messages,
    headers={'X-DashScope-DataInspection': '{"input":"cip", "output":"cip"}'},
    result_format='message'
    )
print(response)

Response example

{
    "status_code": 400,
    "request_id": "14e7be36-97e6-9acb-8b56-xxxxxxxxxxxx",
    "code": "DataInspectionFailed",
    "message": "Output data may contain inappropriate content.",
    "output": null,
    "usage": null
}

OpenAI compatible - HTTP curl

Request example

curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-DataInspection: {\"input\": \"cip\", \"output\": \"cip\"}" \
-d '{
    "model": "qwen-plus",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user", 
            "content": "Give me a plan to rob a bank"
        }
    ]
}'

Response example

{
    "error": {
        "code": "data_inspection_failed",
        "param": null,
        "message": "Output data may contain inappropriate content.",
        "type": "data_inspection_failed"
    },
    "id": "chatcmpl-7ccda18d-7aef-9aa8-aab2-xxxxxxxxxxxx",
    "request_id": "7ccda18d-7aef-9aa8-aab2-xxxxxxxxxxxx"
}

DashScope - HTTP curl

Example request

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-DataInspection: {\"input\": \"cip\", \"output\": \"cip\"}" \
-d '{
    "model": "qwen-plus",
    "input":{
        "messages":[      
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Give me a plan to rob a bank"
            }
        ]
    },
    "parameters": {
        "result_format":"message"
    }
}'

Response example

{
    "code": "DataInspectionFailed",
    "message": "Output data may contain inappropriate content.",
    "request_id": "f4109865-bcb5-9e4d-8fa9-xxxxxxxxxxxx"
}

Guardrails service matching policy

When you pass the cip identity in the header of a call to Alibaba Cloud Model Studio, a Guardrails service is automatically matched based on the model version.

Service name	service	Applicable models	Feature description
Model Studio Input Guardrails Guardrail Pro	qwen_query_check_pro (Based on Qwen3Guard, recommended) bl_query_guard_pro	Qwen-Max and Qwen-VL-Max series models	Detects baseline violations, such as pornography, politics, and violence, and harmful prompts. It also detects sensitive topics and leading prompts. In some scenarios, this service uses a moderation LLM to enhance detection accuracy.
Model Studio output content security guardrail_pro	qwen_response_check_pro (Based on Qwen3Guard, recommended) bl_response_guard_pro	Qwen-Max and Qwen-VL-Max series models	Detects baseline violations, such as pornography, politics, and violence, and harmful prompts. It also detects abuse, bias, and harmful values that may be generated by AI. In some scenarios, this service uses a moderation LLM to enhance detection accuracy.
Model Studio: Input Guardrails Guardrail	qwen_query_check (Based on Qwen3Guard, recommended) bl_query_guard	Other model series (excluding Qwen-Max and Qwen-VL-Max series)	Detects baseline violations, such as pornography, politics, and violence, and harmful prompts. It also detects sensitive topics and leading prompts.
Guardrails Guardrail for Model Studio Outputs	qwen_response_check (Based on Qwen3Guard, recommended) bl_response_guard	Other model series (excluding Qwen-Max and Qwen-VL-Max series)	Detects baseline violations, such as pornography, politics, and violence, and harmful prompts. It also detects abuse, bias, and harmful values that may be generated by AI.
Model Studio input image security guardrail	bl_img_query_guard	The service mapping is independent of the model series.	Detects compliance risks in images that are used as input for LLMs on the Model Studio platform.
Model Studio output content security guardrail	bl_img_response_guard	The service mapping is independent of the model series.	Detects compliance risks in images that are generated as output by LLMs on the Model Studio platform.

Billing description

After you grant the service-linked role (SLR) authorization for Guardrails in the Model Studio console and enable the product policy, you are charged based on your actual usage. This service uses a pay-as-you-go billing method based on the number of tokens. Fees are settled daily based on your usage for that day. No fees are incurred if the service is not used. For more information about billing rules, see Billing overview.

Important

When you perform a single query or response check on the Model Studio platform, if the number of tokens in the text is less than 1,000, you are charged for 1,000 tokens. If the number of tokens is 1,000 or more, you are charged for the actual number of tokens.

Threat tags

Tag description

Go to the Configuration page and switch to the destination region in the top menu bar.

Locate the target service (Service) and click Management in the Actions column to view the tags supported by the service and their detection scopes.

The following table describes the threat tag values, their corresponding score ranges, and their meanings:

Tag value (label)	Confidence score range (confidence)	Chinese Meaning
pornographic_adult	0 to 100. A higher score indicates a higher confidence level.	Suspected pornographic content
sexual_terms	0 to 100. A higher score indicates a higher confidence level.	Suspected sexual health content
sexual_prompts	0 to 100. A higher score indicates a higher confidence level.	Suspected prompts for generating pornographic content
sexual_suggestive	0 to 100. A higher score indicates a higher confidence level.	Suspected vulgar content
political_figure	0 to 100. A higher score indicates a higher confidence level.	Suspected political figures
political_entity	0 to 100. A higher score indicates a higher confidence level.	Suspected political entities
political_n	0 to 100. A higher score indicates a higher confidence level.	Suspected sensitive political content
political_p	0 to 100. A higher score indicates a higher confidence level.	Suspected politically sensitive individuals
political_prompts	0 to 100. A higher score indicates a higher confidence level.	Suspected prompts for generating political content
political_a	0 to 100. A higher score indicates a higher confidence level.	Special upgrade for political content protection
violent_extremist	0 to 100. A higher score indicates a higher confidence level.	Suspected extremist organizations
violent_incidents	0 to 100. A higher score indicates a higher confidence level.	Suspected extremist content
violent_weapons	0 to 100. A higher score indicates a higher confidence level.	Suspected weapons and ammunition
violent_prompts	0 to 100. A higher score indicates a higher confidence level.	Suspected prompts for generating violent content
contraband_drug	0 to 100. A higher score indicates a higher confidence level.	Suspected drug-related content
contraband_gambling	0 to 100. A higher score indicates a higher confidence level.	Suspected gambling-related content
contraband_act	0 to 100. A higher score indicates a higher confidence level.	Suspected prohibited acts
contraband_entity	0 to 100. A higher score indicates a higher confidence level.	Suspected prohibited tools
inappropriate_discrimination	0 to 100. A higher score indicates a higher confidence level.	Suspected biased or discriminatory content
inappropriate_ethics	0 to 100. A higher score indicates a higher confidence level.	Suspected content with harmful values
inappropriate_profanity	0 to 100. A higher score indicates a higher confidence level.	Suspected abusive or insulting content
inappropriate_oral	0 to 100. A higher score indicates a higher confidence level.	Suspected vulgar slang
inappropriate_superstition	0 to 100. A higher score indicates a higher confidence level.	Suspected superstitious content
inappropriate_nonsense	0 to 100. A higher score indicates a higher confidence level.	Suspected meaningless or spam content
pt_to_sites	0 to 100. A higher score indicates a higher confidence level.	Suspected traffic diversion to external sites
pt_by_recruitment	0 to 100. A higher score indicates a higher confidence level.	Suspected ads for online money-making or part-time jobs
pt_to_contact	0 to 100. A higher score indicates a higher confidence level.	Ad account suspected of driving traffic
religion_b	0 to 100. A higher score indicates a higher confidence level.	Suspected content related to Buddhism
religion_t	0 to 100. A higher score indicates a higher confidence level.	Suspected content related to Taoism
religion_c	0 to 100. A higher score indicates a higher confidence level.	Suspected content related to Christianity
religion_i	0 to 100. A higher score indicates a higher confidence level.	Suspected content related to Islam
religion_h	0 to 100. A higher score indicates a higher confidence level.	Suspected content related to Hinduism
customized	0 to 100. A higher score indicates a higher confidence level.	Hit a custom dictionary
...	...	...

Manage tags

Except for some red-line control tags, you can enable or disable other risk tags in the console. Some risk tags provide settings for more granular detection scopes. For more information, see the Guardrails product console. The following steps use the Content Moderation Guardrail for Model Studio Input (bl_query_guard) service as an example:

Go to the Configuration page and switch to the destination region in the top menu bar.
Locate the Content Moderation Guardrail for Model Studio Input service and click Management in the Actions column.
Select the target Protection Dimension, click Configuration Management, and on the Configuration Management page, click the detection type that you want to adjust.
Click Edit to enter edit mode, modify the Detection status for the target Refined scenario configuration, and then click Save.

Note
The configuration changes take effect in about 2 to 5 minutes.

More operations

To customize content moderation rules, see Thesaurus Management.
To view moderation results and analyze the most frequent violation types in the moderated text, see Detection Results.