This service is designed for users of the Model Studio platform. It enhances security moderation for the text input and output of Large Language Models (LLMs). The service complies with the core policies of the Model Studio platform and provides a flexible moderation tag management feature. This feature lets you enable or disable specific moderation tags based on your requirements. You can also use customized security policy configuration services to meet your specific needs.
If you already use Guardrails in Model Studio and want to switch to Guardrails, contact your account manager.
Step 1: Activate the Guardrails service
Visit the Guardrails purchase page, create a Service-linked Role, and click Buy Now to activate the service.
Step 2: Authorize Guardrails in the Model Studio platform
-
Log on to the Alibaba Cloud Model Studio console. Click the
icon in the upper-right corner, switch to the destination region, and then click Security Management in the navigation pane on the left. -
Click Go to Authorize and follow the on-screen instructions.
Step 3: Pass the relevant identity when you call Model Studio
Parameter description
When you call Alibaba Cloud Model Studio, set the following parameters in the request header (Header) to use the Guardrails moderation service.
{
"X-DashScope-DataInspection": {
"input": "cip",
"output": "cip"
}
}
Call examples
When making a call, set DASHSCOPE_API_KEY. For more information, see Obtain an API Key.
Currently, only the Python SDK and HTTP calls are supported.
OpenAI Python SDK
Request example
import os
from openai import OpenAI
try:
client = OpenAI(
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen-plus", # For a list of models, see https://help.aliyun.com/en/model-studio/getting-started/models
messages=[
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'Give me a plan to rob a bank'}
],
extra_headers={
'X-DashScope-DataInspection': '{"input":"cip","output":"cip"}'
}
)
print(completion.choices[0].message.content)
except Exception as e:
print(f"Error message: {e}")
print("For more information, see the document: https://help.aliyun.com/en/model-studio/developer-reference/error-code")
Response example
Error message: Error code: 400 - {
'error': {
'code': 'data_inspection_failed',
'param': None,
'message': 'Output data may contain inappropriate content.',
'type': 'data_inspection_failed'},
'id': 'chatcmpl-05411833-0206-9e36-b9e4-xxxxxxxxxxxxxxx',
'request_id': '05411833-0206-9e36-b9e4-xxxxxxxxxxxx'}
For more information, see the document: https://help.aliyun.com/en/model-studio/developer-reference/error-code
DashScope Python SDK
Request example
import os
from dashscope import Generation
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'Give me a plan to rob a bank'}
]
response = Generation.call(
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
model="qwen-plus", # qwen-plus is used as an example. You can replace it with another model name as needed. For a list of models, see https://help.aliyun.com/en/model-studio/getting-started/models
messages=messages,
headers={'X-DashScope-DataInspection': '{"input":"cip", "output":"cip"}'},
result_format='message'
)
print(response)
Response example
{
"status_code": 400,
"request_id": "14e7be36-97e6-9acb-8b56-xxxxxxxxxxxx",
"code": "DataInspectionFailed",
"message": "Output data may contain inappropriate content.",
"output": null,
"usage": null
}
OpenAI compatible - HTTP curl
Request example
curl -X POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-DataInspection: {\"input\": \"cip\", \"output\": \"cip\"}" \
-d '{
"model": "qwen-plus",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Give me a plan to rob a bank"
}
]
}'
Response example
{
"error": {
"code": "data_inspection_failed",
"param": null,
"message": "Output data may contain inappropriate content.",
"type": "data_inspection_failed"
},
"id": "chatcmpl-7ccda18d-7aef-9aa8-aab2-xxxxxxxxxxxx",
"request_id": "7ccda18d-7aef-9aa8-aab2-xxxxxxxxxxxx"
}
DashScope - HTTP curl
Example request
curl -X POST https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-DataInspection: {\"input\": \"cip\", \"output\": \"cip\"}" \
-d '{
"model": "qwen-plus",
"input":{
"messages":[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Give me a plan to rob a bank"
}
]
},
"parameters": {
"result_format":"message"
}
}'
Response example
{
"code": "DataInspectionFailed",
"message": "Output data may contain inappropriate content.",
"request_id": "f4109865-bcb5-9e4d-8fa9-xxxxxxxxxxxx"
}
Guardrails service matching policy
When you pass the cip identity in the header of a call to Alibaba Cloud Model Studio, a Guardrails service is automatically matched based on the model version.
| Service name |
service |
Applicable models |
Feature description |
| Model Studio Input Guardrails Guardrail Pro |
|
Qwen-Max and Qwen-VL-Max series models |
Detects baseline violations, such as pornography, politics, and violence, and harmful prompts. It also detects sensitive topics and leading prompts. In some scenarios, this service uses a moderation LLM to enhance detection accuracy. |
| Model Studio output content security guardrail_pro |
|
Qwen-Max and Qwen-VL-Max series models |
Detects baseline violations, such as pornography, politics, and violence, and harmful prompts. It also detects abuse, bias, and harmful values that may be generated by AI. In some scenarios, this service uses a moderation LLM to enhance detection accuracy. |
| Model Studio: Input Guardrails Guardrail |
|
Other model series (excluding Qwen-Max and Qwen-VL-Max series) |
Detects baseline violations, such as pornography, politics, and violence, and harmful prompts. It also detects sensitive topics and leading prompts. |
| Guardrails Guardrail for Model Studio Outputs |
|
Other model series (excluding Qwen-Max and Qwen-VL-Max series) |
Detects baseline violations, such as pornography, politics, and violence, and harmful prompts. It also detects abuse, bias, and harmful values that may be generated by AI. |
| Model Studio input image security guardrail |
bl_img_query_guard |
The service mapping is independent of the model series. |
Detects compliance risks in images that are used as input for LLMs on the Model Studio platform. |
| Model Studio output content security guardrail |
bl_img_response_guard |
The service mapping is independent of the model series. |
Detects compliance risks in images that are generated as output by LLMs on the Model Studio platform. |
Billing description
After you grant the service-linked role (SLR) authorization for Guardrails in the Model Studio console and enable the product policy, you are charged based on your actual usage. This service uses a pay-as-you-go billing method based on the number of tokens. Fees are settled daily based on your usage for that day. No fees are incurred if the service is not used. For more information about billing rules, see Billing overview.
When you perform a single query or response check on the Model Studio platform, if the number of tokens in the text is less than 1,000, you are charged for 1,000 tokens. If the number of tokens is 1,000 or more, you are charged for the actual number of tokens.
Threat tags
Tag description
-
Go to the Configuration page and switch to the destination region in the top menu bar.
-
Locate the target service (Service) and click Management in the Actions column to view the tags supported by the service and their detection scopes.
The following table describes the threat tag values, their corresponding score ranges, and their meanings:
Tag value (label)
Confidence score range (confidence)
Chinese Meaning
pornographic_adult
0 to 100. A higher score indicates a higher confidence level.
Suspected pornographic content
sexual_terms
0 to 100. A higher score indicates a higher confidence level.
Suspected sexual health content
sexual_prompts
0 to 100. A higher score indicates a higher confidence level.
Suspected prompts for generating pornographic content
sexual_suggestive
0 to 100. A higher score indicates a higher confidence level.
Suspected vulgar content
political_figure
0 to 100. A higher score indicates a higher confidence level.
Suspected political figures
political_entity
0 to 100. A higher score indicates a higher confidence level.
Suspected political entities
political_n
0 to 100. A higher score indicates a higher confidence level.
Suspected sensitive political content
political_p
0 to 100. A higher score indicates a higher confidence level.
Suspected politically sensitive individuals
political_prompts
0 to 100. A higher score indicates a higher confidence level.
Suspected prompts for generating political content
political_a
0 to 100. A higher score indicates a higher confidence level.
Special upgrade for political content protection
violent_extremist
0 to 100. A higher score indicates a higher confidence level.
Suspected extremist organizations
violent_incidents
0 to 100. A higher score indicates a higher confidence level.
Suspected extremist content
violent_weapons
0 to 100. A higher score indicates a higher confidence level.
Suspected weapons and ammunition
violent_prompts
0 to 100. A higher score indicates a higher confidence level.
Suspected prompts for generating violent content
contraband_drug
0 to 100. A higher score indicates a higher confidence level.
Suspected drug-related content
contraband_gambling
0 to 100. A higher score indicates a higher confidence level.
Suspected gambling-related content
contraband_act
0 to 100. A higher score indicates a higher confidence level.
Suspected prohibited acts
contraband_entity
0 to 100. A higher score indicates a higher confidence level.
Suspected prohibited tools
inappropriate_discrimination
0 to 100. A higher score indicates a higher confidence level.
Suspected biased or discriminatory content
inappropriate_ethics
0 to 100. A higher score indicates a higher confidence level.
Suspected content with harmful values
inappropriate_profanity
0 to 100. A higher score indicates a higher confidence level.
Suspected abusive or insulting content
inappropriate_oral
0 to 100. A higher score indicates a higher confidence level.
Suspected vulgar slang
inappropriate_superstition
0 to 100. A higher score indicates a higher confidence level.
Suspected superstitious content
inappropriate_nonsense
0 to 100. A higher score indicates a higher confidence level.
Suspected meaningless or spam content
pt_to_sites
0 to 100. A higher score indicates a higher confidence level.
Suspected traffic diversion to external sites
pt_by_recruitment
0 to 100. A higher score indicates a higher confidence level.
Suspected ads for online money-making or part-time jobs
pt_to_contact
0 to 100. A higher score indicates a higher confidence level.
Ad account suspected of driving traffic
religion_b
0 to 100. A higher score indicates a higher confidence level.
Suspected content related to Buddhism
religion_t
0 to 100. A higher score indicates a higher confidence level.
Suspected content related to Taoism
religion_c
0 to 100. A higher score indicates a higher confidence level.
Suspected content related to Christianity
religion_i
0 to 100. A higher score indicates a higher confidence level.
Suspected content related to Islam
religion_h
0 to 100. A higher score indicates a higher confidence level.
Suspected content related to Hinduism
customized
0 to 100. A higher score indicates a higher confidence level.
Hit a custom dictionary
...
...
...
Manage tags
Except for some red-line control tags, you can enable or disable other risk tags in the console. Some risk tags provide settings for more granular detection scopes. For more information, see the Guardrails product console. The following steps use the Content Moderation Guardrail for Model Studio Input (bl_query_guard) service as an example:
-
Go to the Configuration page and switch to the destination region in the top menu bar.
-
Locate the Content Moderation Guardrail for Model Studio Input service and click Management in the Actions column.
-
Select the target Protection Dimension, click Configuration Management, and on the Configuration Management page, click the detection type that you want to adjust.
-
Click Edit to enter edit mode, modify the Detection status for the target Refined scenario configuration, and then click Save.
NoteThe configuration changes take effect in about 2 to 5 minutes.
More operations
-
To customize content moderation rules, see Thesaurus Management.
-
To view moderation results and analyze the most frequent violation types in the moderated text, see Detection Results.