AI security protection scans gateway requests and responses in real time, detecting harmful content in both user inputs and AI-generated output. Configure protection levels to control interception strictness.
Use cases
Recommended scenarios:
-
Social platform content moderation: Filter user-generated content containing inappropriate, illegal, or sensitive information.
-
Enterprise customer service chat filtering: Detect inappropriate language in user inputs or bot replies in real time. Block responses containing phishing links, fraudulent websites, or malicious URLs.
-
Generative AI application protection: Prevent models from generating false, discriminatory, or harmful content in AIGC applications such as AI writing and image generation.
-
High-accuracy professional advisory scenarios: Identify model hallucinations such as fabricated facts, incorrect data, or unsupported conclusions, to prevent misleading users.
-
Government and financial information exchange: Prevent sensitive information leakage and inaccurate policy interpretations in highly regulated scenarios.
Procedure
Log on to the AI Gateway console and choose Instance. In the top menu bar, select a region, then click the target instance ID.
In the navigation pane on the left, choose Model API, then click the target API name to go to the API Details page.
-
Click Policies and Plug-ins, enable AI security, and configure its parameters.
NoteCurrently supports text and image generation scenarios only.
Enable AI Security Guardrail first if not already activated.
-
Select a protection service and complete the configuration. AI Fence is recommended. It supports separate interception policies per protection dimension and custom inspection policies for text and images.
AI Security Guardrail
Configure an inspection policy and an interception policy per consumer. Consumer matching rules:
-
Any consumer: Applies to all consumers.
-
Exact match: Applies to a specified consumer.
-
Regex match: Applies to consumers that match a regular expression.
-
Prefix match: Applies to consumers whose names start with a specific prefix.
Parameter
Description
Endpoint
Auto-populated with the nearest Content Moderation VPC address. SDK and Access Guide.
Interception policy
Blocks requests or responses that match inspection rules.
If no interception policy is configured, the system applies a default Low protection level for all protection dimensions and consumers.
Define separate interception policies per consumer and protection dimension.
Protection dimension:
-
Any protection dimension: Applies to all protection dimensions.
-
Content compliance detection: Identifies illegal or non-compliant content from the model.
-
Prompt attack detection: Defends against malicious inputs that trick the model into bypassing safety measures or leaking information.
-
Sensitive content detection: Blocks sensitive or high-risk information in the model's output.
-
Model hallucination: Detects false or fabricated content.
-
Malicious URL detection: Blocks responses containing malicious links.
-
Custom agent: Protects user-defined agent services.
Protection level:
-
Low: Blocks only high-risk requests.
-
Medium: Blocks medium- and high-risk requests.
-
High: Blocks all requests identified as risky.
-
Observation mode: Inspects request and response content but takes no interception actions.
Note: The Custom Agent (CustomLabel) protection dimension only supports the High and Observation mode protection levels. The Low and Medium protection levels are not available.
Inspection policy
Inspects user requests and model responses, saving results to logs for post-event auditing and optimization.
The inspection policy allows you to configure Check Requests and Check Responses settings separately for Text and Picture.
Inspect request: Checks whether the content of a user's request to the Model API is non-compliant. Supports consumer matching rules and the following protection policies (Service):
NoteProtection policies (Service) reference configurations from AI Security Guardrail. Customize them in the AI Security Guardrail console. Configure AI Security Guardrail detection items.
-
Text only
-
query_security_check: Detects content compliance, sensitive content, and prompt attack risks in model inputs.
-
response_security_check: Detects risks in model output, including content compliance issues, sensitive content, and abnormal outputs from prompt attacks.
-
response_security_check_hp: Designed for detecting content in streaming responses, offering lower detection latency and faster response times. It detects risks in model-generated content, including content compliance issues, sensitive content, and abnormal outputs caused by prompt attacks.
-
-
Image only
-
img_query_security_check: Detects content compliance risks in input images.
-
img_response_security_check: Detects content compliance risks in output images.
-
Inspect response: Inspects model responses for compliance. Enabling this converts streaming responses to non-streaming. Supports consumer matching rules and the following protection policies (Service):
-
Text only
-
query_security_check: Detects content compliance, sensitive content, and prompt attack risks in model inputs.
-
response_security_check: Detects risks in model output, including content compliance issues, sensitive content, and abnormal outputs from prompt attacks.
-
response_security_check_hp: Designed for detecting content in streaming responses, offering lower detection latency and faster response times. It detects risks in model-generated content, including content compliance issues, sensitive content, and abnormal outputs caused by prompt attacks.
-
-
Image only
-
img_query_security_check: Detects content compliance risks in input images.
-
img_response_security_check: Detects content compliance risks in output images.
-
AI Security Guardrail Effectiveness Rules
-
Enabling AI Security Guardrail without configuring any policies leaves protection inactive.
-
Configuring an interception policy without enabling an inspection policy leaves protection inactive.
-
Enabling an inspection policy without an interception policy triggers the default configuration: Low protection level for all dimensions and consumers.
-
Enabling both an inspection policy and an interception policy causes the system to inspect requests and responses per your configuration, log results, and block content matching the interception policy.
NoteAn interception policy takes effect only if its protection dimension is also included in the inspection policy.
Example: If the inspection policy covers content compliance and sensitive content, but the interception policy targets prompt attack detection, the interception policy does not take effect.
-
Without advanced settings in the inspection policy, the following defaults apply:
-
Text only
-
Inspect request: Consumer is set to Any consumer, and Service is set to query_security_check.
-
Inspect response: Consumer is set to Any consumer, and Service is set to response_security_check.
-
-
Image only
-
Inspect request: Consumer is set to Any consumer, and Service is set to img_query_security_check.
-
Inspect response: Consumer is set to Any consumer, and Service is set to img_response_security_check.
-
-
-
When rules conflict, precedence is resolved as follows:
-
Specific consumer rules take priority over "Any consumer" rules. Example: Rule 1 sets Consumer to Any consumer, Protection dimension to Any protection dimension, and Protection level to Low. Rule 2 sets Consumer to Prefix match
AI, Protection dimension to Any protection dimension, and Protection level to High. Rule 2 takes priority for consumers with the "AI" prefix. -
When a consumer matches multiple rules, the first rule in the list takes precedence. Example: Rule 1: Any consumer, Any protection dimension, protection level Low. Rule 2: Consumer Regex match with value
AI, Any protection dimension, protection level Medium. Rule 3: Consumer Exact match with valueAI-TEST, protection dimension Model hallucination, protection level High. ConsumerAI-TESTmatches both Rule 2 and Rule 3. Rule 2 is listed higher and applied. Its "Any protection dimension" includes model hallucination, so Rule 3 is overridden.
-
Scenario-based configuration
Configuration examples for common scenarios:
Content Moderation
NoteBefore enabling Content Moderation, authorize the service-linked role.
Parameter
Description
Access endpoint
Auto-populated with the nearest Content Moderation VPC address. SDK and Access Guide.
Check Requests
Enables or disables request inspection.
Check Responses
Enables or disables response inspection. Checks model response content for compliance. When enabled, converts streaming responses to non-streaming.
Protection level
-
Low: Blocks only high-risk requests.
-
Medium: Blocks medium- and high-risk requests.
-
High: Blocks all requests identified as risky.
-
Observation mode: Inspects request and response content but takes no interception actions.
-
-
Verify your settings and click Save.