AI security protection

AI security protection scans gateway requests and responses in real time to detect harmful content in user inputs and AI-generated output. Configure protection levels to control interception strictness.

Use cases

Recommended scenarios:

Social platform content moderation: Filter user-generated content containing inappropriate, illegal, or sensitive information.
Enterprise customer service chat filtering: Detect inappropriate language in user inputs or bot replies in real time. Block responses containing phishing links, fraudulent websites, or malicious URLs.
Generative AI application protection: Prevent models from generating false, discriminatory, or harmful content in AIGC applications such as AI writing and image generation.
High-accuracy professional advisory scenarios: Identify model hallucinations such as fabricated facts, incorrect data, or unsupported conclusions to prevent misleading users.
Government and financial information exchange: Prevent sensitive information leakage and inaccurate policy interpretations in highly regulated scenarios.

Procedure

Log on to the AI Gateway console and choose Instance. In the top menu bar, select a region, then click the target instance ID.
In the navigation pane on the left, choose Model API, then click the target API name to go to the API Details page.
Click Policies and Plug-ins, enable AI security, and configure its parameters.

Note
Currently supports text and image generation scenarios only.

Enable AI Security Guardrail first if not already activated.

Select a protection service and complete the configuration. AI Fence is recommended. It supports separate interception policies per protection dimension and custom inspection policies for text and images.

AI Security Guardrail

Configure an inspection policy and an interception policy per consumer. Consumer matching rules:

Any consumer: Applies to all consumers.
Exact match: Applies to a specified consumer.
Regex match: Applies to consumers that match a regular expression.
Prefix match: Applies to consumers whose names start with a specific prefix.

Parameter	Description
Endpoint	Auto-populated with the nearest Content Moderation VPC address. For details, see SDK and Access Guide.
Interception policy	Blocks requests or responses that match inspection rules. If no interception policy is configured, the system applies a default Low protection level for all protection dimensions and consumers. Define separate interception policies per consumer and protection dimension. Protection dimension: Any protection dimension: Applies to all protection dimensions. Content compliance detection: Identifies illegal or non-compliant content from the model. Prompt attack detection: Defends against malicious inputs that trick the model into bypassing safety measures or leaking information. Sensitive content detection: Blocks sensitive or high-risk information in the model's output. Model hallucination: Detects false or fabricated content. Malicious URL detection: Blocks responses containing malicious links. Custom agent: Protects user-defined agent services. Protection level: Low: Blocks only high-risk requests. Medium: Blocks medium- and high-risk requests. High: Blocks all requests identified as risky. Observation mode: Inspects request and response content but takes no interception actions. Note: The Custom Agent (CustomLabel) protection dimension only supports the High and Observation mode protection levels. The Low and Medium protection levels are not available.
Inspection policy	Inspects user requests and model responses, saving results to logs for post-event auditing and optimization. Configure Check Requests and Check Responses separately for Text and Picture. Inspect request: Checks user requests to the model API for non-compliant content. Supports consumer matching rules and the following protection policies (Service): Note Protection policies (Service) reference configurations from AI Security Guardrail. Customize them in the AI Security Guardrail console. Configure AI Security Guardrail detection items. Text only query_security_check: Detects content compliance, sensitive content, and prompt attack risks in model inputs. response_security_check: Detects risks in model output, including content compliance issues, sensitive content, and abnormal outputs from prompt attacks. response_security_check_hp: Optimized for streaming responses with lower detection latency. Detects content compliance issues, sensitive content, and abnormal outputs caused by prompt attacks in model-generated content. Image only img_query_security_check: Detects content compliance risks in input images. img_response_security_check: Detects content compliance risks in output images. Inspect response: Inspects model responses for compliance. Enabling this converts streaming responses to non-streaming. Supports consumer matching rules and the following protection policies (Service): Text only query_security_check: Detects content compliance, sensitive content, and prompt attack risks in model inputs. response_security_check: Detects risks in model output, including content compliance issues, sensitive content, and abnormal outputs from prompt attacks. response_security_check_hp: Optimized for streaming responses with lower detection latency and faster response times. Detects content compliance issues, sensitive content, and abnormal outputs caused by prompt attacks in model-generated content. Image only img_query_security_check: Detects content compliance risks in input images. img_response_security_check: Detects content compliance risks in output images.

AI Security Guardrail Effectiveness Rules

Enabling AI Security Guardrail without configuring any policies leaves protection inactive.
Configuring an interception policy without enabling an inspection policy leaves protection inactive.
Enabling an inspection policy without an interception policy triggers the default configuration: Low protection level for all dimensions and consumers.
With both policies enabled, the system inspects requests and responses per your configuration, logs results, and blocks content matching the interception policy.

Note
An interception policy takes effect only if its protection dimension is also included in the inspection policy.

Example: If the inspection policy covers content compliance and sensitive content, but the interception policy targets prompt attack detection, the interception policy does not take effect.
Without advanced settings in the inspection policy, the following defaults apply:
- Text only
  - Inspect request: Consumer is set to Any consumer, and Service is set to query_security_check.
  - Inspect response: Consumer is set to Any consumer, and Service is set to response_security_check.
- Image only
  - Inspect request: Consumer is set to Any consumer, and Service is set to img_query_security_check.
  - Inspect response: Consumer is set to Any consumer, and Service is set to img_response_security_check.
When rules conflict, precedence is resolved as follows:
- Specific consumer rules take priority over "Any consumer" rules. Example: Rule 1 sets Consumer to Any consumer, Protection dimension to Any protection dimension, and Protection level to Low. Rule 2 sets Consumer to Prefix match AI, Protection dimension to Any protection dimension, and Protection level to High. Rule 2 takes priority for consumers with the "AI" prefix.
- When a consumer matches multiple rules, the first rule in the list takes precedence. Example: Rule 1: Any consumer, Any protection dimension, protection level Low. Rule 2: Consumer Regex match with value AI, Any protection dimension, protection level Medium. Rule 3: Consumer Exact match with value AI-TEST, protection dimension Model hallucination, protection level High. Consumer AI-TEST matches both Rule 2 and Rule 3. Rule 2 is listed higher and applied. Its "Any protection dimension" includes model hallucination, so Rule 3 is overridden.

Scenario-based configuration

Configuration examples for common scenarios:

Apply security protection to all requests but intercept only high-risk ones

Inspection policy: Enable Check Requests for both Text only and Image only.
Interception policy: Add an interception policy. Set Consumer to Any consumer, Protection dimension to Any protection dimension, and Protection level to Low.

Intercept only compliance or sensitive content risks

Inspection policy: Enable Check Requests for both Text only and Image only.
Interception policy: Add two interception policies. Select the consumer as needed. For the protection dimension, create one policy for Content compliance detection and another for Sensitive content detection. Set the protection level as required.

To apply security protection to all requests, intercept risks for a key business (consumer name AIBusiness), and only inspect but not intercept requests for other businesses.

Inspection policy: Enable Check Requests for both Text only and Image only.
Interception policy: Add an interception policy. Set Consumer to Exact match and select AIBusiness. Set the protection dimension and protection level as required.

For OpenAI-compatible scenarios such as completions requests, inspect both text and images for compliance risks.

Inspection policy: Enable Check Requests and Check Responses for both Text only and Image only.
Use the default interception policy.

For text protection scenarios where key businesses (consumer names AIBusiness and AIFinancial) require detection of content compliance, sensitive content, and prompt attacks in user requests, while general businesses (consumer name Common) only require content compliance detection.

In the AI Security Guardrail console, duplicate query_security_check to create query_security_check_major and query_security_check_common. For query_security_check_major, enable content compliance, sensitive content, and prompt attack dimensions. For query_security_check_common, enable only content compliance.

Inspection policy: Enable Check Requests for Text only.
In the inspection policy, open Advanced Configuration and add two configurations: (1) Consumer prefix match AI, Service set to query_security_check_major. (2) Consumer exact match Common, Service set to query_security_check_common.
Use the default interception policy.

Content Moderation

Note

Before enabling Content Moderation, authorize the service-linked role.

Parameter	Description
Access endpoint	Auto-populated with the nearest Content Moderation VPC address. For details, see SDK and Access Guide.
Check Requests	Enables or disables request inspection.
Check Responses	Enables or disables response inspection. Checks model responses for compliance. When enabled, streaming responses are converted to non-streaming.
Protection level	Low: Blocks only high-risk requests. Medium: Blocks medium- and high-risk requests. High: Blocks all requests identified as risky. Observation mode: Inspects request and response content but takes no interception actions.

Verify your settings and click Save.