Guardrails inspects both large language model (LLM) inputs and outputs, detecting risks related to content compliance, sensitive content, and prompt injection attacks. The console also provides comprehensive features, including online testing, data reports, and result queries.
Guardrails feature set
-
Detection item configuration
-
What it does: Lets you configure detection items for different scenarios and toggle fine-grained labels.
-
Feature
Description
Service name
content compliance detection
Detects policy-violating content in LLM inputs and outputs. This includes baseline risks like politically sensitive content, pornography, and violence, as well as abuse, bias, and other harmful values.
-
AI input content security check (query_security_check)
-
AI generated content security check (response_security_check)
-
AI input content security check for international use (query_security_check_cb)
-
AI generated content security check for international use (response_security_check_cb)
-
AIGC input image security check (img_query_security_check)
-
AIGC output image security check (img_response_security_check)
-
Multimodal (text and image) content security check (text_img_security_check)
-
Real-time file detection (file_security_sync_check)
-
Multimodal (text and file) real-time detection (text_file_sec_sync_check)
sensitive content detection
Automatically identifies, classifies, and grades personal and enterprise sensitive information in LLM-generated content.
-
AI input content security check (query_security_check)
-
AI generated content security check (response_security_check)
-
AI input content security check for international use (query_security_check_cb)
-
AI generated content security check for international use (response_security_check_cb)
prompt injection attack detection
Detects attempts to bypass safety policies and intentionally generate violating content. This includes methods like prompt manipulation, such as adversarial prompts, and technical evasion, such as code obfuscation or multi-turn conversation masking.
-
AI input content security check (query_security_check)
-
AI generated content security check (response_security_check)
-
AI input content security check for international use (query_security_check_cb)
-
AI generated content security check for international use (response_security_check_cb)
malicious file detection
Detects malicious files used with LLMs, preventing the model from generating harmful content or threatening system security.
Real-time file detection (file_security_sync_check)
digital watermark identification
Applies a digital watermark to LLM-generated content to help prevent copyright disputes and improve accountability for the spread of misinformation.
AIGC output image security check (img_response_security_check)
-
-
-
Word library management and matching
-
What it does: Lets you customize moderation rules for content compliance detection by creating lists of risky keywords or keywords to exclude, and then configuring matching rules.
-
For more information, see Operation guide.
-
-
Answer library management
-
What it does: Lets you replace content blocked during content compliance detection with predefined answers from a library.
-
For more information, see Operation guide.
-
-
Online testing
-
What it does: Lets you test the content compliance detection, sensitive content detection, and prompt injection attack detection features online. This helps you quickly verify the effectiveness of your moderation policies.
-
For more information, see Quick guide to using the online testing feature.
-
-
Result query
-
What it does: Lets you view moderation results and returned parameters for scanned content, which helps you analyze high-frequency risk types.
-
For more information, see Operation guide.
-
-
Risk reports
-
What it does: Lets you view risk reports to understand call trends and risk distribution for Guardrails, Sensitive Data Detection, prompt injection attack detection, and custom detection agent.
-
For more information, see Operation guide.
-