What is Guardrails-AI Guardrails(AI Guardrails)-阿里云帮助中心

Guardrails detects and mitigates security risks in AI systems, helping your applications deliver safe, compliant, and reliable responses to user prompts.

Product features

AI applications and AI Agents face security threats such as content compliance violations, data breaches, prompt injection attacks, hallucinations, and jailbreaks. These risks can disrupt operations and create significant compliance exposure.

Guardrails provides end-to-end protection for pre-trained large models, AI services, and AI Agents. It performs precise risk detection and proactive defense on both generative AI input and output.

Risk detection capabilities

Guardrails provides comprehensive detection, including content compliance detection, sensitive content detection, and prompt injection attack detection.
- Content compliance detection: Reviews text inputs and outputs across multiple compliance dimensions, covering politically sensitive content, pornography and vulgarity, bias and discrimination, and harmful values. Use cases: Chatbots, AI in education, intelligent customer service, and AIGC creation platforms.
- Sensitive content detection: Identifies personal privacy and corporate secret data in AI interactions, preventing leakage of both training and conversational data. Use cases: AI in healthcare, AI-powered financial services, and enterprise knowledge base Q&A.
- Prompt injection attack detection: Identifies adversarial behaviors such as jailbreak commands, role-playing inducements, and system prompt tampering. Use cases: Securing command interactions for an AI Agent, defending against adversarial attacks in open-domain dialogue systems, and managing permissions for third-party plugin calls.
- Malicious file detection: Analyzes uploaded documents such as PDF, PPT, and DOC files for hidden malicious content, including executable scripts, macro viruses, and nested attack code. Use cases: AI applications that support document uploads, such as intelligent resume parsing, contract Q&A, and enterprise knowledge base construction.
- Malicious URL detection: Analyzes links in AI interactions in real time, identifying phishing websites, malicious redirects, and links with hidden attack payloads. Use cases: AI-powered search, web page summarization, RAG-based knowledge retrieval, and automated external operations.
- Digital watermarking: Embeds visible or invisible watermarks into AI-generated images, ensuring AIGC content is traceable and accountable. Use cases: AIGC creation platforms, news media, government communications, and educational content generation in compliance-sensitive scenarios.
Custom protection configuration

Guardrails lets you configure granular risk detection settings. You can log on to the Guardrails console to manage detection rules and create risk detection templates.
- Custom detection items: Configure the granular tags used for content compliance detection.
- Custom risk thresholds: Set the hit threshold for each granular tag. Thresholds are based on the model's confidence score (0 to 100) and can be adjusted in increments of 1.
- Custom filter words: Configure a list of sensitive words to detect and block, such as competitor names. You can add, delete, or modify words in the list.

For more information, see the Features documentation.

Use cases

Use Guardrails for risk detection in these business scenarios:

Processing user prompts submitted to a generative AI model.
Analyzing multimodal content, including text, images, and videos, generated by a generative AI model.
Scanning and detoxifying the training corpus for a generative AI model.
Detecting risks in the inputs and outputs of an AI Agent.

Product features

Risk detection capabilities

Custom protection configuration

Use cases