AI Guardrails and Content Moderation product overview-AI Guardrails(AI Guardrails)-阿里云帮助中心

Product offerings

AI Guardrails provides two offerings for distinct content safety scenarios:

Guardrails protects AI applications against adversarial attacks and unsafe model outputs through configurable risk detection and multiple integration methods.
- Threat Detection
  
  Detects regulatory violations (pornography, political sensitivity, violence, gore), PII and credentials, jailbreak attempts, injection attacks, file and link risks, and prompt extraction attempts. Verifies LLM outputs against ground truth sources. Supports invisible digital watermarks in AI-generated images and text for provenance tracking.
- Customization
  
  Configure detection thresholds, risk levels, and filter words for industry compliance (finance, education). Train models on proprietary datasets for vertical use cases.
- Integration Methods
  
  Supports RESTful API for custom applications, AI Gateway for centralized policy enforcement, WAF for edge-level prompt blocking, Model Studio for one-click Tongyi integration, and third-party platforms (Dify agents, OpenClaw plug-ins).
Content Moderation provides multimodal content safety for platforms hosting user-generated content across social media, gaming, e-commerce, and media scenarios. Core features include the moderation API, OSS violation detection, and the console.
- API-Based Moderation
  
  Detects spam, hate speech, violence, and ad violations in text. Scans images for adult content, violence, gore, and political sensitivity. Performs frame-by-frame video analysis with audio detection. Identifies inappropriate speech, violence, and political content in audio. Scans PDF, Word, Excel, and PPT for policy violations. Suitable for platforms with public content: video, live streaming, social media, e-commerce, forums, and CDN.
- Integrated Moderation (OSS violation detection)
  
  Auto-scans OSS buckets for pornography, political sensitivity, violence, and terrorism in images, videos, and audio. Supports configurable auto-delete or freeze actions. Enable from the console without API integration.
- Management Console
  
  View moderation trends, category distributions, and latency metrics. Define custom violation rules and blocklists. Use pre-configured detection profiles for gaming, education, and social scenarios.

Use cases

Guardrails targets AI-specific threats in generative AI applications. Content Moderation handles UGC patterns in user-facing platforms.

Guardrails (AI Application Protection) - Common use cases include:
- AI chatbots — Prevent prompt injection that manipulates chatbots into disclosing policies or generating harmful advice.
- AI content creation — Filter inappropriate AI-generated text, images, and videos before publication.
- Code assistants — Scan AI-generated code for vulnerabilities, credential leaks, and malicious patterns.
- Enterprise AI assistants — Detect manipulation attempts and sensitive data extraction across multi-turn conversations.
- Model serving — Add safety guardrails to third-party models (OpenAI, Anthropic, open-source) without modifying weights.
Content Moderation (UGC Safety) - Common use cases include:
- Social networks — Real-time detection of pornographic, violent, and politically sensitive content in public feeds.
- User profiles — Scan usernames, avatars, and bios at registration to block offensive or impersonating accounts.
- Gaming — Monitor chat channels for toxicity, cheating ads, and bot spam. Detect harassment and underage grooming in direct messages.
- Video platforms — Automated age-appropriate content rating. Real-time frame analysis and audio transcription for live broadcasts.
- E-commerce — Detect counterfeit goods, prohibited items (weapons, drugs), and misleading descriptions. Validate product photos against quality guidelines, detect NSFW images in fashion listings. Monitor buyer-seller communications for fraud.

Pricing

AI Guardrails offers three billing methods: pay-as-you-go for variable workloads, resource plans for discounted pre-purchased capacity, and QPS expansion packages for higher rate limits.

For information about Guardrails pricing, see Activation and billing overview.
Content Moderation pricing: Activation and billing.

Billing method	Guardrails	Content Moderation
Pay-as-you-go	Metered billing based on API calls, suitable for variable workloads. Activate here.	Metered billing by content type and business scenario, ideal for variable workloads. Activate here.
Resource plan	Coming soon.	Pre-purchase discounted capacity, ideal for predictable workloads. Purchase a Content Moderation 2.0 resource plan.
QPS package	If the default QPS is insufficient or you need to scan large volumes in a short period, purchase a Content Moderation QPS package. QPS packages require specific details (UID, features, region). Contact your business manager or submit a ticket to purchase and configure. Available packages: 2.0 QPS expansion package: Increases QPS for Content Moderation 2.0 API (standard editions) and OSS violation detection (general-purpose edition). After expansion, QPS limit = default + purchased QPS. Pay-as-you-go billing still applies. 2.0 QPS expansion package (large model edition): Increases QPS for Content Moderation 2.0 API (large model editions) and OSS violation detection (general-purpose edition). After expansion, QPS limit = default + purchased QPS. Pay-as-you-go billing still applies. 2.0 committed QPS package: Provides committed QPS for Content Moderation 2.0 API (standard editions). QPS limit = purchased QPS. Usage within this limit incurs no additional pay-as-you-go charges. AI Guardrails expansion package: Increases QPS for AI Guardrails. After expansion, QPS limit = default + purchased QPS. Pay-as-you-go billing still applies. 1.0 QPS expansion package: Increases QPS for Content Moderation 1.0 API and OSS violation detection 1.0. After expansion, QPS limit = default + purchased QPS. Pay-as-you-go billing still applies. Important QPS packages require configuration verification and cannot be purchased directly. Contact your business manager or submit a ticket.

Guardrails vs. Content Moderation

Choose the service that matches your primary use case.

When to Use Guardrails - Pure AI scenarios where content originates from or is processed by generative AI models:
- Text-to-text generation (chatbots, writing assistants), text-to-image generation, code generation, AI agent actions (tool use, function calling), and training data sanitization.
- Key differentiators: AI-specific threat detection (prompt injection, hallucinations, prompt crawlers), context-aware multi-turn analysis, and sub-second latency.
When to Use Content Moderation - UGC scenarios where content is uploaded or created by end users:
- Social media posts (text, images, videos), e-commerce product listings, gaming chat and user-created content, document sharing platforms, and live streaming.
- Key differentiators: UGC-optimized detection (spam, fake reviews, impersonation), pre-configured scenarios (avatars, nicknames, public chat), and OSS integration for storage compliance.
For hybrid scenarios, choose the service based on the primary use case:
- Social app with AI-generated image uploads — primary activity is user-published content, so use Content Moderation.
- AI companion chat (user inputs, AI responds) — this is an AI application, so use Guardrails.

Content Moderation 2.0 vs. 1.0

Content Moderation 2.0 adds pre-configured business scenarios and improves performance, label richness, configuration flexibility, and pricing over version 1.0.

Comparison of capabilities

Item	Content Moderation 2.0	Content Moderation 1.0
Billing method and pricing	Image Billing formula: Fees = Number of images × Number of business scenarios × Unit price per business scenario pay-as-you-go: Starts at CNY 15 per 10,000 images, approximately 45% of the price of version 1.0 Text Billing formula: Fees = Number of text entries × Number of business scenarios × Unit price per business scenario pay-as-you-go: Starts at CNY 7.5 per 10,000 entries, approximately 41% of the price of version 1.0 Audio Billing formula: Fees = Audio duration in minutes × Number of business scenarios × Unit price per business scenario pay-as-you-go: Starts at CNY 225 per 10,000 minutes, approximately 22% of the price of version 1.0 Video Billing formula: Fees = (Number of captured frames × Number of business scenarios × Unit price per business scenario) + (Video duration in minutes × Number of audio scenarios × Unit price per audio scenario) pay-as-you-go: Starts at CNY 15 per 10,000 frames for captured video frames, and CNY 202.5 per 10,000 minutes (approximately 34% of the price of version 1.0) for audio from video Document Billing formula: Fees are calculated based on the number of pages, where every 50 pages are counted as one unit. pay-as-you-go: CNY 0.225 per unit (50 pages), approximately 48% of the price of version 1.0 Note Content Moderation 2.0 also supports purchasing pay-as-you-go resource plans.For details about the pricing of Content Moderation 2.0: Content Moderation Pricing.	Image Billing formula: Fees = Number of images × Number of risk scenarios × Unit price per scenario Text Billing formula: Fees = Number of text entries × Number of risk scenarios × Unit price per scenario Audio Billing formula: Fees = Audio duration in minutes × Number of risk scenarios × Unit price per scenario Video Billing formula: Fees = (Number of captured frames × Number of risk scenarios × Unit price per risk scenario) + (Video duration in minutes × Number of audio risk scenarios × Unit price per audio scenario) Document Billing formula: Fees = File conversion fee + (Number of pages × Number of document image scenarios × Unit price per scenario) + (Number of text entries in the document × Unit price per scenario) Note Content Moderation 1.0 also supports purchasing pay-as-you-go resource plans.For details about the pricing of Content Moderation 1.0: Content Moderation Pricing.
Moderation configurations	Moderation scope (10+ major categories, 100+ subcategories) Custom libraries	Moderation scope (5+ major categories, 50+ subcategories) Custom libraries
Default capacity	Image: 50 QPS for large model edition, 100 QPS for standard edition Text: 50 QPS for large model edition, 100 QPS for standard edition Audio: 50 concurrent tasks Video: 50 concurrent tasks Document: 20 concurrent tasks Note If the default QPS is insufficient, purchase a QPS expansion package. Contact your business manager or submit a ticket.	Image: 50 QPS Text: 100 QPS Audio: 20 concurrent tasks Video: 20 concurrent tasks Document: 10 concurrent tasks
Content to be moderated and business scenarios	Modalities: Image, text, audio, video, document, and OSS files Preset business scenarios: Includes avatars, nicknames, public chat, private chat, ad compliance, AI content detection, educational materials, content governance, common baseline moderation, social entertainment and live streaming moderation, and audio-visual media moderation	Modalities: Image, text, video, audio, document, and OSS files Preset business scenarios: Common baseline moderation
Moderation results	Interpretable labels (100+, multiple violation labels can be returned simultaneously) confidence score	Interpretable labels (40+, only one violation label can be returned at a time) handling suggestion
Pay-as-you-go	Content Moderation 2.0 pay-as-you-go bills by content type (images, text, voice) and detection volume. Detecting multiple risk scenarios for the same content costs 50% to 70% less than version 1.0.	Content Moderation 1.0 pay-as-you-go billing varies by content type (image, text, video, OSS content), moderation scenario (pornography, spam), daily scan volume tier, handling suggestion (review, block, pass), and service region.
Resource plan	Resource plan deduction formula: Deducted usage = Daily call volume × Deduction factor for the scenario. However, the deduction system for Content Moderation 2.0 differs from version 1.0. Deduction factors: Pricing Details. Note Content Moderation 2.0 resource plans apply only to version 2.0 services and are not interchangeable with version 1.0 plans.
Activation and billing	Content Moderation 2.0 activation and billing: Activation and billing.	For details about how to activate and pay for Content Moderation 1.0, see Activation and billing.