AI Guardrails provides two content safety offerings: Guardrails for AI application protection and Content Moderation for user-generated content (UGC) safety. Built on Tongyi foundation models, it detects threats across text, images, video, and audio while reducing manual review.
Product offerings
AI Guardrails provides two offerings for distinct content safety scenarios:
-
Guardrails protects AI applications against adversarial attacks and unsafe model outputs through configurable risk detection and multiple integration methods.
-
Threat Detection
Detects regulatory violations (pornography, political sensitivity, violence, gore), PII and credentials, jailbreak attempts, injection attacks, file and link risks, and prompt extraction attempts. Verifies LLM outputs against ground truth sources. Supports invisible digital watermarks in AI-generated images and text for provenance tracking.
-
Customization
Configure detection thresholds, risk levels, and filter words for industry compliance (finance, education). Train models on proprietary datasets for vertical use cases.
-
Integration Methods
Supports RESTful API for custom applications, AI Gateway for centralized policy enforcement, WAF for edge-level prompt blocking, Model Studio for one-click Tongyi integration, and third-party platforms (Dify agents, OpenClaw plug-ins).
-
-
Content Moderation provides multimodal content safety for platforms hosting user-generated content across social media, gaming, e-commerce, and media scenarios. Core features include the moderation API, OSS violation detection, and the console.
-
API-Based Moderation
Detects spam, hate speech, violence, and ad violations in text. Scans images for adult content, violence, gore, and political sensitivity. Performs frame-by-frame video analysis with audio detection. Identifies inappropriate speech, violence, and political content in audio. Scans PDF, Word, Excel, and PPT for policy violations. Suitable for platforms with public content: video, live streaming, social media, e-commerce, forums, and CDN.
-
Integrated Moderation (OSS violation detection)
Auto-scans OSS buckets for pornography, political sensitivity, violence, and terrorism in images, videos, and audio. Supports configurable auto-delete or freeze actions. Enable from the console without API integration.
-
Management Console
View moderation trends, category distributions, and latency metrics. Define custom violation rules and blocklists. Use pre-configured detection profiles for gaming, education, and social scenarios.
-
Use cases
Guardrails targets AI-specific threats in generative AI applications. Content Moderation handles UGC patterns in user-facing platforms.
-
Guardrails (AI Application Protection) - Common use cases include:
-
AI chatbots — Prevent prompt injection that manipulates chatbots into disclosing policies or generating harmful advice.
-
AI content creation — Filter inappropriate AI-generated text, images, and videos before publication.
-
Code assistants — Scan AI-generated code for vulnerabilities, credential leaks, and malicious patterns.
-
Enterprise AI assistants — Detect manipulation attempts and sensitive data extraction across multi-turn conversations.
-
Model serving — Add safety guardrails to third-party models (OpenAI, Anthropic, open-source) without modifying weights.
-
-
Content Moderation (UGC Safety) - Common use cases include:
-
Social networks — Real-time detection of pornographic, violent, and politically sensitive content in public feeds.
-
User profiles — Scan usernames, avatars, and bios at registration to block offensive or impersonating accounts.
-
Gaming — Monitor chat channels for toxicity, cheating ads, and bot spam. Detect harassment and underage grooming in direct messages.
-
Video platforms — Automated age-appropriate content rating. Real-time frame analysis and audio transcription for live broadcasts.
-
E-commerce — Detect counterfeit goods, prohibited items (weapons, drugs), and misleading descriptions. Validate product photos against quality guidelines, detect NSFW images in fashion listings. Monitor buyer-seller communications for fraud.
-
Pricing
AI Guardrails offers three billing methods: pay-as-you-go for variable workloads, resource plans for discounted pre-purchased capacity, and QPS expansion packages for higher rate limits.
-
For information about Guardrails pricing, see Activation and billing overview.
-
Content Moderation pricing: Activation and billing.
|
Billing method |
Guardrails |
Content Moderation |
|
Pay-as-you-go |
Metered billing based on API calls, suitable for variable workloads. Activate here. |
Metered billing by content type and business scenario, ideal for variable workloads. Activate here. |
|
Resource plan |
Coming soon. |
Pre-purchase discounted capacity, ideal for predictable workloads. Purchase a Content Moderation 2.0 resource plan. |
|
QPS package |
If the default QPS is insufficient or you need to scan large volumes in a short period, purchase a Content Moderation QPS package. QPS packages require specific details (UID, features, region). Contact your business manager or submit a ticket to purchase and configure. Available packages:
Important
QPS packages require configuration verification and cannot be purchased directly. Contact your business manager or submit a ticket. |
|
Guardrails vs. Content Moderation
Choose the service that matches your primary use case.
-
When to Use Guardrails - Pure AI scenarios where content originates from or is processed by generative AI models:
-
Text-to-text generation (chatbots, writing assistants), text-to-image generation, code generation, AI agent actions (tool use, function calling), and training data sanitization.
-
Key differentiators: AI-specific threat detection (prompt injection, hallucinations, prompt crawlers), context-aware multi-turn analysis, and sub-second latency.
-
-
When to Use Content Moderation - UGC scenarios where content is uploaded or created by end users:
-
Social media posts (text, images, videos), e-commerce product listings, gaming chat and user-created content, document sharing platforms, and live streaming.
-
Key differentiators: UGC-optimized detection (spam, fake reviews, impersonation), pre-configured scenarios (avatars, nicknames, public chat), and OSS integration for storage compliance.
-
-
For hybrid scenarios, choose the service based on the primary use case:
-
AI companion chat (user inputs, AI responds) — this is an AI application, so use Guardrails.
-
Content Moderation 2.0 vs. 1.0
Content Moderation 2.0 adds pre-configured business scenarios and improves performance, label richness, configuration flexibility, and pricing over version 1.0.
Comparison of capabilities
|
Item |
Content Moderation 2.0 |
Content Moderation 1.0 |
|
Billing method and pricing |
Note
Content Moderation 2.0 also supports purchasing pay-as-you-go resource plans.For details about the pricing of Content Moderation 2.0: Content Moderation Pricing. |
Note
Content Moderation 1.0 also supports purchasing pay-as-you-go resource plans.For details about the pricing of Content Moderation 1.0: Content Moderation Pricing. |
|
Moderation configurations |
Moderation scope (10+ major categories, 100+ subcategories) Custom libraries |
Moderation scope (5+ major categories, 50+ subcategories) Custom libraries |
|
Default capacity |
Note
If the default QPS is insufficient, purchase a QPS expansion package. Contact your business manager or submit a ticket. |
|
|
Content to be moderated and business scenarios |
Modalities: Image, text, audio, video, document, and OSS files Preset business scenarios: Includes avatars, nicknames, public chat, private chat, ad compliance, AI content detection, educational materials, content governance, common baseline moderation, social entertainment and live streaming moderation, and audio-visual media moderation |
Modalities: Image, text, video, audio, document, and OSS files Preset business scenarios: Common baseline moderation |
|
Moderation results |
|
|
|
Pay-as-you-go |
Content Moderation 2.0 pay-as-you-go bills by content type (images, text, voice) and detection volume. Detecting multiple risk scenarios for the same content costs 50% to 70% less than version 1.0. |
Content Moderation 1.0 pay-as-you-go billing varies by content type (image, text, video, OSS content), moderation scenario (pornography, spam), daily scan volume tier, handling suggestion (review, block, pass), and service region. |
|
Resource plan |
Resource plan deduction formula: Deducted usage = Daily call volume × Deduction factor for the scenario. However, the deduction system for Content Moderation 2.0 differs from version 1.0. Deduction factors: Pricing Details. Note
Content Moderation 2.0 resource plans apply only to version 2.0 services and are not interchangeable with version 1.0 plans. |
|
|
Activation and billing |
Content Moderation 2.0 activation and billing: Activation and billing. |
For details about how to activate and pay for Content Moderation 1.0, see Activation and billing. |