AI Guardrails product overview

更新时间:
复制 MD 格式

AI Guardrails provides two content safety offerings: Guardrails for AI application protection and Content Moderation for user-generated content (UGC) safety. Built on Tongyi foundation models, it detects threats across text, images, video, and audio while reducing manual review.

Product offerings

AI Guardrails provides two offerings for distinct content safety scenarios:

  • Guardrails protects AI applications against adversarial attacks and unsafe model outputs through configurable risk detection and multiple integration methods.

    • Threat Detection

      Detects regulatory violations (pornography, political sensitivity, violence, gore), PII and credentials, jailbreak attempts, injection attacks, file and link risks, and prompt extraction attempts. Verifies LLM outputs against ground truth sources. Supports invisible digital watermarks in AI-generated images and text for provenance tracking.

    • Customization

      Configure detection thresholds, risk levels, and filter words for industry compliance (finance, education). Train models on proprietary datasets for vertical use cases.

    • Integration Methods

      Supports RESTful API for custom applications, AI Gateway for centralized policy enforcement, WAF for edge-level prompt blocking, Model Studio for one-click Tongyi integration, and third-party platforms (Dify agents, OpenClaw plug-ins).

  • Content Moderation provides multimodal content safety for platforms hosting user-generated content across social media, gaming, e-commerce, and media scenarios. Core features include the moderation API, OSS violation detection, and the console.

    • API-Based Moderation

      Detects spam, hate speech, violence, and ad violations in text. Scans images for adult content, violence, gore, and political sensitivity. Performs frame-by-frame video analysis with audio detection. Identifies inappropriate speech, violence, and political content in audio. Scans PDF, Word, Excel, and PPT for policy violations. Suitable for platforms with public content: video, live streaming, social media, e-commerce, forums, and CDN.

    • Integrated Moderation (OSS violation detection)

      Auto-scans OSS buckets for pornography, political sensitivity, violence, and terrorism in images, videos, and audio. Supports configurable auto-delete or freeze actions. Enable from the console without API integration.

    • Management Console

      View moderation trends, category distributions, and latency metrics. Define custom violation rules and blocklists. Use pre-configured detection profiles for gaming, education, and social scenarios.

Use cases

Guardrails targets AI-specific threats in generative AI applications. Content Moderation handles UGC patterns in user-facing platforms.

  • Guardrails (AI Application Protection) - Common use cases include:

    • AI chatbots — Prevent prompt injection that manipulates chatbots into disclosing policies or generating harmful advice.

    • AI content creation — Filter inappropriate AI-generated text, images, and videos before publication.

    • Code assistants — Scan AI-generated code for vulnerabilities, credential leaks, and malicious patterns.

    • Enterprise AI assistants — Detect manipulation attempts and sensitive data extraction across multi-turn conversations.

    • Model serving — Add safety guardrails to third-party models (OpenAI, Anthropic, open-source) without modifying weights.

  • Content Moderation (UGC Safety) - Common use cases include:

    • Social networks — Real-time detection of pornographic, violent, and politically sensitive content in public feeds.

    • User profiles — Scan usernames, avatars, and bios at registration to block offensive or impersonating accounts.

    • Gaming — Monitor chat channels for toxicity, cheating ads, and bot spam. Detect harassment and underage grooming in direct messages.

    • Video platforms — Automated age-appropriate content rating. Real-time frame analysis and audio transcription for live broadcasts.

    • E-commerce — Detect counterfeit goods, prohibited items (weapons, drugs), and misleading descriptions. Validate product photos against quality guidelines, detect NSFW images in fashion listings. Monitor buyer-seller communications for fraud.

Pricing

AI Guardrails offers three billing methods: pay-as-you-go for variable workloads, resource plans for discounted pre-purchased capacity, and QPS expansion packages for higher rate limits.

Billing method

Guardrails

Content Moderation

Pay-as-you-go

Metered billing based on API calls, suitable for variable workloads. Activate here.

Metered billing by content type and business scenario, ideal for variable workloads. Activate here.

Resource plan

Coming soon.

Pre-purchase discounted capacity, ideal for predictable workloads. Purchase a Content Moderation 2.0 resource plan.

QPS package

If the default QPS is insufficient or you need to scan large volumes in a short period, purchase a Content Moderation QPS package.

QPS packages require specific details (UID, features, region). Contact your business manager or submit a ticket to purchase and configure.

Available packages:

  • 2.0 QPS expansion package: Increases QPS for Content Moderation 2.0 API (standard editions) and OSS violation detection (general-purpose edition). After expansion, QPS limit = default + purchased QPS. Pay-as-you-go billing still applies.

  • 2.0 QPS expansion package (large model edition): Increases QPS for Content Moderation 2.0 API (large model editions) and OSS violation detection (general-purpose edition). After expansion, QPS limit = default + purchased QPS. Pay-as-you-go billing still applies.

  • 2.0 committed QPS package: Provides committed QPS for Content Moderation 2.0 API (standard editions). QPS limit = purchased QPS. Usage within this limit incurs no additional pay-as-you-go charges.

  • AI Guardrails expansion package: Increases QPS for AI Guardrails. After expansion, QPS limit = default + purchased QPS. Pay-as-you-go billing still applies.

  • 1.0 QPS expansion package: Increases QPS for Content Moderation 1.0 API and OSS violation detection 1.0. After expansion, QPS limit = default + purchased QPS. Pay-as-you-go billing still applies.

Important

QPS packages require configuration verification and cannot be purchased directly. Contact your business manager or submit a ticket.

Guardrails vs. Content Moderation

Choose the service that matches your primary use case.

  • When to Use Guardrails - Pure AI scenarios where content originates from or is processed by generative AI models:

    • Text-to-text generation (chatbots, writing assistants), text-to-image generation, code generation, AI agent actions (tool use, function calling), and training data sanitization.

    • Key differentiators: AI-specific threat detection (prompt injection, hallucinations, prompt crawlers), context-aware multi-turn analysis, and sub-second latency.

  • When to Use Content Moderation - UGC scenarios where content is uploaded or created by end users:

    • Social media posts (text, images, videos), e-commerce product listings, gaming chat and user-created content, document sharing platforms, and live streaming.

    • Key differentiators: UGC-optimized detection (spam, fake reviews, impersonation), pre-configured scenarios (avatars, nicknames, public chat), and OSS integration for storage compliance.

  • For hybrid scenarios, choose the service based on the primary use case:

    • Social app with AI-generated image uploads — primary activity is user-published content, so use Content Moderation.

    • AI companion chat (user inputs, AI responds) — this is an AI application, so use Guardrails.

Content Moderation 2.0 vs. 1.0

Content Moderation 2.0 adds pre-configured business scenarios and improves performance, label richness, configuration flexibility, and pricing over version 1.0.

Comparison of capabilities

Item

Content Moderation 2.0

Content Moderation 1.0

Billing method and pricing

  • Image

    Billing formula: Fees = Number of images × Number of business scenarios × Unit price per business scenario

    pay-as-you-go: Starts at CNY 15 per 10,000 images, approximately 45% of the price of version 1.0

  • Text

    Billing formula: Fees = Number of text entries × Number of business scenarios × Unit price per business scenario

    pay-as-you-go: Starts at CNY 7.5 per 10,000 entries, approximately 41% of the price of version 1.0

  • Audio

    Billing formula: Fees = Audio duration in minutes × Number of business scenarios × Unit price per business scenario

    pay-as-you-go: Starts at CNY 225 per 10,000 minutes, approximately 22% of the price of version 1.0

  • Video

    Billing formula: Fees = (Number of captured frames × Number of business scenarios × Unit price per business scenario) + (Video duration in minutes × Number of audio scenarios × Unit price per audio scenario)

    pay-as-you-go: Starts at CNY 15 per 10,000 frames for captured video frames, and CNY 202.5 per 10,000 minutes (approximately 34% of the price of version 1.0) for audio from video

  • Document

    Billing formula: Fees are calculated based on the number of pages, where every 50 pages are counted as one unit.

    pay-as-you-go: CNY 0.225 per unit (50 pages), approximately 48% of the price of version 1.0

Note

Content Moderation 2.0 also supports purchasing pay-as-you-go resource plans.For details about the pricing of Content Moderation 2.0: Content Moderation Pricing.

  • Image

    Billing formula: Fees = Number of images × Number of risk scenarios × Unit price per scenario

  • Text

    Billing formula: Fees = Number of text entries × Number of risk scenarios × Unit price per scenario

  • Audio

    Billing formula: Fees = Audio duration in minutes × Number of risk scenarios × Unit price per scenario

  • Video

    Billing formula: Fees = (Number of captured frames × Number of risk scenarios × Unit price per risk scenario) + (Video duration in minutes × Number of audio risk scenarios × Unit price per audio scenario)

  • Document

    Billing formula: Fees = File conversion fee + (Number of pages × Number of document image scenarios × Unit price per scenario) + (Number of text entries in the document × Unit price per scenario)

Note

Content Moderation 1.0 also supports purchasing pay-as-you-go resource plans.For details about the pricing of Content Moderation 1.0: Content Moderation Pricing.

Moderation configurations

Moderation scope (10+ major categories, 100+ subcategories)

Custom libraries

Moderation scope (5+ major categories, 50+ subcategories)

Custom libraries

Default capacity

  • Image: 50 QPS for large model edition, 100 QPS for standard edition

  • Text: 50 QPS for large model edition, 100 QPS for standard edition

  • Audio: 50 concurrent tasks

  • Video: 50 concurrent tasks

  • Document: 20 concurrent tasks

Note

If the default QPS is insufficient, purchase a QPS expansion package. Contact your business manager or submit a ticket.

  • Image: 50 QPS

  • Text: 100 QPS

  • Audio: 20 concurrent tasks

  • Video: 20 concurrent tasks

  • Document: 10 concurrent tasks

Content to be moderated and business scenarios

Modalities: Image, text, audio, video, document, and OSS files

Preset business scenarios: Includes avatars, nicknames, public chat, private chat, ad compliance, AI content detection, educational materials, content governance, common baseline moderation, social entertainment and live streaming moderation, and audio-visual media moderation

Modalities: Image, text, video, audio, document, and OSS files

Preset business scenarios: Common baseline moderation

Moderation results

  • Interpretable labels (100+, multiple violation labels can be returned simultaneously)

  • confidence score

  • Interpretable labels (40+, only one violation label can be returned at a time)

  • handling suggestion

Pay-as-you-go

Content Moderation 2.0 pay-as-you-go bills by content type (images, text, voice) and detection volume. Detecting multiple risk scenarios for the same content costs 50% to 70% less than version 1.0.

Content Moderation 1.0 pay-as-you-go billing varies by content type (image, text, video, OSS content), moderation scenario (pornography, spam), daily scan volume tier, handling suggestion (review, block, pass), and service region.

Resource plan

Resource plan deduction formula: Deducted usage = Daily call volume × Deduction factor for the scenario.

However, the deduction system for Content Moderation 2.0 differs from version 1.0. Deduction factors: Pricing Details.

Note

Content Moderation 2.0 resource plans apply only to version 2.0 services and are not interchangeable with version 1.0 plans.