Image moderation for Model Studio-AI Guardrails(AI Guardrails)-阿里云帮助中心

This service helps Model Studio users improve security reviews for image inputs and outputs from large models. While adhering to Model Studio's red-line control policy, this service provides flexible review label management, letting you enable or disable specific labels. You can also configure a custom security policy.

Usage

Follow these steps to enable and configure the Content Moderation service for Model Studio:

Step 1: Enable Content Moderation

Go to the Content Moderation - Enhanced Edition service activation page, read and accept the service agreement, and click the Activate Now button to activate the service.

Step 2: Authorize in Model Studio

Grant the required authorization in Model Studio and pass the required identifier in the header of your API calls. For detailed instructions, see the Model Studio Content Moderation documentation.

Services

After you grant SLR authorization for Content Moderation in the Model Studio console and pass the cip identifier in your API call headers, Model Studio automatically routes your request to the appropriate image moderation service for your use case. The services are described below:

Service

Scenario

Detectable content

Service name: Model Studio Image Input Moderation

Service: bailianQueryImageCheck

Detects risks in input images for use cases such as image-to-image and image understanding.

Detects content including pornography, suggestive material, political content, terrorism and violence, contraband, promotional spam, undesirable content, religion, special symbols, specific objects, and abusive text. For a detailed list of detectable categories, see the Content Moderation Console.

Service name: Model Studio Image Generation Moderation

Service: bailianResponseImageCheck

Detects risks in generated images for scenarios such as text-to-image and image-to-image.

Optimized for AIGC-generated images, this service detects content including pornography, suggestive material, political content, terrorism and violence, contraband, promotional spam, and undesirable content. For a detailed list of detectable categories, see the Content Moderation Console.

Billing

The image moderation service for Model Studio users offers two billing methods: pay-as-you-go and resource plans.

Pay-as-you-go

When you enable the Image Moderation Pro service, pay-as-you-go is the default billing method. You are billed daily based on your actual usage. If you do not use the service, you are not charged.

Moderation type	Supported services	Unit price
General image moderation (image_standard)	Model Studio Image Input Moderation: bailianQueryImageCheck Model Studio Image Generation Moderation: bailianResponseImageCheck	15 CNY per 10,000 calls Note You are charged for each call to a supported service. For example, 100 calls to the bailianQueryImageCheck service cost 0.15 CNY.

Note

The pay-as-you-go plan for Content Moderation Pro is metered and billed 1 time per hour. In the bill details, moderationType corresponds to the moderation type field. You can view the bill details.

Resource plans

If you expect a high volume of traffic or have consistent moderation needs, you can purchase a resource plan in advance. Larger resource plans offer greater discounts. You can purchase and use multiple resource plans simultaneously. For more information, see Purchase a resource plan for Image Moderation Pro.

Moderation type

Deduction factor

General image moderation (image_standard)

The deduction factor is 2. This means that each successful API call consumes 2 units from your resource plan.

For example, if you purchase a resource plan with 10 units, one successful API call consumes 2 units, leaving a balance of 8 units.

After you purchase a resource plan, the system first deducts your Image Moderation Pro API usage from the plan's quota. If your usage exceeds your plan's quota, the system automatically bills the overage on a pay-as-you-go basis. Monitor your plan's remaining balance and your pay-as-you-go bills. You can set up low-balance alerts in the Resource Plan management console of Alibaba Cloud Billing Management.

Risk labels

Label definitions

You can view the labels supported by each service and their detection scopes in the Content Moderation console. The following table describes the risk labels, their confidence score ranges, and their descriptions.

Label	Confidence score range	Description
pornographic_adultContent	0–100. A higher score indicates greater confidence.	Detects adult pornographic content.
pornographic_adultToys	0–100. A higher score indicates greater confidence.	Detects adult toys.
pornographic_artwork	0–100. A higher score indicates greater confidence.	Detects pornographic artwork.
pornographic_adultContent_tii	0–100. A higher score indicates greater confidence.	Detects pornographic text within an image.
sexual_suggestiveContent	0–100. A higher score indicates greater confidence.	Detects sexually suggestive or racy content.
sexual_breastBump	0–100. A higher score indicates greater confidence.	Detects nipple protrusion.
sexual_cleavage	0–100. A higher score indicates greater confidence.	Detects female cleavage.
sexual_femaleUnderwear	0–100. A higher score indicates greater confidence.	Detects female underwear or swimwear.
sexual_maleTopless	0–100. A higher score indicates greater confidence.	Detects topless males.
sexual_cartoon	0–100. A higher score indicates greater confidence.	Detects sexually suggestive cartoons or anime.
sexual_femaleShoulder	0–100. A higher score indicates greater confidence.	Detects suggestive images focusing on female shoulders.
sexual_femaleLeg	0–100. A higher score indicates greater confidence.	Detects suggestive images focusing on female legs.
sexual_pregnancy	0–100. A higher score indicates greater confidence.	Detects images related to pregnancy or breastfeeding.
sexual_underage	0–100. A higher score indicates greater confidence.	Detects sexually suggestive content involving minors.
political_historicalNihility	0–100. A higher score indicates greater confidence.	Detects content related to historical revisionism or sensitive historical events.
political_historicalNihility_tii	0–100. A higher score indicates greater confidence.	Detects text related to historical revisionism.
political_politicalFigure_1	0–100. A higher score indicates greater confidence.	Detects current or former national leaders.
political_politicalFigure_2	0–100. A higher score indicates greater confidence.	Detects family members of political figures.
political_politicalFigure_3	0–100. A higher score indicates greater confidence.	Detects local or municipal government officials.
political_politicalFigure_4	0–100. A higher score indicates greater confidence.	Detects foreign leaders and their family members.
political_politicalFigure_name_tii	0–100. A higher score indicates greater confidence.	Detects names of political leaders in text.
political_politicalFigure_metaphor_tii	0–100. A higher score indicates greater confidence.	Detects nicknames or metaphors for major political leaders.
political_prohibitedPerson_tii	0–100. A higher score indicates greater confidence.	Detects names of disgraced officials.
political_prohibitedPerson_1	0–100. A higher score indicates greater confidence.	Detects disgraced national-level officials.
political_prohibitedPerson_2	0–100. A higher score indicates greater confidence.	Detects disgraced provincial- or municipal-level officials.
political_taintedCelebrity	0–100. A higher score indicates greater confidence.	Detects controversial public figures or those involved in major scandals.
political_taintedCelebrity_tii	0–100. A higher score indicates greater confidence.	Detects names of controversial public figures.
political_Chinaflag	0–100. A higher score indicates greater confidence.	Detects the national flag of China.
political_otherflag	0–100. A higher score indicates greater confidence.	Detects other national flags.
political_Chinamap	0–100. A higher score indicates greater confidence.	Detects maps of China.
political_logo	0–100. A higher score indicates greater confidence.	Detects logos of prohibited media outlets.
political_outfit	0–100. A higher score indicates greater confidence.	Detects military or police uniforms.
political_medicalOutfit	0–100. A higher score indicates greater confidence.	Detects medical or healthcare uniforms.
political_badge	0–100. A higher score indicates greater confidence.	Detects national or party emblems.
political_racism_tii	0–100. A higher score indicates greater confidence.	Detects text containing specific sensitive expressions. For more information, see the Content Moderation console.
violent_explosion	0–100. A higher score indicates greater confidence.	Detects explosions or fireworks.
violent_burning	0–100. A higher score indicates greater confidence.	Detects fire or burning objects.
violent_armedForces	0–100. A higher score indicates greater confidence.	Detects imagery associated with terrorist organizations.
violent_crowding	0–100. A higher score indicates greater confidence.	Detects crowd gatherings or protests.
violent_gun	0–100. A higher score indicates greater confidence.	Detects guns.
violent_knives	0–100. A higher score indicates greater confidence.	Detects knives.
violent_gunKnives_tii	0–100. A higher score indicates greater confidence.	Detects text describing guns or knives.
violent_blood	0–100. A higher score indicates greater confidence.	Detects bloody or gory content.
violent_horrific	0–100. A higher score indicates greater confidence.	Detects horrific or frightening content.
violent_horrific_tii	0–100. A higher score indicates greater confidence.	Detects text describing violent or horrific content.
contraband_drug	0–100. A higher score indicates greater confidence.	Detects illegal drugs or drug paraphernalia.
contraband_drug_tii	0–100. A higher score indicates greater confidence.	Detects text about illegal drugs.
contraband_gamble	0–100. A higher score indicates greater confidence.	Detects gambling paraphernalia.
contraband_gamble_tii	0–100. A higher score indicates greater confidence.	Detects text about gambling activities.
contraband_certificate_tii	0–100. A higher score indicates greater confidence.	Detects text related to fake documents or cash-out services.
religion_funeral	0–100. A higher score indicates greater confidence.	Detects funerals or memorial halls.
religion_buddhism	0–100. A higher score indicates greater confidence.	Detects content related to specific religious attire or symbols. For more information, see the Content Moderation console.
religion_christianity	0–100. A higher score indicates greater confidence.
religion_muslim	0–100. A higher score indicates greater confidence.
religion_tii	0–100. A higher score indicates greater confidence.
racism_tii	0–100. A higher score indicates greater confidence.
PDA_kiss	0–100. A higher score indicates greater confidence.	Detects people kissing.
PDA_physicalContact	0–100. A higher score indicates greater confidence.	Detects intimate physical contact.
object_landmark	0–100. A higher score indicates greater confidence.	Detects domestic landmarks.
object_rmb	0–100. A higher score indicates greater confidence.	Detects RMB banknotes or coins.
object_foreignCurrency	0–100. A higher score indicates greater confidence.	Detects foreign currency.
object_wn	0–100. A higher score indicates greater confidence.	Detects a specific cartoon character.
object_carcrash	0–100. A higher score indicates greater confidence.	Detects car crashes.
object_candle	0–100. A higher score indicates greater confidence.	Detects candles.
object_flood	0–100. A higher score indicates greater confidence.	Detects natural disasters such as floods.
pt_logotoSocialNetwork	0–100. A higher score indicates greater confidence.	Detects watermarks from common social media platforms.
pt_qrCode	0–100. A higher score indicates greater confidence.	Detects QR codes.
pt_programCode	0–100. A higher score indicates greater confidence.	Detects mini program codes.
pt_toDirectContact_tii	0–100. A higher score indicates greater confidence.	Detects text containing specific promotional or spam information. For more information, see the Content Moderation console.
pt_toSocialNetwork_tii	0–100. A higher score indicates greater confidence.
pt_toShortVideos_tii	0–100. A higher score indicates greater confidence.
pt_investment_tii	0–100. A higher score indicates greater confidence.
pt_recruitment_tii	0–100. A higher score indicates greater confidence.
inappropriate_smoking	0–100. A higher score indicates greater confidence.	Detects smoking-related content.
inappropriate_drinking	0–100. A higher score indicates greater confidence.	Detects alcohol-related content.
inappropriate_tattoo	0–100. A higher score indicates greater confidence.	Detects tattoos.
inappropriate_middleFinger	0–100. A higher score indicates greater confidence.	Detects the middle finger gesture.
inappropriate_foodWasting	0–100. A higher score indicates greater confidence.	Detects content depicting food waste.
logo_brand	0–100. A higher score indicates greater confidence.	Detects brand logos.
logo_tv	0–100. A higher score indicates greater confidence.	Detects logos of television stations.
logo_streaming	0–100. A higher score indicates greater confidence.	Detects logos of streaming and entertainment services.
profanity_oral_tii	0–100. A higher score indicates greater confidence.	Detects common profanity or vulgar language.
profanity_offensive_tii	0–100. A higher score indicates greater confidence.	Detects severe insults or abusive language.
meme_vulgar	0–100. A higher score indicates greater confidence.	Detects vulgar internet memes.
meme_metaphor	0–100. A higher score indicates greater confidence.	Detects metaphorical internet memes.

Manage labels

You can enable or disable most risk labels in the console, with the exception of mandatory red-line tags. For some risk labels, you can also configure more granular detection settings. For details, see the Content Moderation console.

In the left-side navigation pane, choose API-based Detection (Enhanced) > Image Moderation > Rule configuration.
On the Rule management tab, find the rule for Model Studio Input Image Detection (bailianQueryImageCheck) and click Configure rule in the Actions column.
1. Select the detection type you want to adjust.
2. Click Edit to modify the detection status and the score thresholds for medium and high risk.
3. Click Save. The new configuration is applied to the production environment within 2 to 5 minutes.

Additional operations

Besides tag management, the Content Moderation Console lets you configure a custom image library and a custom text library, view invocation results, and query usage.

For detailed instructions, see the Console Operation Guide.

Note

When you adjust the risk detection scope, select Model Studio Scenario as the service scenario to quickly locate Model Studio-specific services. In Query Results, use the Model Studio request to locate a specific invocation record.