Schedule scans for incremental OSS data

更新时间:
复制 MD 格式

OSS Content Violation Detection - Inclusive Edition is for customers who need to periodically scan data in Object Storage Service (OSS) in batches. It provides cost-effective performance options, integrates the detection capabilities of Content Moderation - Enhanced Edition, and supports a wider range of risk types, more detailed risk labels, and native integration with cloud products like OSS and Log Service. This topic describes how to use the Inclusive Edition to schedule scans for images, audio, video, and documents in OSS.

Enable and authorize

OSS Content Violation Detection - Inclusive Edition uses the detection services of Content Moderation - Enhanced Edition. Before using OSS Content Violation Detection - Inclusive Edition, you must enable Content Moderation - Enhanced Edition. For more information, see Activation and Billing.

Before using OSS Content Violation Detection - Inclusive Edition, you must authorize Content Moderation to access your OSS buckets and Log Service. Once authorized, OSS Content Violation Detection - Inclusive Edition pushes detection results to Log Service. Log Service provides features for querying, analyzing, and processing data to help you monitor risk trends.

Pushing logs and performing query analysis do not incur additional fees. You must enable Log Service and grant the required permissions. For more information about billing, see Billing of OSS Content Violation Detection - Inclusive Edition.

Configure a scheduled scan job

  1. Log on to the Content Moderation console. In the left-side navigation pane, choose Inclusive Edition of OSS Content Moderation > Scan Tasks.

  2. On the Inclusive Edition of OSS Content Moderation page, click Scheduled Scan Task.

    Complete the following settings in the wizard.

    1. Select the type of detection job and click Next.

      Parameter

      Description

      Task Name

      A unique name for the scheduled detection job.

      Select bucket (multiple choice)

      • This service is available in all public cloud OSS regions in the Chinese mainland.

        For more information about regions supported by OSS, see OSS endpoints and data centers.

      • Supports the Region-agnostic (Chinese mainland) region for public cloud OSS.

      Select a task type

      Supports scheduled jobs for images, audio, video, and documents.

      • Image tasks

        Supported image formats: PNG, JPG, JPEG, BMP, WEBP, TIFF, SVG, ICO, and HEIF.

        Image size must not exceed 20 MB. Larger image files are not detected.

        By default, Disable Extensionless File Check. If you enable this feature, files without a suffix are identified as images based on their content-type.

      • Audio and video tasks

        Supported video formats: AVI, FLV, MP4, MPG, ASF, WMV, MOV, WMA, RMVB, RM, FLASH, and TS.

        Supported audio formats: MP3, WAV, AAC, WMA, OGG, M4A, AMR, FLAC, 3GP, and APE.

        The audio or video file size must not exceed 1 GB. Larger files are not detected.

        By default, both Video Files and Audio Files are detected.

      • Document tasks

        Supported document formats: DOC, DOCX, PPT, PPTX, PPS, PPSX, PDF, XLS, XLSX, XLTX, XLTM, HTML, and TXT.

        Document size must not exceed 200 MB. Larger document files are not detected.

      select scan service

      Click Adjust configuration to adjust the detection types for the current job. You can select multiple detection types. For more information about configuring Content Moderation - Enhanced Edition services, see Console User Guide.

      Important

      Detection jobs in OSS Content Violation Detection - Inclusive Edition and the Content Moderation service API share the same detection configuration. Changes to this configuration affect both.

      • Image detection services

        • Large model services for detection:

          • Image Moderation for Large and Small Model Integration (Recommended): Combines the capabilities of the large image detection model and expert models to comprehensively identify various types of non-compliant content, including pornography, sexually suggestive content, political content, terrorism, prohibited items, religious content, advertisement redirection, and undesirable content.

          • Image Moderation Service Based on LLMs: A large model trained for image detection scenarios that can identify risks such as pornographic, political, terrorism-related, prohibited, undesirable, abusive, and advertising content.

          • Large Model-Powered Ad Traffic Detection: Based on a large model, this service can effectively identify various evasive advertisement redirections and AI-generated advertisement content.

        • General scenarios:

          • OSS baseline check: Suitable for detecting red-line violations such as pornographic, political, and terrorism-related content in images stored in OSS.

          • BaselineCheck: Detects red-line violations or content that is unsuitable for dissemination in images.

            We recommend that you select this option if your files include publicly accessible images.

          • baselineCheck_pro: Provides more fine-grained labels in addition to the features of General baseline detection.

            We recommend that you select this option if you have more granular processing needs and some custom requirements for images.

          • TonalityImprove: Detects content in images that may disrupt platform order, affect content tone, or degrade user experience.

            We recommend that you use this service in addition to General baseline detection based on your governance needs.

        • AIGC scenarios:

          • AIGC Image Risk Check: Designed for AIGC scenarios, this service detects whether AIGC-generated images contain non-compliant or inappropriate content.

            We recommend that you select this option if your files include AIGC-generated images.

          • AIGC image detection: Determines whether an image was generated by AIGC across various scenarios.

          • AIGC Detection_Professional Edition: For various scenarios, determines if an image is likely AI-generated or synthetically altered.

          • AI Image Detection (Video Screenshots): For video screenshot scenarios, determines whether an image was generated by AIGC.

          • AIGC Violation Detection: For AIGC scenarios, this service detects elements such as trademarks, special logos, and people in an image to identify potential infringement risks.

        • Business scenarios:

          • Profile Photo Check: For profile photo scenarios, this service detects non-compliant, inappropriate, or platform-disrupting content.

          • Post Comment Image Moderation: For images in posts and comments, this service detects non-compliant, inappropriate, or platform-disrupting content.

          • Advertising Check: For marketing materials, this service detects content that violates advertising laws, is non-compliant, inappropriate, or disrupts platform order.

          • Live Stream Check: For video and live stream screenshots, this service detects non-compliant, inappropriate, or platform-disrupting content.

        • Special scenarios:

          • Moderate Images for Malicious Content: Detects malicious use of images to hide video clips or video players, preventing attackers from exploiting your OSS and CDN traffic.

      • Audio and video detection services

        • Video File Moderation_LLM-based Version (Recommended): Uses the large model service for image detection to detect non-compliant visual or audio information in video files. We recommend using this for all publicly accessible video files.

        • Video Detection: Detects non-compliant or inappropriate content in video files. We recommend using this for all publicly accessible video files.

      • Document detection services

        • General Document Moderation (Large Model Edition) (Recommended): Uses the large model service for image detection on the visual parts of documents to detect non-compliant image or text information, including baseline violations like pornography, sexually suggestive content, political content, terrorism, and prohibited items.

        • General Document Moderation: Detects non-compliant image or text information in documents, including baseline violations like pornography, sexually suggestive content, political content, terrorism, and prohibited items.

    2. Specify the scope of the detection job based on your business requirements, and then click Next.

      Parameter

      Description

      任务周期

      The frequency of the scheduled scan. Options are daily, weekly, or monthly.

      调度时间

      The time window when the job is expected to start. Actual completion time depends on the number of files to be scanned. This value is automatically set by Content Moderation and cannot be changed.

      For weekly or monthly jobs, you can set the schedule date. For example, you can scan incremental files in an OSS bucket for the current month on the last day of each month.

      Filter

      Configure the scan to target files whose prefixes contain specific content or do not contain specific content. For example, if you add img/test_, only files in an OSS Bucket with the prefix img/test_ are scanned.

      Note

      If the files to be scanned are in a specific directory, you can prepend the directory path to the filename to form the complete prefix.

      skip scanned files

      Enabling skip scanned files prevents the detection task from scanning the marked OSS files again.

      Files scanned by a job are marked using an OSS object tag. You can click How do I check whether content moderation has been applied to an OSS file? to learn how to identify scanned files. For more information, see Log storage for OSS content violation detection results.

    3. Select a specification.

      Parameter

      Description

      Specify the upper limit

      • unlimited quantity: Scans all files without a limit.

      • Set detection limit: Set a custom scan limit based on your needs. The service itself does not impose a limit.

      • Important

        The total file count shown for the OSS bucket is an estimate. The actual number of scannable image, audio, or video files cannot be determined beforehand.

      select config

      Two performance tiers are available at different price points, offering a more cost-effective solution than API-based detection. For pricing details, see Billing of OSS Content Violation Detection - Inclusive Edition.

      • scan in 24 hours: The job is scheduled with priority and is typically completed within 24 hours.

      • Queue Up for Moderation: The job is queued and scheduled based on its creation time. The completion time depends on the number of files to be scanned but is typically within 3 days.

    4. Configure callback notifications and result processing actions.

      Parameter

      Description

      Callback Notification

      Select an existing callback plan or create a new one. The service sends detection results according to the plan's settings.

      Note

      You can manage callback notifications on the Notification page. For detailed instructions, see Configure message notifications.

      Result Processing

      By default, Automatic Result Freezing is disabled. You can enable it to process results based on the freezing scope and freezing method you select.

      Disposal Scope:

      • Image tasks

        You can choose to freeze high-risk content and medium-risk content.

        By default, Freeze High-risk Content. You can choose whether to also freeze medium-risk content based on your business needs. You can manage the risk level thresholds in the image detection rule settings.

      • Audio and video tasks

        For video frames and audio, you can choose to freeze high-risk content and medium-risk content respectively.

        By default, Freeze High-risk Content for both video frames and audio. You can choose whether to also freeze medium-risk content based on your business needs. The risk level is calculated based on all captured video frames and audio clips from the video file.

      • Document tasks

        For document images and text, you can choose to freeze high-risk content and medium-risk content respectively.

        By default, Freeze High-risk Content for both document images and text. You can choose whether to also freeze medium-risk content based on your business needs. The risk level is calculated based on all captured screenshots and all text from the document file.

      Disposal Method:

      • Modify permissions: Sets the access permission of the OSS file that meets the freezing criteria to private.

      • Move files: Moves the flagged OSS file to a backup directory in the bucket (location: ${bucket}/alicip_riskfile_backup/) or a custom dump directory, and then deletes the original file.

      Important

      Enabling Automatic Result Freezing requires OSS authorization. Once enabled, this feature directly processes any OSS files that meet the specified criteria. Ensure the detection scope and conditions are correctly configured. If an OSS file is frozen by mistake, you can restore it from the results page or by following the instructions in Use the OSS API to restore a frozen file.

      For real-time scanning, you can click instant increment scan to configure an incremental scan job with OSS Content Violation Detection 1.0. For more information, see Configure an incremental scan job. You can also use Function Compute to call the Content Moderation - Enhanced Edition service. For more information, see Use Function Compute to implement incremental image scanning in OSS.

  3. Click Submit.

    Note
    • The job list displays the total number of files included in the job (filtered files) and the actual number of scanned files (detected files). Because detection jobs run asynchronously, there may be a delay of about 1 minute for the job information to update in the list.

    • You can filter the job list by time, view results, and inspect job configurations. Job and result data is available for the past 180 days.

Configure message notifications

  1. On the OSS compliance detection V2.0 page, click Notification in the navigation bar.

  2. On this page, you can create, edit, and delete message notification plans.

    1. Create New Notification: Click Create New Notification to open the creation dialog box. Enter the callback plan information and click OK.

      • Title: Up to 12 characters. Chinese characters, English letters, underscores (_), and digits are supported.

      • Callback URL: The public HTTP or HTTPS endpoint that receives callback messages. It must support the POST method, the form parameters checksum and content, and the application/x-www-form-urlencoded data format. Ensure the URL returns a valid response.

      • Encryption algorithm: Select an appropriate encryption algorithm.

      • Audit Result: Select Only risky results (returns only results with risky labels) or All results (returns all detection results).

      • Seed value: This value is automatically generated after you configure message notifications in the console. You can view it in the message notification management section.

    2. Edit notification: You can edit a notification message. Editing a plan that is in use affects all tasks using that configuration. Proceed with caution.

    3. Delete notification: You can delete unused notification plans. Plans that are in use cannot be deleted.

  3. Message notification content:

After you enable callback notifications, Content Security sends callback notifications for OSS compliance detection based on your callback configurations. The checksum value is generated by concatenating <user UID> + <Seed> + <content> into a string and applying the encryption algorithm that you configured in the console. After you receive the result, you can recalculate the checksum by using the same algorithm and compare it with the checksum returned by the system to prevent content tampering. The following table describes the content field structure.

Parameter

Type

Example

Description

Code

String

200

The status code.

RequestId

String

ABCD1234-1234-1234-1234-123****

The unique identifier generated by Alibaba Cloud for the request. You can use it to troubleshoot issues.

Data

Object

The content moderation result. For more information, see Data.

Table 2. Data

Parameter

Type

Example

Description

OssBucketName

String

AAAAA-BBBBB-2024-0307

The name of the bucket where the OSS file is located.

OssObjectName

String

videoId****

The name of the OSS file.

OssRegionId

String

cn-shanghai

The region where the bucket is located.

Results

JSONObject

The result of the image detection task. For more information about the fields, see Image moderation API response.

FrameResult

JSONObject

The result for the video frames in a video detection task. For more information about the fields, see Video moderation API response.

AudioResult

JSONObject

The result for the audio in a video detection task. For more information about the fields, see Video moderation API response.

PageResult

JSONObject

The result of the document detection task. For more information about the fields, see Document moderation API response.

Sample responses:

Image detection

The following code provides a sample callback for an image detection task. For more information about the fields, see Response parameters.

{
    "Code": 200,
    "Data": {
        "OssObjectName": "test/img.webp",
        "OssBucketName": "tmpsample",
        "OssRegionId": "cn-shanghai",
        "Results": [
            {
                "Service": "oss_baselineCheck",
                "RiskLevel": "high",
                "Result": [
                    {
                        "Confidence": 95.89,
                        "Label": "sexual_partialNudity"
                    }
                ]
            }
        ]
    },
    "RequestId": "AAAAA-BBBBB-CCCC-DDDDD"
}

Audio and video detection

The following code provides a sample callback for an audio and video detection task. For more information about the fields, see Response parameters.

{
    "Code": 200,
    "Data": {
        "TaskId": "ABCDEF_vi_0502zsx1314520yhxforever-12345",
        "OssObjectName": "test/test_video.mp4",
        "OssRegionId": "cn-shanghai",
        "OssBucketName": "tmpsample",
        "RiskLevel": "high",
        "FrameResult": {
            "FrameNum": 2,
            "RiskLevel": "medium",
            "FrameSummarys": [
                {
                    "Label": "violent_explosion",
                    "LabelSum": 8
                },
                {
                    "Label": "sexual_cleavage",
                    "LabelSum": 5
                }
            ],
            "Frames": [
                {
                    "Offset": 1,
                    "RiskLevel": "none",
                    "Results": [
                        {
                            "Result": [
                                {
                                    "Label": "nonLabel"
                                }
                            ],
                            "Service": "baselineCheck_global"
                        }
                    ],
                    "TempUrl": "http://abc.oss-ap-southeast-1.aliyuncs.com/test1.jpg"
                },
                {
                    "Offset": 2,
                    "RiskLevel": "medium",
                    "Results": [
                        {
                            "Result": [
                                {
                                    "Confidence": 1,
                                    "Label": "sexual_cleavage"
                                },
                                {
                                    "Confidence": 74.1,
                                    "Label": "violent_explosion"
                                }
                            ],
                            "Service": "baselineCheck_global"
                        }
                    ],
                    "TempUrl": "http://abc.oss-ap-southeast-1.aliyuncs.com/test2.jpg"
                }
            ]
        },
        "AudioResult": {
            "AudioSummarys": [
                {
                    "Label": "sexual_sounds",
                    "LabelSum": 3
                }
            ],
            "RiskLevel": "high",
            "SliceDetails": [
                {
                    "EndTime": 60,
                    "EndTimestamp": 1698912813192,
                    "Labels": "",
                    "RiskLevel": "none",
                    "StartTime": 30,
                    "StartTimestamp": 1698912783192,
                    "Text": "Content Security",
                    "Url": "http://abc.oss-cn-shanghai.aliyuncs.com/test.wav"
                },
                {
                    "EndTime": 30,
                    "EndTimestamp": 1698912813192,
                    "Extend": "{\"customizedWords\":\"service\",\"customizedLibs\":\"test\"}",
                    "Labels": "C_customized",
                    "RiskLevel": "high",
                    "StartTime": 0,
                    "StartTimestamp": 1698912783192,
                    "Text": "Welcome to Alibaba Cloud Content Security",
                    "Url": "http://abc.oss-cn-shanghai.aliyuncs.com/test.wav"
                }
            ]
        }
    },
    "RequestId": "9d93d864-ebb9-469f-b7f9-b66ee3a9c41c"
}

Document detection

The following code provides a sample callback for a document detection task. For more information about the fields, see Response parameters.

{
    "Code": 200,
    "Data": {
        "OssObjectName": "test/Test_Document.docx",
        "OssBucketName": "tmpsample",
        "OssRegionId": "cn-shanghai",
        "PageSummary": {
            "PageSum": 2,
            "ImageSummary": {
                "RiskLevel": "high",
                "ImageLabels": [
                    {
                        "LabelSum": 2,
                        "Label": "nonLabel"
                    },
                    {
                        "LabelSum": 1,
                        "Label": "pornographic_adultContent_tii"
                    }
                ]
            },
            "TextSummary": {
                "TextLabels": [
                    {
                        "LabelSum": 2,
                        "Label": "contraband"
                    }
                ],
                "RiskLevel": "high"
            }
        },
        "PageResult": [
            {
                "ImageResult": [
                    {
                        "Description": "Moderation of image content on the document page",
                        "LabelResult": [
                            {
                                "Label": "nonLabel"
                            }
                        ],
                        "RiskLevel": "none",
                        "Service": "baselineCheck"
                    }
                ],
                "ImageUrl": "http://oss.aliyundoc.com/a.png",
                "PageNum": 1,
                "TextResult": [
                    {
                        "Description": "Moderation of text content on the document page",
                        "Labels": "",
                        "RiskLevel": "none",
                        "RiskTips": "",
                        "RiskWords": "",
                        "Service": "pgc_detection",
                        "Text": "Content Security product test case A"
                    }
                ]
            },
            {
                "ImageResult": [
                    {
                        "Description": "Moderation of image content on the document page",
                        "LabelResult": [
                            {
                                "Confidence": 89.01,
                                "Label": "pornographic_adultContent_tii"
                            }
                        ],
                        "RiskLevel": "high",
                        "Service": "baselineCheck"
                    }
                ],
                "ImageUrl": "http://oss.aliyundoc.com/b.png",
                "PageNum": 10,
                "TextResult": [
                    {
                        "Description": "Moderation of text content on the document page",
                        "Labels": "contraband,sexual_content",
                        "RiskLevel": "high",
                        "RiskTips": "Contraband_Prohibited Goods,Pornography_Video Resources,Pornography_Vulgar Content",
                        "RiskWords": "Risky Word A,Risky Word B",
                        "Service": "ad_compliance_detection",
                        "Text": "Content Security product test case B"
                    }
                ]
            }
        ]
    },
    "RequestId": "1d122669-f580-4e17-aafd-87b6803dd830"
}

View job results

  1. On the Inclusive Edition of OSS Content Moderation page, find the desired job in the job list and click View Results in the Actions column.

  2. Filter job detection results by schedule date, detection time range, object name, text information, risk level, search label, or automatic processing status.

    You can query detection results for the last 180 days. The console can display and export a maximum of 50,000 entries at a time. All detection results are also sent to Log Service, which you can use for in-depth analysis and real-time monitoring. For more information, see Log storage for OSS content violation detection results.

    For details about the labels and their meanings, see the API documentation for the respective services: Image Moderation - Enhanced Edition 2.0 Synchronous Detection API for images, video frames, and document snapshots; Audio Moderation - Enhanced Edition API for audio; and Text Moderation - Enhanced Edition API for document text.

    Scans may fail if a file is too large, in an unsupported format, or inaccessible. You are not charged for failed scans, and their results do not appear in the list. If you need to obtain these detection results, contact technical support.

  3. For audio/video jobs, click Sound Picture Results to view detailed moderation results for video frames and audio.

  4. For document jobs, click 文档页结果 to view detailed moderation results for document snapshots and text.

  5. Find the desired file and click View in the Actions column to see a file preview and detailed results.

    To export detection results, click the image.png icon in the upper-right corner of the query results list to export an XLSX file.

Disable and cancel jobs

  • To disable a scheduled job, on the Inclusive Edition of OSS Content Moderation page, click 关闭定时任务. After the job is disabled, you can still view and export results from previous scans.

  • To cancel a running job, click Cancel Job. You cannot cancel completed or stopped jobs. Because detection jobs run asynchronously, cancellation may take up to a minute to take effect. During this time, any files already being processed or in the queue will be scanned to completion.

  • If you disable a job due to a configuration error and create a replacement, we recommend enabling the skip scanned files option in the new job. This prevents you from being charged again for scanning the same files.