Scan existing data in OSS

更新时间:
复制 MD 格式

The OSS violation detection affordable edition is for customers who periodically scan existing data in Object Storage Service (OSS). It offers more cost-effective turnaround times and integrates the detection capabilities of Content Moderation (Enhanced Edition). This service supports a wider range of risk types and richer risk labels, and natively integrates with cloud products such as OSS buckets and Log Service (SLS). This topic describes how to use the affordable edition to perform a full scan on images, audio, video, and documents stored in OSS.

Activate and authorize

You must activate Content Moderation (Enhanced Edition) before using this service. For more information, see Activation and billing.

Before using the OSS violation detection affordable edition, you must authorize Content Moderation to access your OSS buckets and Log Service. After authorization, the service pushes scan results to Log Service. Log Service provides features for querying, analysis, and data processing that help you understand content risk trends and monitor content in real time.

On the page, click Authorize Access to OSS and Log Service to complete the authorization.

Pushing logs and performing query analysis do not incur additional charges, but you must activate Log Service and grant the necessary permissions. For detailed billing information, see Billing of OSS violation detection affordable edition.

Configure a full scan task

  1. Log on to the Content Moderation console. In the left-side navigation pane, choose OSS violation detection affordable edition > Scan Tasks.

  2. On the OSS violation detection affordable edition page, click Existing Data Scan Task.

    Follow the wizard to complete the configuration.

    1. Select a task type and click Next.

      Parameter

      Description

      Task Name

      The name of the full scan task. This value must be unique.

      Select bucket (multiple choice)

      • Available in all public cloud OSS regions except for China (Hong Kong) and regions outside the Chinese mainland.

        For more information about OSS-supported regions, see OSS endpoints and data centers.

      • Available for OSS buckets in the Chinese mainland that are not tied to a specific region.

      Select a task type

      Supports image, audio, video, and document scan tasks.

      • Image tasks

        Supported image formats: PNG, JPG, JPEG, BMP, WEBP, TIFF, SVG, ICO, and HEIF.

        Image size must not exceed 20 MB. Larger image files are not detected.

        By default, Disable Extensionless File Check. If you enable this feature, files without a suffix are identified as images based on their content-type.

      • Audio and video tasks

        Supported video formats: AVI, FLV, MP4, MPG, ASF, WMV, MOV, WMA, RMVB, RM, FLASH, and TS.

        Supported audio formats: MP3, WAV, AAC, WMA, OGG, M4A, AMR, FLAC, 3GP, and APE.

        The audio or video file size must not exceed 1 GB. Larger files are not detected.

        By default, both Video Files and Audio Files are detected.

      • Document tasks

        Supported document formats: DOC, DOCX, PPT, PPTX, PPS, PPSX, PDF, XLS, XLSX, XLTX, XLTM, HTML, and TXT.

        Document size must not exceed 200 MB. Larger document files are not detected.

      select scan service

      You can click Manage Moderation Services to adjust and select multiple detection types for the current task. For information on how to configure services in Content Moderation (Enhanced Edition), see Console guide.

      Important

      The OSS violation detection affordable edition task and the Content Moderation service API share the same detection configuration. Any changes to the configuration affect both.

      • Image detection services

        • Large model services for detection:

          • Image Moderation for Large and Small Model Integration (Recommended): Combines the capabilities of the large image detection model and expert models to comprehensively identify various types of non-compliant content, including pornography, sexually suggestive content, political content, terrorism, prohibited items, religious content, advertisement redirection, and undesirable content.

          • Image Moderation Service Based on LLMs: A large model trained for image detection scenarios that can identify risks such as pornographic, political, terrorism-related, prohibited, undesirable, abusive, and advertising content.

          • Large Model-Powered Ad Traffic Detection: Based on a large model, this service can effectively identify various evasive advertisement redirections and AI-generated advertisement content.

        • General scenarios:

          • OSS baseline check: Suitable for detecting red-line violations such as pornographic, political, and terrorism-related content in images stored in OSS.

          • BaselineCheck: Detects red-line violations or content that is unsuitable for dissemination in images.

            We recommend that you select this option if your files include publicly accessible images.

          • baselineCheck_pro: Provides more fine-grained labels in addition to the features of General baseline detection.

            We recommend that you select this option if you have more granular processing needs and some custom requirements for images.

          • TonalityImprove: Detects content in images that may disrupt platform order, affect content tone, or degrade user experience.

            We recommend that you use this service in addition to General baseline detection based on your governance needs.

        • AIGC scenarios:

          • AIGC Image Risk Check: Designed for AIGC scenarios, this service detects whether AIGC-generated images contain non-compliant or inappropriate content.

            We recommend that you select this option if your files include AIGC-generated images.

          • AIGC image detection: Determines whether an image was generated by AIGC across various scenarios.

          • AIGC Detection_Professional Edition: For various scenarios, determines if an image is likely AI-generated or synthetically altered.

          • AI Image Detection (Video Screenshots): For video screenshot scenarios, determines whether an image was generated by AIGC.

          • AIGC Violation Detection: For AIGC scenarios, this service detects elements such as trademarks, special logos, and people in an image to identify potential infringement risks.

        • Business scenarios:

          • Profile Photo Check: For profile photo scenarios, this service detects non-compliant, inappropriate, or platform-disrupting content.

          • Post Comment Image Moderation: For images in posts and comments, this service detects non-compliant, inappropriate, or platform-disrupting content.

          • Advertising Check: For marketing materials, this service detects content that violates advertising laws, is non-compliant, inappropriate, or disrupts platform order.

          • Live Stream Check: For video and live stream screenshots, this service detects non-compliant, inappropriate, or platform-disrupting content.

        • Special scenarios:

          • Moderate Images for Malicious Content: Detects malicious use of images to hide video clips or video players, preventing attackers from exploiting your OSS and CDN traffic.

      • Audio and video detection services

        • Video File Moderation_LLM-based Version (Recommended): Uses the large model service for image detection to detect non-compliant visual or audio information in video files. We recommend using this for all publicly accessible video files.

        • Video Detection: Detects non-compliant or inappropriate content in video files. We recommend using this for all publicly accessible video files.

      • Document detection services

        • General Document Moderation (Large Model Edition) (Recommended): Uses the large model service for image detection on the visual parts of documents to detect non-compliant image or text information, including baseline violations like pornography, sexually suggestive content, political content, terrorism, and prohibited items.

        • General Document Moderation: Detects non-compliant image or text information in documents, including baseline violations like pornography, sexually suggestive content, political content, terrorism, and prohibited items.

    2. Specify the scope of the scan task, and then click Next.

      Parameter

      Description

      Filter files uploaded or updated within specific time range

      Scans objects that were uploaded to or updated in an OSS bucket within the specified time range.

      Filter

      Set a prefix to include or exclude objects from the scan. For example, if you add img/test_, the scan includes only objects in the OSS bucket that have the prefix img/test_.

      Note

      If the objects you want to scan are in a specific directory, include the directory path in the object name and use the full path as the prefix.

      skip scanned files

      If you enable skip scanned files, the scan task skips any OSS objects that have already been scanned and marked.

      Objects scanned by a task are marked using OSS object tagging. You can click How do I check whether content moderation has been applied to an OSS file? to see which objects are marked. For more information, see Log storage for OSS violation detection results.

    3. Select a specification.

      Parameter

      Description

      Specify the upper limit

      • unlimited quantity: Content Moderation scans all objects without restriction.

      • Set Detection Limit: Set a custom limit based on your business needs. Content Moderation does not impose a restriction.

      • Important

        The total number of objects in the OSS bucket is for reference only. The number of image objects cannot be estimated in advance.

        If the total number of objects in all in-progress image tasks under your account exceeds 5,000,000, or the total number of objects in all in-progress audio, video, or document tasks exceeds 500,000, you cannot select the "Scan within 24 hours" specification.

        Set a detection limit for the task or select the "Queued Scan" specification.

      select config

      Two turnaround options are available, each with a different unit price, making it more affordable than API-based scanning. For pricing details, see Billing of OSS violation detection affordable edition.

      • scan in 24 hours: The task is scheduled with higher priority and completes within 24 hours after creation.

      • Queue Up for Moderation: Tasks are scheduled based on their creation time. The turnaround time depends on the number of objects. The task typically completes within 3 days after creation.

    4. Configure callback and handling settings.

      Parameter

      Description

      Callback notification

      You can select an existing callback notification plan or create a new one. The service returns scan results according to the notification settings.

      Note

      You can manage callback notifications on the Notification page. For more information, see Configure message notifications.

      Result handling

      By default, Automatic Result Freezing is disabled. You can enable it to process results based on the freezing scope and freezing method you select.

      Disposal Scope:

      • Image tasks

        You can choose to freeze high-risk content and medium-risk content.

        By default, Freeze High-risk Content. You can choose whether to also freeze medium-risk content based on your business needs. You can manage the risk level thresholds in the image detection rule settings.

      • Audio and video tasks

        For video frames and audio, you can choose to freeze high-risk content and medium-risk content respectively.

        By default, Freeze High-risk Content for both video frames and audio. You can choose whether to also freeze medium-risk content based on your business needs. The risk level is calculated based on all captured video frames and audio clips from the video file.

      • Document tasks

        For document images and text, you can choose to freeze high-risk content and medium-risk content respectively.

        By default, Freeze High-risk Content for both document images and text. You can choose whether to also freeze medium-risk content based on your business needs. The risk level is calculated based on all captured screenshots and all text from the document file.

      Disposal Method:

      • Modify permissions: Sets the access permission of the OSS file that meets the freezing criteria to private.

      • Move files: Moves the flagged OSS file to a backup directory in the bucket (location: ${bucket}/alicip_riskfile_backup/) or a custom dump directory, and then deletes the original file.

      Important

      Enabling Automatic Result Freezing requires OSS authorization. Once enabled, this feature directly processes any OSS files that meet the specified criteria. Ensure the detection scope and conditions are correctly configured. If an OSS file is frozen by mistake, you can restore it from the results page or by following the instructions in Use the OSS API to restore a frozen file.

      If you need to scan newly added objects in your OSS bucket in near real-time, click Incremental Scan Task to configure an incremental scan task. For more information, see Scan incremental images, audio, video, and documents in OSS.

  3. Click Submit.

    Note

    After the task is created, its status appears as Scanning in the task list. When the scan is finished, the status changes to Completed.

    • The task list shows the total number of objects included in the task (filtered objects) and the actual number of scanned objects. Because the scan is asynchronous, there may be a delay of about one minute before the task information updates.

    • In the task list, you can filter tasks by time, view results, and check configurations. You can query tasks and results from the last 180 days.

Configure message notifications

  1. On the OSS compliance detection V2.0 page, click Notification in the navigation bar.

  2. On this page, you can create, edit, and delete message notification plans.

    1. Create New Notification: Click Create New Notification to open the creation dialog box. Enter the callback plan information and click OK.

      • Title: Up to 12 characters. Chinese characters, English letters, underscores (_), and digits are supported.

      • Callback URL: The public HTTP or HTTPS endpoint that receives callback messages. It must support the POST method, the form parameters checksum and content, and the application/x-www-form-urlencoded data format. Ensure the URL returns a valid response.

      • Encryption algorithm: Select an appropriate encryption algorithm.

      • Audit Result: Select Only risky results (returns only results with risky labels) or All results (returns all detection results).

      • Seed value: This value is automatically generated after you configure message notifications in the console. You can view it in the message notification management section.

    2. Edit notification: You can edit a notification message. Editing a plan that is in use affects all tasks using that configuration. Proceed with caution.

    3. Delete notification: You can delete unused notification plans. Plans that are in use cannot be deleted.

  3. Message notification content:

After you enable callback notifications, Content Security sends callback notifications for OSS compliance detection based on your callback configurations. The checksum value is generated by concatenating <user UID> + <Seed> + <content> into a string and applying the encryption algorithm that you configured in the console. After you receive the result, you can recalculate the checksum by using the same algorithm and compare it with the checksum returned by the system to prevent content tampering. The following table describes the content field structure.

Parameter

Type

Example

Description

Code

String

200

The status code.

RequestId

String

ABCD1234-1234-1234-1234-123****

The unique identifier generated by Alibaba Cloud for the request. You can use it to troubleshoot issues.

Data

Object

The content moderation result. For more information, see Data.

Table 2. Data

Parameter

Type

Example

Description

OssBucketName

String

AAAAA-BBBBB-2024-0307

The name of the bucket where the OSS file is located.

OssObjectName

String

videoId****

The name of the OSS file.

OssRegionId

String

cn-shanghai

The region where the bucket is located.

Results

JSONObject

The result of the image detection task. For more information about the fields, see Image moderation API response.

FrameResult

JSONObject

The result for the video frames in a video detection task. For more information about the fields, see Video moderation API response.

AudioResult

JSONObject

The result for the audio in a video detection task. For more information about the fields, see Video moderation API response.

PageResult

JSONObject

The result of the document detection task. For more information about the fields, see Document moderation API response.

Sample responses:

Image detection

The following code provides a sample callback for an image detection task. For more information about the fields, see Response parameters.

{
    "Code": 200,
    "Data": {
        "OssObjectName": "test/img.webp",
        "OssBucketName": "tmpsample",
        "OssRegionId": "cn-shanghai",
        "Results": [
            {
                "Service": "oss_baselineCheck",
                "RiskLevel": "high",
                "Result": [
                    {
                        "Confidence": 95.89,
                        "Label": "sexual_partialNudity"
                    }
                ]
            }
        ]
    },
    "RequestId": "AAAAA-BBBBB-CCCC-DDDDD"
}

Audio and video detection

The following code provides a sample callback for an audio and video detection task. For more information about the fields, see Response parameters.

{
    "Code": 200,
    "Data": {
        "TaskId": "ABCDEF_vi_0502zsx1314520yhxforever-12345",
        "OssObjectName": "test/test_video.mp4",
        "OssRegionId": "cn-shanghai",
        "OssBucketName": "tmpsample",
        "RiskLevel": "high",
        "FrameResult": {
            "FrameNum": 2,
            "RiskLevel": "medium",
            "FrameSummarys": [
                {
                    "Label": "violent_explosion",
                    "LabelSum": 8
                },
                {
                    "Label": "sexual_cleavage",
                    "LabelSum": 5
                }
            ],
            "Frames": [
                {
                    "Offset": 1,
                    "RiskLevel": "none",
                    "Results": [
                        {
                            "Result": [
                                {
                                    "Label": "nonLabel"
                                }
                            ],
                            "Service": "baselineCheck_global"
                        }
                    ],
                    "TempUrl": "http://abc.oss-ap-southeast-1.aliyuncs.com/test1.jpg"
                },
                {
                    "Offset": 2,
                    "RiskLevel": "medium",
                    "Results": [
                        {
                            "Result": [
                                {
                                    "Confidence": 1,
                                    "Label": "sexual_cleavage"
                                },
                                {
                                    "Confidence": 74.1,
                                    "Label": "violent_explosion"
                                }
                            ],
                            "Service": "baselineCheck_global"
                        }
                    ],
                    "TempUrl": "http://abc.oss-ap-southeast-1.aliyuncs.com/test2.jpg"
                }
            ]
        },
        "AudioResult": {
            "AudioSummarys": [
                {
                    "Label": "sexual_sounds",
                    "LabelSum": 3
                }
            ],
            "RiskLevel": "high",
            "SliceDetails": [
                {
                    "EndTime": 60,
                    "EndTimestamp": 1698912813192,
                    "Labels": "",
                    "RiskLevel": "none",
                    "StartTime": 30,
                    "StartTimestamp": 1698912783192,
                    "Text": "Content Security",
                    "Url": "http://abc.oss-cn-shanghai.aliyuncs.com/test.wav"
                },
                {
                    "EndTime": 30,
                    "EndTimestamp": 1698912813192,
                    "Extend": "{\"customizedWords\":\"service\",\"customizedLibs\":\"test\"}",
                    "Labels": "C_customized",
                    "RiskLevel": "high",
                    "StartTime": 0,
                    "StartTimestamp": 1698912783192,
                    "Text": "Welcome to Alibaba Cloud Content Security",
                    "Url": "http://abc.oss-cn-shanghai.aliyuncs.com/test.wav"
                }
            ]
        }
    },
    "RequestId": "9d93d864-ebb9-469f-b7f9-b66ee3a9c41c"
}

Document detection

The following code provides a sample callback for a document detection task. For more information about the fields, see Response parameters.

{
    "Code": 200,
    "Data": {
        "OssObjectName": "test/Test_Document.docx",
        "OssBucketName": "tmpsample",
        "OssRegionId": "cn-shanghai",
        "PageSummary": {
            "PageSum": 2,
            "ImageSummary": {
                "RiskLevel": "high",
                "ImageLabels": [
                    {
                        "LabelSum": 2,
                        "Label": "nonLabel"
                    },
                    {
                        "LabelSum": 1,
                        "Label": "pornographic_adultContent_tii"
                    }
                ]
            },
            "TextSummary": {
                "TextLabels": [
                    {
                        "LabelSum": 2,
                        "Label": "contraband"
                    }
                ],
                "RiskLevel": "high"
            }
        },
        "PageResult": [
            {
                "ImageResult": [
                    {
                        "Description": "Moderation of image content on the document page",
                        "LabelResult": [
                            {
                                "Label": "nonLabel"
                            }
                        ],
                        "RiskLevel": "none",
                        "Service": "baselineCheck"
                    }
                ],
                "ImageUrl": "http://oss.aliyundoc.com/a.png",
                "PageNum": 1,
                "TextResult": [
                    {
                        "Description": "Moderation of text content on the document page",
                        "Labels": "",
                        "RiskLevel": "none",
                        "RiskTips": "",
                        "RiskWords": "",
                        "Service": "pgc_detection",
                        "Text": "Content Security product test case A"
                    }
                ]
            },
            {
                "ImageResult": [
                    {
                        "Description": "Moderation of image content on the document page",
                        "LabelResult": [
                            {
                                "Confidence": 89.01,
                                "Label": "pornographic_adultContent_tii"
                            }
                        ],
                        "RiskLevel": "high",
                        "Service": "baselineCheck"
                    }
                ],
                "ImageUrl": "http://oss.aliyundoc.com/b.png",
                "PageNum": 10,
                "TextResult": [
                    {
                        "Description": "Moderation of text content on the document page",
                        "Labels": "contraband,sexual_content",
                        "RiskLevel": "high",
                        "RiskTips": "Contraband_Prohibited Goods,Pornography_Video Resources,Pornography_Vulgar Content",
                        "RiskWords": "Risky Word A,Risky Word B",
                        "Service": "ad_compliance_detection",
                        "Text": "Content Security product test case B"
                    }
                ]
            }
        ]
    },
    "RequestId": "1d122669-f580-4e17-aafd-87b6803dd830"
}

View scan results

  1. In the task list on the OSS violation detection affordable edition page, find the target task and click View Results in the Actions column.

  2. On the View Results page, you can query scan results by time range, object name, text content, risk level, search label, and automatic handling status.

    You can query results from the last 180 days and display or export up to 50,000 records. All scan results are pushed to Log Service, which provides features such as query, analysis, and data processing to help you understand content risk trends and perform real-time monitoring. For more information, see Log storage for OSS violation detection results.

    The OSS violation detection affordable edition assigns labels from Content Moderation (Enhanced Edition) to the results. For label definitions, see the API reference for synchronous image scans (v2.0) for images, video frames, or document snapshots, the audio moderation API for audio, and the text moderation API for document text.

    Scans may fail for reasons such as the object being too large, an unsupported format, or an object access failure. These failed scans do not incur charges and are not displayed in the results list. If you need to access the results for failed scans, join the DingTalk group (ID: 35573806) for assistance.

  3. For an audio/video scan task, click Sound Picture Results in the Actions column to view detailed moderation results for video frames and audio.

    The page contains the Frame Moderation Results and Audio Moderation Results tabs, where you can filter results by label. Frame moderation results are displayed as video frame thumbnails, each marked with a timestamp and corresponding moderation labels.

  4. For a document scan task, click 文档页结果 in the Actions column to view detailed moderation results for document snapshots and text.

    The page contains the Image Moderation Results and Text Moderation Results tabs, where you can filter results by label. Image moderation results are displayed as document page thumbnails, each marked with corresponding moderation labels.

  5. In the Actions column for a specific object, click View to see an object preview and detailed results.

    To export scan results, click the image.png icon in the upper-right corner of the results list to export an XLSX file.

Cancel a scan task

  • To cancel a full scan task, go to the OSS violation detection affordable edition page and click Cancel Task for a running task. Completed or stopped tasks cannot be canceled. For a canceled task, the results of objects already scanned can still be viewed and exported.

  • Because scan tasks are asynchronous, there may be a delay of about one minute for the cancellation to take effect. Objects that are being scanned or are already queued during this delay will continue to be processed and completed.

  • A canceled task has the status Stopped and cannot be resumed. If you canceled a task due to a configuration error, you must create a new scan task. To avoid duplicate charges for the same objects, we recommend enabling the option to not rescan detected files.