Asynchronous file moderation

更新时间:
复制 MD 格式

The file moderation service extracts images and text from various file types for content moderation. In addition to parsing file content, the service can convert files into images during the moderation process. It then uses image moderation and Optical Character Recognition (OCR) to ensure comprehensive content compliance. This topic describes how to call the file moderation API to moderate file content.

Function introduction

Feature

File detection

Detects the image file format.

DOC, DOCX, PPT, PPTX, PDF, XLS, XLSX, and TXT.

Text detection in images

Support

File size limit

200 MB

Number of concurrent file detection tasks

10

Offline moderation mode (for batch file moderation)

Supported

Other

  • Enable or disable image moderation.

  • You can specify the maximum number of pages to detect in a file.

Usage notes

The API operation for asynchronous file moderation is /green/file/asyncscanv2.

You can call this operation to create a file content moderation task. For information about how to construct an HTTP request, see Request structure. You can also use a pre-built HTTP request. For more information, see SDKs.

Billing information

Billing is based on the actual number of images and the amount of text moderated in the file.

  • Images

    You are charged based on the number of moderated pages multiplied by the number of image moderation scenarios. Each image on each page is moderated for pornography, terrorist content, and text within images. Additionally, if you enable image moderation for files, you are charged for file conversion by Intelligent Media Management (IMM). For more information about file conversion fees, see Product Billing.

  • Text

    You are charged based on the number of text entries multiplied by the number of text anti-spam scenarios. Every 5,000 characters of text are considered one entry for moderation and billing. For more information about fees for the text anti-spam moderation scenario, see Billing of Text Moderation.

Billing example: If you moderate a 100-page PDF file that contains 1 million characters of text, the total file moderation fee is the sum of the image and text moderation fees.

  • Images: File conversion fee (CNY 0.08 per file) + Image moderation fee (100 pages × CNY 6.85/1,000 pages) = CNY 0.08 + CNY 0.685

  • Text: 200 text entries × text anti-spam moderation fee (CNY 1.8/1,000 entries) = CNY 0.36

Total: CNY 0.08 + CNY 0.685 + CNY 0.36 = CNY 1.125

Concurrency limits

You can call this operation up to five times per second per account. If the number of calls per second exceeds the limit, throttling is triggered. As a result, your business may be affected. We recommend that you take note of the limit when you call this operation.

You can moderate a maximum of 10 files at the same time. If you exceed this limit, the error code 588 is returned. If you require a higher concurrency limit, you can join the DingTalk group (ID: 35573806) to contact a product technical expert.

Request parameters

Name

Type

Required

Example

Description

bizType

String

No

default

This field specifies your business scenario. You can create business scenarios in the AI Guardrails console. For more information, see Customize moderation rules.

offline

Boolean

No

false

Specifies whether to use the offline moderation mode.

  • false (default): real-time moderation mode. Moderation requests that exceed the concurrency limit are rejected.

  • true: offline moderation mode. Submitted tasks are not guaranteed to be processed in real time. They are queued for processing and moderation begins within 24 hours.

modType

String

No

All

Specifies the moderation type:

  • Text: Scans only the text in the file.

  • Image: Scans the file for images only.

  • All (default): Moderates both the images and text in the file.

Note

We recommend that you moderate both images and text for DOC, DOCX, PPT, PPTX, and PDF files. We recommend that you moderate only text for XLS, XLSX, and TXT files.

maxPages

Integer

No

200

The maximum number of pages to moderate in a file. The default value is 200. The maximum value is 1,000.

The text content in a file is split into entries of 5,000 characters each for moderation. A maximum of 1,000 entries can be moderated.

callback

String

No

http://www.aliyundoc.com/xx.json

If you send asynchronous moderation requests, the moderation results are not returned in real time. To obtain moderation results, you can poll the moderation results periodically or enable callback notification. The moderation results are retained for up to 1 hour.

If you set the callback parameter in the moderation request, make sure that the specified HTTP or HTTPS URL meets the following requirements: supports the POST method, uses UTF-8 to encode the transmitted data, and supports the checksum and content parameters. To send moderation results to the specified callback URL, Content Moderation returns the checksum and content parameters in callback notifications based on the following rules and format:

  • checksum: the string in the UID + Seed + Content format that is generated by the Secure Hash Algorithm 256 (SHA-256) algorithm. UID indicates the ID of your Alibaba Cloud account. You can query the ID in the Alibaba Cloud Management Console. To prevent data tampering, you can use the SHA-256 algorithm to generate a string when your server receives a callback notification and verify the string against the received checksum parameter.

    Note

    UID must be the ID of an Alibaba Cloud account, but not the ID of a RAM user.

  • content: the JSON-formatted string to be parsed to the callback data in the JSON format. For more information about the format of the content parameter, see the sample success responses of each operation that you can call to query asynchronous moderation results.

Note

If your server successfully receives a callback notification, the server sends an HTTP 200 status code to Content Moderation. If your server fails to receive a callback notification, the server sends other HTTP status codes to Content Moderation. If your server fails to receive a callback notification, Content Moderation continues to push the callback notification until your server receives it. Content Moderation can push a callback notification repeatedly up to 16 times. After 16 times, Content Moderation stops pushing the callback notification. In this case, we recommend that you check the status of the callback URL.

seed

String

No

test

A random string that is used to generate a signature for the callback notification request.

The string can be up to 64 characters in length and can contain letters, digits, and underscores (_). You can customize this string. It is used to verify the callback notification request when Content Moderation pushes callback notifications to your server.

Note

This parameter is required if you set the callback parameter.

cryptType

String

No

SHA256

When you use a callback notification (callback), this parameter specifies the algorithm to encrypt the callback notification content. AI Guardrails encrypts the returned result (a string concatenated from user UID + seed + content) using the specified algorithm before sending it to your callback URL. Valid values:

  • SHA256 (default): Uses the SHA256 encryption algorithm.

  • SM3: Uses the Chinese cryptographic HMAC-SM3 algorithm. A hexadecimal string that consists of lowercase letters and digits is returned.

    For example, encrypting abc using the SM3 algorithm returns 66c7f0f462eeedd9d1f2d46bdc10e4e24167c4875cf2f7a2297da02b8f4ba8e0.

tasks

JSONArray

Yes

Specifies the moderation objects. Each element in the JSON array is a moderation task object. You can specify up to 100 elements, which means you can submit 100 content entries for moderation at a time. To specify 100 elements, you must increase the concurrent task limit to 100 or more. For a description of the structure of each element, see task.

Table 1. Task

Name

Type

Required

Example

Description

clientInfo

JSONObject

No

{"userId":"28645****","userNick":"Mike","userType":"others"}

The information about the client. For more information, see the "Common request parameters" section of Common parameters.

The server determines whether to use the global clientInfo parameter or the clientInfo parameter that is described in this table.

Note

The clientInfo parameter in this table takes priority over the global one.

dataId

String

No

test2NInmO$tAON6qYUrtCRgLo-1mwxdi

The data ID of the moderation object. Ensure that all IDs are unique within a single request.

url

String

Yes

https://www.aliyundoc.com/tfs/TB1urBOQFXXXXbMXFXXXXXXXXXX-1442-257.pdf

of the object to be detectedURL.

  • Public network HTTP/HTTPS URL, and the length cannot exceed2048 characters.

  • Alibaba Cloud OSSthe file path provided.You must first authorize Content Moderation to accessOSSbucket, only in the same regionOSS bucket.For more information, seeauthorize Content Moderation to accessOSSbucket.

    file path format: oss://<bucket-name>.<endpoint>/<object-name>

Returned data

Name

Type

Example

Description

code

Integer

200

The error code. This is consistent with the HTTP status code. For more information, see Common error codes.

taskId

String

file_t_7Efx6ndTriK5Xx$rD2RFkr-1oB8zu

The ID of the moderation task.

dataId

String

testCvlKbUe4U@6uT6XJxh3G5-1oB8zu

The dataId from the request.

msg

String

OK

The response message for the request.

Examples

Request example

http(s)://[Endpoint]/green/file/asyncscanv2
&<Common request parameters>
{
    "bizType":"aligreen-test",
    "offline":true,
    "modType":"All",
    "maxPages":200,
    "tasks":[
        {
            "dataId":"test2NInmO$xxxxxxxxxxxxxx-1mwxdi",
            "url":"https://www.aliyundoc.com/tfs/TB1urBOQFXXXXbMXFXXXXXXXXXX-1442-257.pdf"
        },
        {
            "dataId":"test2NInmO$xxxxxxxxxxxxxx-aksdjak",
            "url":"https://www.aliyundoc.com/tfs/TB1urBOQFXXXXbMXFXXXXXXXXXX-1442-257.pdf"
        }
    ]
}

Response example

{
    "code":200,
    "data":[
        {
            "code":280,
            "dataId":"testCvlxxxxxxxxxxxxxxx-1oB8zu",
            "msg":"PROCESSING - queue",
            "taskId":"file_t_xxxxxxxxxxxxxxx-1oB8zu"
        }
    ],
    "msg":"OK",
    "requestId":"B15C5A4F-xxxxxxxx-xxxxxxx-446E72C9"
}