The file moderation service extracts images and text from various file types for content moderation. In addition to parsing file content, the service can convert files into images during the moderation process. It then uses image moderation and Optical Character Recognition (OCR) to ensure comprehensive content compliance. This topic describes how to call the file moderation API to moderate file content.
Function introduction
|
Feature |
File detection |
|
Detects the image file format. |
DOC, DOCX, PPT, PPTX, PDF, XLS, XLSX, and TXT. |
|
Text detection in images |
Support |
|
File size limit |
200 MB |
|
Number of concurrent file detection tasks |
10 |
|
Offline moderation mode (for batch file moderation) |
Supported |
|
Other |
|
Usage notes
The API operation for asynchronous file moderation is /green/file/asyncscanv2.
You can call this operation to create a file content moderation task. For information about how to construct an HTTP request, see Request structure. You can also use a pre-built HTTP request. For more information, see SDKs.
Billing information
Billing is based on the actual number of images and the amount of text moderated in the file.
-
Images
You are charged based on the number of moderated pages multiplied by the number of image moderation scenarios. Each image on each page is moderated for pornography, terrorist content, and text within images. Additionally, if you enable image moderation for files, you are charged for file conversion by Intelligent Media Management (IMM). For more information about file conversion fees, see Product Billing.
-
Text
You are charged based on the number of text entries multiplied by the number of text anti-spam scenarios. Every 5,000 characters of text are considered one entry for moderation and billing. For more information about fees for the text anti-spam moderation scenario, see Billing of Text Moderation.
Billing example: If you moderate a 100-page PDF file that contains 1 million characters of text, the total file moderation fee is the sum of the image and text moderation fees.
-
Images: File conversion fee (CNY 0.08 per file) + Image moderation fee (100 pages × CNY 6.85/1,000 pages) = CNY 0.08 + CNY 0.685
-
Text: 200 text entries × text anti-spam moderation fee (CNY 1.8/1,000 entries) = CNY 0.36
Total: CNY 0.08 + CNY 0.685 + CNY 0.36 = CNY 1.125
Concurrency limits
You can call this operation up to five times per second per account. If the number of calls per second exceeds the limit, throttling is triggered. As a result, your business may be affected. We recommend that you take note of the limit when you call this operation.
You can moderate a maximum of 10 files at the same time. If you exceed this limit, the error code 588 is returned. If you require a higher concurrency limit, you can join the DingTalk group (ID: 35573806) to contact a product technical expert.
Request parameters
|
Name |
Type |
Required |
Example |
Description |
|
bizType |
String |
No |
default |
This field specifies your business scenario. You can create business scenarios in the AI Guardrails console. For more information, see Customize moderation rules. |
|
offline |
Boolean |
No |
false |
Specifies whether to use the offline moderation mode.
|
|
modType |
String |
No |
All |
Specifies the moderation type:
Note
We recommend that you moderate both images and text for DOC, DOCX, PPT, PPTX, and PDF files. We recommend that you moderate only text for XLS, XLSX, and TXT files. |
|
maxPages |
Integer |
No |
200 |
The maximum number of pages to moderate in a file. The default value is 200. The maximum value is 1,000. The text content in a file is split into entries of 5,000 characters each for moderation. A maximum of 1,000 entries can be moderated. |
|
callback |
String |
No |
http://www.aliyundoc.com/xx.json |
If you send asynchronous moderation requests, the moderation results are not returned in real time. To obtain moderation results, you can poll the moderation results periodically or enable callback notification. The moderation results are retained for up to 1 hour. If you set the callback parameter in the moderation request, make sure that the specified HTTP or HTTPS URL meets the following requirements: supports the POST method, uses UTF-8 to encode the transmitted data, and supports the checksum and content parameters. To send moderation results to the specified callback URL, Content Moderation returns the checksum and content parameters in callback notifications based on the following rules and format:
Note If your server successfully receives a callback notification, the server sends an HTTP 200 status code to Content Moderation. If your server fails to receive a callback notification, the server sends other HTTP status codes to Content Moderation. If your server fails to receive a callback notification, Content Moderation continues to push the callback notification until your server receives it. Content Moderation can push a callback notification repeatedly up to 16 times. After 16 times, Content Moderation stops pushing the callback notification. In this case, we recommend that you check the status of the callback URL. |
|
seed |
String |
No |
test |
A random string that is used to generate a signature for the callback notification request. The string can be up to 64 characters in length and can contain letters, digits, and underscores (_). You can customize this string. It is used to verify the callback notification request when Content Moderation pushes callback notifications to your server. Note This parameter is required if you set the callback parameter. |
|
cryptType |
String |
No |
SHA256 |
When you use a callback notification (callback), this parameter specifies the algorithm to encrypt the callback notification content. AI Guardrails encrypts the returned result (a string concatenated from
|
|
tasks |
JSONArray |
Yes |
Specifies the moderation objects. Each element in the JSON array is a moderation task object. You can specify up to 100 elements, which means you can submit 100 content entries for moderation at a time. To specify 100 elements, you must increase the concurrent task limit to 100 or more. For a description of the structure of each element, see task. |
|
Name |
Type |
Required |
Example |
Description |
|
clientInfo |
JSONObject |
No |
{"userId":"28645****","userNick":"Mike","userType":"others"} |
The information about the client. For more information, see the "Common request parameters" section of Common parameters. The server determines whether to use the global clientInfo parameter or the clientInfo parameter that is described in this table. Note The clientInfo parameter in this table takes priority over the global one. |
|
dataId |
String |
No |
test2NInmO$tAON6qYUrtCRgLo-1mwxdi |
The data ID of the moderation object. Ensure that all IDs are unique within a single request. |
|
url |
String |
Yes |
https://www.aliyundoc.com/tfs/TB1urBOQFXXXXbMXFXXXXXXXXXX-1442-257.pdf |
of the object to be detectedURL.
|
Returned data
|
Name |
Type |
Example |
Description |
|
code |
Integer |
200 |
The error code. This is consistent with the HTTP status code. For more information, see Common error codes. |
|
taskId |
String |
file_t_7Efx6ndTriK5Xx$rD2RFkr-1oB8zu |
The ID of the moderation task. |
|
dataId |
String |
testCvlKbUe4U@6uT6XJxh3G5-1oB8zu |
The dataId from the request. |
|
msg |
String |
OK |
The response message for the request. |
Examples
Request example
http(s)://[Endpoint]/green/file/asyncscanv2
&<Common request parameters>
{
"bizType":"aligreen-test",
"offline":true,
"modType":"All",
"maxPages":200,
"tasks":[
{
"dataId":"test2NInmO$xxxxxxxxxxxxxx-1mwxdi",
"url":"https://www.aliyundoc.com/tfs/TB1urBOQFXXXXbMXFXXXXXXXXXX-1442-257.pdf"
},
{
"dataId":"test2NInmO$xxxxxxxxxxxxxx-aksdjak",
"url":"https://www.aliyundoc.com/tfs/TB1urBOQFXXXXbMXFXXXXXXXXXX-1442-257.pdf"
}
]
}
Response example
{
"code":200,
"data":[
{
"code":280,
"dataId":"testCvlxxxxxxxxxxxxxxx-1oB8zu",
"msg":"PROCESSING - queue",
"taskId":"file_t_xxxxxxxxxxxxxxx-1oB8zu"
}
],
"msg":"OK",
"requestId":"B15C5A4F-xxxxxxxx-xxxxxxx-446E72C9"
}