Document Moderation 2.0 API

更新时间:
复制 MD 格式

Detects risks and violations in common documents using asynchronous moderation. This topic describes the operations available in Document Moderation 2.0.

Access guidelines

  1. Register an Alibaba Cloud account: Register now.Register now

  2. Activate the pay-as-you-go billing method for Content Moderation: Make sure that the Content Moderation 2.0 service is activated. For more information, see Activate service. Activation is free. After you call API operations, the billing system charges you based on usage.

  3. Create an AccessKey pair: Make sure that you have created an AccessKey pair as a Resource Access Management (RAM) user. For more information, see Create AccessKey. To use an AccessKey pair belonging to a RAM user, use your Alibaba Cloud account to grant the AliyunYundunGreenWebFullAccess permission to the RAM user. For more information, see RAM authorization.

  4. Use SDKs: For more information, see Document Moderation 2.0 SDK and integration guide.

Submit a moderation task

Usage notes

  • Business operation: FileModeration. Only asynchronous moderation is supported.

  • Supported regions and access addresses:

    RegionPublic network access addressInternal network access addressSupported services
    Singaporegreen-cip.ap-southeast-1.aliyuncs.comgreen-cip-vpc.ap-southeast-1.aliyuncs.comdocument_detection_global
  • Billing: This operation is chargeable and billed by the number of pages processed in the document.

  • Moderation object: Common documents are supported.

  • Result delivery: Moderation results are not returned in real time. Retrieve them by polling or by enabling callback notification. Results are retained for up to 24 hours.

    • Callback notification: Specify a callback URL in the callback parameter when submitting the moderation task.

    • Polling: Leave the callback parameter blank and call the result query operation after submission.

  • Document requirements:

    • Supported protocols: HTTP and HTTPS.

    • Supported formats: DOC, DOCX, PPT, PPTX, PPS, PPSX, PDF, XLS, XLSX, XLTX, XLTM, HTML, and TXT (UTF-8 encoding).

    • Size limit: 200 MB per document. Compress or split documents that exceed this limit.

    • Moderation time depends on document download time. Use a stable and reliable storage service such as Alibaba Cloud OSS.

  • Rule configuration: Configure Document Moderation rules in the Content Moderation console before making your first call. Without this configuration, Document Moderation 2.0 defaults to standard settings.

QPS limit

You can call this operation up to 100 times per second per account. The system supports a maximum of 20 concurrent moderation tasks. Requests that exceed this limit are dropped, which may interrupt your service. Note this limit when calling this operation.

Debugging

Use Alibaba Cloud OpenAPIDocument Moderation enhanced edition to debug the operation online, view sample code and SDK dependencies, and explore operation parameters.

Important

Before calling the Content Moderation API, log on to the Content Moderation console using your Alibaba Cloud account. Fees incurred by calling the operations are billed to that account.

Request parameters

NameTypeRequiredExampleDescription
ServiceStringYesdocument_detection_globalThe moderation service type. Valid values: document_detection_global (General Document Moderation).
  • document_detection: general document moderation

  • document_detection_cb: general document moderation for regions outside the Chinese mainland

  • document_detection_byvl: general document moderation powered by large language models

ServiceParametersJSONStringYesThe parameters required by the moderation service, as a JSON string. For descriptions of each field, see ServiceParameters.

Table 1. ServiceParameters

NameTypeRequiredExampleDescription
urlStringYes*http://www.aliyundoc.com/a.pdfThe URL of the document to moderate. The URL must be accessible over the public network. Maximum length: 2,048 characters. The URL cannot contain Chinese characters, and only one URL is allowed per request.
ossBucketNameStringNo*bucket_0307The name of the authorized OSS bucket. Before using OSS intranet addresses, use your Alibaba Cloud account to complete authorization on the Cloud Resource Access Authorization page.
ossObjectNameStringNo*20240307/07/28/test.pdfThe name of the object in the authorized OSS bucket.
ossRegionIdStringNo*cn-shanghaiThe region of the OSS bucket.
docTypeStringNopdfThe document format, required when the URL points to a file without a filename extension. Valid values: doc, docx, ppt, pptx, pps, ppsx, xls, xlsx, xltx, xltm, xlsb, xlsm, csv, pdf, html, txt.
Note

For txt files, only text content is moderated; image content is not moderated by screenshot. Extract text from txt files and call the Text Moderation 2.0 service instead.

callbackStringNohttp://www.aliyundoc.comThe callback URL for moderation result notification. Supports HTTP and HTTPS. If left blank, poll for results. The callback endpoint must support POST requests with UTF-8 encoding, and accept the checksum and content form parameters. Content Moderation populates these parameters as follows: checksum is a string in UID + seed + content format, signed using the SHA256 algorithm, where UID is the Alibaba Cloud account ID (query it in the Alibaba Cloud Management Console). Verify the checksum on your server to detect data tampering.
Note

The UID must be the Alibaba Cloud account ID, not a RAM user ID. content is a JSON-encoded string; parse it to retrieve the moderation result. For the content format, see the sample success responses of the result query operation.

Note

If your server successfully receives a callback notification, return HTTP 200. Otherwise, Content Moderation retries the notification up to 16 times, then stops. Check the callback URL status if notifications are not received.

seedStringNoabc****A random string used to generate the callback notification signature. Maximum length: 64 characters. Allowed characters: letters, digits, and underscores (_). Required if callback is set.
cryptTypeStringNoSHA256The signing algorithm for callback notification content. Valid values: SHA256 (default): signs using the SHA256 algorithm. SM3: signs using the HMAC-SM3 algorithm and returns a hexadecimal lowercase string (for example, 66c7f0f462eeedd9d1f2d46bdc10e4e24167c4875cf2f7a2297da02b8f4ba8e0).
dataIdStringNofileId****The ID of the object to moderate. Maximum length: 128 characters. Allowed characters: letters, digits, underscores (_), hyphens (-), and periods (.).
refererStringNowww.aliyun.comThe Referer request header, used for hotlink protection. Maximum length: 256 characters.

*The url, ossBucketName/ossObjectName/ossRegionId (OSS authorization), and local document upload (via SDK) are three mutually exclusive input methods. Choose one. For local document upload code examples, see Document Moderation 2.0 SDK and access guide.

Response parameters

NameTypeExampleDescription
CodeInteger200The status code, consistent with HTTP status codes. For details, see Code description.
DataJSONObjectThe moderation result data.
Data.TaskIdStringAAAAA-BBBBBThe task ID.
MessageStringOKThe response message.
RequestIdStringABCD1234-1234-1234-1234-123****The request ID.

Examples

Sample requests

{
  "Service": "document_detection_global",
  "ServiceParameters":
  {
    "url": "http://www.aliyundoc.com/a.pdf",
    "dataId": "fileId-2024-0307-0728***"
  }
}

Sample success responses

{
    "Msg": "OK",
    "Code": 200,
    "Data":
    {
        "TaskId": "AAAAA-BBBBB-CCCCCCCC"
    },
    "RequestId": "ABCD1234-1234-1234-1234-123****"
}

Obtain Document Moderation task results

Usage notes

  • Business operation: DescribeFileModerationResult. Retrieves Document Moderation task results.

  • Billing: This operation is free of charge.

  • Query timing: Query moderation results at least 30 seconds after submitting an asynchronous moderation request. Results are retained for up to 24 hours and are automatically deleted after 4 hours.

QPS limit

You can call this operation up to 100 times per second per account. Exceeding this limit triggers throttling, which may affect your service. Note this limit when calling this operation.

Debugging

Use Alibaba Cloud OpenAPI to debug the operation online, view sample code and SDK dependencies, and explore operation parameters.

Request parameters

NameTypeRequiredExampleDescription
ServiceStringYesdocument_detectionThe moderation service type. Must match the service type used when submitting the task.
ServiceParametersJSONStringYesThe parameters required by the moderation service, as a JSON string. For descriptions of each field, see ServiceParameters.

Table 1. ServiceParameters

NameTypeRequiredExampleDescription
taskIdStringYesabcd****The task ID to query. One task ID per request. Obtain the task ID from the response to the submit operation.

Response parameters

NameTypeExampleDescription
RequestIdStringABCD1234-1234-1234-1234-123****The request ID, used to locate and troubleshoot issues.
DataObjectThe document moderation results. For details, see Data.
CodeString200The status code, consistent with HTTP status codes. For details, see Code description.
MessageStringOKThe response message.

Table 2. Data

NameTypeExampleDescription
DataIdStringfileId****The ID of the moderated object. Returned only if dataId was specified in the request.
UrlStringhttp://www.aliyundoc.com/a.docxThe URL of the moderated object.
DocTypeStringpdfThe document format specified for files without a filename extension. Valid values: doc, docx, ppt, pptx, pps, ppsx, xls, xlsx, xltx, xltm, xlsb, xlsm, csv, pdf, html, txt.
PageSummaryObjectA summary of the moderation results. For details, see PageSummary.
RiskLevelStringhighThe overall risk level, calculated from both image and text moderation results. Valid values: high (handle directly), medium (manual review recommended), low (handle when more risky content is detected), none (no risk detected; handle based on business requirements). Configure risk score thresholds in the Content Moderation console.
PageResultJSONArrayPer-page moderation results. HTTP status code 280 indicates moderation is in progress (partial results returned); 200 indicates moderation is complete. For details, see PageResult.

Table 3. PageSummary

NameTypeExampleDescription
PageSumInteger10The total number of pages moderated.
ImageSummaryObjectA summary of image moderation results. Not present for txt files. For details, see ImageSummary.
TextSummaryObjectA summary of text moderation results. For details, see TextSummary.

Table 4. ImageSummary

NameTypeExampleDescription
RiskLevelStringhighThe image risk level, based on configured risk score thresholds. Valid values: high, medium, low, none.
ImageLabelsJSONArrayA summary of image labels. For details, see ImageLabels.

Table 5. ImageLabels

NameTypeExampleDescription
LabelStringviolent_explosionThe image risk label. For details, see Risk label interpretation table.
LabelSumIntegerThe number of occurrences of the label.
DescriptionStringFireworks contentA description of the label. This field is informational and may change. Base result processing on the Label field, not this field.

Table 6. TextSummary

NameTypeExampleDescription
RiskLevelStringhighThe text risk level. Valid values: high, medium, low, none.
TextLabelsJSONArrayA summary of text labels. For details, see TextLabels.

Table 7. TextLabels

NameTypeExampleDescription
LabelStringviolent_explosionThe text risk label.
LabelSumIntegerThe number of times the label was matched.

Table 8. PageResult

NameTypeExampleDescription
PageNumInteger50The page number of the document.
ImageUrlStringhttp://oss.aliyundoc.com/a.pngThe URL of the screenshot for the current page.
ImageResultJSONArrayImage moderation results for the current page. Not present for txt files. For details, see ImageResult.
TextResultJSONArrayText moderation results for the current page. For details, see TextResult.

Table 9. ImageResult

NameTypeExampleDescription
DescriptionStringModeration of the image content of the document pageA description of the image moderation scope.
ServiceStringbaselineCheckThe service called for image moderation.
RiskLevelStringhighThe image risk level, based on configured risk score thresholds. Valid values: high, medium, low, none.
LocationJSONObject{"x":0,"y":0,"w":100,"h":100}(Reserved) The coordinates of the image area.
LabelResultJSONArrayThe labels returned for the image. For details, see LabelResult.

Table 10. LabelResult

NameTypeExampleDescription
LabelStringviolent_explosionThe label returned for the image. Multiple labels may be returned for the same screenshot. For details, see Risk label interpretation table.
ConfidenceFloat81.22The confidence score. Valid values: 0 to 100, accurate to two decimal places.
DescriptionStringFireworks contentA description of the label. This field is informational and may change. Base result processing on the Label field, not this field.

Table 11. TextResult

NameTypeExampleDescription
DescriptionStringModeration of the text content of the document page.A description of the text moderation scope.
ServiceStringpgc_detectionThe service called for text moderation.
TextStringThis is the text partThe text content of the moderated section.
LabelsStringad_compliance,C_customizedThe labels returned for the text. For details, see .
RiskWordsStringRisk word A, Risk word BThe risk words detected in the text.
RiskTipsStringAdvertising Law_General Prohibition of Extreme WordsThe sub-labels returned for the text.
RiskLevelStringhighThe text risk level, based on the calculated text risk. Valid values: high, medium, low, none.

Examples

Sample requests

{
    "service": "document_detection_global",
    "serviceParameters": {
        "taskId": "abcd****"
    }
}

Sample success responses

{
    "Code": 200,
    "Data": {
        "DataId": "fileId-2024-0307-0728***",
        "PageResult": [
            {
                "ImageResult": [
                    {
                        "Description": "Moderation of the image content of the document page",
                        "LabelResult": [
                            {
                                "label": "nonLabel"
                            }
                        ],
                        "Service": "baselineCheck_global"
                    }
                ],
                "ImageUrl": "http://oss.aliyundoc.com/a.png",
                "PageNum": 1,
                "TextResult": [
                    {
                        "Description": "Moderation of the text content of the document page",
                        "Labels": "",
                        "RiskTips": "",
                        "RiskWords": "",
                        "Service": "comment_multilingual_global",
                        "Text": "Content Moderation product test case a"
                    }
                ]
            },
            ...
            {
                "ImageResult": [
                    {
                        "Description": "Moderation of the image content of the document page",
                        "LabelResult": [
                            {
                                "Confidence": 89.01,
                                "Label": "pornographic_adultContent_tii"
                            }
                        ],
                        "Service": "baselineCheck_global"
                    }
                ],
                "ImageUrl": "http://oss.aliyundoc.com/b.png",
                "PageNum": 10,
                "TextResult": [
                    {
                        "Description": "Moderation of the text content of the document page",
                        "Labels": "contraband,sexual_content",
                        "RiskTips": "Prohibited_Prohibited goods, Pornographic_Film resources, Pornographic_Vulgar",
                        "RiskWords": "Risk word A, Risk word B",
                        "Service": "comment_multilingual_global",
                        "Text": "Content Moderation product test case b"
                    }
                ]
            }
        ],
        "Url": "http://www.aliyundoc.com/a.docx"
    },
    "Message": "SUCCESS",
    "RequestId": "1D0854A7-AAAAA-BBBBBBB-CC8292AE5"
}

Code description

Only requests with code 200 or 280 are measured and billed. Other codes are not billed.

CodeDescription
200The request succeeded or the moderation is complete.
280Moderation is in progress.
400Not all required request parameters are configured.
401The request parameters are invalid.
402Invalid request parameters. Check and modify them, then try again.
403The QPS of requests exceeds the upper limit. Reduce the number of requests sent at a time.
404The file failed to download. Check the file URL and try again.
405File download or conversion timed out. The URL may be inaccessible. Check and adjust the file, then try again.
406The file is too large. Check and adjust the file size, then try again.
407The file format is not supported. Check and change the file format, then try again.
408Insufficient permissions. The account may not be activated, may have overdue payments, or may not be authorized to call this operation.
409The specified RequestId does not exist. The moderation results may have exceeded the 24-hour validity period.
480The number of concurrent moderation tasks exceeds the upper limit. Reduce the number of concurrent tasks.
500A system error occurred.