Asynchronous detection

更新时间:
复制 MD 格式

This topic describes how to call the asynchronous image detection API to perform general optical character recognition (OCR). General OCR recognizes and returns text content from images.

General OCR is available in two versions: standard and advanced.

  • The standard version is suitable for scenarios that involve a small amount of text, such as movie scenes and Internet images.

  • The advanced version is suitable for complex document image recognition and scenarios with high text density. It can return information about individual characters.

Note

By default, general OCR detects Chinese and English. To detect content in other languages, you can contact your account manager. The supported languages include Mongolian, Uyghur, Tibetan, Arabic, Russian, French, Spanish, Portuguese, Japanese, Korean, Thai, Vietnamese, Persian, Bengali, German, Dutch, Malay, Italian, Hindi, and Indonesian.

Usage notes for asynchronous image detection

API operation: /green/image/asyncscan. This operation is used to perform asynchronous image detection.

You can call this API to create asynchronous image detection tasks. For more information about how to construct an HTTP request, see Request structure. You can also use a pre-built HTTP request. For more information, see SDK overview.

  • Billing information:

    You are charged for calling this operation. For more information about the billing methods, see

  • Detection timeout:

    The maximum response time that is allowed for a synchronous moderation request is 6 seconds. If the moderation is not completed within 6 seconds, a timeout error is returned. If you do not require moderation results in real time, you can send asynchronous moderation requests. In most cases, we recommend that you send synchronous moderation requests because synchronous moderation operations are easier to call. We recommend that you set the timeout period to 6 seconds for calling synchronous moderation operations.

  • Return value:

    If you send asynchronous moderation requests, the moderation results are not returned in real time. To obtain moderation results, you can poll the moderation results periodically or enable callback notification. The moderation results are retained for up to 1 hour.

    • Retrieve detection results using a callback: When you submit an asynchronous detection task, include the callback parameter in the request to automatically receive the detection results. For more information, see Request parameters for asynchronous detection.

    • Retrieve detection results by polling: When you submit an asynchronous detection task, you do not need to include the callback parameter. After you submit the task, call the result query API to retrieve the detection results. For more information, see Usage notes for image asynchronous detection result query.

  • Image requirements:

    • The URLs of images must be HTTP or HTTPS URLs.

    • The images must be in PNG, JPG, JPEG, BMP, GIF, or WEBP format.

    • An image can be up to 20 MB in size. The limit for the image size is applicable to both synchronous and asynchronous moderation operations.

      , The height or width cannot exceed30,000pixels (px), and the total pixels of the image cannot exceed2.5 hundred million (px)

      Note

      where, GIFformat images, the total pixels of the image cannot exceed4,194,304(px), The height or width cannot exceed30,000pixels (px).

    • The duration for downloading an image is limited to 3 seconds. If an image fails to be downloaded within 3 seconds, a timeout error is returned.

    • We recommend that you submit images of at least 256 × 256 pixels to ensure the moderation effect.

    • The response time of an operation for moderating images varies based on the duration for downloading these images. Make sure that you use a stable and reliable storage service to store the images to be moderated. We recommend that you use Object Storage Service (OSS) or Content Delivery Network (CDN).

QPS limits

You can call this operation up to 10 times per second per account. If the number of calls per second exceeds the limit, throttling is triggered. As a result, your business may be affected. We recommend that you take note of the limit when you call this operation.

Request parameters for asynchronous detection

Name

Type

Required

Example

Description

bizType

String

No

default

The business scenario. You can create a business scenario in the

Content Moderation console. For more information, see Customize moderation policies.

scenes

StringArray

Yes

["ocr"]

The detection scenario. Set the value to ocr.

callback

String

No

http://www.aliyundoc.com/xx.json

The callback URL for notifying you of asynchronous moderation results. HTTP and HTTPS URLs are supported. If you do not set this parameter, you must poll moderation results periodically.

If you set the callback parameter in the moderation request, make sure that the specified HTTP or HTTPS URL meets the following requirements: supports the POST method, uses UTF-8 to encode the transmitted data, and supports the checksum and content parameters. To send moderation results to the specified callback URL, Content Moderation returns the checksum and content parameters in callback notifications based on the following rules and format:

  • checksum: the string in the UID + Seed + Content format that is generated by the Secure Hash Algorithm 256 (SHA-256) algorithm. UID indicates the ID of your Alibaba Cloud account. You can query the ID in the Alibaba Cloud Management Console. To prevent data tampering, you can use the SHA-256 algorithm to generate a string when your server receives a callback notification and verify the string against the received checksum parameter.

    Note

    UID must be the ID of an Alibaba Cloud account, but not the ID of a RAM user.

  • content: the JSON-formatted string to be parsed to the callback data in the JSON format. For more information about the format of the content parameter, see the sample success responses of each operation that you can call to query asynchronous moderation results.

Note

If your server successfully receives a callback notification, the server sends an HTTP 200 status code to Content Moderation. If your server fails to receive a callback notification, the server sends other HTTP status codes to Content Moderation. If your server fails to receive a callback notification, Content Moderation continues to push the callback notification until your server receives it. Content Moderation can push a callback notification repeatedly up to 16 times. After 16 times, Content Moderation stops pushing the callback notification. In this case, we recommend that you check the status of the callback URL.

seed

String

No

aabbcc123

A random string that is used to generate a signature for the callback notification request.

The string can be up to 64 characters in length and can contain letters, digits, and underscores (_). You can customize this string. It is used to verify the callback notification request when Content Moderation pushes callback notifications to your server.

Note

This parameter is required if you set the callback parameter.

cryptType
String
No
SHA256
The encryption algorithm used to encrypt the callback notification content when you enable callback notification. AI Guardrails encrypts the returned string by using the encryption algorithm that you specify and sends the encrypted string to the callback URL. The returned string is in the UID + Seed + Content format. Valid values:
  • SHA256: The HMAC-SHA256 encryption algorithm is used. This is the default value.
  • SM3: The HMAC-SM3 encryption algorithm is used, and a hexadecimal string is returned. The string consists of lowercase letters and digits.

    For example, 66c7f0f462eeedd9d1f2d46bdc10e4e24167c4875cf2f7a2297da02b8f4ba8e0 is returned after you encrypt abc by using the HMAC-SM3 encryption algorithm.

tasks

JSONArray

Yes

The detection objects. Each element in the JSON array is a task struct. You can submit up to 100 elements at a time. To submit 100 elements, you must increase the concurrent task limit to more than 100. For more information about the structure of each element, see task.

extras

JSONObject

No

xxx

If you do not pass the extras parameter, the standard OCR version is used.

To use the advanced OCR version, you must pass this parameter in the {"type":"${ocrType}"} format. Set ocrType to advanced.

Table 1. task

Name

Type

Required

Example

Description

dataId

String

No

test_data_xxxx

The data ID. Ensure that all IDs in a request are unique.

url

String

Yes

https://www.aliyundoc.com/test_image_xxxx.png

of the object to be detectedURL.

  • Public network HTTP/HTTPS URL, and the length cannot exceed2048 characters.

  • Alibaba Cloud OSSthe file path provided.You must first authorize Content Moderation to accessOSSbucket, only in the same regionOSS bucket.For more information, seeauthorize Content Moderation to accessOSSbucket.

    file path format: oss://<bucket-name>.<endpoint>/<object-name>

interval

Integer

No

2

The interval between two frames that are consecutively captured. This parameter is dedicated for GIF or long image moderation.

  • A GIF image can be regarded as an array of frames. One frame is captured for moderation from every n frames, where n is specified by the interval parameter. The system captures frames from GIF images only when this parameter is specified.

  • Long images can be in portrait or horizontal mode.

    • To moderate a long portrait image, you can calculate the total number of frames in the following way: divide the height by the width and round the result to the nearest integer. In a long portrait image, the height is greater than 400 pixels, and the ratio of height to width is greater than 2.5:1.

    • To moderate a long horizontal image, you can calculate the total number of frames in the following way: divide the width by the height and round the result to the nearest integer. In a long horizontal image, the width is greater than 400 pixels, and the ratio of width to height is greater than 2.5:1.

By default, only the first frame of a GIF image or a long image is moderated. You can use the interval parameter to specify the interval between two frames that the system consecutively captures. This helps reduce moderation costs.

Note

The interval and maxFrames parameters must be used in pairs. For example, the interval parameter is set to 2, and the maxFrames parameter is set to 100 for moderating a GIF image or a long image. In this example, one out of every two frames is moderated, and a maximum of 100 frames are moderated. The fee is calculated based on the actual number of moderated frames.

maxFrames

Integer

No

100

The maximum number of frames to be captured. This parameter is dedicated for GIF or long image moderation. Default value: 1.

If the value of the interval parameter multiplied by that of the maxFrames parameter is smaller than the total number of frames in a GIF image or a long image, the interval for capturing frames is automatically changed to the integer rounded up from the result of dividing the total number of frames in the image by the value of the maxFrames parameter. This helps improve the overall moderation effects.

Response data for asynchronous detection

Name

Type

Example

Description

code

Integer

200

The returned HTTP status code.

For more information, see Common error codes.

msg

String

OK

The response message for the request.

dataId

String

test_data_xxxx

The ID of the moderation object.

Note

If you set the dataId parameter in the moderation request, the value of the dataId request parameter is returned here.

taskId

String

aaa25f95-4892-4d6b-aca9-7939bc6e9baa-148619876****

The ID of the detection task.

url

String

https://www.aliyundoc.com/test_image_xxxx.png

of the object to be detectedURL.

  • Public network HTTP/HTTPS URL, and the length cannot exceed2048 characters.

  • Alibaba Cloud OSSthe file path provided.You must first authorize Content Moderation to accessOSSbucket, only in the same regionOSS bucket.For more information, seeauthorize Content Moderation to accessOSSbucket.

    file path format: oss://<bucket-name>.<endpoint>/<object-name>

extras

JSONObject

xxx

If you set the extras parameter in the moderation request, the value of the extras request parameter is returned here.

Note

This parameter may be subject to changes. Use the latest value of this parameter.

Examples of asynchronous detection

Sample request

http(s)://[Endpoint]/green/image/asyncscan
&<Common request parameters>
{
    "scenes": [
        "ocr"
    ],
    "tasks": [
        {
            "dataId": "test_data_xxxx",
            "url": "https://www.aliyundoc.com/test_image_xxxx.png"
        }
    ]
}

Sample success response

{
    "code": 200,
    "msg": "OK",
    "requestId": "92AD868A-F5D2-4AEA-96D4-E1273B8E074C",
    "data": [
        {
            "code": 200,
            "msg": "OK",
            "dataId": "test_data_xxxx",
            "taskId": "aaa25f95-4892-4d6b-aca9-7939bc6e9baa-148619876****",
            "url": "https://www.aliyundoc.com/test_image_xxxx.png"
        }
    ]
}

Usage notes for image asynchronous detection result query

API operation: /green/image/results. This operation is used to query the results of an asynchronous image detection task.

You can call this API to query the results of asynchronous image detection tasks. For more information about how to construct an HTTP request, see Request structure. You can also use a pre-built HTTP request. For more information, see SDK overview.

  • Billing information:

    This operation is free of charge.

  • Query timeout:

    We recommend that you query moderation results at least 30 seconds after you send an asynchronous moderation request. Content Moderation retains moderation results for up to 4 hours. After 4 hours, the results are deleted.

QPS limits

You can call this operation up to 10 times per second per account. If the number of calls per second exceeds the limit, throttling is triggered. As a result, your business may be affected. We recommend that you take note of the limit when you call this operation.

Request parameters for result query

Name

Type

Required

Example value

Description

body

JSONArray

Yes

["aaa25f95-4892-4d6b-aca9-7939bc6e9baa-1486198766695"]

The list of IDs of asynchronous moderation tasks that you want to query. The array can contain up to 100 elements.

After you submit a moderation task, you can obtain the ID of the task from the response.

Response data for result query

Name

Type

Example value

Description

code

Integer

200

The returned HTTP status code.

For more information, see Common error codes.

msg

String

OK

The response message for the request.

dataId

String

test_data_xxxx

The ID of the moderation object.

Note

If you set the dataId parameter in the moderation request, the value of the dataId request parameter is returned here.

taskId

String

aaa25f95-4892-4d6b-aca9-7939bc6e9baa-148619876****

The ID of the detection task.

url

String

https://www.aliyundoc.com/test_image_xxxx.png

of the object to be detectedURL.

  • Public network HTTP/HTTPS URL, and the length cannot exceed2048 characters.

  • Alibaba Cloud OSSthe file path provided.You must first authorize Content Moderation to accessOSSbucket, only in the same regionOSS bucket.For more information, seeauthorize Content Moderation to accessOSSbucket.

    file path format: oss://<bucket-name>.<endpoint>/<object-name>

extras

JSONObject

xxx

If you set the extras parameter in the moderation request, the value of the extras request parameter is returned here.

Note

This parameter may be subject to changes. Use the latest value of this parameter.

results

Array

The returned results. If the call is successful (code=200), the results contain one or more elements. Each element is a struct. For more information about the structure, see result.

Table 2. result

Name

Type

Example

Description

scene

String

ocr

The detection scenario. The value is ocr.

label

String

ocr

The classification of the detection result. Valid values:

  • normal: No text is recognized in the image.

  • ocr: The image contains text.

suggestion

String

review

The recommended user action. Valid values:

  • pass: No action is needed for the returned result.

  • review: Review the recognized text.

rate

Float

99.91

You can ignore this return value in the OCR scenario.

ocrLocations

Array

If the static image (non-GIF) contains text, this parameter returns information about each recognized text entry. For more information about the structure, see ocrLocation.

ocrData

Array

This topic describes the details of calling an asynchronous image detection task,

If the static image (non-GIF) contains text, this parameter returns a combination of all recognized text. The combined text is usually stored in the first element of the array.

frames

Array

xxx

If the dynamic image (GIF) contains text, this parameter returns each recognized frame and its corresponding text.

ocrDetailInfo

Object

Detailed text information from the high-precision full-text recognition result. For more information about the structure, see ocrDetailInfo.

Note

This result is returned only if you pass {"type":"advanced"} in the extras request parameter of the asynchronous detection task.

Table 1. ocrLocation
ParameterTypeExampleDescription
textStringhelloThe single text entry that is detected in the moderated image.
xFloat41The distance between the upper-left corner of the text area and the y-axis, with the upper-left corner of the image being the coordinate origin. Unit: pixels.
yFloat84The distance between the upper-left corner of the text area and the x-axis, with the upper-left corner of the image being the coordinate origin. Unit: pixels.
wFloat83The width of the text area. Unit: pixels.
hFloat26The height of the text area. Unit: pixels.
Table 3. ocrDetailInfo

Name

Type

Example

Description

wordNum

Integer

2

The number of word blocks.

wordsInfo

Object

The word block information. For more information about the structure, see wordsInfo.

Table 4. wordsInfo

Name

Type

Sample value

Description

charInfo

Array

The single character information. For more information about the structure, see charInfo.

direction

Integer

0

The text direction. Valid values:

  • 0: Horizontal

  • 1: Vertical

pos

Array

The coordinate information. For more information about the structure, see pos.

prob

Integer

99

The confidence level.

word

String

Light filters through the forest

The text content of the word block.

Table 2. charInfo
ParameterTypeExampleDescription
hInteger20The height of the word. Unit: pixels.
probInteger99The confidence level.
wInteger20The width of the word. Unit: pixels.
wordStringForestThe content of the word.
xInteger39The x-coordinate of the word. Unit: pixels.
yInteger86The y-coordinate of the word. Unit: pixels.
Table 3. pos
ParameterTypeExampleDescription
xInteger73The x-coordinate of the phrase. Unit: pixels.
yInteger113The y-coordinate of the phrase. Unit: pixels.

Examples of result query

Sample request

http(s)://[Endpoint]green/image/results
&<Common request parameters>
[
    "aaa25f95-4892-4d6b-aca9-7939bc6e9baa-148619876****"
]

Sample success responses

  • Standard OCR

    {
        "code": 200,
        "data": [
            {
                "code": 200,
                "dataId": "test_data_xxxx",
                "extras": {
    
                },
                "msg": "OK",
                "results": [
                    {
                        "label": "ocr",
                        "ocrData": [
                            "This topic describes the details of calling an asynchronous image detection task,"
                        ],
                        "ocrLocations": [
                            {
                                "h": 19,
                                "text": "This topic describes the details of calling an asynchronous image detection task,",
                                "w": 362,
                                "x": 31,
                                "y": 11
                            }
                        ],
                        "rate": 99.91,
                        "scene": "ocr",
                        "suggestion": "review"
                    }
                ],
                "taskId": "aaa25f95-4892-4d6b-aca9-7939bc6e9baa-148619876****",
                "url": "https://www.aliyundoc.com/test_image_xxxx.png"
            }
        ],
        "msg": "OK",
        "requestId": "992C7849-AA45-4055-8F82-8D44D64C15E3"
    }
  • High-precision Image and Text OCR

    {
        "msg": "OK",
        "code": 200,
        "data": [
            {
                "msg": "OK",
                "code": 200,
                "dataId": "test_data_xxxx",
                "extras": {
    
                },
                "results": [
                    {
                        "ocrData": [
                            "Light filters through the forest, scattered like remaining snow "
                        ],
                        "ocrDetailInfo": {
                            "wordsInfo": [
                                {
                                    "prob": 99,
                                    "pos": [
                                        {
                                            "x": 37,
                                            "y": 86
                                        },
                                        {
                                            "x": 123,
                                            "y": 86
                                        },
                                        {
                                            "x": 123,
                                            "y": 109
                                        },
                                        {
                                            "x": 37,
                                            "y": 109
                                        }
                                    ],
                                    "word": "Light filters through the forest",
                                    "charInfo": [
                                        {
                                            "prob": 99,
                                            "w": 20,
                                            "h": 20,
                                            "x": 39,
                                            "y": 86,
                                            "word": "For"
                                        },
                                        {
                                            "prob": 99,
                                            "w": 5,
                                            "h": 20,
                                            "x": 63,
                                            "y": 86,
                                            "word": "est"
                                        },
                                        {
                                            "prob": 99,
                                            "w": 17,
                                            "h": 20,
                                            "x": 72,
                                            "y": 86,
                                            "word": "leak"
                                        },
                                        {
                                            "prob": 99,
                                            "w": 17,
                                            "h": 20,
                                            "x": 103,
                                            "y": 86,
                                            "word": "ing"
                                        }
                                    ],
                                    "direction": 0
                                },
                                {
                                    "prob": 99,
                                    "pos": [
                                        {
                                            "x": 73,
                                            "y": 113
                                        },
                                        {
                                            "x": 174,
                                            "y": 113
                                        },
                                        {
                                            "x": 174,
                                            "y": 136
                                        },
                                        {
                                            "x": 73,
                                            "y": 136
                                        }
                                    ],
                                    "word": "Scattered like remaining snow",
                                    "charInfo": [
                                        {
                                            "prob": 99,
                                            "w": 19,
                                            "h": 20,
                                            "x": 74,
                                            "y": 113,
                                            "word": "Scat"
                                        },
                                        {
                                            "prob": 99,
                                            "w": 16,
                                            "h": 20,
                                            "x": 97,
                                            "y": 113,
                                            "word": "ter"
                                        },
                                        {
                                            "prob": 99,
                                            "w": 13,
                                            "h": 20,
                                            "x": 117,
                                            "y": 113,
                                            "word": "ed"
                                        },
                                        {
                                            "prob": 99,
                                            "w": 16,
                                            "h": 20,
                                            "x": 134,
                                            "y": 113,
                                            "word": "like"
                                        },
                                        {
                                            "prob": 99,
                                            "w": 16,
                                            "h": 20,
                                            "x": 154,
                                            "y": 113,
                                            "word": "snow"
                                        }
                                    ],
                                    "direction": 0
                                }
                            ],
                            "wordNum": 2
                        },
                        "scene": "ocr",
                        "label": "ocr",
                        "suggestion": "review"
                    }
                ],
                "taskId": "aaa25f95-4892-4d6b-aca9-7939bc6e9baa-148619876****",
                "url": "https://www.aliyundoc.com/test_image_xxxx.png"
            }
        ],
        "requestId": "03E6B458-8DDD-4D44-8856-3216E660201E"
    }