Image OCR-AI Guardrails(AI Guardrails)-阿里云帮助中心

Use the Content Moderation SDK for Python to submit synchronous optical character recognition (OCR) tasks and extract text from images in real time.

Prerequisites

Before you begin, ensure that you have:

Python dependencies installed. See Installation for the required Python version. Using an unsupported version causes subsequent operation calls to fail.
The Extension.Uploader utility class downloaded and imported into your project.

Submit synchronous OCR tasks

Use ImageSyncScanRequest with the scenes parameter set to ocr. The following regions are supported:

Region ID	Location
`cn-shanghai`	China (Shanghai)
`cn-beijing`	China (Beijing)
`cn-shenzhen`	China (Shenzhen)
`ap-southeast-1`	Singapore

# coding=utf-8
import json
import os
import uuid

from aliyunsdkcore import client
from aliyunsdkgreen.request.v20180509 import ImageSyncScanRequest
from aliyunsdkgreen.request.extension import HttpContentHelper

# Reuse the client across requests to improve performance
# and avoid repeated connection overhead.
# Load credentials from environment variables — do not hard-code them.
clt = client.AcsClient(
    os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'],
    os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET'],
    "cn-shanghai"
)


def scan_image_for_text(image_url):
    # Create a new request object for each call — request objects cannot be reused.
    request = ImageSyncScanRequest.ImageSyncScanRequest()
    request.set_accept_format('JSON')

    task = {
        "dataId": str(uuid.uuid1()),
        "url": image_url
    }

    request.set_content(HttpContentHelper.toValue({
        "tasks": [task],
        "scenes": ["ocr"]
    }))

    response = clt.do_action_with_exception(request)
    result = json.loads(response)

    if result["code"] == 200:
        for task_result in result["data"]:
            if task_result["code"] == 200:
                print(task_result["results"])
    return result


scan_image_for_text("https://example.com/test.jpg")

Detect an image using its URL

#coding=utf-8
# The following code calls the image OCR API and returns the detection result in real time.
from aliyunsdkcore import client
from aliyunsdkgreen.request.v20180509 import ImageSyncScanRequest
from aliyunsdkgreenextension.request.extension import HttpContentHelper
import json
import uuid

# Note: Reuse the instantiated client to improve detection performance. This avoids repeated connection establishment.
# Common methods to obtain environment variables:
# Obtain the AccessKey ID of a RAM user: os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID']
# Obtain the AccessKey secret of a RAM user: os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET']
clt = client.AcsClient("<your_access_key_id>", "<your_access_key_secret>", "cn-shanghai")
# Create a new request for each call. Do not reuse request objects.
request = ImageSyncScanRequest.ImageSyncScanRequest()
request.set_accept_format('JSON')
task = {"dataId": str(uuid.uuid1()),
         "url":"https://example.com/test.jpg"
        }

# Set the detection type for card and certificate recognition using the extras parameter. For more information, see the API documentation.
extras = {"card" : "id-card-front"}
print(task)
# Specify the image to detect. One image corresponds to one detection task.
# Note: Batch detection is slower than single-task detection.
# This example detects a single image; create multiple tasks for batch detection.
request.set_content(HttpContentHelper.toValue({"tasks": [task],
                                               "scenes": ["ocr"],
                                               "extras": extras
                                               }))
response = clt.do_action_with_exception(request)
print(response)
result = json.loads(response)
if 200 == result["code"]:
    taskResults = result["data"]
    for taskResult in taskResults:
        if (200 == taskResult["code"]):
            sceneResults = taskResult["results"]
            print(sceneResults)

Detect a local image file

#coding=utf-8
# The following code calls the image OCR API and returns the detection result in real time.
from aliyunsdkcore import client
from aliyunsdkgreen.request.v20180509 import ImageSyncScanRequest
from aliyunsdkgreenextension.request.extension import HttpContentHelper
from aliyunsdkgreenextension.request.extension import ClientUploader
import json
import uuid

# Set the encoding rule to support local paths that contain Chinese characters.
# Add the following content in Python 2. You do not need to add it in Python 3.
if sys.version_info[0] == 2:
    reload(sys)
    sys.setdefaultencoding('utf-8')

# Note: Reuse the instantiated client to improve detection performance. This avoids repeated connection establishment.
# Common methods to obtain environment variables:
# Obtain the AccessKey ID of a RAM user: os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID']
# Obtain the AccessKey secret of a RAM user: os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET']
clt = client.AcsClient("<your_access_key_id>", "<your_access_key_secret>", "cn-shanghai")
# Create a new request for each call. Do not reuse request objects.
request = ImageSyncScanRequest.ImageSyncScanRequest()
request.set_accept_format('JSON')

# Upload the local file to the server. Change the path to your local file path.
uploader = ClientUploader.getImageClientUploader(clt)
url = uploader.uploadFile('d:/test/test.jpg')

task = {"dataId": str(uuid.uuid1()),
         "url":url
        }
# Set the detection type for card and certificate recognition using the extras parameter. For more information, see the API documentation.
extras = {"card" : "id-card-front"}
print(task)
# Specify the image to detect. One image corresponds to one detection task.
# Note: Batch detection is slower than single-task detection.
# This example detects a single image; create multiple tasks for batch detection.
request.set_content(HttpContentHelper.toValue({"tasks": [task],
                                               "scenes": ["ocr"],
                                               "extras": extras
                                               }))
response = clt.do_action_with_exception(request)
print(response)
result = json.loads(response)
if 200 == result["code"]:
    taskResults = result["data"]
    for taskResult in taskResults:
        if (200 == taskResult["code"]):
            sceneResults = taskResult["results"]
            print(sceneResults)

Detect an image using its binary byte array

#coding=utf-8
# The following code calls the image OCR API and returns the detection result in real time.
from aliyunsdkcore import client
from aliyunsdkgreen.request.v20180509 import ImageSyncScanRequest
from aliyunsdkgreenextension.request.extension import HttpContentHelper
from aliyunsdkgreenextension.request.extension import ClientUploader
import json
import uuid

# Set the encoding rule to support local paths that contain Chinese characters.
# Add the following content in Python 2. You do not need to add it in Python 3.
if sys.version_info[0] == 2:
    reload(sys)
    sys.setdefaultencoding('utf-8')

# Use your AccessKey information.
clt = client.AcsClient("<your_access_key_id>", "<your_access_key_secret>","cn-shanghai")
# Create a new request for each call. Do not reuse request objects.
request = ImageSyncScanRequest.ImageSyncScanRequest()
request.set_accept_format('JSON')

# Read a local file as binary data to simulate binary data detection.
# Change the path to your local file path.
f = open('d:/test/test.jpg', "rb+")
imageBytes = f.read()
f.close()

# Upload the binary file to the server.
uploader = ClientUploader.getImageClientUploader(clt)
url = uploader.uploadBytes(imageBytes)
task = {"dataId": str(uuid.uuid1()),
         "url":url
        }

# Set the detection type for card and certificate recognition using the extras parameter. For more information, see the API documentation.
extras = {"card" : "id-card-front"}
print(task)
# Specify the image to detect. One image corresponds to one detection task.
# Note: Batch detection is slower than single-task detection.
# This example detects a single image; create multiple tasks for batch detection.
request.set_content(HttpContentHelper.toValue({"tasks": [task],
                                               "scenes": ["ocr"],
                                               "extras": extras
                                               }))
response = clt.do_action_with_exception(request)
print(response)
result = json.loads(response)
if 200 == result["code"]:
    taskResults = result["data"]
    for taskResult in taskResults:
        if (200 == taskResult["code"]):
            sceneResults = taskResult["results"]
            print(sceneResults)

Replace the placeholder values before running:

Placeholder	Description
`ALIBABA_CLOUD_ACCESS_KEY_ID`	Environment variable holding your RAM user's AccessKey ID
`ALIBABA_CLOUD_ACCESS_KEY_SECRET`	Environment variable holding your RAM user's AccessKey secret
`cn-shanghai`	Region ID. See the supported regions table above.
`https://example.com/test.jpg`	Publicly accessible URL of the image to scan

Response structure

A successful response returns code: 200 at both the top level and per-task level. The OCR text results are in data[n].results.

Field	Type	Description
`code`	Integer	Top-level status code. `200` indicates success.
`data`	Array	Task results, one entry per submitted image.
`data[n].code`	Integer	Per-task status code. `200` indicates success.
`data[n].results`	Array	OCR output for the image, including the recognized text.

Performance considerations

Create one task per image. The total response time spans from the first request to completion of the last image in the batch.
Batching multiple images increases the average response time per image. For latency-sensitive workloads, submit images one at a time.

Billing

OCR cost = number of images moderated x unit price.