Use the Content Moderation SDK for Python to submit synchronous optical character recognition (OCR) tasks and extract text from images in real time.
Prerequisites
Before you begin, ensure that you have:
Python dependencies installed. See Installation for the required Python version. Using an unsupported version causes subsequent operation calls to fail.
The Extension.Uploader utility class downloaded and imported into your project.
Submit synchronous OCR tasks
Use ImageSyncScanRequest with the scenes parameter set to ocr. The following regions are supported:
Region ID | Location |
| China (Shanghai) |
| China (Beijing) |
| China (Shenzhen) |
| Singapore |
# coding=utf-8
import json
import os
import uuid
from aliyunsdkcore import client
from aliyunsdkgreen.request.v20180509 import ImageSyncScanRequest
from aliyunsdkgreen.request.extension import HttpContentHelper
# Reuse the client across requests to improve performance
# and avoid repeated connection overhead.
# Load credentials from environment variables — do not hard-code them.
clt = client.AcsClient(
os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'],
os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET'],
"cn-shanghai"
)
def scan_image_for_text(image_url):
# Create a new request object for each call — request objects cannot be reused.
request = ImageSyncScanRequest.ImageSyncScanRequest()
request.set_accept_format('JSON')
task = {
"dataId": str(uuid.uuid1()),
"url": image_url
}
request.set_content(HttpContentHelper.toValue({
"tasks": [task],
"scenes": ["ocr"]
}))
response = clt.do_action_with_exception(request)
result = json.loads(response)
if result["code"] == 200:
for task_result in result["data"]:
if task_result["code"] == 200:
print(task_result["results"])
return result
scan_image_for_text("https://example.com/test.jpg")Detect an image using its URL
#coding=utf-8 # The following code calls the image OCR API and returns the detection result in real time. from aliyunsdkcore import client from aliyunsdkgreen.request.v20180509 import ImageSyncScanRequest from aliyunsdkgreenextension.request.extension import HttpContentHelper import json import uuid # Note: Reuse the instantiated client to improve detection performance. This avoids repeated connection establishment. # Common methods to obtain environment variables: # Obtain the AccessKey ID of a RAM user: os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'] # Obtain the AccessKey secret of a RAM user: os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET'] clt = client.AcsClient("<your_access_key_id>", "<your_access_key_secret>", "cn-shanghai") # Create a new request for each call. Do not reuse request objects. request = ImageSyncScanRequest.ImageSyncScanRequest() request.set_accept_format('JSON') task = {"dataId": str(uuid.uuid1()), "url":"https://example.com/test.jpg" } # Set the detection type for card and certificate recognition using the extras parameter. For more information, see the API documentation. extras = {"card" : "id-card-front"} print(task) # Specify the image to detect. One image corresponds to one detection task. # Note: Batch detection is slower than single-task detection. # This example detects a single image; create multiple tasks for batch detection. request.set_content(HttpContentHelper.toValue({"tasks": [task], "scenes": ["ocr"], "extras": extras })) response = clt.do_action_with_exception(request) print(response) result = json.loads(response) if 200 == result["code"]: taskResults = result["data"] for taskResult in taskResults: if (200 == taskResult["code"]): sceneResults = taskResult["results"] print(sceneResults)Detect a local image file
#coding=utf-8 # The following code calls the image OCR API and returns the detection result in real time. from aliyunsdkcore import client from aliyunsdkgreen.request.v20180509 import ImageSyncScanRequest from aliyunsdkgreenextension.request.extension import HttpContentHelper from aliyunsdkgreenextension.request.extension import ClientUploader import json import uuid # Set the encoding rule to support local paths that contain Chinese characters. # Add the following content in Python 2. You do not need to add it in Python 3. if sys.version_info[0] == 2: reload(sys) sys.setdefaultencoding('utf-8') # Note: Reuse the instantiated client to improve detection performance. This avoids repeated connection establishment. # Common methods to obtain environment variables: # Obtain the AccessKey ID of a RAM user: os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'] # Obtain the AccessKey secret of a RAM user: os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET'] clt = client.AcsClient("<your_access_key_id>", "<your_access_key_secret>", "cn-shanghai") # Create a new request for each call. Do not reuse request objects. request = ImageSyncScanRequest.ImageSyncScanRequest() request.set_accept_format('JSON') # Upload the local file to the server. Change the path to your local file path. uploader = ClientUploader.getImageClientUploader(clt) url = uploader.uploadFile('d:/test/test.jpg') task = {"dataId": str(uuid.uuid1()), "url":url } # Set the detection type for card and certificate recognition using the extras parameter. For more information, see the API documentation. extras = {"card" : "id-card-front"} print(task) # Specify the image to detect. One image corresponds to one detection task. # Note: Batch detection is slower than single-task detection. # This example detects a single image; create multiple tasks for batch detection. request.set_content(HttpContentHelper.toValue({"tasks": [task], "scenes": ["ocr"], "extras": extras })) response = clt.do_action_with_exception(request) print(response) result = json.loads(response) if 200 == result["code"]: taskResults = result["data"] for taskResult in taskResults: if (200 == taskResult["code"]): sceneResults = taskResult["results"] print(sceneResults)Detect an image using its binary byte array
#coding=utf-8 # The following code calls the image OCR API and returns the detection result in real time. from aliyunsdkcore import client from aliyunsdkgreen.request.v20180509 import ImageSyncScanRequest from aliyunsdkgreenextension.request.extension import HttpContentHelper from aliyunsdkgreenextension.request.extension import ClientUploader import json import uuid # Set the encoding rule to support local paths that contain Chinese characters. # Add the following content in Python 2. You do not need to add it in Python 3. if sys.version_info[0] == 2: reload(sys) sys.setdefaultencoding('utf-8') # Use your AccessKey information. clt = client.AcsClient("<your_access_key_id>", "<your_access_key_secret>","cn-shanghai") # Create a new request for each call. Do not reuse request objects. request = ImageSyncScanRequest.ImageSyncScanRequest() request.set_accept_format('JSON') # Read a local file as binary data to simulate binary data detection. # Change the path to your local file path. f = open('d:/test/test.jpg', "rb+") imageBytes = f.read() f.close() # Upload the binary file to the server. uploader = ClientUploader.getImageClientUploader(clt) url = uploader.uploadBytes(imageBytes) task = {"dataId": str(uuid.uuid1()), "url":url } # Set the detection type for card and certificate recognition using the extras parameter. For more information, see the API documentation. extras = {"card" : "id-card-front"} print(task) # Specify the image to detect. One image corresponds to one detection task. # Note: Batch detection is slower than single-task detection. # This example detects a single image; create multiple tasks for batch detection. request.set_content(HttpContentHelper.toValue({"tasks": [task], "scenes": ["ocr"], "extras": extras })) response = clt.do_action_with_exception(request) print(response) result = json.loads(response) if 200 == result["code"]: taskResults = result["data"] for taskResult in taskResults: if (200 == taskResult["code"]): sceneResults = taskResult["results"] print(sceneResults)
Replace the placeholder values before running:
Placeholder | Description |
| Environment variable holding your RAM user's AccessKey ID |
| Environment variable holding your RAM user's AccessKey secret |
| Region ID. See the supported regions table above. |
| Publicly accessible URL of the image to scan |
Response structure
A successful response returns code: 200 at both the top level and per-task level. The OCR text results are in data[n].results.
Field | Type | Description |
| Integer | Top-level status code. |
| Array | Task results, one entry per submitted image. |
| Integer | Per-task status code. |
| Array | OCR output for the image, including the recognized text. |
Performance considerations
Create one task per image. The total response time spans from the first request to completion of the last image in the batch.
Batching multiple images increases the average response time per image. For latency-sensitive workloads, submit images one at a time.
Billing
OCR cost = number of images moderated x unit price.