Alibaba Cloud Model Studio provides an OpenAI-compatible Batch File API. Submit requests in bulk through files. The system processes them asynchronously and returns results when all requests complete or the maximum wait time is reached. Costs are only 50% of real-time calls. Ideal for data analytics, model evaluation, and other large-scale workloads where latency is not critical.
To use the console, see console guide.
Workflow
Prerequisites
You can call the Batch File API through the OpenAI SDK (Python, Node.js) or HTTP API.
-
Get an API Key: Get and configure your Model Studio API Key as an environment variable
-
Install SDK (optional): Install the OpenAI SDK if you plan to use it.
-
Service endpoints
-
Chinese mainland:
https://dashscope.aliyuncs.com/compatible-mode/v1 -
International:
https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1
-
Model Studio has released a workspace-specific domain for the Singapore region: https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com. The new dedicated domain delivers superior performance and higher stability for inference requests. We recommend migrating from https://dashscope-intl.aliyuncs.com to the new domain.
{WorkspaceId} is your workspace ID, which can be found on the Workspace Details page in the Model Studio console. The existing domain remains fully functional.
Scope
China (Beijing)
Supported models
Text generation models: The stable versions and some
latestversions of Qwen-Max, Plus, Flash, and Long. Some third-party models (deepseek-r1, deepseek-v3.2, deepseek-v3) are also supported.Multimodal models: The stable versions and some
latestversions of Qwen-VL-Plus, Flash, and OCR.Text embedding models: All versions of text embedding models.
In the batch processing scenario, the maximum context tokens per request is 256 K for
qwen3.7-max,qwen3.7-plus,qwen3.6-plus,qwen3.5-plus, andqwen3.5-flash.Some models support thinking mode. Enabling this mode generates thinking
tokensand increases costs.The
qwen3.6-plusandqwen3.5series models, such asqwen3.5-plusandqwen3.5-flash, have thinking mode enabled by default. If you use a hybrid thinking model, you must explicitly set theenable_thinkingparameter. Set this parameter totrueto enable the mode orfalseto disable it.In the JSONL request body,
enable_thinkingis a top-level parameter ofbodyand must be placed at the same level asmodel. Do not place it insideextra_body.
Singapore
Supported models: qwen-max, qwen-plus, qwen-turbo.
Getting started
Before processing formal tasks, test with the batch-test-model. This test model skips inference and returns a fixed success response, allowing you to verify your API call chain and data format.
Limitations of batch-test-model:
-
Your test file must meet Input file requirements. Maximum size: 1 MB. Maximum lines: 100.
-
Concurrency limit: Up to 2 parallel tasks.
-
Cost: The test model does not incur model inference fees.
Step 1: Prepare the input file
Use OSS for production: This step uploads files through the complimentary storage space, which is intended for quick validation only. The complimentary space has storage limits. For production environments, we recommend using files stored in your own Alibaba Cloud OSS to create Batch tasks without uploading. For details, see Use OSS files to create a Batch task in the Advanced features section.
Prepare a file named test_model.jsonl with the following content:
{"custom_id":"1","method":"POST","url":"/v1/chat/ds-test","body":{"model":"batch-test-model","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Hello! How can I help you?"}]}}
{"custom_id":"2","method":"POST","url":"/v1/chat/ds-test","body":{"model":"batch-test-model","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is 2+2?"}]}}
Multimodal models (e.g., qwen-vl-plus) support file URLs and Base64-encoded inputs:
{"custom_id":"image-url","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-vl-plus","messages":[{"role":"user","content":[{"type":"image_url","image_url":{"url":"https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"}},{"type":"text","text":"Describe this image."}]}]}}
{"custom_id":"image-base64","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-vl-plus","messages":[{"role":"user","content":[{"type":"image_url","image_url":{"url":"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEA8ADwAAD..."}},{"type":"text","text":"Describe this image."}]}]}}
Step 2: Run the code
Select a code snippet for your programming language. Save it in the same directory as your input file and run it. The code handles the full workflow: upload, create task, poll status, and download results.
To customize the file path or other parameters, modify the code as needed.
Reuse an existing file ID: The ID returned after uploading a file (e.g., file-batch-xxx) can be reused. If the input content remains the same, skip re-uploading and directly create a task with the existing ID:
batch = client.batches.create(
input_file_id="file-batch-xxx", # Reuse existing file ID, no need to re-upload
endpoint="/v1/chat/completions",
completion_window="24h"
)
You can retrieve historical file IDs through the client.files.list(purpose="batch") API to query uploaded Batch file IDs.
Step 3: Verify test results
After the task succeeds, the result file result.jsonl contains the fixed response {"content":"This is a test result."}:
{"id":"a2b1ae25-21f4-4d9a-8634-99a29926486c","custom_id":"1","response":{"status_code":200,"request_id":"a2b1ae25-21f4-4d9a-8634-99a29926486c","body":{"created":1743562621,"usage":{"completion_tokens":6,"prompt_tokens":20,"total_tokens":26},"model":"batch-test-model","id":"chatcmpl-bca7295b-67c3-4b1f-8239-d78323bb669f","choices":[{"finish_reason":"stop","index":0,"message":{"content":"This is a test result."}}],"object":"chat.completion"}},"error":null}
{"id":"39b74f09-a902-434f-b9ea-2aaaeebc59e0","custom_id":"2","response":{"status_code":200,"request_id":"39b74f09-a902-434f-b9ea-2aaaeebc59e0","body":{"created":1743562621,"usage":{"completion_tokens":6,"prompt_tokens":20,"total_tokens":26},"model":"batch-test-model","id":"chatcmpl-1e32a8ba-2b69-4dc4-be42-e2897eac9e84","choices":[{"finish_reason":"stop","index":0,"message":{"content":"This is a test result."}}],"object":"chat.completion"}},"error":null}
Run a formal task
Input file requirements
-
Format: UTF-8 encoded JSONL (one independent JSON object per line).
-
Size limits: Maximum 50,000 requests per file, maximum 500 MB.
-
Line limit: Each JSON object must not exceed 6 MB and must fit within the model's context window.
-
Consistency: All requests in the same file must use the same model and the same thinking mode (if applicable).
-
Unique identifier: Each request must include a unique custom_id field within the file. This field is used to match requests with results.
Scenario 1: Text chat
Example file content:
{"custom_id":"1","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-plus","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Hello! How can I help you?"}]}}
{"custom_id":"2","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-plus","messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is 2+2?"}]}}
Scenario 2: Image and video understanding
Multimodal models (e.g., qwen-vl-plus) support file URLs and Base64-encoded inputs.
{"custom_id":"image-url","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-vl-plus","messages":[{"role":"user","content":[{"type":"image_url","image_url":{"url":"https://dashscope.oss-cn-beijing.aliyuncs.com/images/dog_and_girl.jpeg"}},{"type":"text","text":"Describe this image."}]}]}}
{"custom_id":"image-base64","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-vl-plus","messages":[{"role":"user","content":[{"type":"image_url","image_url":{"url":"data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEA8ADwAAD..."}},{"type":"text","text":"Describe this image."}]}]}}
{"custom_id":"video-url","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-vl-plus","messages":[{"role":"user","content":[{"type":"video","video":"https://example.com/video.mp4"},{"type":"text","text":"Describe this video."}]}]}}
{"custom_id":"video-base64","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-vl-plus","messages":[{"role":"user","content":[{"type":"video","video":["data:image/jpeg;base64,{frame1}","data:image/jpeg;base64,{frame2}","data:image/jpeg;base64,{frame3}"]},{"type":"text","text":"Describe this video."}]}]}}
{"custom_id":"multi-image-url","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-vl-plus","messages":[{"role":"user","content":[{"type":"image_url","image_url":{"url":"https://example.com/image1.jpg"}},{"type":"image_url","image_url":{"url":"https://example.com/image2.jpg"}},{"type":"text","text":"Compare these two images."}]}]}}
{"custom_id":"multi-image-base64","method":"POST","url":"/v1/chat/completions","body":{"model":"qwen-vl-plus","messages":[{"role":"user","content":[{"type":"image_url","image_url":{"url":"data:image/jpeg;base64,{image1_base64}"}},{"type":"image_url","image_url":{"url":"data:image/jpeg;base64,{image2_base64}"}},{"type":"text","text":"Compare these two images."}]}]}}
The Base64 strings in the examples above are truncated. Generate full encodings using the Python code below.
For full details (including file limits, MIME types, and encoding methods), see Pass local files (Base64 encoding or file paths).
JSONL Batch Generation Tool
Use this tool to quickly generate JSON Lines (JSONL) files.
1. Modify the input file
-
In the
test_model.jsonlfile, set themodelparameter to the target model and set theurlfield:Model type
url
Text generation/multimodal models
/v1/chat/completionsText embedding models
/v1/embeddings -
Or use the "JSONL batch generation tool" above to generate a new file for formal tasks. Ensure the
modelandurlfields are correct.
2. Modify the Getting Started code
-
Change the input file path to your file name.
-
Set the endpoint parameter to match the url field in your input file.
3. Run the code and wait for results
When the task completes, successful request results are saved to the local result.jsonl file. If any requests failed, the error details are saved to the error.jsonl file.
-
Successful results (
output_file_id): Each line corresponds to one successful request and includes thecustom_idandresponse.{"id":"3a5c39d5-3981-4e4c-97f2-e0e821893f03","custom_id":"req-001","response":{"status_code":200,"request_id":"3a5c39d5-3981-4e4c-97f2-e0e821893f03","body":{"created":1768306034,"usage":{"completion_tokens":654,"prompt_tokens":14,"total_tokens":668},"model":"qwen-plus","id":"chatcmpl-3a5c39d5-3981-4e4c-97f2-e0e821893f03","choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"Hello! Hangzhou West Lake is a famous scenic spot in China, located in the western part of Hangzhou City, Zhejiang Province, hence the name \"West Lake\". It is one of China's top ten scenic spots and a World Cultural Heritage site (listed by UNESCO in 2011). It is renowned worldwide for its beautiful natural scenery and profound cultural heritage.\n\n### I. Natural Landscape\nWest Lake is surrounded by mountains on three sides and borders the city on one side, covering an area of approximately 6.39 square kilometers, shaped like a ruyi scepter with rippling blue waters. The lake is naturally or artificially divided into multiple water areas by Solitary Hill, Bai Causeway, Su Causeway, and Yanggong Causeway, forming a layout of \"one mountain, two pagodas, three islands, and three causeways\".\n\nMain attractions include the following:\n- **Spring Dawn at Su Causeway**: During the Northern Song Dynasty, the great literary figure Su Dongpo, while serving as the prefect of Hangzhou, led the dredging of West Lake and used the excavated silt to build a causeway, later named \"Su Causeway\". In spring, peach blossoms and willows create a picturesque scene.\n- **Lingering Snow on Broken Bridge**: Located at the eastern end of Bai Causeway, this is where the reunion scene from the Legend of the White Snake took place. After snowfall in winter, it is particularly famous for its silver-white appearance.\n- **Leifeng Pagoda at Sunset**: Leifeng Pagoda glows golden under the setting sun and was once one of the \"Ten Scenes of West Lake\".\n- **Three Pools Mirroring the Moon**: On Xiaoyingzhou Island in the lake, there are three stone pagodas. During the Mid-Autumn Festival, lanterns can be lit inside the pagodas, creating a harmonious interplay of moonlight, lamplight, and lake reflections.\n- **Autumn Moon over Calm Lake**: Located at the western end of Bai Causeway, it is an excellent spot for viewing the moon over the lake.\n- **Viewing Fish at Flower Harbor**: Known for viewing flowers and fish, with peonies and koi complementing each other beautifully in the garden.\n\n### II. Cultural History\nWest Lake not only boasts beautiful scenery but also carries rich historical and cultural significance:\n- Since the Tang and Song dynasties, numerous literati such as Bai Juyi, Su Dongpo, Lin Bu, and Yang Wanli have left poems here.\n- Bai Juyi oversaw the construction of \"Bai Causeway\" and dredged West Lake, benefiting the local people.\n- Around West Lake are many historical sites, including Yuewang Temple (commemorating national hero Yue Fei), Lingyin Temple (a millennium-old Buddhist temple), Liuhe Pagoda, and Longjing Village (the origin of Longjing tea, one of China's top ten famous teas).\n\n### III. Cultural Symbolism\nWest Lake is regarded as a representative of \"paradise on earth\" and a model of traditional Chinese landscape aesthetics. It embodies the philosophical concept of \"harmony between heaven and humanity\" by integrating natural beauty with cultural depth. Many poems, paintings, and operas feature West Lake, making it an important symbol of Chinese culture.\n\n### IV. Travel Recommendations\n- Best visiting seasons: Spring (March-May) for peach blossoms and willows, Autumn (September-November) for clear skies and cool weather.\n- Recommended ways: Walking, cycling (along the lakeside greenway), or boating on the lake.\n- Local cuisine: West Lake vinegar fish, Longjing shrimp, Dongpo pork, pian'erchuan noodles.\n\nIn summary, Hangzhou West Lake is not just a natural wonder but also a living cultural museum worth exploring in detail. If you ever visit Hangzhou, don't miss this earthly paradise that is \"equally charming in light or heavy makeup\"."}}],"object":"chat.completion"}},"error":null} {"id":"628312ba-172c-457d-ba7f-3e5462cc6899","custom_id":"req-002","response":{"status_code":200,"request_id":"628312ba-172c-457d-ba7f-3e5462cc6899","body":{"created":1768306035,"usage":{"completion_tokens":25,"prompt_tokens":18,"total_tokens":43},"model":"qwen-plus","id":"chatcmpl-628312ba-172c-457d-ba7f-3e5462cc6899","choices":[{"finish_reason":"stop","index":0,"message":{"role":"assistant","content":"The spring breeze brushes green willows,\nNight rain nourishes red flowers.\nBird songs fill the forest,\nMountains and rivers share the same beauty."}}],"object":"chat.completion"}},"error":null} -
Failure details (
error_file_id): Contains information about failed requests with line numbers and error reasons. See Error codes for troubleshooting.
Detailed procedure
The Batch API workflow consists of four steps: upload a file, create a task, query task status, and download results.
If your data is already stored in Alibaba Cloud Object Storage Service (OSS), skip the file upload step and use the OSS file path when creating the batch task. See Create a batch task using an OSS file.
1. Upload file
2. Create a batch task
3. Query and manage batch tasks
4. Download Batch result file
Advanced features
Create a batch task using an OSS file
For large files, store them in Alibaba Cloud OSS and reference them via input_file_id to avoid upload size limits.
Method 1: Use file URL
Use an OSS file URL with public-read permission or a pre-signed URL as input_file_id:
batch_job = client.batches.create(
input_file_id="https://your-bucket.oss-cn-beijing.aliyuncs.com/file.jsonl?Expires=...",
endpoint="/v1/chat/completions",
completion_window="24h"
)
Method 2: Use resource identifier (recommended)
-
Complete OSS authorization
Refer to the authorization and tag addition steps in Configuration instructions for importing files from OSS.
-
Parameter configuration
Use an OSS resource identifier in the format
oss:{region}:{bucket}/{file_path}:batch_job = client.batches.create( input_file_id="oss:cn-beijing:your-bucket/path/to/file.jsonl", endpoint="/v1/chat/completions", completion_window="24h" )
Recommendations:
-
Use a same-region bucket (
cn-beijing): this leverages the internal network, reduces latency, improves stability, and avoids inter-region fees. -
Method 2 (resource identifier) is more secure because it uses RAM authorization instead of public URLs.
Set completion notifications
For long-running tasks, use asynchronous notifications instead of polling to reduce resource consumption.
Completion notification is only supported in the Beijing region.
-
Callback: Specify a publicly accessible URL when creating the task.
-
EventBridge message queue: Deeply integrated with the Alibaba Cloud ecosystem. No public IP required.
Method 1: Callback
Method 2: EventBridge message queue
Going live
-
File management
-
Periodically delete unnecessary files using the OpenAI File delete API to avoid reaching storage limits (10,000 files or 100 GB).
-
Store large files in OSS instead of uploading directly.
-
-
Task monitoring
-
Use Callback or EventBridge asynchronous notifications.
-
If polling is required, set the interval to more than 1 minute and use an exponential backoff strategy.
-
-
Error handling
-
Implement handling for network errors, API errors, and other exceptions.
-
Download and analyze error details from
error_file_id. -
For common error codes, see Error codes.
-
-
Cost optimization
-
Migrate tasks with relaxed latency requirements to the Batch API.
-
Consolidate small tasks into a single batch.
-
Set
completion_windowappropriately to allow greater scheduling flexibility.
-
Utility tools
CSV to JSONL
JSONL results to CSV
Rate limits
|
API |
Rate limit (per Alibaba Cloud account) |
|
Create task |
1,000 calls/minute; up to 1,000 concurrent tasks |
|
Query task |
1,000 calls/minute |
|
Query task list |
100 calls/minute |
|
Cancel task |
1,000 calls/minute |
Billing
-
Unit price: The input and output tokens for all successful requests are charged at 50% of the real-time inference price for the corresponding model. For more information, see Model list.
-
Billing scope:
-
Only requests successfully executed within a task are billed.
-
Requests that fail because of file parsing errors, task execution failures, or row-level errors do not incur charges.
-
For canceled tasks, requests successfully completed before the cancellation are still billed as normal.
-
-
Batch inference is a separate billing item. It supports AI general-purpose savings plan, but not discounts, such as subscription (other savings plans) or free quotas for new users. It also does not support features such as context cache.
-
Some models, such as qwen3.5-plus and qwen3.5-flash, have thinking mode enabled by default. This mode generates additional thinking tokens, which are billed at the output token price and increase costs. To control costs, set the `enable_thinking` parameter based on task complexity. For more information, see Deep thinking.
Error codes
If a request fails and returns an error message, see Error codes for a solution.
FAQ
-
How do I choose between Batch Chat and Batch File?
Use Batch File when you need to process a large file containing many requests asynchronously. Use Batch Chat when your business logic requires submitting many independent conversation requests synchronously with high concurrency.
-
How is the Batch File API billed? Do I need to purchase a separate package?
Batch uses pay-as-you-go billing based on tokens consumed by successful requests. No separate resource package is required.
-
Are submitted batch files executed in order?
No. The system uses dynamic scheduling based on compute load and does not guarantee execution order. Tasks may be delayed when resources are constrained.
-
How long does it take to complete a submitted batch file?
Execution time depends on system resources and task scale. If a task does not complete within the completion_window, it expires. Unprocessed requests in expired tasks are not executed and do not incur charges.
Scenario recommendations: Use real-time calls for scenarios requiring strict real-time model inference. Use batch calls for large-scale data processing scenarios that can tolerate delay.