Split Size Hint-MaxCompute(MaxCompute)-阿里云帮助中心

When a MaxCompute SQL job runs slower than expected, the cause is often a mismatch between task count and available resources. The split_size hint controls how MaxCompute divides table data into chunks for parallel processing. Each chunk becomes one map task, so the split size directly determines job concurrency. Adjust this value to balance task count with available resources: smaller splits create more map tasks for higher parallelism, while larger splits reduce task count to ease scheduling pressure.

SELECT ...
FROM <table_name> /*+split_size(<value_in_mb>)*/
...

The default split size is 256 MB.

How it works

MaxCompute reads a table's data and divides it into splits of the specified size. The number of map tasks is approximately:

map tasks = table data size / split_size

For example, a 10 GB table with split_size(256) creates about 40 map tasks. Dropping to split_size(64) creates about 160 map tasks — four times the parallelism.

MaxCompute honors the split_size hint for most tables. One exception: if the query optimizer uses a clustered table's bucketing properties for optimization, it ignores the hint for that table.

Tune the split size

A mismatch between task count and available resources is one of the most common causes of slow SQL jobs:

Too many tasks for available resources -- Tasks queue for execution slots, and scheduling overhead increases. The job spends more time waiting than computing.
Too few tasks for available resources -- Resources sit idle while a small number of tasks process large data chunks sequentially. The job takes longer than necessary.

Scenario	Action	Why
Tasks are queuing for resources	Increase `split_size` (e.g., 512 or 1024)	Fewer map tasks reduce contention and scheduling overhead
Resources are idle while the job runs	Decrease `split_size` (e.g., 64 or 128)	More map tasks run in parallel, improving resource utilization

Set split_size in multiples of 256 MB (e.g., 256, 512, 1024) for optimal performance.

Examples

All examples place the hint immediately after the table name in the FROM clause.

Set the split size to 1 MB when reading src. The src table data is divided into 1 MB chunks, each processed by a separate map task. For a 10 GB table, this creates about 10,240 map tasks instead of the default 40:

SELECT a.key
FROM src a /*+split_size(1)*/
JOIN src2 b ON a.key = b.key;

Increase the split size to 512 MB to reduce task count on a large table. For a 10 GB table, this cuts the task count from 40 to about 20:

SELECT *
FROM large_table /*+split_size(512)*/
WHERE dt = '2024-01-01';

Behavior with clustered tables and multiple hints

Clustered tables: The query optimizer ignores the split_size hint for a clustered table if it uses the table's bucketing properties for optimization.
Multiple hints on the same table: If a query reads the same table multiple times with different split_size hints, MaxCompute uses the smallest value. For example:
- split_size(1) and split_size(10) on the same table -- MaxCompute uses 1.
- split_size(1) on one reference and no hint on another -- MaxCompute uses 1 (not the 256 MB default).

Reference

Property	Value	Description
Hint name	`split_size`	The name used in the hint comment
Syntax	`/+split_size(<value_in_mb>)/`	Inline comment syntax placed after the table name
Placement	After the table name in a query	Applies to the specific table it follows
Unit	MB	The split size value is interpreted in megabytes
Default	256	Used when no hint is specified
Recommended values	Multiples of 256 (e.g., 256, 512, 1024)	Aligns with internal block boundaries for optimal performance
Conflict resolution	Smallest value across all hints on the same table	Applies when the same table appears multiple times with different hints