When a MaxCompute SQL job runs slower than expected, the cause is often a mismatch between task count and available resources. The split_size hint controls how MaxCompute divides table data into chunks for parallel processing. Each chunk becomes one map task, so the split size directly determines job concurrency. Adjust this value to balance task count with available resources: smaller splits create more map tasks for higher parallelism, while larger splits reduce task count to ease scheduling pressure.
SELECT ...
FROM <table_name> /*+split_size(<value_in_mb>)*/
...
The default split size is 256 MB.
How it works
MaxCompute reads a table's data and divides it into splits of the specified size. The number of map tasks is approximately:
map tasks = table data size / split_size
For example, a 10 GB table with split_size(256) creates about 40 map tasks. Dropping to split_size(64) creates about 160 map tasks — four times the parallelism.
MaxCompute honors the split_size hint for most tables. One exception: if the query optimizer uses a clustered table's bucketing properties for optimization, it ignores the hint for that table.
Tune the split size
A mismatch between task count and available resources is one of the most common causes of slow SQL jobs:
Too many tasks for available resources -- Tasks queue for execution slots, and scheduling overhead increases. The job spends more time waiting than computing.
Too few tasks for available resources -- Resources sit idle while a small number of tasks process large data chunks sequentially. The job takes longer than necessary.
|
Scenario |
Action |
Why |
|
Tasks are queuing for resources |
Increase |
Fewer map tasks reduce contention and scheduling overhead |
|
Resources are idle while the job runs |
Decrease |
More map tasks run in parallel, improving resource utilization |
Set split_size in multiples of 256 MB (e.g., 256, 512, 1024) for optimal performance.
Examples
All examples place the hint immediately after the table name in the FROM clause.
Set the split size to 1 MB when reading src. The src table data is divided into 1 MB chunks, each processed by a separate map task. For a 10 GB table, this creates about 10,240 map tasks instead of the default 40:
SELECT a.key
FROM src a /*+split_size(1)*/
JOIN src2 b ON a.key = b.key;
Increase the split size to 512 MB to reduce task count on a large table. For a 10 GB table, this cuts the task count from 40 to about 20:
SELECT *
FROM large_table /*+split_size(512)*/
WHERE dt = '2024-01-01';
Behavior with clustered tables and multiple hints
Clustered tables: The query optimizer ignores the
split_sizehint for a clustered table if it uses the table's bucketing properties for optimization.-
Multiple hints on the same table: If a query reads the same table multiple times with different
split_sizehints, MaxCompute uses the smallest value. For example:split_size(1)andsplit_size(10)on the same table -- MaxCompute uses1.split_size(1)on one reference and no hint on another -- MaxCompute uses1(not the 256 MB default).
Reference
|
Property |
Value |
Description |
|
Hint name |
|
The name used in the hint comment |
|
Syntax |
|
Inline comment syntax placed after the table name |
|
Placement |
After the table name in a query |
Applies to the specific table it follows |
|
Unit |
MB |
The split size value is interpreted in megabytes |
|
Default |
256 |
Used when no hint is specified |
|
Recommended values |
Multiples of 256 (e.g., 256, 512, 1024) |
Aligns with internal block boundaries for optimal performance |
|
Conflict resolution |
Smallest value across all hints on the same table |
Applies when the same table appears multiple times with different hints |