Description of the advanced parameters for job management-Microservices Engine(MSE)-阿里云帮助中心

Jobs support advanced parameters that control retry behavior, concurrency, history retention, and subtask distribution. Use these parameters to tune job execution for reliability, throughput, and resource protection.

General parameters

These parameters apply to all execution modes.

Parameter	Description	Default value	When to change
Task failure retry count	Maximum automatic retries after a job fails. For example, if the worker running the job restarts, the job fails. A retry count greater than 0 lets the system automatically rerun the job. A value of 0 means no retries.	0	Set to a value greater than 0 when failures are transient, such as worker restarts or temporary network issues. Do not retry jobs that fail due to logic errors -- retries will not resolve them.
Task failure retry interval	Wait time in seconds between consecutive retries.	30	Increase if the retry target needs time to recover, such as a downstream service restarting. Decrease for fast-recovering dependencies.
Task concurrency	Maximum instances that run the same job at the same time. A value of 1 specifies that concurrent execution is not allowed.	1	Increase only if the job is idempotent and safe to run in parallel. Keep at 1 for jobs that modify shared state.
Cleaning strategy	Cleanup policy for job execution history.	Keep last N entries	Change if you need a time-based cleanup policy instead of count-based.
Retained Number	Number of execution history records to keep. Older records are deleted when this limit is exceeded.	300	Increase for a longer audit trail. Decrease to save storage.

Distributed execution parameters

These parameters apply only to Visual MapReduce, MapReduce, and Shard run execution modes unless noted otherwise.

Subtask concurrency and retries

Parameter	Description	Default value	When to change
Number of single-machine concurrent subtasks	Maximum subtasks that run concurrently on a single worker.	5	Increase to speed up execution when the worker has available CPU and memory. Decrease if downstream services or databases cannot handle the load.
Number of failed retries of subtasks	Maximum automatic retries for a failed subtask.	0	Set to a value greater than 0 when subtask failures are transient, such as temporary network timeouts.
Sub-task failure retry interval	Wait time in seconds between consecutive subtask retries.	0	Increase if the subtask depends on a recovering external resource.

Failover and master node behavior

Parameter	Applicable modes	Description	When to change
Subtask Failover Strategy	Visual MapReduce, MapReduce, Shard run	Controls whether a failed subtask is reassigned to a different worker after the worker fails to execute the task and is stopped. When enabled, the system may run a subtask more than once during failover. Make sure your subtask logic is idempotent before enabling this option. Requires agent V1.8.13 or later.	Enable when you need subtask-level fault tolerance. Keep disabled if subtask logic is not idempotent.
The master node participates in the execution	Visual MapReduce, MapReduce, Shard run	Controls whether the master node also runs subtasks. At least two workers must be available. Requires agent V1.8.13 or later.	Turn off when the job generates an extremely large number of subtasks. Dedicating the master node to coordination prevents it from becoming a bottleneck.

Subtask distribution

The Subtask distribution method parameter determines how the master node assigns subtasks to workers:

Model	Behavior	Best for	Limitation
Push model (default)	The master node evenly distributes subtasks to workers.	Most workloads with uniform subtask processing times.	The slowest worker can limit overall throughput.
Pull model	Each worker pulls subtasks from the master node on demand. Faster workers automatically pull more subtasks, which avoids the bottleneck effect.	Workloads where subtask processing times vary across workers.	All pending subtasks are cached on the master node, which increases memory usage. Do not use if you distribute more than 10,000 subtasks at a time.

Push model parameters

The following parameters apply to Visual MapReduce and MapReduce modes only.

distribution policy

Strategy for distributing subtasks to workers. Requires agent V1.10.14 or later.

Policy	Behavior	Default	When to use
Polling Scheme	Distributes an equal number of subtasks to each worker.	Yes	Each subtask takes roughly the same amount of time.
WorkerLoad optimal strategy	The master node monitors worker loads and assigns more subtasks to less-loaded workers.	No	Subtask processing times vary significantly across workers.

Distribution rate

Rate limit for subtask distribution, specified as the number of subtasks distributed per second or per minute. Use this to prevent overwhelming downstream services during large batch operations.

Pull model parameters

The following parameters appear only when Subtask distribution method is set to Pull model.

Parameter	Description	Default value	When to change
Number of subtasks pulled per time	Number of subtasks a worker pulls from the master node in a single request.	5	Increase to reduce pull frequency. Decrease for finer-grained load balancing.
Subtask queue capacity	Size of the local queue that caches subtasks on each worker. A larger queue reduces pull frequency but increases worker memory usage.	10	Increase if subtasks are short-lived and the pull overhead is significant. Decrease to limit worker memory consumption.
Global concurrency of subtasks	Maximum total concurrent subtasks across all workers. The system does not exceed this limit even if individual workers have spare capacity.	1000	Decrease to protect downstream resources. Increase if downstream systems can handle higher parallelism.

Version requirements

Parameter	Minimum agent version
Subtask Failover Strategy	V1.8.13
The master node participates in the execution	V1.8.13
distribution policy	V1.10.14