Advanced parameters for job management

更新时间:
复制 MD 格式

Jobs support advanced parameters that control retry behavior, concurrency, history retention, and subtask distribution. Use these parameters to tune job execution for reliability, throughput, and resource protection.

General parameters

These parameters apply to all execution modes.

ParameterDescriptionDefault valueWhen to change
Task failure retry countMaximum automatic retries after a job fails. For example, if the worker running the job restarts, the job fails. A retry count greater than 0 lets the system automatically rerun the job. A value of 0 means no retries.0Set to a value greater than 0 when failures are transient, such as worker restarts or temporary network issues. Do not retry jobs that fail due to logic errors -- retries will not resolve them.
Task failure retry intervalWait time in seconds between consecutive retries.30Increase if the retry target needs time to recover, such as a downstream service restarting. Decrease for fast-recovering dependencies.
Task concurrencyMaximum instances that run the same job at the same time. A value of 1 specifies that concurrent execution is not allowed.1Increase only if the job is idempotent and safe to run in parallel. Keep at 1 for jobs that modify shared state.
Cleaning strategyCleanup policy for job execution history.Keep last N entriesChange if you need a time-based cleanup policy instead of count-based.
Retained NumberNumber of execution history records to keep. Older records are deleted when this limit is exceeded.300Increase for a longer audit trail. Decrease to save storage.

Distributed execution parameters

These parameters apply only to Visual MapReduce, MapReduce, and Shard run execution modes unless noted otherwise.

Subtask concurrency and retries

ParameterDescriptionDefault valueWhen to change
Number of single-machine concurrent subtasksMaximum subtasks that run concurrently on a single worker.5Increase to speed up execution when the worker has available CPU and memory. Decrease if downstream services or databases cannot handle the load.
Number of failed retries of subtasksMaximum automatic retries for a failed subtask.0Set to a value greater than 0 when subtask failures are transient, such as temporary network timeouts.
Sub-task failure retry intervalWait time in seconds between consecutive subtask retries.0Increase if the subtask depends on a recovering external resource.

Failover and master node behavior

ParameterApplicable modesDescriptionWhen to change
Subtask Failover StrategyVisual MapReduce, MapReduce, Shard runControls whether a failed subtask is reassigned to a different worker after the worker fails to execute the task and is stopped. When enabled, the system may run a subtask more than once during failover. Make sure your subtask logic is idempotent before enabling this option. Requires agent V1.8.13 or later.Enable when you need subtask-level fault tolerance. Keep disabled if subtask logic is not idempotent.
The master node participates in the executionVisual MapReduce, MapReduce, Shard runControls whether the master node also runs subtasks. At least two workers must be available. Requires agent V1.8.13 or later.Turn off when the job generates an extremely large number of subtasks. Dedicating the master node to coordination prevents it from becoming a bottleneck.

Subtask distribution

The Subtask distribution method parameter determines how the master node assigns subtasks to workers:

ModelBehaviorBest forLimitation
Push model (default)The master node evenly distributes subtasks to workers.Most workloads with uniform subtask processing times.The slowest worker can limit overall throughput.
Pull modelEach worker pulls subtasks from the master node on demand. Faster workers automatically pull more subtasks, which avoids the bottleneck effect.Workloads where subtask processing times vary across workers.All pending subtasks are cached on the master node, which increases memory usage. Do not use if you distribute more than 10,000 subtasks at a time.

Push model parameters

The following parameters apply to Visual MapReduce and MapReduce modes only.

distribution policy

Strategy for distributing subtasks to workers. Requires agent V1.10.14 or later.

PolicyBehaviorDefaultWhen to use
Polling SchemeDistributes an equal number of subtasks to each worker.YesEach subtask takes roughly the same amount of time.
WorkerLoad optimal strategyThe master node monitors worker loads and assigns more subtasks to less-loaded workers.NoSubtask processing times vary significantly across workers.

Distribution rate

Rate limit for subtask distribution, specified as the number of subtasks distributed per second or per minute. Use this to prevent overwhelming downstream services during large batch operations.

Pull model parameters

The following parameters appear only when Subtask distribution method is set to Pull model.

ParameterDescriptionDefault valueWhen to change
Number of subtasks pulled per timeNumber of subtasks a worker pulls from the master node in a single request.5Increase to reduce pull frequency. Decrease for finer-grained load balancing.
Subtask queue capacitySize of the local queue that caches subtasks on each worker. A larger queue reduces pull frequency but increases worker memory usage.10Increase if subtasks are short-lived and the pull overhead is significant. Decrease to limit worker memory consumption.
Global concurrency of subtasksMaximum total concurrent subtasks across all workers. The system does not exceed this limit even if individual workers have spare capacity.1000Decrease to protect downstream resources. Increase if downstream systems can handle higher parallelism.

Version requirements

ParameterMinimum agent version
Subtask Failover StrategyV1.8.13
The master node participates in the executionV1.8.13
distribution policyV1.10.14