Jobs support advanced parameters that control retry behavior, concurrency, history retention, and subtask distribution. Use these parameters to tune job execution for reliability, throughput, and resource protection.
General parameters
These parameters apply to all execution modes.
| Parameter | Description | Default value | When to change |
|---|---|---|---|
| Task failure retry count | Maximum automatic retries after a job fails. For example, if the worker running the job restarts, the job fails. A retry count greater than 0 lets the system automatically rerun the job. A value of 0 means no retries. | 0 | Set to a value greater than 0 when failures are transient, such as worker restarts or temporary network issues. Do not retry jobs that fail due to logic errors -- retries will not resolve them. |
| Task failure retry interval | Wait time in seconds between consecutive retries. | 30 | Increase if the retry target needs time to recover, such as a downstream service restarting. Decrease for fast-recovering dependencies. |
| Task concurrency | Maximum instances that run the same job at the same time. A value of 1 specifies that concurrent execution is not allowed. | 1 | Increase only if the job is idempotent and safe to run in parallel. Keep at 1 for jobs that modify shared state. |
| Cleaning strategy | Cleanup policy for job execution history. | Keep last N entries | Change if you need a time-based cleanup policy instead of count-based. |
| Retained Number | Number of execution history records to keep. Older records are deleted when this limit is exceeded. | 300 | Increase for a longer audit trail. Decrease to save storage. |
Distributed execution parameters
These parameters apply only to Visual MapReduce, MapReduce, and Shard run execution modes unless noted otherwise.
Subtask concurrency and retries
| Parameter | Description | Default value | When to change |
|---|---|---|---|
| Number of single-machine concurrent subtasks | Maximum subtasks that run concurrently on a single worker. | 5 | Increase to speed up execution when the worker has available CPU and memory. Decrease if downstream services or databases cannot handle the load. |
| Number of failed retries of subtasks | Maximum automatic retries for a failed subtask. | 0 | Set to a value greater than 0 when subtask failures are transient, such as temporary network timeouts. |
| Sub-task failure retry interval | Wait time in seconds between consecutive subtask retries. | 0 | Increase if the subtask depends on a recovering external resource. |
Failover and master node behavior
| Parameter | Applicable modes | Description | When to change |
|---|---|---|---|
| Subtask Failover Strategy | Visual MapReduce, MapReduce, Shard run | Controls whether a failed subtask is reassigned to a different worker after the worker fails to execute the task and is stopped. When enabled, the system may run a subtask more than once during failover. Make sure your subtask logic is idempotent before enabling this option. Requires agent V1.8.13 or later. | Enable when you need subtask-level fault tolerance. Keep disabled if subtask logic is not idempotent. |
| The master node participates in the execution | Visual MapReduce, MapReduce, Shard run | Controls whether the master node also runs subtasks. At least two workers must be available. Requires agent V1.8.13 or later. | Turn off when the job generates an extremely large number of subtasks. Dedicating the master node to coordination prevents it from becoming a bottleneck. |
Subtask distribution
The Subtask distribution method parameter determines how the master node assigns subtasks to workers:
| Model | Behavior | Best for | Limitation |
|---|---|---|---|
| Push model (default) | The master node evenly distributes subtasks to workers. | Most workloads with uniform subtask processing times. | The slowest worker can limit overall throughput. |
| Pull model | Each worker pulls subtasks from the master node on demand. Faster workers automatically pull more subtasks, which avoids the bottleneck effect. | Workloads where subtask processing times vary across workers. | All pending subtasks are cached on the master node, which increases memory usage. Do not use if you distribute more than 10,000 subtasks at a time. |
Push model parameters
The following parameters apply to Visual MapReduce and MapReduce modes only.
distribution policy
Strategy for distributing subtasks to workers. Requires agent V1.10.14 or later.
| Policy | Behavior | Default | When to use |
|---|---|---|---|
| Polling Scheme | Distributes an equal number of subtasks to each worker. | Yes | Each subtask takes roughly the same amount of time. |
| WorkerLoad optimal strategy | The master node monitors worker loads and assigns more subtasks to less-loaded workers. | No | Subtask processing times vary significantly across workers. |
Distribution rate
Rate limit for subtask distribution, specified as the number of subtasks distributed per second or per minute. Use this to prevent overwhelming downstream services during large batch operations.
Pull model parameters
The following parameters appear only when Subtask distribution method is set to Pull model.
| Parameter | Description | Default value | When to change |
|---|---|---|---|
| Number of subtasks pulled per time | Number of subtasks a worker pulls from the master node in a single request. | 5 | Increase to reduce pull frequency. Decrease for finer-grained load balancing. |
| Subtask queue capacity | Size of the local queue that caches subtasks on each worker. A larger queue reduces pull frequency but increases worker memory usage. | 10 | Increase if subtasks are short-lived and the pull overhead is significant. Decrease to limit worker memory consumption. |
| Global concurrency of subtasks | Maximum total concurrent subtasks across all workers. The system does not exceed this limit even if individual workers have spare capacity. | 1000 | Decrease to protect downstream resources. Increase if downstream systems can handle higher parallelism. |
Version requirements
| Parameter | Minimum agent version |
|---|---|
| Subtask Failover Strategy | V1.8.13 |
| The master node participates in the execution | V1.8.13 |
| distribution policy | V1.10.14 |