Visual MapReduce model
The visual MapReduce model enhances the standard MapReduce model by adding visualization and operational capabilities. Without changing your backend code, you can switch the execution mode to the visual MapReduce model in the SchedulerX console. This adds a subtask list page where you can view the details, results, and logs for each subtask. You can also rerun individual subtasks.
Notes
Older SDK versions have security vulnerabilities. Upgrade your SDK to v1.12.2 or later.
This feature is available only in the Professional Edition.
The number of subtasks cannot exceed 1,000.
A task cannot exceed 64 KB in size.
To display custom labels for subtasks, the subtask objects must implement a specific interface.
The return value of the result field of ProcessResult cannot exceed 1,000 bytes in size.
If you use the reduce method, the results of all tasks are cached on the master node. In this case, high memory pressure is caused on the master node. We recommend that you specify a small number of tasks and a small return value for the result field. If the reduce method is not required, you can directly call the MapJobProcessor interface.
If a failover is triggered, SchedulerX may run a task more than once. In this case, you must implement the idempotence of tasks on your own.
Interfaces
The visual MapReduce model inherits all interfaces from the standard MapReduce model, and the development process for task-handling code is identical. For more information, see MapReduce model.
(Optional) In addition to the standard MapReduce model interfaces, you can set a label for each subtask. To do this, the subtask object must implement the
com.alibaba.schedulerx.worker.processor.BizSubTaskinterface.Interface
Description
Required
public Map<String, String> labelMap()Implement this method to return subtask label information. These labels are used to display business-specific attributes for each subtask object, such as an account name, product code, or city and region.
No
Comparison with the MapReduce model
Item | MapReduce | Visual MapReduce |
Number of subtasks | Supports millions | Up to 1,000 |
Task development model | Identical | |
Subtask list | Not supported | Supported. |
Subtask execution details | Not supported | Supported. Provides execution records, status, logs, distributed traces, and thread dumps for each subtask. |
Subtask labels | Not supported | Supported. You can view business labels by implementing the |
Subtask operations | Not supported | Supported. You can stop or rerun individual subtasks. |
Example: Task development
Batch processing of accounts
This example shows how to batch-process a list of bank accounts. Each account is handled as an independent subtask that runs in parallel across the cluster. Each subtask in the execution list displays its corresponding account information. This lets you easily track the processing status and execution details of each account. The following sections provide sample code.
Define a custom subtask object for account information. To display a label for each subtask object, implement the
com.alibaba.schedulerx.worker.processor.BizSubTaskinterface and itslabelMapmethod.Create a processor for the account business logic. This processor handles the logic for a single account and inherits from
com.alibaba.schedulerx.worker.processor.MapReduceJobProcessor.
Procedure
Configure the task
Log on to the SchedulerX console. In the left-side navigation pane, click task management.
On the task management page, click Create Task.
In the Create Task panel, select Visual MapReduce from the Execution Mode drop-down list.

In the advanced configuration section, set the following parameters. For information about other parameters, see Advanced parameters for task management.
Parameter
Description
Distribution policy
Polling Scheme (Default): Distributes an equal number of subtasks to each worker. This strategy is suitable for scenarios where subtasks have similar processing times.
Worker load optimal strategy: The master node automatically detects the load on each worker and distributes subtasks accordingly. This strategy is suitable for scenarios where subtasks and workers have significantly different processing times.
NoteRequires client version 1.10.14 or later.
Subtask concurrency per worker
The number of execution threads per worker. The default is 5. You can increase this value to speed up execution. If downstream systems or databases cannot handle the load, you can decrease this value.
Subtask retry attempts
The number of times a subtask is automatically retried after a failure. The default is 0.
Subtask retry interval
The interval between retry attempts for a failed subtask, in seconds. The default is 0.
Subtask failover strategy
Specifies whether to redistribute subtasks to other machines if an execution node crashes or goes offline. If you enable this option, subtasks may be executed more than once during a failover. You must ensure that your business logic is idempotent.
NoteRequires client version 1.8.13 or later.
Master participation
Specifies whether the master node participates in subtask execution. This requires at least two online workers. Disable this option when the number of subtasks is very large.
NoteRequires client version 1.8.13 or later.
Monitor and debug subtasks
After a task is executed, go to the Execution List page and click Details to view detailed information about its subtasks.
On the subtask list tab, you can view the processing status of each subtask.

On the subtask list tab, find the target subtask and click Log in the Operation column to view its business logs and analyze the execution results.

While a task is running, go to the Current execution details tab and click thread dump to inspect the running threads on the corresponding machine. This helps analyze exceptions in the current task.

On the subtask list tab, if you have integrated Tracing Analysis, you can click a TraceId to query the distributed trace for that subtask. For more information, see Integrate Tracing Analysis.


