Create a MapReduce offline computing task in Dataphin to process large-scale data on MaxCompute.
Prerequisites
You must upload the JAR resource package. For more information, see Upload and reference resources.
Background information
JAR resource packages referenced in MapReduce tasks must be uploaded in advance. Upload the JAR package in Resource Management, and then reference it in your MapReduce code task.
Procedure
-
In the top menu bar of the Dataphin home page, select Develop > Data Development.
-
On the Develop page, select a project from the top menu bar. If you use the Dev-Prod mode, also select an environment.
-
In the navigation pane on the left, choose Data Processing > Script Task. In the Script Task list, click the
icon and select MapReduce on MaxCompute. -
In the Create MapReduce on MaxCompute Task dialog box, configure the following parameters.
Parameter
Description
Task Name
Enter a name for the offline computing task.
The name can be up to 256 characters in length and cannot contain vertical bars (|), forward slashes (/), backslashes (\), colons (:), question marks (?), angle brackets (<>), asterisks (*), or double quotation marks (").
Schedule Type
Select the scheduling type for the task:
-
Recurring task node: The task runs on the system's recurring schedule.
-
Manual task node: The task must be triggered manually.
Select Directory
Select the directory to store the task.
To create a new folder, follow these steps:
-
Click the
icon above the task list to open the New Folder dialog box. -
In the New Folder dialog box, enter a Name for the folder and select a directory location as needed.
-
Click OK.
Use Template
Turn on the Use Template switch to use a code template. If enabled, you must also select a template and a template version.
Code templates enable efficient development. The template code is read-only — configure the template parameters to complete code development. For more information, see Create an offline computing template.
Description
Enter a brief description of the task. The description can be up to 1,000 characters in length.
-
-
Click OK.
-
In the code editing area on the MapReduce on MaxCompute task tab, write the code for the MapReduce offline computing task. After you write the code, click Run above the code editing area. For more information about MapReduce development, see MapReduce.
-
Click Property in the right sidebar. In the Property panel, configure the task's Basic Information, Runtime Resources, Runtime Parameter, Schedule Property (for recurring tasks), Schedule Dependency (for recurring tasks), Runtime Configuration, and Resource Configuration.
-
Basic Information
Specify basic information such as the task name, owner, and description. For more information, see Configure basic information for a task.
-
Runtime Resources
The CPU and memory resources allocated to run the task. The default value is 0.2 cores and 512 MB. For more information, see Configure runtime resources for an offline task.
-
Runtime Parameter
If your code uses parameters, assign values to them in the properties. The parameters are automatically replaced with the specified values when the node is scheduled. For more information, see Configure runtime parameters for an offline task.
-
Schedule Property (recurring tasks)
If the task is a Recurring Task, you must configure its scheduling properties in addition to the Basic Information. For more information, see Configure scheduling properties for an offline task.
-
Schedule Dependency (recurring tasks)
If the task is a Recurring Task, you must configure its scheduling dependencies in addition to the Basic Information. For more information, see Configure scheduling dependencies for an offline task.
-
Runtime Configuration
Configure a task-level timeout period and a rerun policy as needed. If you do not configure these parameters, the task inherits the default tenant-level settings. For more information, see Configure runtime settings for a computing task.
-
Resource Configuration
Configure a scheduling resource group for the task. When scheduled, the task uses the resource quota of this resource group. For more information, see Configure resources for a computing task.
-
-
On the MapReduce on MaxCompute task tab, save and submit the task.
-
Click the
icon above the code editing area to save the code. -
Click the
icon above the code editing area to submit the code.
-
-
On the Submitting Log page, confirm the Submission Content and the Pre-check results, and enter remarks. For more information, see Submit an offline computing task.
-
After you confirm the information, click Confirm and Submit.
What to do next
-
If your development mode is Dev-Prod mode, go to the release list to publish the task to the production environment after submitting it. For more information, see Manage release tasks.
-
If your development mode is Basic mode, the submitted MapReduce task is ready for scheduling in the production environment. Go to Operation Center to view your published tasks. For more information, see Manage integration and computing tasks and Manage manual tasks.