Create a MapReduce on MaxCompute task

更新时间:
复制 MD 格式

This topic describes how to create a MapReduce offline computing task in Dataphin.

Prerequisites

You must upload the JAR resource package. For more information, see Upload and reference resources.

Background information

JAR resource packages that are referenced in MapReduce tasks must be created beforehand. You can upload the JAR package in Resource Management and then reference it in the MapReduce code task.

Procedure

  1. In the top menu bar of the Dataphin home page, select Develop > Data Development.

  2. On the Develop page, select a project from the top menu bar. If you use the Dev-Prod mode, also select an environment.

  3. In the navigation pane on the left, choose Data Processing > Script Task. In the Script Task list, click the image icon and select MapReduce on MaxCompute.

  4. In the Create MapReduce on MaxCompute Task dialog box, configure the following parameters.

    Parameter

    Description

    Task Name

    Enter a name for the offline computing task.

    The name can be up to 256 characters in length and cannot contain vertical bars (|), forward slashes (/), backslashes (\), colons (:), question marks (?), angle brackets (<>), asterisks (*), or double quotation marks (").

    Schedule Type

    Select the scheduling type for the task. Options for Schedule Type include the following:

    • Recurring task node: Automatically participates in the system's recurring schedule.

    • Manual task node: Requires the task to be triggered manually.

    Select Directory

    Select the directory to store the task.

    To create a new folder, follow these steps:

    1. Click the image icon above the task list to open the New Folder dialog box.

    2. In the New Folder dialog box, enter a Name for the folder and select a directory location as needed.

    3. Click OK.

    Use Template

    Click the Use Template switch to select whether to use a code template. If you enable this switch, you must also select a template and a template version.

    Reference a code template for efficient development. The template code is read-only. Simply configure the template parameters to complete code development. For more information, see Create an offline computing template.

    Description

    Enter a brief description of the task. The description can be up to 1,000 characters in length.

  5. Click OK.

  6. In the code editing area on the MapReduce on MaxCompute task tab, write the code for the MapReduce offline computing task. After you write the code, click Run above the code editing area. For more information about MapReduce development, see MapReduce.

  7. Click Property in the right sidebar. In the Property panel, configure the task's Basic Information, Runtime Resources, Runtime Parameter, Schedule Property (for recurring tasks), Schedule Dependency (for recurring tasks), Runtime Configuration, and Resource Configuration.

    • Basic Information

      Specify the basic information for the task, such as the task name, owner, and description. For more information, see Configure basic information for a task.

    • Runtime Resources

      The CPU and memory resources that are allocated to run the current computing task. The default value is 0.2 cores and 512 MB. For more information, see Configure runtime resources for an offline task.

    • Runtime Parameter

      If your code calls parameters, you can assign values to them in the properties. This way, the parameters are automatically replaced with the specified values when the node is scheduled. For more information, see Configure runtime parameters for an offline task.

    • Schedule Property (recurring tasks)

      If the task is a Recurring Task, you must configure its scheduling properties in addition to the Basic Information. For more information, see Configure scheduling properties for an offline task.

    • Schedule Dependency (recurring tasks)

      If the task is a Recurring Task, you must configure its scheduling dependencies in addition to the Basic Information. For more information, see Configure scheduling dependencies for an offline task.

    • Runtime Configuration

      You can configure a task-level timeout period and a rerun policy for the offline computing task as needed. If you do not configure these parameters, the task inherits the default tenant-level settings. For more information, see Configure runtime settings for a computing task.

    • Resource Configuration

      You can configure a scheduling resource group for the current computing task. When the task is scheduled, it uses the resource quota of this resource group. For more information, see Configure resources for a computing task.

  8. On the MapReduce on MaxCompute task tab, save and submit the task.

    1. Click the image icon above the code editing area to save the code.

    2. Click the image icon above the code editing area to submit the code.

  9. On the Submitting Log page, confirm the Submission Content and the Pre-check results, and enter remarks. For more information, see Submit an offline computing task.

  10. After you confirm the information, click Confirm and Submit.

What to do next

  • If your development mode is Dev-Prod mode, after the task is submitted, go to the release list to publish the task to the production environment. For more information, see Manage release tasks.

  • If your development mode is Basic mode, the submitted MapReduce task is ready to be scheduled in the production environment. You can go to the Operation Center to view your published tasks. For more information, see Manage integration and computing tasks and Manage manual tasks.