Create a Shell task

更新时间:
复制 MD 格式

This topic describes how to create a Shell task in Dataphin.

Limitations

  • You can add datasets only after you enable the unstructured data feature.

  • Shell tasks in Basic mode projects support referencing datasets.

Permissions

The following project roles can use all datasets in the current project under Properties > Dataset:

  • Dev-Prod and Basic mode projects: Project Administrator, Developer, and Analyst.

  • Production projects in Dev-Prod mode: Project Administrator and O&M.

  • Custom project roles with the Dataset - Use permission.

Procedure

  1. In the top navigation bar of the Dataphin homepage, choose Develop > Data Development.

  2. On the Develop page, select a project from the top navigation bar. If you are using Dev-Prod mode, you also need to select an environment.

  3. In the navigation pane on the left, choose Data Processing > Compute Task. In the Compute Task list, click the image icon and select Shell.

  4. In the Create Shell Task dialog box, configure the following parameters.

    Parameter

    Description

    Task Name

    Enter a name for the task.

    The name can be up to 256 characters long and cannot contain the following characters: vertical bar (|), forward slash (/), backslash (\), colon (:), question mark (?), angle brackets (<>), asterisk (*), or double quotation mark (").

    Scheduling Type

    Select the Scheduling Type for the task. Scheduling Types include:

    • Scheduled Task: The task runs automatically on a recurring schedule.

    • Manual Task: Must be triggered manually.

    Select Directory

    Select a directory for the task.

    If a directory does not exist, you can create a new folder:

    1. Above the compute task list on the left, click the image icon to open the New Folder dialog box.

    2. In the New Folder dialog box, enter a folder Name and select a directory.

    3. Click OK.

    Use Template

    Enable Use Template to use a code template. If enabled, you must also select a template and template version.

    Using a code template improves development efficiency. The code in a template is read-only; you only need to configure its parameters. For more information, see Create an offline computing template.

    Third-party Python Package

    Select the third-party Python packages to include. For more information, see Install a Python Module.

    Note

    Before you can import a third-party module, you must declare it as a reference in the task. Configure these references in the Third-party Python Package section of the task properties.

    Description

    Enter a description for the task (up to 1,000 characters).

  5. Click OK.

  6. In the code editor on the Shell task tab, write the code for the task. When you are finished, click Run above the code editor.

  7. In the right-side pane, click Properties. In the Properties panel, configure the task's Basic information, Running resources, Third-party Python package, Dataset, Runtime parameters, Scheduling properties (for scheduled tasks), Scheduling dependencies (for scheduled tasks), Run configuration, and Resource configuration.

    • Basic information

      Configure the task's basic information, such as its name, owner, and description. For configuration details, see Configure basic information for a task.

    • Running resources

      The CPU and memory resources allocated to run the compute task. The default is 0.1 core and 256 MB. For configuration details, see Configure running resources for an offline task.

    • Third-party Python package

      Select the third-party Python packages that you want to include. For more information, see Install a Python Module.

    • Dataset

      Select the datasets to reference. You can reference up to 5 datasets. For configuration details, see Configure datasets referenced by Python/Shell tasks.

    • Runtime parameters

      If your task uses parameter variables, you can assign values to them in the properties. When the task is scheduled, the system automatically replaces the variables with their assigned values. For configuration details, see Configure runtime parameters for an offline task.

    • Scheduling properties (for scheduled tasks)

      If the scheduling type of an offline computing task is Scheduled Task, you must also configure the scheduling properties of the task in addition to the Basic Information. For more information, see Configure scheduling properties for offline tasks.

    • Scheduling dependencies (for scheduled tasks)

      If the scheduling type of an offline computing task is Scheduled Task, you must also configure its scheduling dependencies in addition to the Basic Information. For more information, see Configure scheduling dependencies for offline tasks.

    • Run configuration

      Configure a task-level timeout period and a rerun policy. If not configured, the task inherits the default settings at the tenant level. For configuration details, see Configure run configurations for a compute task.

    • Resource configuration

      You can configure a scheduling resource group for the compute task. The task consumes the resource quota of this group when it runs. For configuration details, see Configure resources for a compute task.

  8. In the Shell task tab, save and submit the task.

    1. Click the image icon above the code editor to save the code.

    2. Click the image icon above the code editor to submit the code.

  9. On the Submission Details page, confirm the Submission Content, review the Pre-check results, and add notes if necessary. For more information, see Submission guide for offline computing tasks.

  10. Click OK and Submit.

Next steps

  • If you use Dev-Prod mode, after you submit the task, go to the release list and publish it to the production environment. For more information, see Manage release tasks.

  • If your development mode is Basic mode, a successfully submitted Shell task is scheduled in the production environment. You can go to the O&M Center to view the published task. For more information, see Manage integration and compute tasks, Manage manual tasks.