Overview of Offline Task Property Configuration

更新时间:
复制 MD 格式

To run an offline task on a recurring schedule, define its scheduling properties, including the scheduling cycle, dependencies, and parameters.

Important Notes

  • The system supports scheduling configuration only for offline computing tasks with the scheduling type set to auto triggered task.

  • A dependency defines the execution order between two nodes. The status of an upstream node affects the execution status of downstream nodes.

  • When you configure dependencies, the system schedules downstream nodes as follows. First, it waits until the upstream node completes successfully. Then, it checks whether the scheduled time for the downstream node has been reached.

  • If you submit scheduling configuration before the scheduled time, the configuration takes effect after that time. If you configure dependencies after the scheduled time, the system creates instances one day later.

  • Scheduling configuration defines only the properties used when the task runs on schedule. To apply this configuration, publish the task to the production environment.

  • The scheduled time defines the expected execution time. The actual execution time depends on upstream node status. For details about task execution conditions, see Instance Run Diagnostics.

Access Offline Task Properties

  1. On the Dataphin homepage, in the top menu bar, click Develop > Data Development.

  2. On the Develop page, in the top menu bar, click Project.

  3. In the navigation pane on the left, select Data Processing > Compute Job. In the Compute Job list, click the target job name.

  4. On the task tab, click Property on the right to open the Property panel.

Configure Offline Task Properties

The following table describes the basic information and scheduling properties available for offline tasks.

Configuration Item

Description

Basic Information

Includes task name, ID, node type, development owner, O&M owner, and description.

  • Task Name: The name entered when you created the task.

  • Node ID: A unique identifier for the node. The system generates it after you submit the node.

  • Development Owner: Defaults to the current user. You can select any member of the current project.

    Note

    In the production environment, you cannot configure the development owner. The value from the development environment applies.

  • O&M Owner: Defaults to the node creator. You can select any member of the current project as the O&M owner.

Runtime Resources

CPU and memory resources assigned to run the task.

Note

This setting applies only to Python, Shell, Spark on MaxCompute, Spark on Yarn, MapReduce on MaxCompute, and MapReduce on Yarn tasks.

Python Third-Party Packages

Select the Python third-party packages to import.

Note
  • This setting applies only to Python and Shell tasks.

  • After you add a third-party module to Python Third-Party Packages, you must declare a reference to it in the task before importing it in your code. You can configure the referenced module in Compute Task Properties > Python Third-Party Packages.

Runtime Parameters

Define parameters used during node scheduling. Dataphin provides built-in and custom parameters for dynamic value assignment at runtime.

Note

If you define variables in your node code, assign values to them here. If no variables are defined, skip this step.

Scheduling Properties

Define how the task runs on a recurring schedule in the production environment.

  • Scheduling Type: Defines the execution status of task instances in the production environment.

  • Priority: Sets the task priority. When you create a task, the system uses the default priority from Management Hub > Development Platform > Node Task Settings > Default Priority.

    Note

    After you publish the task to the production environment or submit it in Basic mode, you cannot change the priority when editing the task. Update it in O&M operations in the production environment. The priority value reflects the latest setting in the production environment.

  • Effective Date: Defines the date range during which the task runs on schedule. After this date, the system stops generating instances.

  • Scheduling Cycle: Defines how often the task runs.

  • Conditional Scheduling: Defines conditions under which the task runs. You can set multiple condition groups. The system evaluates them in order from top to bottom. When a condition matches, the system runs the corresponding schedule and stops evaluating further conditions. If no condition matches, the system uses the default schedule.

Scheduling Dependencies

Define upstream and downstream dependencies for the task. Dependencies ensure downstream nodes start only after upstream nodes succeed, guaranteeing timely delivery of valid business data. Use automatic parsing to quickly set dependencies, or add them manually.

Runtime Configuration

Define the timeout period and retry policy for failed task runs. This prevents resource waste from long-running tasks and improves reliability.

Resource Configuration

Select the resource group for the compute task. The system uses this group's resources when scheduling the task.

What to do next

After configuring task properties, submit and publish the task to the production environment. Then perform O&M operations as needed. For details, see Operation Center.