Configure scheduling dependencies for offline tasks

更新时间: 2026-06-23 15:26:07

Scheduling dependencies between nodes define the execution order of your data workflows. Properly configured dependencies ensure that each node runs in the correct order and delivers timely, accurate data.

Background

A scheduling dependency defines the upstream-downstream relationship between nodes. A downstream task runs only after its upstream node completes successfully, ensuring that the downstream task retrieves data only after the upstream tables are generated. This prevents errors caused by accessing unavailable data.

Procedure

  1. In the top menu bar of the Dataphin homepage, choose Plan > Data Development.

  2. On the Data Development page, select a project from the top menu bar. If you are using the Dev-Prod mode, you must also select an environment.

  3. In the navigation pane on the left, choose Data Processing > Compute Task.

  4. In the compute task list, click the target compute task to open its configuration tab.

  5. In the right-side pane, click Properties to open the Properties panel. In the Scheduling dependencies section, configure the following parameters.

    1. Upstream dependencies

      • Auto parse

        • For SQL tasks, click Auto parse to let Dataphin analyze the task code, identify upstream tasks and output tables, and add the parsed dependencies to the upstream dependency list. You can view, edit, or delete the parsed dependencies.

        Note
        • If an input table is generated by multiple tasks, Dataphin automatically adds all of those tasks as upstream dependencies.

        • For all parsed dependencies, the dependency period is set to Current cycle by default.

        • If the code references a project variable or does not specify a project, the system resolves the dependency to the production project name by default to ensure scheduling stability. For example, if the development project is named onedata_dev:

          • If the code contains select * from s_order, the scheduling dependency is parsed as onedata.s_order.

          • If the code contains select * from ${onedata}.s_order, the scheduling dependency is parsed as onedata.s_order.

          • If the code contains select * from onedata.s_order, the scheduling dependency is parsed as onedata.s_order.

          • If the code contains select * from onedata_dev.s_order, the scheduling dependency is parsed as onedata_dev.s_order.

      • Add root node

        If a task has no upstream dependencies, click Add root node to set a root node as its upstream dependency.

        Note

        Each tenant or enterprise is initialized with a virtual root node whose name starts with virtual_root_node.

      • Add previous cycle of this node

        This option creates a self-dependency, where the task's execution depends on the success of its instance from the previous cycle (for example, the previous day or N hours ago).

      • Add dependency

        If Automatic Parsing cannot resolve scheduling dependencies or the parsed configuration does not match the actual requirements, manually click +Add Dependency to add the node's upstream dependency.

        • Add dependency - Physical node

          Select one or more physical nodes from the node list. You can search or filter by This project, Project, Node type, Node name, or Output table name.

        • Add dependency - Logical table node

          Select one or more logical table nodes from the node list. You can search and filter by Logical table type, Business segment, and Logical table name.

          To depend on specific fields in a logical table instead of the entire table, click image..png in the Dependent fields column to view and select the required fields.

        When you add a dependency, the Dependency period and Dependency policy are automatically set to recommended values. To modify these settings, click image.png in the dependency list to edit the Dependency period and Dependency policy.

        • Dependency period: The time range for the scheduled execution of the upstream task instance. Typically, this is the current day, covering the range [00:00~24:00).

        • Dependency policy: If multiple instances can exist within a dependency period, you must specify a dependency policy. You also need to select whether the dependency is met when the upstream task has Succeeded, Failed, or Completed (Succeeded or Failed). The default selection is Succeeded. For default policies for cross-cycle dependencies, see Appendix 2: Default policies for cross-cycle dependencies.

          Important

          If there is only one instance, you can select any policy. To ensure compatibility with potential changes to the upstream task's schedule, only relative path policies are supported.

          • Succeeded: The downstream task runs only if the upstream task succeeds. If the upstream task fails, the downstream task does not run.

          • Failed: The downstream task runs only if the upstream task fails after all automatic retries. If the upstream task succeeds, the downstream task does not run.

          • Completed (Succeeded or Failed): The downstream task runs whether the upstream task succeeds or fails after all automatic retries.

    2. Output of this node

      The system automatically generates an output name for the node. To add multiple output names, click Auto-generate output name.

      Important

      The system uses output names to build the scheduling dependency graph. The output name is generated automatically. Manual modification is not recommended.

  6. Click OK to save the scheduling dependency configuration.

Preview dependency period and policy

  1. Click Properties for the target offline compute task. In the Properties panel, navigate to the Scheduling dependencies section.

  2. In the Upstream dependencies list of the Scheduling dependencies section, click the image icon in the Actions column for the target dependency.

  3. In the Edit dependency dialog box, you can view information such as the dependent node name, dependency period, dependency policy, and a preview of the node dependency period.

    • Dependency period: The time range for the scheduled execution of the upstream task instance. Typically, this is the current day, covering the range [00:00~24:00).

    • Dependency policy: If multiple instances can exist within a dependency period, you must specify a dependency policy. If there is only one instance, you can select any policy. To ensure compatibility with potential changes to the upstream task's schedule, only relative path policies are supported.

    • Node dependency period preview: View the instance list for the current node and the selected upstream node for a specific business date.image

      Section

      Description

      ① Instance list of the selected upstream node

      • Business date: This date is determined by the Dependency period and the selected business date for the current node.

        • If the dependency period is Current cycle, the business date is the same as the current node's business date.

        • If the dependency period is Previous cycle, the business date is one day before the current node's business date.

        • If the dependency period is N days ago, the business date is N days before the current node's business date.

        • If the dependency period is Last 24 hours and the instances span two business dates, the business date is displayed as {yyyy-MM-dd ~ yyyy-MM-dd}.

      • Instance list: Shows the total number of instances for the selected upstream node on a given business date.

        • If the total number of instances on the business date is less than or equal to 5, the instance list displays all instances.

        • If the total number of instances on the business date is greater than 5, you can click Expand all to view all instances.

          • If the dependent upstream instance is the first or last instance in its list, the UI displays the first instance and the last instance.

          • If a dependent upstream instance is not the first or last instance in its list, the UI displays the first instance, the dependent instance, and the last instance.

        • Instances are displayed sequentially in the format Instance n ({Instance scheduled time}), where n starts from 1.

      Instance list of the current node

      The total number of instances for the current node on the selected business date.

      If the total number of instances on the business date is less than or equal to 5, the list displays all instances. If the total number of instances is greater than 5, the list displays only the first instance and the last instance. You can click Expand all to view all instances. The first instance (Instance 1) is selected by default. You can click an instance to select a different one.

      Instances are displayed sequentially in the format Instance n ({Instance scheduled time}), where n starts from 1.

      Connector line from the selected instance on the right to its dependent instance on the left

      • If the Dependency policy is First instance, Last instance, Nearest instance backward, or Nearest instance forward, a single line connects the selected instance on the right (current node) to a single instance on the left (upstream node).

      • If the Dependency policy is All instances, all instances on the left (upstream node) are selected. Connecting lines show that the selected instance on the right depends on all instances on the left.

Appendix 1: Default dependency periods and policies

Current node cycle

Upstream node cycle

Upstream self-dependency

Default dependency period

Default dependency policy

Daily/Weekly/Monthly

Daily

Yes/No

Current cycle

Last instance

Daily/Weekly/Monthly

Hourly/Minutely

No

Current cycle

All instances

Daily/Weekly/Monthly

Hourly/Minutely

Yes

Current cycle

Last instance

Monthly/Weekly/Daily/Hourly/Minutely

Monthly/Weekly

Yes

Current cycle

Last instance

Monthly/Weekly/Daily/Hourly/Minutely

Monthly/Weekly

No

Current cycle

Last instance

Hourly/Minutely

Daily

Yes/No

Current cycle

Last instance

Hourly/Minutely

Hourly/Minutely

Yes/No

Current cycle

Last instance

Appendix 2: Default policies for cross-cycle dependencies

In the following table, - indicates that the parameter is not applicable.

Current node cycle

Upstream node

Upstream node cycle

Upstream self-dependency

Default dependency period

Monthly

Current node (self-dependency)

-

-

Previous cycle

Weekly

Current node (self-dependency)

-

-

Previous cycle

Daily

Current node (self-dependency)

-

-

Previous cycle

Hourly

Current node (self-dependency)

-

-

Last 24 hours

Minutely

Current node (self-dependency)

-

-

Last 24 hours

Daily/Weekly/Monthly

Other nodes

Daily

-

Current cycle

Daily/Weekly/Monthly

Other nodes

Hourly/Minutely

No

Current cycle

Daily/Weekly/Monthly

Other nodes

Hourly/Minutely

Yes

Current cycle

Monthly/Weekly/Daily/Hourly/Minutely

Other nodes

Monthly/Weekly

Yes

Current cycle

Monthly/Weekly/Daily/Hourly/Minutely

Other nodes

Monthly

No

Current cycle

Monthly/Weekly/Daily/Hourly/Minutely

Other nodes

Weekly

No

Current cycle

Hourly/Minutely

Other nodes

Daily

-

Current cycle

Hourly/Minutely

Other nodes

Hourly/Minutely

-

Current cycle

上一篇: Offline Task Scheduling Property Configuration 下一篇: Configure offline task runtime parameters
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈