Configure scheduling dependencies for an offline pipeline

更新时间: 2026-05-22 06:07:17

Dataphin uses scheduling dependencies to run nodes sequentially in a business process, ensuring timely and effective data generation. This topic describes how to configure scheduling dependencies for a periodic offline pipeline.

Procedure

  1. On the Dataphin homepage, choose Develop > Data Integration from the top navigation bar.

  2. From the top menu bar on the integration page, select a Project. (In Dev-Prod mode, you also need to select an environment.)

  3. In the navigation pane on the left, click Offline Integration, and then click the offline pipeline that you want to configure.

  4. In the right navigation bar, click Properties to open the Properties panel.

  5. In the Scheduling dependency area, configure the scheduling dependency parameters for the integration task.

    1. Upstream Dependencies

      • Automatic Parsing

        For the upstream dependencies of an integration task, you can click Automatic Parsing. Dataphin then automatically parses the integration task to retrieve the upstream tasks and output tables. After the parsing is complete, all retrieved dependency tables are added to the upstream dependency list. You can view the details of the parsed dependency tables, or edit or delete them.

        Note
        • If an input table has multiple source tasks, all source tasks are added as upstream dependencies by default.

        • The dependency period for all parsed dependencies defaults to the current cycle.

      • Add Root Node

        If a task lacks upstream dependencies, click Add Root Node to set a root node as its dependency.

        Note

        Each tenant or enterprise is initialized with a virtual root node whose name starts with virtual_root_node.

      • Add Previous Cycle of This Node

        This creates a self-dependency, where the current task instance depends on the successful completion of the previous instance (for example, from the previous day or hour).

      • Add Dependency

        If Automatic Parsing cannot parse scheduling dependencies or the upstream dependency configuration generated by Automatic Parsing does not match your actual application, you can manually click +Add Dependency to add the node's upstream dependency.

        • Add Dependency-Physical Node

          Select one or more physical nodes from the node list. You can filter this list by Current Project, Project, Node Type, Node Name, or Output Table Name.

        • Add Dependency-Logical Table Node

          Select one or more logical table nodes from the node list. You can filter this list by Logical Table Type, Business Category, and Logical Table Name.

          To depend on specific fields instead of the entire logical table, click the image..png icon in the Dependent Fields column to select the required fields.

        When you add a dependency, the system automatically applies the recommended settings for Dependency Period and Dependency Policy. To modify these settings for a specific dependency, click the image.png edit icon in the dependency list.

        • Dependency Period: The time range for the scheduled run of the upstream task instance. Typically, this is the current day, which corresponds to the time range [00:00, 24:00).

        • Dependency Policy: You must specify a dependency policy because multiple instances might exist within a dependency period. Select the upstream task status that satisfies the dependency: Succeeded, Failed, or Finished (Succeeded or Failed). The default is Succeeded. For information about default cross-cycle dependency policies, see Appendix: Default cross-cycle dependency policies.

          Important

          If there is only one instance, the dependency policy can be set to any option. To remain compatible with possible changes to the scheduling settings of upstream tasks, only relative-path policies are supported.

          • Succeeded: The upstream task must succeed before the downstream task can run. If the upstream task fails, the downstream task does not run.

          • Failed: The upstream task must ultimately fail (fails after the final automatic retry) before the downstream task can run. If the upstream task succeeds, the downstream task does not run.

          • Finished (Succeeded or Failed): The downstream task can run after the upstream task either succeeds or ultimately fails (fails after the final automatic retry).

    2. Output of This Node

      The system automatically generates an output name for the node. To add more output names, click Auto-generate Output Name.

      Important

      Do not manually change the automatically generated output names.

  6. Click OK to complete the scheduling dependency configuration.

Appendix: Default cross-cycle dependency policies

Current node cycle

Upstream node

Upstream node cycle

Upstream self-dependency

Default dependency period

Month

Current node (self-dependency)

-

Previous cycle (1 day ago)

Week

Current node (self-dependency)

-

Previous cycle (1 day ago)

Day

Current node (self-dependency)

-

Previous cycle (1 day ago)

Hour

Current node (self-dependency)

-

Last 24 hours

Minute

Current node (self-dependency)

-

Last 24 hours

Day/Week/Month

Not the current node

Day

Current period (current day)

Day/Week/Month

Not the current node

Hour/Minute

No

Current period (current day)

Day/Week/Month

Not the current node

Hour/Minute

Yes

Current period (current day)

Month/Week/Day/Hour/Minute

Not the current node

Month/Week

Yes

Current period (current day)

Month/Week/Day/Hour/Minute

Not the current node

Month

No

Current period (current day)

Month/Week/Day/Hour/Minute

Not the current node

Week

No

Current period (current day)

Hour/Minute

Not the current node

Day

Current period (current day)

Hour/Minute

Not the current node

Hour/Minute

Current period (current day)

上一篇: Configure Offline Pipeline Scheduling Configuration 下一篇: Configure runtime parameters for an offline pipeline
阿里云首页 智能数据建设与治理 Dataphin 相关技术圈