Configure scheduling dependencies for an offline pipeline
Dataphin uses scheduling dependencies to run nodes sequentially in a business process, ensuring timely and effective data generation. This topic describes how to configure scheduling dependencies for a periodic offline pipeline.
Procedure
On the Dataphin homepage, choose Develop > Data Integration from the top navigation bar.
From the top menu bar on the integration page, select a Project. (In Dev-Prod mode, you also need to select an environment.)
In the navigation pane on the left, click Offline Integration, and then click the offline pipeline that you want to configure.
In the right navigation bar, click Properties to open the Properties panel.
In the Scheduling dependency area, configure the scheduling dependency parameters for the integration task.
Upstream Dependencies
Automatic Parsing
For the upstream dependencies of an integration task, you can click Automatic Parsing. Dataphin then automatically parses the integration task to retrieve the upstream tasks and output tables. After the parsing is complete, all retrieved dependency tables are added to the upstream dependency list. You can view the details of the parsed dependency tables, or edit or delete them.
NoteIf an input table has multiple source tasks, all source tasks are added as upstream dependencies by default.
The dependency period for all parsed dependencies defaults to the current cycle.
Add Root Node
If a task lacks upstream dependencies, click Add Root Node to set a root node as its dependency.
NoteEach tenant or enterprise is initialized with a virtual root node whose name starts with virtual_root_node.
Add Previous Cycle of This Node
This creates a self-dependency, where the current task instance depends on the successful completion of the previous instance (for example, from the previous day or hour).
Add Dependency
If Automatic Parsing cannot parse scheduling dependencies or the upstream dependency configuration generated by Automatic Parsing does not match your actual application, you can manually click +Add Dependency to add the node's upstream dependency.
Add Dependency-Physical Node
Select one or more physical nodes from the node list. You can filter this list by Current Project, Project, Node Type, Node Name, or Output Table Name.
Add Dependency-Logical Table Node
Select one or more logical table nodes from the node list. You can filter this list by Logical Table Type, Business Category, and Logical Table Name.
To depend on specific fields instead of the entire logical table, click the
icon in the Dependent Fields column to select the required fields.
When you add a dependency, the system automatically applies the recommended settings for Dependency Period and Dependency Policy. To modify these settings for a specific dependency, click the
edit icon in the dependency list.Dependency Period: The time range for the scheduled run of the upstream task instance. Typically, this is the current day, which corresponds to the time range [00:00, 24:00).
Dependency Policy: You must specify a dependency policy because multiple instances might exist within a dependency period. Select the upstream task status that satisfies the dependency: Succeeded, Failed, or Finished (Succeeded or Failed). The default is Succeeded. For information about default cross-cycle dependency policies, see Appendix: Default cross-cycle dependency policies.
ImportantIf there is only one instance, the dependency policy can be set to any option. To remain compatible with possible changes to the scheduling settings of upstream tasks, only relative-path policies are supported.
Succeeded: The upstream task must succeed before the downstream task can run. If the upstream task fails, the downstream task does not run.
Failed: The upstream task must ultimately fail (fails after the final automatic retry) before the downstream task can run. If the upstream task succeeds, the downstream task does not run.
Finished (Succeeded or Failed): The downstream task can run after the upstream task either succeeds or ultimately fails (fails after the final automatic retry).
Output of This Node
The system automatically generates an output name for the node. To add more output names, click Auto-generate Output Name.
ImportantDo not manually change the automatically generated output names.
Click OK to complete the scheduling dependency configuration.
Appendix: Default cross-cycle dependency policies
Current node cycle | Upstream node | Upstream node cycle | Upstream self-dependency | Default dependency period |
Month | Current node (self-dependency) | - | Previous cycle (1 day ago) | |
Week | Current node (self-dependency) | - | Previous cycle (1 day ago) | |
Day | Current node (self-dependency) | - | Previous cycle (1 day ago) | |
Hour | Current node (self-dependency) | - | Last 24 hours | |
Minute | Current node (self-dependency) | - | Last 24 hours | |
Day/Week/Month | Not the current node | Day | Current period (current day) | |
Day/Week/Month | Not the current node | Hour/Minute | No | Current period (current day) |
Day/Week/Month | Not the current node | Hour/Minute | Yes | Current period (current day) |
Month/Week/Day/Hour/Minute | Not the current node | Month/Week | Yes | Current period (current day) |
Month/Week/Day/Hour/Minute | Not the current node | Month | No | Current period (current day) |
Month/Week/Day/Hour/Minute | Not the current node | Week | No | Current period (current day) |
Hour/Minute | Not the current node | Day | Current period (current day) | |
Hour/Minute | Not the current node | Hour/Minute | Current period (current day) |