How to configure pipeline scheduling dependencies-Dataphin(Dataphin)-阿里云帮助中心

Dataphin uses scheduling dependencies to run nodes sequentially in a business process, ensuring timely and effective data generation. This topic describes how to configure scheduling dependencies for a periodic offline pipeline.

Procedure

On the Dataphin homepage, choose Develop > Data Integration from the top navigation bar.
From the top menu bar on the integration page, select a Project. (In Dev-Prod mode, you also need to select an environment.)
In the navigation pane on the left, click Offline Integration, and then click the offline pipeline that you want to configure.
In the right navigation bar, click Properties to open the Properties panel.
In the Scheduling dependency area, configure the scheduling dependency parameters for the integration task.
1. Upstream Dependencies
  - Automatic Parsing
    For the upstream dependencies of an integration task, you can click Automatic Parsing. Dataphin then automatically parses the integration task to retrieve the upstream tasks and output tables. After the parsing is complete, all retrieved dependency tables are added to the upstream dependency list. You can view the details of the parsed dependency tables, or edit or delete them.
    Note
    If an input table has multiple source tasks, all source tasks are added as upstream dependencies by default.
    The dependency period for all parsed dependencies defaults to the current cycle.
  - Add Root Node
    If a task lacks upstream dependencies, click Add Root Node to set a root node as its dependency.
    Note
    Each tenant or enterprise is initialized with a virtual root node whose name starts with virtual_root_node.
  - Add Previous Cycle of This Node
    This creates a self-dependency, where the current task instance depends on the successful completion of the previous instance (for example, from the previous day or hour).
  - Add Dependency
    If Automatic Parsing cannot parse scheduling dependencies or the upstream dependency configuration generated by Automatic Parsing does not match your actual application, you can manually click +Add Dependency to add the node's upstream dependency.
    - Add Dependency-Physical Node
      Select one or more physical nodes from the node list. You can filter this list by Current Project, Project, Node Type, Node Name, or Output Table Name.
    - Add Dependency-Logical Table Node
      Select one or more logical table nodes from the node list. You can filter this list by Logical Table Type, Business Category, and Logical Table Name.
      To depend on specific fields instead of the entire logical table, click the icon in the Dependent Fields column to select the required fields.
    When you add a dependency, the system automatically applies the recommended settings for Dependency Period and Dependency Policy. To modify these settings for a specific dependency, click the edit icon in the dependency list.
    - Dependency Period: The time range for the scheduled run of the upstream task instance. Typically, this is the current day, which corresponds to the time range [00:00, 24:00).
    - Dependency Policy: You must specify a dependency policy because multiple instances might exist within a dependency period. Select the upstream task status that satisfies the dependency: Succeeded, Failed, or Finished (Succeeded or Failed). The default is Succeeded. For information about default cross-cycle dependency policies, see Appendix: Default cross-cycle dependency policies.
      Important
      If there is only one instance, the dependency policy can be set to any option. To remain compatible with possible changes to the scheduling settings of upstream tasks, only relative-path policies are supported.
      - Succeeded: The upstream task must succeed before the downstream task can run. If the upstream task fails, the downstream task does not run.
      - Failed: The upstream task must ultimately fail (fails after the final automatic retry) before the downstream task can run. If the upstream task succeeds, the downstream task does not run.
      - Finished (Succeeded or Failed): The downstream task can run after the upstream task either succeeds or ultimately fails (fails after the final automatic retry).
2. Output of This Node
  The system automatically generates an output name for the node. To add more output names, click Auto-generate Output Name.
  Important
  Do not manually change the automatically generated output names.
Click OK to complete the scheduling dependency configuration.

Appendix: Default cross-cycle dependency policies

Current node cycle	Upstream node	Upstream node cycle	Upstream self-dependency	Default dependency period
Month	Current node (self-dependency)	-		Previous cycle (1 day ago)
Week	Current node (self-dependency)	-		Previous cycle (1 day ago)
Day	Current node (self-dependency)	-		Previous cycle (1 day ago)
Hour	Current node (self-dependency)	-		Last 24 hours
Minute	Current node (self-dependency)	-		Last 24 hours
Day/Week/Month	Not the current node	Day		Current period (current day)
Day/Week/Month	Not the current node	Hour/Minute	No	Current period (current day)
Day/Week/Month	Not the current node	Hour/Minute	Yes	Current period (current day)
Month/Week/Day/Hour/Minute	Not the current node	Month/Week	Yes	Current period (current day)
Month/Week/Day/Hour/Minute	Not the current node	Month	No	Current period (current day)
Month/Week/Day/Hour/Minute	Not the current node	Week	No	Current period (current day)
Hour/Minute	Not the current node	Day		Current period (current day)
Hour/Minute	Not the current node	Hour/Minute		Current period (current day)

上一篇: Configure Offline Pipeline Scheduling Configuration 下一篇: Configure runtime parameters for an offline pipeline