Offline mode dependency configuration-Dataphin(Dataphin)-阿里云帮助中心

Dataphin runs nodes in a business process in the correct order based on their scheduling dependencies, ensuring that business data is delivered on time. This topic describes how to configure offline mode dependencies for stream-batch integrated tasks.

Background information

Scheduling dependencies define the upstream and downstream relationships between nodes. In Dataphin, a downstream task node starts only after its upstream task node has completed successfully. This ensures that each task receives the correct data at runtime. Dataphin checks the running status of the upstream node to determine whether the latest upstream table data is available, preventing the downstream node from reading data before it is ready.

Procedure

Access the Offline Mode configuration panel by referring to Offline Mode Configuration Entry.

In the Dependency section of the offline mode configuration panel, set the Dependency parameters.

Parameter	Description
Start Parsing	If the node's task type is SQL, you can click Start parsing. The system parses the tables in the code and matches table names against output names. The node associated with a matching output name becomes the upstream dependency for the current node. If the code references project variables or does not specify a project, the system defaults to the production project name to ensure scheduling stability. For example, if the development project name is `onedata_dev`: Code specifying `select * from s_order` results in a dependency of `onedata.s_order`. Code with `select * from ${onedata}.s_order` also results in a dependency of `onedata.s_order`. Code specifying `select * from onedata.s_order` results in a dependency of `onedata.s_order`. Code specifying `select * from onedata_dev.s_order` results in a dependency of `onedata_dev.s_order`.
Upstream Dependency	To add an upstream node that the current node depends on for scheduling: Click Manually Add Upstream. In the New Upstream Dependency dialog box, search for dependency nodes by: Entering the output name keyword of the dependent node. Entering virtual to find virtual nodes (each tenant or enterprise has a root node upon initialization). Note Note: The output name of the node is globally unique and case-insensitive. Click Confirm Addition. You can also click the Actions column's icon to delete the added dependency node.
Current Node	To set the output name of the current node so that other nodes can depend on it: Click Manually Add Output. In the Add Current Node Output dialog box, enter the output name. Use a consistent naming convention, typically `project name.table name` (case-insensitive). This helps identify the table produced by this node and makes it easier for other nodes to select it as a scheduling dependency. For example, for a development project named `onedata_dev`, the recommended output name is `onedata.s_order`. Setting the output name to `onedata_dev.s_order` means only code specifying `select * from onedata_dev.s_order` can parse the upstream dependency node. Click Confirm Addition. For existing output names on the current node, you can: To delete the added output name, click the Actions column's icon. If the node has been submitted or published and has downstream dependencies (with submitted tasks), click the Actions column's icon to view the dependent downstream nodes.

Complete the offline mode dependency configuration by clicking Confirm.