Offline mode dependency configuration
Dataphin runs nodes in a business process in the correct order based on their scheduling dependencies, ensuring that business data is delivered on time. This topic describes how to configure offline mode dependencies for stream-batch integrated tasks.
Background information
Scheduling dependencies define the upstream and downstream relationships between nodes. In Dataphin, a downstream task node starts only after its upstream task node has completed successfully. This ensures that each task receives the correct data at runtime. Dataphin checks the running status of the upstream node to determine whether the latest upstream table data is available, preventing the downstream node from reading data before it is ready.
Procedure
-
Access the Offline Mode configuration panel by referring to Offline Mode Configuration Entry.
-
In the Dependency section of the offline mode configuration panel, set the Dependency parameters.
Parameter
Description
Start Parsing
If the node's task type is SQL, you can click Start parsing. The system parses the tables in the code and matches table names against output names. The node associated with a matching output name becomes the upstream dependency for the current node.
If the code references project variables or does not specify a project, the system defaults to the production project name to ensure scheduling stability. For example, if the development project name is
onedata_dev:-
Code specifying
select * from s_orderresults in a dependency ofonedata.s_order. -
Code with
select * from ${onedata}.s_orderalso results in a dependency ofonedata.s_order. -
Code specifying
select * from onedata.s_orderresults in a dependency ofonedata.s_order. -
Code specifying
select * from onedata_dev.s_orderresults in a dependency ofonedata_dev.s_order.
Upstream Dependency
To add an upstream node that the current node depends on for scheduling:
-
Click Manually Add Upstream.
-
In the New Upstream Dependency dialog box, search for dependency nodes by:
-
Entering the output name keyword of the dependent node.
-
Entering virtual to find virtual nodes (each tenant or enterprise has a root node upon initialization).
NoteNote: The output name of the node is globally unique and case-insensitive.
-
-
Click Confirm Addition.
You can also click the Actions column's
icon to delete the added dependency node.Current Node
To set the output name of the current node so that other nodes can depend on it:
-
Click Manually Add Output.
-
In the Add Current Node Output dialog box, enter the output name. Use a consistent naming convention, typically
project name.table name(case-insensitive). This helps identify the table produced by this node and makes it easier for other nodes to select it as a scheduling dependency.For example, for a development project named
onedata_dev, the recommended output name isonedata.s_order. Setting the output name toonedata_dev.s_ordermeans only code specifyingselect * from onedata_dev.s_ordercan parse the upstream dependency node. -
Click Confirm Addition.
For existing output names on the current node, you can:
-
To delete the added output name, click the Actions column's
icon. -
If the node has been submitted or published and has downstream dependencies (with submitted tasks), click the Actions column's
icon to view the dependent downstream nodes.
-
-
Complete the offline mode dependency configuration by clicking Confirm.