Integration task submission instructions
Dataphin automatically parses data lineage for tables and fields when you submit an integration task. It also verifies the change type and content of the task object and runs a pre-check to ensure the task meets submission requirements.
Lineage parsing overview
-
The system parses data lineage of tables and fields in the development environment on submission and in the production environment on publication. A single task submission or publication can parse up to 100,000 lineage relationships. Relationships beyond this limit are not recorded and do not appear in the asset folder.
-
Offline integration tasks support automatic parsing of field lineage for tables from data sources supported by the Metadata Center. For more information, see Overview of metadata acquisition.
-
If an input component uses a schema to select a table and that schema is later updated, you must resubmit the integration task to update the lineage automatically.
-
If you use the MySQL input component and select multiple tables based on a rule, the system automatically parses the lineage for only the first 50 tables. If you use the
select *expression to select tables, the system parses the lineage for only the first table in the query results. -
If a task configuration includes a table from a custom data source that does not have a specified database or schema, the system automatically adds the
default_schemaprefix to the table name when parsing the lineage. -
If you use a
whereclause in a filter component or a built-in function in a field calculation component to define field logic, the field lineage graph displays a direct lineage relationship. -
The system collects table lineage in the development environment on submission and in the production environment on publication. You can view the lineage in the table details on the Asset Checklist page. If a table is not listed in the Asset Checklist, use metadata acquisition to add it to Dataphin first.
-
Unpublishing a real-time integration task deletes its lineage. This applies only to tasks unpublished from the real-time integration task page. Instances taken offline from the O&M page are not affected.
-
Once a real-time integration task starts running, the lineage information for added or deleted tables cannot be collected or updated.
Offline integration tasks
Submission details
In the Submit dialog box, review the submission content and pre-check results, and enter submission remarks.
-
Submission Content
Displays the name, type, and change type of the submitted task object.
-
Pre-check
When you submit an offline integration task, the system performs the following pre-checks. If any check item is configured incorrectly, the task cannot be submitted.
Check item
Description
Schedule dependency
Dataphin uses the schedule dependency configuration of each node to run nodes in a business flow in sequence, ensuring business data is generated on time. For more information, see Configure offline pipeline schedule configuration.
Run parameters
Assigns values to variables in the integration task for node scheduling. Parameter variables are automatically replaced with their assigned values. For more information, see Configure offline pipeline run parameters.
Cross-node parameters
Variables that are passed to the direct descendant nodes of the current node. For more information, see Cross-node variables.
-
Submission Remarks
Enter remarks for the task submission. The remarks can be a maximum of 128 characters.
Check item details
After you submit the integration task, click Confirm And Submit in the Submission Content dialog box to view the check items and results in the Submit dialog box.
|
Check item |
Description |
|
Configuration check |
Verifies that the integration pipeline configurations are correct, including the ID, name, FileId, object type, schedule, and general settings. |
|
Parameter configuration |
Checks whether the current values of the integration pipeline parameters are correct. |
|
Permission check |
Checks whether you have permissions on the objects in the integration pipeline. If the permission check for an object fails, click the The system lists all objects in the integration pipeline with their operational permissions, including the object name, type, permission status, row-level permission status, and the permission request operation.
|
|
Table schema check |
Checks if the production table exists. If it does not, the status is Alert, and the error details show: <Table name> does not exist in the production environment. Then, it checks if the table schemas in the development and production environments are consistent. If they are not, the status is Alert, and the error details show: <Table name> table schema is inconsistent between the development and production environments. Note
A failed check only triggers an alert and does not block the submission. |
|
Table duplication check |
Checks if any tables in the integration pipeline are duplicates. |