Integration and computing task instances are generated each time a scheduled integration or computing task runs. You can perform O&M operations on these instances, such as viewing operational logs, rerunning instances, forcing reruns of current or downstream instances, and viewing node code.
Prerequisites
To view Gantt charts, you must purchase the artificial intelligence for IT operations value-added service and enable the artificial intelligence for IT operations module for the current tenant.
Accessing the integration and compute task instances page
-
In the top menu bar of the Dataphin home page, choose Develop > O&M.
-
In the navigation pane on the left, choose Instance O&M > Recurring Instance.
-
In the top menu bar, select the production or development environment.
-
On the Recurring Instance page, click the Integration and Computing Task tab.
Operations supported in the integration and computing task instance list
Instances generated by auto-triggered integration and computing tasks are listed on the Integration and Computing Task tab. The list shows the instance object, instance ID, status, schedule cycle, data timestamp, scheduled run time, start time, end time, duration, retries/auto-retries, priority, owner, project, related baseline instances, schedule resource group, tags, and supported operations.
-
Instance Object: A recurring instance object is generated when an auto-triggered task runs. This column displays the name and ID of the instance object and identifies the schedule type of the task. Click the
icon next to the column name to sort by object name in ascending or descending order. For more information, see and Description of recurring instance statuses. -
Status: The current status of the instance. Possible values are Succeeded, Failed, Running, Waiting for Schedule Time, Throttled, Waiting for Schedule Resources, and Not Run. For more information about the status icons and their details, see Description of recurring instance statuses.
-
Start Running Time: The time when the instance starts running. Click the
icon next to the column name to sort by start time in ascending or descending order.NoteThe start time of a logical table node is the time when the earliest internal materialization node of the instance object starts running.
-
End Running Time: The time when the instance stops running. Click the
icon next to the column name to sort by end time in ascending or descending order.NoteThe end time of a logical table node is the time when the latest internal materialization node of the instance object stops running.
-
Retries/Auto-retries: The number of manual retries and automatic retries.
Retries = Runs - 1. -
Running Duration: The total time that the instance runs. Click the
icon next to the column name to sort by duration in ascending or descending order.NoteThe duration of a logical table node is the time difference between the start time of the earliest internal materialization node and the end time of the latest internal materialization node.
-
Priority: The priority level of the instance.
NoteIf the baseline feature is enabled, the priority of a baseline task is the highest priority among all its baselines. This overrides the original priority configured for the task.
-
Project: The project to which the task belongs. The project is displayed in the
Project English Name (Project Chinese Name)format. -
Related Baseline Instances: The baseline that the node guarantees, and any related baselines that have this node as an ancestor node.
NoteIf the baseline feature is disabled, this field is not displayed.
-
Resource Group: The name of the schedule resource group that the instance uses at runtime.
If the custom resource group specified for the task is not active, the project's default resource group is used. If the project's default resource group is also not active, the tenant's default resource group is used. The priority order is: Custom resource group > Project default resource group > Tenant default resource group.
NoteWhen you change the project's default resource group, the change may not be immediately reflected in the UI. However, the modified resource group is used for the next run.
Tenant default resource group: This resource group does not belong to any project. Each Dataphin instance has only one default resource group. It is used to schedule a task if the task does not have a specified custom resource group or if the project does not have a specified project default resource group.
The following table describes the supported operations.
|
Operation |
Description |
|
DAG |
Click the |
|
View Operational Log |
Click the |
|
Rerun |
Click the If your business scenario requires a rerun, you can perform a forced rerun. |
|
View Gantt Chart |
Click the
For more information about the Gantt chart, see View the Gantt chart of a critical path. |
|
Download Ancestor and Descendant Nodes |
Downloads a list of the upstream and downstream nodes for the current node. The list includes all columns (including list items that are not displayed). Click Download Ancestor And Descendant Nodes. In the Download Ancestor And Descendant Nodes dialog box, select the levels for the ancestor and descendant nodes. You can select from Layer 1 to Layer 10 or Unlimited Layers, with Layer 1 being the default for both. After you select the layers, click OK to download the Excel file, which is named |
|
View Node Code |
Click the Logical Code: The task code that you write. Physical Code: The compiled code that can run on the Flink engine. |
|
Recurring Task |
Click the |
|
Edit Development Node |
Click the Note
You can edit development nodes only for integration and computing task instances in Dev-Prod mode projects. |
|
View Production Node |
Click the Note
You can view development nodes only for integration and computing task instances in Dev-Prod mode projects. |
|
Edit Node |
Click the Note
You can edit nodes only for integration and computing task instances in Basic mode projects. |
|
Rerun Downstream |
Click the To rerun the entire dependency chain, we recommend that you force a rerun of the downstream instances. For more information, see Force a rerun of downstream instances. The rerun downstream operation is often used in the following scenarios:
|
|
Set To Success & Resume |
Click the |
|
Stop |
Click the Note
You cannot stop instances that are in the Succeeded, Failed, or Not Run state. You can stop instances in any other state. The stop operation is often used in the following scenarios:
|
|
Forced Rerun |
Click the
Important
A forced rerun does not check whether all upstream instances have run successfully or whether the scheduled run time of the current instance has been reached. This can lead to run failures or data quality issues. Before you proceed, make sure that the operation does not affect downstream data. |
|
Remove Upstream Dependencies |
Click the Important
You must keep at least one upstream instance. |
|
Pause |
Click the Note
|
|
Resume |
Click the |
|
Modify Schedule Resource Group |
Click the Note
|
|
Modify Priority |
Click the |
Operations supported for DAG nodes of integration and computing task instances
The Directed Acyclic Graph (DAG) shows the upstream and downstream dependencies of instance nodes and lets you perform O&M on them. By default, the DAG displays the selected node and its immediate ancestor and descendant nodes.
Dataphin supports cross-project instance O&M. To operate on an instance in another project, you must have the required view and operation permissions for that project.
-
Operations supported in the DAG
Operation
Description
Expand Parent Nodes
Expand the dependency nodes at different levels of the main node in the DAG.
Expand Child Nodes
View Task
Go to the DAG of the task node that generates the current instance node. You can view the task node details, information about its upstream and downstream nodes, and perform O&M on the task node. For more information, see auto triggered tasks.
View Operation Logs
View the logs of operations performed on the instance.
-
Operations supported for DAG nodes
Hover over a DAG node to view its name, type, schedule cycle, owner, and description. The operations supported for DAG nodes are the same as those supported in the instance list. For more information, see Operations supported in the integration and computing task instance list.
Batch operations for integration and computing task instances
The following table describes the supported batch operations.
|
Operation |
Description |
|
Rerun |
|
|
Stop |
|
|
Set To Success & Resume |
Select multiple instances to manually set the status of failed or not run instances to Succeeded in a batch. This allows them to participate in scheduling. |
|
Pause |
|
|
Resume |
Resume paused recurring instances in a batch. |
|
Modify Schedule Resource Group |
Modify the schedule resource group that instances use at runtime. Note
|
|
Modify Priority |
Modify the priority of the selected instances in a batch. You can select Highest, High, Medium, Low, or Lowest. |
|
Download All |
Download the data of all recurring instances, including integration, computing, and modeling task instances, to your computer. The downloaded file is in the .xlsx format. The file is named in the The table contains the following information: instance object, instance ID, status, schedule cycle, data timestamp, priority, owner, project (if a logical aggregate table belongs to multiple projects, the project names are separated by commas (`, `)), scheduled run time, start time, end time, duration, retries/auto-retries, related baseline instances (if an instance is associated with multiple baselines, the baseline names are separated by commas (`, `)), and schedule resource group (this parameter is empty for modeling task instances). |
Rerun downstream
-
In the Rerun Downstream dialog box, configure the parameters.
NoteYou cannot rerun descendant nodes that have a Waiting or Running status. To rerun the entire dependency chain, we recommend that you force a rerun of the downstream instances. For more information, see Force a rerun of downstream instances.
Parameter
Description
Start Node Run Mode
Define the run mode of the start node. You can select Dry-run or Normal run.
-
Dry-run: The status of a dry-run instance is Succeeded (Normal). The operational log is empty, no duration is recorded, and no data is processed.
-
Normal Run: The instance is scheduled as normal.
Downstream Rerun Scope
Select the scope of descendant nodes to rerun.
-
All Failed Instances: The list of descendant nodes is not displayed. The system automatically selects all descendant instances that have failed and reruns them.
-
Custom: If you want to specify the descendant instances to rerun, select this option. You can search for nodes by name or ID, or filter them by status, owner, or project.
-
-
Click OK.
-
After you rerun the downstream nodes, the data of the descendant instances is updated.
Force a rerun of downstream instances
-
In the Force Rerun Downstream dialog box, configure the rerun parameters.
Parameter
Description
Start Node Run Mode
Define the run mode of the start node. You can select Dry-run or Normal run.
-
Dry-run: The status of a dry-run instance is Succeeded (Normal). The operational log is empty, no duration is recorded, and no data is processed.
-
Normal Run: The instance is scheduled as normal.
Downstream Forced Rerun Scope
Select the scope of descendant nodes to force a rerun.
-
All Instances: Select all descendant instance nodes of the start node.
-
Custom: If you want to specify the descendant instances to rerun, select this option. You can search for nodes by name or ID, or filter them by status, owner, or project.
-
-
Click OK.