Operation Center

After you submit or publish nodes from the Data Integration and Data Studio modules, you can manage them in the Operation Center for the development or production environment. The Operation Center provides five functional modules: O&M Overview, Node Operations, Instance Operations, Monitoring Management, and System Configuration. These modules allow you to perform comprehensive operations management on submitted nodes and their generated instances.

Quick start in 5 minutes

Scenarios

Global monitoring and control: The Dataphin Operation Center provides statistics on both offline and real-time instances. You can view details about anomalies, such as run trends, failure rankings, and alert rankings. This provides a global view that lets you control instance execution, receive timely anomaly alerts, and improve O&M efficiency.
Resource cost savings: The resource dashboard in the Dataphin Operation Center lets you compare the allocated and actual CPU and memory consumption for your entire project or for individual nodes. This comparison provides an analytical basis for optimizing global resource configuration and individual node resource allocation. You can flexibly adjust resource configurations to save costs and improve resource utilization while ensuring that nodes have sufficient resources to run stably.
Node operations management: You can manage code tasks generated from the Data Integration, Modeling, Code Development, and Data Distilling modules. You can also view the status of a single node, its upstream and downstream dependencies, and perform management operations.
Runtime resource control: When a compute engine experiences performance bottlenecks, has insufficient resource allocation, or requires control over the time and order of node submissions, you can configure throttling rules. This ensures system stability and prioritizes resource allocation to guarantee data output.
Anomaly alerts: Baseline O&M lets you configure alert rules for physical nodes and logical table fields. If a monitored node or field is abnormal, the system notifies you by phone, text message, DingTalk, or email.

Function overview

After you develop nodes in Dataphin and submit or publish them to the production environment, you can perform O&M operations in the Operation Center. These operations include data backfill for auto triggered tasks, running one-time tasks, viewing task execution details, monitoring node run status, configuring alerts, viewing instance and resource usage statistics, and configuring O&M policies for timed-out or failed nodes. The functional modules of the Operation Center are described below:

image..png

The following table describes the functions of each module in the Operation Center.

Functional module	Description
O&M Overview	Instance statistics: Provides statistics on the execution of offline and real-time instances, including run details, run trends, and rankings of failed instances. This helps you monitor instance execution from a project or global perspective. Anomaly statistics: Provides statistics on nodes with abnormal runs within the global scope or a selected project. It offers insights from two perspectives: run errors and excessive run duration. This helps you promptly understand the status of node execution to assess resource consumption and impact, which informs decisions on budget preparation, resource scale-out, or quota increases. Schedule Resource Dashboard: Provides information on global node resource allocation, global resource consumption, and recommended optimizations for nodes. This helps you stay informed about resource scheduling, understand trends, and assess resource consumption and its impact. This information supports decisions on budget preparation, resource scale-out, or quota increases.
Job Operations and Maintenance	Node Operations are categorized by schedule timeliness into auto triggered tasks, real-time tasks, and one-time tasks. Auto triggered tasks include script nodes, detail and aggregate table nodes, and data distilling nodes. The Node Operations module lets you manage these tasks, including viewing DAGs, viewing instances, performing data backfills, and modifying the owners of nodes in batches.
Instance Operations	The Instance Operations module categorizes instances by their generation method into baseline instances, recurring instances, data backfill instances, one-time instances, and real-time instances. This module lets you manage these instances, including viewing DAGs, viewing nodes, viewing operational logs, and rerunning instances in batches.
Monitoring Management	Monitoring Management supports baseline monitoring and offline task monitoring. Baseline monitoring: Baseline O&M lets you manage and maintain baseline monitoring, baseline alerts, baseline instances, and high-priority node assurance. Operations include viewing DAGs, enabling or disabling baseline monitoring for nodes in batches, and transferring baseline ownership in batches. Nodes within a baseline's scope can be set to a higher priority for preferential resource allocation. Offline and real-time node monitoring: Configure various monitoring and alert rules for nodes. For offline logical table tasks, you can configure monitoring and alerts at the field level. Monitoring configurations help you stay informed about node execution dynamics and monitor abnormal nodes. You can also configure baseline monitoring for nodes that require special assurance. Note Monitoring and alerts can only be configured for Basic and Prod projects. Throttling configuration and Baseline O&M must be purchased and enabled separately before use.
System Configuration	System Configuration provides throttling configuration and runtime configuration features. Throttling configuration: When a compute engine experiences performance bottlenecks, has insufficient resource allocation, or requires control over the time and order of node submissions, you can configure throttling rules in the development or production environment. This ensures system stability and allows high-priority nodes to receive resources and run first, guaranteeing timely and orderly data output. Runtime configuration: Dataphin supports tenant-level runtime configurations. You can configure timeout periods for running instances and retry policies for failed nodes based on the tenant type and business scenario. This prevents resource waste caused by long-running instances and improves the reliability of instance execution.

Task instance generation logic

Node types in the Operation Center include auto triggered tasks, one-time tasks, and real-time tasks. Nodes can be triggered to run using recurring schedules or manual triggers. Recurring schedules can be set to run by the minute, hour, day, week, month, or year. Triggering a node to run can involve performing data backfill for an auto triggered task, manually running a one-time task, or starting a real-time task.

Important

By default, jobs in the development environment do not run. You must trigger them manually.
After an auto triggered task is published to the production environment, it begins to run based on its configured schedule.

image..png

Recurring instance generation logic

When a node with the schedule type set to auto triggered task is submitted or published to the Operation Center, you can find it in the auto triggered task list. Auto triggered tasks can generate the following two types of instances:

Instance type

Instance generation time

Instance execution logic

Instance run conditions

Recurring instance

Every night at 23:00, auto triggered tasks automatically generate the recurring instances that need to run the next day. The instance generation follows a T+1 pattern:

Auto triggered tasks submitted or published before 23:00 will generate recurring instances the next day.
Auto triggered tasks submitted or published after 23:00 will generate instances on the third day.
Note
Modifying the schedule resource group for a node only affects newly generated instances, not existing ones. To change the schedule resource group for an instance, modify the node's resource configuration and submit or publish it before 23:00. Alternatively, you can individually modify the schedule resource for an instance that has been generated but has not yet started running.

After a recurring instance is generated from a snapshot of the auto triggered task, it runs automatically based on the task's schedule properties.

Before a recurring instance can start running, it must meet the following conditions:

All dependent parent node instances have run successfully.
The instance node's scheduled run time has been reached.
Schedules resources to support the operation of the instance.
The instance and its associated auto triggered task are not paused.

For more information about run states, see Instance run diagnostics.

Data backfill instance

You must manually perform a data backfill operation on the current auto triggered task to generate a data backfill instance.

After a data backfill instance is manually generated, it will backfill data based on the configured data timestamp.

Note

In the production environment, you can use data backfill for an auto triggered task to verify whether the corresponding task in the development environment can run normally and produce data correctly.

One-time instance generation logic
When a node with the schedule type set to manual task is submitted or published to the Operation Center, you can find it in the one-time task list. To run the node, click Run in the one-time task list. This action generates a one-time instance. You can view the execution details on the One-time Instance page.
Real-time instance generation logic
When a real-time task is submitted or published to the Operation Center, you can perform operations such as starting it or modifying its resource configuration in the real-time task list. In Basic mode and the Prod environment of Dev-Prod mode, a real-time instance is automatically generated after the real-time task is submitted. The automatically generated instance is in the Stopped state. O&M for real-time tasks covers both real-time computing tasks and real-time integration tasks.

Operation Center entry points

Shortcut (Recommended)

On the Dataphin home page, you can also click O&M Schedule in the Dataphin product path to quickly navigate to the Operation Center.

Standard entry point

On the Dataphin home page, click Develop in the top menu bar.
On the Data Development page, click O&M in the top menu bar to navigate to the Operation Center page.

上一篇: Manage publishing tasks 下一篇: Operations overview