Operation Center-Dataphin(Dataphin)-阿里云帮助中心

After you submit or publish nodes from the Data Integration and Data Studio modules, they are moved to the development or production environment in the Operation Center. The Operation Center provides five functional modules: Overview, Node O&M, Instance O&M, Monitoring Management, and System Configuration. You can use these modules to perform comprehensive operations and maintenance (O&M) on submitted nodes and their generated instances.

Get started in 5 minutes

Scenarios

Global monitoring and control: The Dataphin Operation Center provides comprehensive monitoring capabilities. It offers statistics for offline and real-time instances, tracks anomalies, and provides details such as operational logs and running trends. The center also provides various rankings, including failed instances, nodes with failed instances, instances with failure alerts, and instances with latency alerts. You can view statistics on runtime status, duration, failure counts, latency, and alerts. This lets you monitor instance operations from a global perspective, receive timely notifications about anomalies, and improve O&M efficiency.
Node O&M management: You can manage code tasks generated from the Data Integration, Data Modeling, Code Development, and Data Distilling modules. You can also view and manage the status of single nodes and their upstream and downstream dependencies, including both nodes and instances.
Runtime resource control: In scenarios where the compute engine experiences performance bottlenecks, resource allocation is insufficient, or you need to control the time and order of node submission, you can configure throttling rules. This ensures system stability and prioritizes resource allocation to guarantee data output.
Anomaly alerts: Baseline O&M lets you configure alert rules for physical nodes and logical table fields. If a monitored node or field is abnormal, the system notifies you by phone, text message, DingTalk, or email.

Function Overview

After you develop a node in Dataphin and submit or publish it to the production environment, you can perform O&M operations on the node in the Operation Center. These operations include backfilling data for auto triggered tasks, running one-time tasks, viewing node execution details, monitoring node running status, configuring alerts, viewing statistics on instance and resource usage, and configuring O&M policies for node timeouts or failures. The functional modules of the Operation Center are described as follows:

image..png

The following table describes the functions of each module in the Operation Center.

Functional module	Description
Overview	Instance statistics: Provides statistics on the operational details, running trends, and rankings of failed instances and instances with failure alerts for offline and real-time instances in the system. This helps you monitor instance operations from a project or global perspective. Anomaly statistics: Provides statistics on abnormally running nodes within the global scope or selected projects. It offers two perspectives: running errors and excessive total runtime. This lets you promptly understand the running status of nodes to assess resource consumption and impact, which helps in budget preparation and decisions on resource scale-out or quota increases.
Job Operations and Maintenance	Node O&M is categorized by schedule timeliness into auto triggered tasks, real-time tasks, and one-time tasks. Auto triggered tasks include script nodes, dimension and aggregate table nodes, and data distilling nodes. The Node O&M module lets you perform O&M and management for these task types. Operations include viewing DAGs, viewing instances, backfilling data, and modifying the owners of multiple nodes in batches.
Instance O&M	The Instance O&M module categorizes instances by generation method into baseline instances, recurring instances, data backfill instances, one-time instances, and real-time instances. This module lets you perform O&M and management for these instance types. Operations include viewing DAGs, viewing nodes, viewing operational logs, and rerunning multiple instances in batches.
Monitoring Management	Monitoring Management supports baseline monitoring and offline task monitoring. Baseline monitoring: Baseline O&M provides O&M and management for baseline monitoring, baseline alerting, baseline instances, and high-priority node assurance. Operations include viewing DAGs, enabling or disabling baseline monitoring for multiple nodes in batches, and transferring baseline owners in batches. Nodes within a baseline can be set to a higher priority for preferential resource allocation. Offline and real-time node monitoring: Configure various monitoring and alert rules for nodes. For monitoring offline logical table tasks, you can configure field-level monitoring and alerts. Monitoring configurations help you stay informed about the running dynamics of nodes and monitor abnormal nodes. You can also configure baseline monitoring for nodes that require special assurance. Note Only Basic and Prod projects support monitoring and alert configuration. Throttling configuration and Baseline O&M must be purchased and enabled separately.
System Configuration	System Configuration provides features such as throttling configuration and runtime configuration. Throttling configuration: When the compute engine experiences performance bottlenecks, resource allocation is insufficient, or you need to control the time and order of node submission, you can configure throttling rules in the development or production environment. This ensures system stability and allows high-priority nodes to be prioritized for resource allocation and execution, guaranteeing timely and orderly data output. Runtime configuration: Dataphin supports tenant-level runtime configurations. You can configure timeout periods for running instances and retry policies for failed nodes based on the tenant type and business scenario. This prevents resource waste caused by long-running instances and improves the reliability of running instances.

Task Instance Generation Logic

The Operation Center supports three types of nodes: auto triggered tasks, one-time tasks, and real-time tasks. Nodes can be triggered to run based on recurring schedules or manual triggers. The recurring schedule can be configured to run every minute, hour, day, week, month, or year. Triggering a node to run can involve backfilling data for an auto triggered task, manually running a one-time task, or starting a real-time task.

Important

By default, tasks in the development environment do not run. You must trigger them manually.
After an auto triggered task is published to the production environment, it automatically runs based on its recurring schedule.

image..png

Recurring instance generation logic

When a node with the schedule type set to auto triggered task is submitted or published to the Operation Center, the node appears in the auto triggered task list. An auto triggered task can generate the following two types of instances:

Instance type

Instance generation time

Instance running logic

Instance running conditions

Recurring instance

Auto triggered tasks automatically generate the recurring instances for the next day at 23:00 every night. The instance generation follows a T+1 pattern:

For auto triggered tasks submitted or published before 23:00, recurring instances are generated the next day.
For auto triggered tasks submitted or published after 23:00, instances are generated on the third day.
Note
Changes to a node's schedule resource group only affect newly generated instances, not existing ones. To modify the schedule resource group for an instance, modify the node's resource configuration and submit or publish it before 23:00. You can also separately modify the schedule resource for an instance that has been generated but has not yet started running.

After an auto triggered task generates a recurring instance from a snapshot, the instance is automatically scheduled to run based on the node's scheduling properties.

A recurring instance must meet the following conditions before it can run:

All parent node instances have run successfully.
The instance has reached its scheduled runtime.
The schedule resource is sufficient for the instance to run.
The instance and its associated auto triggered task are not paused. The following figure shows the running states of a recurring instance:

For more information about running states, see Instance running diagnostics.

Data backfill instance

You must manually perform a data backfill operation on the current auto triggered task to generate a data backfill instance.

After a data backfill instance is manually generated, it backfills data based on the configured data timestamp.

Note

In the production environment, you can use data backfill for an auto triggered task to verify whether the auto triggered task in the development environment can run normally and produce data correctly.

Logic for generating one-time instances
When a node with the schedule type set to one-time task is submitted or published to the Operation Center, the node appears in the one-time task list. To run the node, you can click Run in the one-time task list. This manual trigger generates a one-time instance. You can view the execution details of this instance on the One-time Instance page.
Real-time instance generation logic
After a real-time task is submitted or published to the Operation Center, you can perform operations, such as starting the task and modifying its resource configuration, in the real-time task list of the Operation Center. In the production environment of Basic and Dev-Prod modes, a real-time instance is automatically generated after a real-time task is submitted. By default, the automatically generated instance is in the Stopped state. Real-time task O&M includes O&M for both real-time computing tasks and real-time integration tasks.

Access the Operation Center

Quick access (recommended)

On the Dataphin home page, you can click O&M Schedule in the product usage path to quickly access the Operation Center.

Standard access

On the Dataphin home page, click Develop in the top menu bar.
On the Data Development page, click O&M in the top menu bar to open the Operation Center page.

上一篇: Manage publishing tasks 下一篇: O&M overview