Supported node types

更新时间:
复制 MD 格式

DataWorks Data Development (DataStudio) provides various nodes for your data processing needs, including data integration nodes, computing resource nodes such as ODPS SQL, Hologres SQL, and EMR Hive, and general-purpose nodes such as virtual nodes and Check nodes.

Important

If you cannot create a computing resource node, such as an ODPS SQL, Hologres SQL, or EMR Hive node, in Data Development, click Computing Resource in the navigation pane on the left to check whether the corresponding computing resource is attached. If a resource is attached but you still cannot create the node, refresh the current page to update the cached data. You can also try to enable incognito mode in your browser.

Data synchronization nodes

Data integration node

Usage

Node code

TaskType

Offline synchronization node

Used for periodic offline (batch) data synchronization. It supports data synchronization between various disparate data sources in complex scenarios. For more information about the data sources that offline synchronization supports, see Supported data sources and synchronization solutions.

23

DI

Real-time synchronization node

Used for real-time synchronization of incremental data. Real-time synchronization includes three basic plug-ins: real-time read, transform, and write. These plug-ins interact through a defined intermediate data format. For more information about the data sources that real-time synchronization supports, see Supported data sources and synchronization solutions.

900

RI

Note

In addition to the nodes that you create in the Data Development (DataStudio) interface, the Data Integration primary site supports various types of synchronization solutions. Examples include real-time synchronization of full and incremental data and offline synchronization of entire databases. For more information, see Synchronization task capabilities in Data Integration. The code for tasks on the Data Integration primary site is typically 24.

Engine compute nodes

In a business workflow, you can create a node that corresponds to a specific engine. You can then use this node for data development and send the code to the corresponding compute engine for execution.

Engine integrated with DataWorks

How DataWorks encapsulates engine capabilities

Node code

TaskType

MaxCompute

ODPS SQL node

10

ODPS_SQL

ODPS Spark node

225

ODPS_SPARK

PyODPS 2 node

221

PY_ODPS

PyODPS 3 node

1221

PYODPS3

ODPS Script node

24

ODPS_SQL_SCRIPT

ODPS MR node

11

ODPS_MR

SQL script template node

1010

COMPONENT_SQL

E-MapReduce

EMR Hive node

227

EMR_HIVE

EMR MR node

230

EMR_MR

EMR Spark SQL node

229

EMR_SPARK_SQL

EMR Spark node

228

EMR_SPARK

EMR Shell node

257

EMR_SHELL

EMR Presto node

259

EMR_PRESTO

EMR Impala node

260

EMR_IMPALA

EMR Spark Streaming node

264

EMR_SPARK_STREAMING

EMR Kyuubi node

268

EMR_KYUUBI

EMR Trino node

267

EMR_TRINO

EMR JAR

231

EMR_JAR

EMR File

232

EMR_FILE

CDH

CDH Hive node

270

CDH_HIVE

CDH Spark node

271

CDH_SPARK

CDH MR node

273

CDH_MR

CDH Presto node

278

CDH_PRESTO

CDH Impala node

279

CDH_IMPALA

CDH Spark SQL node

272

CDH_SPARK_SQL

AnalyticDB for PostgreSQL

AnalyticDB for PostgreSQL node

-

-

AnalyticDB for MySQL

AnalyticDB for MySQL node

1000126

-

Hologres

Hologres SQL node

1093

HOLOGRES_SQL

One-click MaxCompute table schema synchronization node

1094

HOLOGRES_SYNC_DDL

One-click MaxCompute data synchronization node

1095

HOLOGRES_SYNC_DATA

ClickHouse

ClickHouse SQL

1301

CLICK_SQL

Algorithm (machine learning)

PAI Designer node

1117

PAI_STUDIO

PAI DLC node

1119

PAI_DLC

Database

MySQL node

1000125

-

SQL Server node

10001

-

Oracle node

10002

-

PostgreSQL node

10003

-

StarRocks

10004

-

DRDS node

10005

-

PolarDB MySQL node

10006

-

PolarDB PostgreSQL node

10007

-

Doris node

10008

-

MariaDB node

10009

-

SelectDB node

10010

-

Redshift node

10011

-

SAP HANA node

10012

-

Vertica node

10013

-

DM (Dameng) node

10014

-

KingbaseES (Renmin Jincang) node

10015

-

OceanBase node

10016

-

DB2 node

10017

-

GBase 8a node

10018

-

Other

Data Lake Analytics node

1000023

-

General-purpose nodes

Engine nodes can be combined with general-purpose nodes to handle complex logic. In a business workflow, you can create the required nodes in the general-purpose node group and combine them with engine nodes to implement complex logic.

Business scenario

Node type

Usage notes

Node code

TaskType

Business management

Virtual node

A virtual node is a control plane node. It is a dry-run node that does not generate any data. It is often used as the root node of a business workflow to help you manage nodes and workflows.

99

VIRTUAL

Event trigger

HTTP trigger node

You can use this node to trigger a task in DataWorks after a task in another scheduling system completes.

Note

DataWorks no longer supports the creation of cross-tenant collaboration nodes. If you use cross-tenant collaboration nodes, replace them with HTTP trigger nodes. HTTP trigger nodes provide the same features.

1114

SCHEDULER_TRIGGER

OSS Object Check node

Triggers the execution of descendant nodes by monitoring the creation of OSS objects.

239

OSS_INSPECT

FTP Check node

Triggers the execution of descendant nodes by monitoring the creation of FTP files.

Note

DataWorks officially recommends that you use a Check node instead of an FTP Check node.

1320

FTP_CHECK

Check node

Used to check whether a target object is active. When the Check node meets the check policy, it returns a successful running state. If there are downstream dependencies, descendant tasks are triggered. The following target objects can be checked:

  • MaxCompute partitioned table

  • FTP file

  • OSS file

  • HDFS

  • OSS-HDFS

241

CHECK_NODE

Data Quality

Data Quality monitoring node

You can configure Data Quality monitoring rules to monitor the data quality of tables in related data sources, for example, to check for dirty data. You can also customize a scheduling policy to periodically run monitoring jobs for data validation.

1333

DATA_QUALITY_MONITOR

Data Comparison node

A Data Comparison node lets you compare data from different tables in a workflow in multiple ways.

1331

DATA_SYNCHRONIZATION_QUALITY_CHECK

Parameter assignment and passing

Assignment node

Used for passing parameters. The built-in output of the assignment node passes the result of the last query or output to descendant nodes through the node context feature. This implements cross-node parameter passing.

1100

CONTROLLER_ASSIGNMENT

Parameter node

Used to aggregate parameters from ancestor nodes and distribute them to descendant nodes.

1115

PARAM_HUB

Control

for-each node

Used to traverse the result set passed by an assignment node.

1106

CONTROLLER_TRAVERSE

do-while node

Used to loop the execution of some node logic. You can also use it with an assignment node to loop the output of the result passed by the assignment node.

1103

CONTROLLER_CYCLE

Branch node

Used to evaluate the result of an ancestor node and determine which branch logic to follow for different results. You can use it with an assignment node.

1101

CONTROLLER_BRANCH

Join node

Used to merge the running states of ancestor nodes. This resolves issues with dependency attachment and execution triggering for descendant nodes of a branch node.

1102

CONTROLLER_JOIN

Other

Shell node

A Shell node supports standard Shell syntax but does not support interactive syntax.

6

DIDE_SHELL

Function Compute node

Used to periodically schedule and process event functions, and to integrate and jointly schedule with other types of nodes.

1330

FUNCTION_COMPUTE

Data Push node

Used to push query data from a business workflow to DingTalk groups, Lark groups, WeCom groups, and Teams. This allows team members to promptly receive and follow the latest data.

1332

DATA_PUSH