DataWorks Data Studio provides various nodes for different data processing tasks: data integration nodes for synchronization; engine compute nodes such as MaxCompute SQL, Hologres SQL, and EMR Hive for data cleaning; and general-purpose nodes such as zero-load nodes and do-while loop nodes for complex logic processing.
Supported node types
The following table lists the node types supported by recurring schedules. Supported node types for one-time tasks and manually triggered workflows may differ. For the most up-to-date list, refer to the UI.
-
Node availability varies by edition and region. For the most accurate information, see the UI.
-
Some nodes cannot be run in a workflow. See the node details for specifics.
|
Node type |
Node name |
Description |
Node code |
Task type |
|
data integration |
Synchronizes data in recurring batches between various data sources. For more information about the data sources supported by batch synchronization, see Supported data sources and synchronization solutions. |
23 |
DI |
|
|
Synchronizes data changes from a source to a destination database in real time. You can synchronize a single table or an entire database to maintain data consistency. For more information about the data sources supported by real-time synchronization, see Supported data sources and synchronization solutions. |
900 |
RI |
||
|
Notebook |
Notebook offers a flexible and interactive platform for data processing and analysis. Its intuitive, modular, and interactive environment streamlines data processing, exploration, visualization, and model building. |
1323 |
NOTEBOOK |
|
|
MaxCompute |
Schedules recurring MaxCompute SQL tasks. These tasks use an SQL-like syntax for distributed processing of massive (terabyte-scale) datasets where real-time performance is not critical. |
10 |
ODPS_SQL |
|
|
An SQL code template with multiple input and output parameters. It filters, joins, and aggregates data from source tables to generate a result table. Use these predefined components to quickly build data processing flows, which significantly improves development efficiency. |
1010 |
COMPONENT_SQL |
||
|
Combines multiple SQL statements into a single script for compilation and execution. This is ideal for complex query scenarios, such as nested subqueries or multi-step operations. Submitting the entire script at once generates a unified execution plan, meaning the job is queued and run only once, which improves resource utilization. |
24 |
ODPS_SQL_SCRIPT |
||
|
Integrates the Python SDK for MaxCompute. Use this node to write and edit Python code for data processing and analysis tasks in MaxCompute. |
221 |
PY_ODPS |
||
|
Use a PyODPS 3 node to write MaxCompute jobs directly in Python and configure them for recurring scheduling. |
1221 |
PYODPS3 |
||
|
Runs offline Spark jobs in cluster mode on MaxCompute. |
225 |
ODPS_SPARK |
||
|
Write and schedule MapReduce programs using the Java API to process large-scale datasets in MaxCompute. |
11 |
ODPS_MR |
||
|
To accelerate queries on MaxCompute data, use this feature to map MaxCompute table metadata to Hologres. You can then use Hologres foreign tables to query the data in MaxCompute directly. |
- |
- |
||
|
Synchronizes data from a single MaxCompute table to Hologres, enabling efficient big data analysis and real-time queries. |
- |
- |
||
|
Hologres |
Queries data in Hologres instances. Because Hologres and MaxCompute are seamlessly connected, you can use this node to query and analyze large-scale data in MaxCompute by using standard PostgreSQL statements, delivering rapid results without data migration. |
1093 |
HOLOGRES_SQL |
|
|
Migrates data from a single Hologres table to MaxCompute. |
1070 |
HOLOGRES_SYNC_DATA_TO_MC |
||
|
Quickly creates Hologres foreign tables in batches by importing the schemas of source MaxCompute tables. |
1094 |
HOLOGRES_SYNC_DDL |
||
|
Quickly synchronizes data from MaxCompute to a Hologres database. |
1095 |
HOLOGRES_SYNC_DATA |
||
|
Serverless Spark |
A Serverless Spark node for large-scale data processing. |
2100 |
SERVERLESS_SPARK_BATCH |
|
|
An SQL query node that is based on Serverless Spark. It supports standard SQL syntax and provides high-performance data analysis capabilities. |
2101 |
SERVERLESS_SPARK_SQL |
||
|
Connects to Serverless Spark through the Kyuubi JDBC/ODBC interface to provide a multi-tenant Spark SQL service. |
2103 |
SERVERLESS_KYUUBI |
||
|
Serverless StarRocks |
An SQL node that is based on E-MapReduce Serverless StarRocks. It is compatible with the SQL syntax of open source StarRocks and provides high-speed online analytical processing (OLAP) and data lakehouse query analysis. |
2104 |
SERVERLESS_STARROCKS |
|
|
Large language model (LLM) |
Uses a built-in engine that intelligently performs data cleaning, processing, analysis, and mining based on your natural language instructions. |
2200 |
LLM_NODE |
|
|
Flink |
Defines real-time task processing logic with standard SQL. This node is easy to use and features rich SQL support, powerful state management, and fault tolerance. It is compatible with both event time and processing time, offers flexible scalability, integrates with systems like Kafka and HDFS, and provides detailed logs and performance monitoring tools. |
2012 |
FLINK_SQL_STREAM |
|
|
Allows you to use standard SQL statements to define and execute data processing tasks. It is suitable for analyzing and transforming large datasets, including data cleaning and aggregation. This node supports visual configuration and provides an efficient and flexible solution for large-scale batch processing. |
2011 |
FLINK_SQL_BATCH |
||
|
Use this node to run real-time Flink tasks by submitting a JAR file. You can select an uploaded Flink JAR resource as the job entry point and configure the entry class and runtime parameters. |
2016 |
FLINK_JAR_STREAM |
||
|
Use this node to run Flink batch processing tasks by submitting a JAR file. You can select an uploaded Flink JAR resource as the job entry point and configure the entry class and scheduling parameters. |
2015 |
FLINK_JAR_BATCH |
||
|
Use this node to run real-time Flink tasks by submitting a Python file. You can select an uploaded Flink Python resource as the file path and configure the entry module and runtime parameters. |
2018 |
FLINK_PYTHON_STREAM |
||
|
Use this node to run Flink batch processing tasks by submitting a Python file. You can select an uploaded Flink Python resource as the file path and configure the entry module and scheduling parameters. |
2017 |
FLINK_PYTHON_BATCH |
||
|
E-MapReduce |
Uses SQL-like statements to read, write, and manage large datasets, enabling efficient analysis and development of massive log data. |
227 |
EMR_HIVE |
|
|
An interactive SQL query engine for fast, real-time queries on petabyte-scale data. |
260 |
EMR_IMPALA |
||
|
Breaks down large datasets into multiple parallel Map tasks, which significantly improves data processing efficiency. |
230 |
EMR_MR |
||
|
A flexible and scalable distributed SQL query engine that supports interactive analysis of big data using the standard SQL query language. |
259 |
EMR_PRESTO |
||
|
Lets you edit custom Shell scripts to use advanced features such as data processing, calling Hadoop components, and file operations. |
257 |
EMR_SHELL |
||
|
A general-purpose big data analysis engine known for its high performance, ease of use, and wide applicability. It supports complex in-memory computing and is ideal for building large-scale, low-latency data analysis applications. |
228 |
EMR_SPARK |
||
|
Uses a distributed SQL query engine to process structured data and improve job execution efficiency. |
229 |
EMR_SPARK_SQL |
||
|
Processes high-throughput, real-time streaming data. It features a fault tolerance mechanism for quick recovery of failed data streams. |
264 |
EMR_SPARK_STREAMING |
||
|
A distributed SQL query engine suitable for interactive analysis across multiple data sources. |
267 |
EMR_TRINO |
||
|
A distributed and multi-tenant gateway that provides SQL and other query services for data lake query engines such as Spark, Flink, or Trino. |
268 |
EMR_KYUUBI |
||
|
ADB |
Lets you develop and schedule recurring AnalyticDB for PostgreSQL tasks. |
1000090 |
- |
|
|
Lets you develop and schedule recurring AnalyticDB for MySQL tasks. |
1000126 |
- |
||
|
Lets you develop and schedule recurring AnalyticDB Spark tasks. |
1990 |
ADB_SPARK |
||
|
Lets you develop and schedule recurring AnalyticDB Spark SQL tasks. |
1991 |
ADB_SPARK_SQL |
||
|
CDH |
Use this node if you have deployed a CDH cluster and want to use DataWorks to run Hive tasks. |
270 |
CDH_HIVE |
|
|
A general-purpose big data analysis engine that features high performance, ease of use, and wide applicability. It supports complex in-memory analysis and is ideal for building large-scale, low-latency data analysis applications. |
271 |
CDH_SPARK |
||
|
Uses a distributed SQL query engine to process structured data and improve job execution efficiency. |
272 |
CDH_SPARK_SQL |
||
|
Processes massive datasets. |
273 |
CDH_MR |
||
|
This node provides a distributed SQL query engine, which enhances the data analysis capabilities of the CDH environment. |
278 |
CDH_PRESTO |
||
|
The CDH Impala node allows you to write and run Impala SQL scripts for faster query performance. |
279 |
CDH_IMPALA |
||
|
Lindorm |
Lets you develop and schedule recurring Lindorm Spark tasks. |
1800 |
LINDORM_SPARK |
|
|
Lets you develop and schedule recurring Lindorm Spark SQL tasks. |
1801 |
LINDORM_SPARK_SQL |
||
|
ClickHouse |
Performs distributed SQL queries and processes structured data to improve job execution efficiency. |
1301 |
CLICK_SQL |
|
|
data quality |
You can configure data quality monitoring rules to monitor the data quality of tables in a data source, such as checking for bad data. You can also customize scheduling policies to periodically run monitoring jobs for data validation. |
1333 |
DATA_QUALITY_MONITOR |
|
|
Use this node to compare data from different tables in various ways. |
1331 |
DATA_SYNCHRONIZATION_QUALITY_CHECK |
||
|
general |
A control-type, dry-run node that produces no data. It is typically used as the root node in a workflow to organize nodes and business processes. |
99 |
VIRTUAL |
|
|
Passes parameters between nodes. Its built-in output parameter passes the result of the last query or its own output to downstream nodes through the node context. |
1100 |
CONTROLLER_ASSIGNMENT |
||
|
The Shell node supports standard Shell syntax but does not support interactive syntax. |
6 |
DIDE_SHELL |
||
|
Aggregates parameters from ancestor nodes and passes them to descendant nodes. |
1115 |
PARAM_HUB |
||
|
Triggers the execution of descendant nodes by monitoring an OSS object. |
239 |
OSS_INSPECT |
||
|
Supports Python 3. It allows you to obtain upstream parameters and configure custom parameters by using the scheduling parameters in the scheduling configuration. You can also pass its own output as parameters to downstream nodes. |
1322 |
PYTHON |
||
|
Merges the running statuses of ancestor nodes to resolve dependency and execution trigger issues for the descendant nodes of a branch node. |
1102 |
CONTROLLER_JOIN |
||
|
Evaluates the result of an ancestor node to determine which branch of logic to follow. You can use this node together with an assignment node. |
1101 |
CONTROLLER_BRANCH |
||
|
Traverses the result set passed by an assignment node. |
1106 |
CONTROLLER_TRAVERSE |
||
|
Loops through a part of the node logic. You can also use it with an assignment node to loop through the results passed by the assignment node. |
1103 |
CONTROLLER_CYCLE |
||
|
Checks whether a target object is available. If the check policy is met, the node runs successfully and triggers downstream tasks. The following target objects are supported:
|
241 |
CHECK_NODE |
||
|
Used for recurring scheduling of event-driven functions. |
1330 |
FUNCTION_COMPUTE |
||
|
Triggers a DataWorks task upon the completion of a task in an external scheduling system. Note
DataWorks no longer supports creating cross-tenant collaboration nodes. If you are using a cross-tenant collaboration node, replace it with an HTTP trigger node, which provides the same capabilities. |
1114 |
SCHEDULER_TRIGGER |
||
|
Allows you to specify an SSH data source to remotely access the host connected to that data source from DataWorks and trigger a script to run on the remote host. |
1321 |
SSH |
||
|
Pushes query results from other nodes in a DataStudio workflow to a configured destination. Supported destinations include DingTalk groups, Lark groups, WeCom groups, Teams, and email. |
1332 |
DATA_PUSH |
||
|
MySQL node |
Lets you develop and schedule recurring MySQL tasks. |
1000125 |
- |
|
|
SQL Server |
Lets you develop and schedule recurring SQL Server tasks. |
10001 |
- |
|
|
Oracle node |
Lets you develop and schedule recurring Oracle tasks. |
10002 |
- |
|
|
PostgreSQL node |
Lets you develop and schedule recurring PostgreSQL tasks. |
10003 |
- |
|
|
StarRocks node |
Lets you develop and schedule recurring StarRocks tasks. |
10004 |
- |
|
|
DRDS node |
Lets you develop and schedule recurring DRDS tasks. |
10005 |
- |
|
|
PolarDB for MySQL node |
Lets you develop and schedule recurring PolarDB for MySQL tasks. |
10006 |
- |
|
|
PolarDB for PostgreSQL node |
Lets you develop and schedule recurring PolarDB for PostgreSQL tasks. |
10007 |
- |
|
|
Doris node |
Lets you develop and schedule recurring Doris tasks. |
10008 |
- |
|
|
MariaDB node |
Lets you develop and schedule recurring MariaDB tasks. |
10009 |
- |
|
|
SelectDB node |
Lets you develop and schedule recurring SelectDB tasks. |
10010 |
- |
|
|
Redshift node |
Lets you develop and schedule recurring Redshift tasks. |
10011 |
- |
|
|
SAP HANA node |
Lets you develop and schedule recurring SAP HANA tasks. |
10012 |
- |
|
|
Vertica node |
Lets you develop and schedule recurring Vertica tasks. |
10013 |
- |
|
|
DM (Dameng) node |
Lets you develop and schedule recurring DM tasks. |
10014 |
- |
|
|
KingbaseES node |
Lets you develop and schedule recurring KingbaseES tasks. |
10015 |
- |
|
|
OceanBase node |
Lets you develop and schedule recurring OceanBase tasks. |
10016 |
- |
|
|
DB2 node |
Lets you develop and schedule recurring DB2 tasks. |
10017 |
- |
|
|
GBase 8a node |
Lets you develop and schedule recurring GBase 8a tasks. |
10018 |
- |
|
|
Algorithm |
PAI Designer is a visual modeling tool for building end-to-end machine learning development workflows. |
1117 |
PAI_STUDIO |
|
|
PAI DLC is a container-based training service used to run distributed training tasks. |
1119 |
PAI_DLC |
||
|
Generates a PAI Flow node in DataWorks for a PAI knowledge base indexing workflow. |
1250 |
PAI_FLOW |
||
|
logic node |
The SUB_PROCESS node integrates multiple workflows into a unified whole for management and scheduling. |
1122 |
SUB_PROCESS |
Create a node
Create a node for a scheduled workflow
To run a task automatically on a recurring schedule, such as hourly, daily, or weekly, create a node for a scheduled workflow. You can do this by creating a new scheduled task node, adding an inner node to a scheduled workflow, or cloning an existing node.
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
-
In the left navigation pane, click the
icon to go to the Data Studio page.
Create a scheduled task node
-
Click the
icon to the right of the Workspace Directories, select New Node, and then select a node type.ImportantYou can choose between Common Nodes and All Nodes. To see all available types, select All Nodes at the bottom of the list. You can then use the search box or filter by category (such as MaxCompute, Data Integration, and General) to find the node you need.
You can create folders in advance to organize and manage your nodes.
-
Enter a name for the node and save it to open the node editor page.
Create an inner node in a workflow
-
Create a scheduled workflow.
-
On the workflow canvas, click Create Node in the top toolbar. Select a node type for your task and drag it onto the canvas.
-
Enter a name for the node and save it.
Create a node by cloning
You can use the clone feature to quickly create a new node from an existing one. Cloning copies the node's Scheduling Settings, such as Scheduling Parameters, Scheduling time, and Scheduling Dependency.
-
In the Project Directory pane on the left, right-click the node that you want to clone and select Cloning from the context menu.
-
In the dialog box, change the node's Name and Path or accept the defaults, then click Confirm.
-
After the node is cloned, it appears in the Project Directory pane.
Create a node for a manual workflow
If a task does not require a recurring schedule but must be published to the production environment for on-demand execution, create an inner node in a manually triggered workflow.
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
-
In the left navigation pane, click the
icon to go to the manually triggered workflow page.-
Create a manually triggered workflow.
-
On the manually triggered workflow editor page, click New Internal Node in the top toolbar and select a node type.
-
Enter a name for the node and save it.
-
Create a manual task node
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
-
In the left navigation pane, click the
icon to go to the manual task page. -
In the Manual Tasks section at the bottom of the page, click the
icon next to Manually Triggered Task, select New Node, and then choose a node type.NoteManual tasks support only the following node types: Offline synchronization, Notebook, MaxCompute SQL, MaxCompute Script, PyODPS 2, MaxCompute MR, Hologres SQL, Python node, and Shell node.
-
Enter a name for the node and save it to open the node editor page.
Batch edit nodes
Editing nodes individually in a large workflow is inefficient. The DataWorks Inner Node List feature allows you to quickly preview, search, and batch edit all nodes from a list on the right side of the canvas.
How to use
-
Click the Show Inner Node List button in the toolbar at the top of the workflow canvas to open the right-side panel.

-
When the panel opens, it lists all nodes in the current workflow.
-
Code preview and sorting:
-
For nodes that support code editing, such as MaxCompute SQL, the code editor expands by default.
-
Nodes without code editing support, such as zero load nodes, appear as cards and are automatically sorted to the bottom of the list.
-
-
Quick search and locating:
-
Search: Enter a keyword in the search box at the top to fuzzy search by node name.
-
Synchronized focus: Focus is synchronized between the canvas and the sidebar. Selecting a node on the canvas highlights it in the sidebar, and conversely, clicking a node in the sidebar focuses the canvas on that node.
-
-
Online editing:
-
Operations: Each node card's upper-right corner contains shortcuts, including Load Latest Code, Open Node, and Edit.
-
Auto-save: In edit mode, your changes save automatically when you move the cursor outside the code editor.
-
Conflict detection: If another user updates the code while you are editing it, a failure notification appears when you save. This prevents you from accidentally overwriting their changes.
-
-
Focus mode:
-
Select a node and click
in the upper-right corner of the floating window to enable Focus Mode. In this mode, the sidebar displays only the selected node, freeing up more space for code editing.
-
-
Version management
Version management lets you restore a node to a previous version. It also provides tools for viewing and comparing versions to help you analyze differences and make adjustments.
-
In the Project Directory pane on the left, double-click the target node name to open its editor.
-
On the right side of the node editor, click Version. On the Version page, you can view and manage the node's Developer Record and Publish Record.
-
View a version:
-
On the Developer Record or Publish Record tab, find the node version you want to view.
-
Click View in the Operation column to open the details page. This page shows the node code and Scheduling Settings information.
NoteYou can view the Scheduling Settings information in code editor or visualization mode. You can switch the view mode in the upper-right corner of the Scheduling Settings tab.
-
-
Compare versions:
You can compare different versions of a node on the Developer Record or Publish Record tab. The following example shows how to compare versions from the development history.
-
Compare versions in the same environment: On the Developer Record tab, select two versions and click Select Comparison at the top to compare the node code and scheduling information of the selected versions.
-
Compare versions between different environments:
-
On the Developer Record tab, locate a version of the node.
-
Click Compare in the Operation column. On the details page, select a version from the Publish Record or Build Records to compare with.
-
-
-
Restore a version:
You can only restore a node to a previous version from the Developer Record tab. On this tab, find the target version and click Restore in the Operation column to restore the node's code and Scheduling Settings information to that version.
-
References
-
For details about developing nodes in scheduled workflows and manually triggered workflows, see scheduled workflow orchestration and manually triggered workflows.
-
After you create and develop a node, you can publish it to the production environment. For details, see node scheduling configuration and Publish nodes and workflows.
FAQ
Can I download node code, such as SQL or Python, to a local machine?
-
Answer: Direct downloads are not supported. As a workaround, you can copy the code to your local machine during development. Alternatively, the new DataStudio lets you add a local file to your personal folder for development. When you complete development, submit the code to the workspace directories.
