Data development workflow

更新时间:
复制 MD 格式

In DataWorks, you create data development tasks as nodes, each encapsulating a task for a specific compute engine. DataStudio also lets you develop complex tasks by using resources, functions, and various logical processing nodes. This topic describes the common data development workflow.

Prerequisites

Go to DataStudio

Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O&M > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

The following procedure describes how to create tasks in DataStudio.

Development workflow

The following figure shows the common workflow for data development.脚本开发流程

Step

Description

Related documentation

Step 1: Create a workflow

In DataWorks, you organize all development tasks and code into workflows.

Create a workflow

Step 2: Create a table

DataWorks provides a visual interface to create and manage tables, which are displayed in a directory structure.

Before you start data development, you must create tables in the engine to store raw data and the results of data cleansing. The type of table you create depends on your specific use case.

Step 3: Create and upload a resource (Optional)

If your development process requires resources such as text files or JAR packages, you can upload and manage them in DataWorks for use with a specific compute engine.

Note

The supported compute engines and resource types are displayed in the DataWorks UI.

Step 4: Create a scheduling node

In DataWorks, you develop tasks by using nodes. Tasks for different compute engines are encapsulated into different types of nodes. You can select the appropriate node type for your engine task based on your business needs.

DataWorks also provides convenient node management operations. For example, you can use node groups to clone nodes in batches or use the recycle bin to quickly restore deleted nodes.

DataWorks supports multiple engines, including:

Different node types are available for different tasks in each engine. For a detailed list of supported node types, see Supported node types.

For more information about node management, see the following topics:

Step 5: Reference a resource in a node (Optional)

To use a resource in DataWorks, you must load it into the node's runtime environment before using it in the node.

Step 6: Register a function (Optional)

If your development process requires a function, you can register it by using the visual interface of DataWorks. Before you register a function, you must upload the function's required resources to DataWorks.

Note

The compute engines that support function registration are displayed in the DataWorks UI.

Step 7: Edit node code

On the node editor page, write business code by using the syntax of the corresponding engine and database. The syntax may vary for different node types.

Note

After you finish editing the code, save it promptly (保存) to prevent data loss.

For a detailed list of supported node types, see Supported node types.

Usage notes for common engines:

Next steps

After you develop a task, perform the following operations as needed:

  • Debug the code: Run a single task or an entire workflow to debug it. For more information, see Task debugging process.

  • Configure scheduling: Configure scheduling properties for the node. The node then runs periodically based on these settings. For more information, see Configure scheduling for a node.

  • Commit and deploy the node: After development is complete, commit the node to the target environment for scheduled execution. If you use a workspace in standard mode, after a successful commit, you must also deploy the node by clicking Deploy in the upper-right corner. For more information, see Deploy a node.

  • Perform O&M operations: After a node is deployed, it appears in Operation Center for the production environment by default. Go to Operation Center to view the run status of nodes and perform related O&M. For more information, see Operation Center overview.