DataStudio feature guide

更新时间:
复制 MD 格式

This topic describes the layout of the DataWorks DataStudio interface and its main components: workflows and nodes.

Go to DataStudio

Log on to the DataWorks console. In the target region, click Data Development and O&M > Data Development in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Development.

In DataStudio, you can create workflows and different types of nodes. For more information, see Create a workflow and Create a node.

The UI features vary for different development tasks. See the following topics for more information about specific interfaces.

DataStudio interface overview

This figure provides an overview of the DataStudio interface.界面说明

Area

Description

1

2

Click the 切换模块 icon in this area to display the name of each feature.

  • Scheduled Workflow: Use this module to develop scheduled tasks. You can create different types of nodes based on various compute engines for data development. Tasks developed in this module can be deployed to the production environment for O&M.

    Note

    To use a compute engine for data development, you must first bind a corresponding computing resource.

  • Manually Triggered Workflow: Use this module to develop manually triggered tasks. Tasks developed in this module can be deployed to the production environment for O&M.

  • Run History: View the run history of tests performed in DataStudio. Records from the last three days are retained.

  • Ad Hoc Query: Run simple, one-time queries for testing. Ad hoc queries cannot be deployed to the production environment for O&M.

  • Tenant Tables: Used to view all production tables under the current Alibaba Cloud account.

  • Workspace Tables: Visually perform operations on a target table. The available operations depend on the table's underlying engine.

  • Built-in Functions: Provides information about the built-in functions of MaxCompute.

  • Recycle Bin: Manage nodes, resources, and functions that are deleted from DataStudio and Manually Triggered Workflows.

  • Snippets: A snippet is a template of an SQL code process that contains multiple input and output parameters. An SQL code process usually processes one or more source tables by performing operations such as filtering, joining, and aggregation to generate the target table required by your business.

  • Operation History: View and filter historical operation records in the current workspace by operation type, operator, and operation time.

  • Perform Operation Check: Used to filter and view the details of operations by operation type and check status.

  • MaxCompute: Click to display the following submodules.

    • MaxCompute Resource Management: Used to manage existing MaxCompute resources. You can view the operation records of resources. You can also use this feature to load resources that were not uploaded in DataWorks into DataStudio for management.

    • MaxCompute Function Management: Used to manage existing MaxCompute functions. You can view the operation records of functions. You can also use this feature to load functions that were not registered in DataWorks into DataStudio for management.

Note

If some modules are not displayed in the left-side navigation pane, click the 设置 icon in Area 4 and add the modules on the Personal Settings page. For more information, see Personal settings.

3

Shortcuts to other modules in DataStudio:

  • cross-project cloning: You can use this feature to clone and migrate tasks, such as computing and synchronization tasks, between workspaces.

  • Operation Center: Go to Operation Center to perform maintenance on tasks. Operation Center is available for both development and production environments. It is used to manage and control all maintenance tasks for scheduled jobs in the production environment.

Common features of DataWorks modules:

Note

This topic uses the DataStudio interface as an example to describe the following common features. These features are the same in other modules.

  • Notification Center (消息中心): Displays notifications about product feature changes.

  • Helps (互动学习): Provides instructions for product features.

  • Workspace Management (工作空间管理): Go to the workspace configuration page. On this page, you can view the basic information, scheduling information, IP address whitelist details, data sources, and open source clusters of the workspace. For more information, see Configure a workspace.

  • Language: Click the currently displayed language to switch between supported languages (Chinese and English).

  • Account information: Click your account to view personal information and an overview of workbench tasks.

4

System configuration includes the following settings:

  • Personal settings: Configure management modules, editor features, and interface styles.

  • Configure Code Templates: Used to manage templates for code statements to render them in the desired style.

  • Scheduling Settings: Used to enable the scheduling cycle feature on the scheduling settings page. After this feature is enabled, periodic tasks can be automatically scheduled and run.

  • Security Settings and Others:

    • Data Security: Used to control whether to mask sensitive information in query results within the workspace.

    • Forcible Code Review: Used to enable forcible code review, configure code reviewers, and control the code quality of development tasks.

5

This area displays common shortcuts for the DataStudio editor. For a complete list, see Editor shortcuts.

Workflow interface features

When you enter DataStudio, the data development module opens by default. You must first create a workflow to organize your data development tasks. For more information, see Create a workflow. The following figure shows the user interface (UI) for a workflow.业务流程

Area

Description

1

  • solution: Groups multiple workflows. A workflow can be added to one or more solutions. Solutions can be displayed as a list or a graph.

  • workflow: Organizes your data development from a business perspective.

Click the 全部 icon to display all solutions or workflows in the current workspace.

2

  • Refresh (刷新): Click this icon to update the directory tree after you modify a workflow or solution.

  • Locate (定位): Quickly locates the currently open file in the directory tree.

  • Code Search (代码搜索): Searches for code snippets by keyword. This lets you quickly locate all nodes that contain a specific code snippet in data development, manually triggered workflows, ad hoc queries, and the recycle bin, and view the snippet's details. You can use this feature to find the source task that caused data changes in a target table.

  • Batch Operation (批量操作): Modifies multiple tables, resources, or functions. You can change properties such as the owner, engine instance, resource group for scheduling, rerun properties, scheduling type, scheduling period, and scheduling timeout.

  • Import Data (导入): Uploads local data to a target table. Currently, you can only upload data to MaxCompute tables.

  • Create (快捷新建): Quickly creates workflows, nodes of various types, tables, resources, and functions.

  • Solution and workflow directory trees:

    • All: The directory tree displays all created objects, such as nodes, resources, and functions, in the current workspace, organized by solution and workflow.

    • Owned by Me: The directory tree displays objects, such as nodes, resources, and functions, that you own, organized by solution and workflow.

    • My Favorites: The directory tree displays objects, such as nodes, resources, and functions, that you have added to favorites, organized by solution and workflow.

  • Node search:

    • Exact search: You can enter a File name or Created By and click the 查找 icon to find a specific node.

    • Search by node type: You can click the 筛选 icon and select a node type to find all nodes of that type. After you select a node type, the directory tree displays only nodes of that type.

      Note

      You can also select Hide Engine Instances and Hide Node Folders. If selected, these items are hidden from the directory tree.

      • The Hide Engine Instances and Hide Node Folders options apply only to the new workflow version.

      • If a target engine contains only one engine instance, we recommend that you hide the engine instance.

      • If you do not need to use folders for node types such as data development, table, resource, and function, you can hide them.

Note

If you are in a new workspace, you must first create a workflow and then create nodes within it to start data development. For details, see Create a workflow.

3

Use the directory tree to manage nodes, tables, resources, and functions in each workflow:

  • Workflow: The unit for data development, where you perform specific development work.

  • Node: The basic unit of development. A node can contain code for various engines, algorithms, data integration tasks, and general purposes.

  • Table: Manage tables visually.

  • Resource: Upload resources visually.

    Note

    Only the MaxCompute, E-MapReduce, and CDH engines support visual resource uploads.

  • Function: Register functions visually.

    Note

    Only the MaxCompute, E-MapReduce, and CDH engines support visual function registration.

The icon next to a node name indicates the node status:

  • 未提交 icon: Indicates the current node version has not been submitted. Click to submit the node.

  • 未发布 icon: Indicates the node has not been deployed. Click this icon to go to the deployment center to deploy the node.

The time of the last edit is displayed after the node name.

Double-click a workflow name to open the workflow editing page (Areas 5 to 8), where you can perform data development.

4

Resource Group Orchestration (资源组编排): During data development, use this feature to perform batch modifications on the resource groups for scheduling that are used by nodes in a workflow. This helps you reassign resource groups to improve resource utilization. After making changes, you must deploy them to Operation Center for the new settings to take effect for nodes in the production environment.

5

  • Common Nodes: Displays commonly used node types in the current workspace for quick access and creation.

  • Node Group: References a set of nodes across workflows. You can group frequently reused nodes into a node group and quickly reuse it in other workflows by cloning the nodes in bulk.

  • Quick node creation: Drag nodes from directories such as Data Integration, MaxCompute, and E-MapReduce onto the workflow canvas on the right to create nodes of those types.

6

Tools on the workflow canvas (1):

  • Switch Layout (切换布局): Switches the canvas layout to Vertical, Horizontal, or Grid.

  • Box-select (框选): Groups selected nodes into a node group to perform batch operations.

  • Refresh (刷新): Refreshes the current workflow. When you make changes to the workflow, you can manually refresh to get the latest view.

  • Format (格式化): Aligns nodes in the workflow horizontally.

  • Fit to Window (适配窗口): Automatically fits the workflow layout to the current window size.

  • Center (居中): Centers the nodes in the current workflow.

  • 1:1 (1:1): Sets the workflow view scale to 1:1.

  • Zoom In (放大): Zooms in on the current workflow.

  • Zoom Out (缩小): Zooms out of the current workflow.

  • Search (查找): Enter a keyword to search for nodes that contain the keyword.

    Note

    The search uses fuzzy matching. After you enter a keyword, DataWorks displays all nodes in the current workflow that contain the keyword.

  • Toggle Full Screen (全屏): Displays the current workflow in full-screen mode.

  • Hide Engine Information (隐藏引擎信息): Shows or hides the engine information for each node.

7

Tabs in the right-side pane (2):

  • Workflow Parameters: Replaces common variables in code in batches. Currently, only ODPS SQL nodes support workflow parameters.

  • Change History: Displays the operation history for nodes on the workflow canvas.

  • Versions: Submitting a workflow generates a new version. You can view the version history and details of each version here.

8

Tools in the toolbar (3):

  • Submit (提交): Submits updated nodes in the workflow to the deployment page in batches.

  • Run (运行): Runs all nodes in the current workflow.

  • Stop (停止运行): Stops nodes that are running in the workflow in batches.

  • Deploy (发布): Opens the deployment page, filtering it to show nodes from this workflow that are pending deployment.

  • Operation Center (前往运维): Navigates to Operation Center to view operational details for the nodes.

  • Search Tabs: If many tabs are open, click the 搜索 icon to view them all in a drop-down list.

  • Close Tab: Click the 关闭页签 icon to close a specific tab.

Workflow shortcut menu

Right-click a workflow to display its shortcut menu, as shown in the following figure.业务流程快捷操作

Feature

Description

Create Node

Creates new nodes of various types.

When you create a node, the system displays recently used node types for quick access. Selecting a type automatically populates the Compute Engine Instance and Node Type fields from the last used configuration, letting you quickly recreate nodes.新建节点

Create Table

Creates new tables of various types.

Create Resource

Creates new engine resources.

Note

Currently, this feature supports creating resources only for MaxCompute, CDH, and EMR engines.

Create Function

Creates new engine functions.

Note

Currently, this feature supports creating functions only for MaxCompute, CDH, and EMR engines.

Board

Opens the editing canvas for the workflow.

Change

Modifies the name, owner, and description of the workflow.

Delete Workflow

Deletes the current workflow.

Note

This action deletes all objects within the workflow. Proceed with caution.

If an object cannot be deleted, you can choose one of the following strategies:

  • Terminate the Delete Operation: This is the default option. If the system fails to delete an object, the operation is aborted. Objects deleted before the failure are not restored.

  • Skip Current Object and Continue to Delete Other Objects: If the system fails to delete an object, it skips that object and continues deleting the rest.

删除业务流程

Batch operation

Batch-modify properties—such as owner, engine instance, and scheduling attributes—for multiple nodes, resources, and functions. You can also commit and deploy these changes to the production environment.

DataStudio node editor features

After you create a workflow, you can create different types of DataStudio nodes based on your development requirements. For more information, see Create a development node. Different types of nodes provide similar features. This topic uses an ODPS SQL node as an example to describe the features of the node editor.节点界面功能

Area

Description

1

The toolbar for node development:

  • Save (保存): Saves the code and configurations of the current node.

  • Save as Ad-hoc Query (另存为临时查询文件): Saves the current code as an ad hoc query file. You can go to the Ad Hoc Query page to view the file. For more information, see Create an ad hoc query.

  • Commit (提交): Commits the current node.

  • Commit and Unlock (提交并允许他人编辑该文件): Commits the current node and unlocks it for other users to edit.

  • Steal lock (偷锁编辑): Lets a user other than the node owner edit the node.

  • Run (运行): Runs the code of the current node. When you run SQL code, you only need to assign values to SQL variables once. These initial assignments are retained even if the node's code changes.

    Note

    If you do not select a resource group for scheduling for the node, the system prompts you to select one when you run the task.

  • Run with Parameters (Run with Parameters): Runs the code for the current node with configured parameters. When you use Running, you must manually assign values to the variables in the SQL statement for each run. The initial values assigned by using Running are passed to Advanced Run. After custom parameter values are assigned by using Run with Parameters, the custom parameters for the current run are updated.

    Note

    If you do not select a resource group for scheduling for the node, the system prompts you to select one when you run the task.

  • Stop (停止运行): Stops the running node.

  • Reload (重新加载): Refreshes the node page and restores the last saved version of the page.

  • Perform smoke testing in the development environment (在开发环境执行冒烟测试): Tests the current node's code in the development environment. This test simulates how scheduling parameters are replaced during scheduling in the production environment. After you select a business date, the system replaces the parameters with the values that correspond to that date.

    Note

    After you change the scheduling properties for smoke testing, you must save and commit the changes before running another smoke test. Otherwise, the test uses the old property values.

  • View the smoke testing log in the development environment (查看开发环境的冒烟测试日志): Views the run log of the node in the development environment.

  • Go to the scheduling system in the development environment (前往开发环境的调度系统): Navigates to Operation Center for the development environment to perform O&M tasks. For more information, see View auto triggered instances.

  • Format (格式化): Formats the code of the current node. This feature is useful for long lines of code.

  • Share (分享): Shares the current node with other users.

2

Scheduling Settings:

  • General properties: View the node's name, ID, and type, and configure its basic information, such as the owner and description.

  • Parameters: Configure scheduling parameters to assign values dynamically.

  • Time properties: Define the time-related properties for the node in the scheduling environment after the node is deployed to the production environment. You can specify how auto triggered instances are generated, their scheduling cycle and execution time, whether to allow reruns, and a timeout period.

  • Resource Group: Configure the resource group for scheduling the node.

  • Scheduling Dependency: Configure dependencies between upstream and downstream nodes. For more information, see Configure same-cycle scheduling dependencies and Configure cross-cycle scheduling dependencies.

  • Input and output parameters: Pass parameters between upstream and downstream nodes, for example, to pass query results from an upstream node to a downstream one.

Lineage: Displays the dependencies and internal lineage between the current node and other nodes.

Versions: A new version is created each time you commit or deploy a node. This panel displays the node's version history, the committer, commit time, change type, status, and remarks. The version statuses are as follows:

  • Submitted: The node has been committed to the development environment and is awaiting deployment.

  • Published: The node has been deployed to the production environment. You can view the auto triggered task in Operation Center. For more information, see Manage auto triggered tasks.

  • Intermediate Version: If a node is committed but not deployed, and then committed again, the previously committed version becomes an intermediate version.

  • The deployment is cancelled.: After you commit a node, if you then cancel its deployment, the version's status changes to deployment canceled.

Structure: Visualizes the code's structure based on its SQL operators.

3

SQL editor: Write SQL statements for your task in this editor.

  • Click the 跳转至首行 icon to jump to the first line of the SQL statement.

  • Click the 全屏展示 icon to display the SQL editor in full-screen mode.

  • Click the 快捷运行 icon to quickly run a selected code snippet and verify that it is correct. For more information, see Debug a code snippet: Quick run

    Note

    This icon appears only when you click within a line of code.

4

Deployment and O&M operations:

  • Deploy: Go to the task deployment page to view deployment details. After deployment, you can also perform O&M in the production environment from this page.

  • O&M: Go to Operation Center to perform O&M for the node.

Data development node shortcut menu

Hover over the target development node and right-click to display the shortcut menu for the node. The related features are shown in the following figure.Node editing shortcut operations

Feature

Description

Rename

Changes the name of the target node.

Add to Favorites

Adds the target node to your favorites. You can view your favorite nodes by clicking My Favorites in the upper-right corner of the directory tree. To remove a node from your favorites, click Remove from Favorites in its shortcut menu.

Move

Moves the target node to another workflow directory.

Clone

Creates a copy of the target node with the same node type, owner, and resource properties.

Note

In the same workflow directory, the original and cloned nodes cannot have the same name.

View version history

Opens the Versions panel, which displays the node's version history. The history includes details such as the committer, commit time, change type, status, and remarks.

View in Operation Center

Opens Operation Center to display the node's runtime information. If a node in a standard workspace is committed to both the development and production environments, you can view its runtime status for either environment in Operation Center.

Initiate code review

Submits the node's code for a code review. The node must pass this review before it can be deployed.

Delete

Deletes the target node and unlinks it from any upstream or downstream nodes. If a deleted node has been deployed to the production environment, you must use the Task Deployment page to take the node offline. For details, see Undeploy a task.