Serverless Kyuubi node

更新时间:
复制 MD 格式

Use the Serverless Kyuubi node in DataWorks to develop and periodically schedule Kyuubi tasks on an EMR Serverless Spark compute resource and integrate them with other jobs.

Prerequisites

  • Computing resource limitations: You can only attach EMR Serverless Spark computing resources. Ensure that network connectivity is available between the resource group and the computing resources.

    Resource group limitation: You can only use a Serverless resource group to run this type of task.

  • (Optional) If you use a RAM user to develop tasks, the RAM user must be added to the corresponding workspace and granted the Development or Workspace Administrator role. The workspace administrator role has extensive permissions, so grant it with caution. For more information about how to add members to a workspace, see Add members to a workspace.

    If you are using an Alibaba Cloud account, you can skip this step.

Create a node

See Create a node for instructions.

Develop the node

Develop your task code in the SQL editor. You can define variables in the format ${variable_name}. Then, in the Scheduling Parameters section of the Scheduling Settings panel, assign a value to the variable. This enables dynamic parameter passing when the scheduler runs the node. For more information about scheduling parameters, see Sources and expressions of scheduling parameters. The following code provides an example.

SHOW TABLES;
SELECT * FROM kyuubi040702 WHERE age >= '${a}'; -- You can use scheduling parameters.
Note

The maximum size of an SQL statement is 130 KB.

Debug the node

  1. In the Run Configuration panel, configure parameters such as Compute Resource and Resource Group.

    Parameter

    Description

    Compute Resource

    Select a bound EMR Serverless Spark compute resource. Before you proceed, you must bind an EMR Serverless Spark compute resource to your workspace. If no compute resources are available, you can select Create Compute Resource from the drop-down list.

    Resource Group

    Select a resource group that is bound to the workspace.

    Script Parameters

    When you configure the content of a node, you can define variables by using the format ${ParameterName}. You must configure the Parameter name and Parameter Value in the Script Parameters section. The variable is dynamically replaced with its actual value at runtime. For more information, see Source and expressions of scheduling parameters.

    ServerlessSpark Node Parameters

    Specifies native Spark properties. For more information, see Open-source Spark properties and List of custom Spark Conf parameters. Use the following format: spark.eventLog.enabled : false .

    Note

    DataWorks allows you to set global Spark parameters at the workspace level for DataWorks modules. You can specify whether these global Spark parameters take precedence over module-specific Spark parameters. For more information, see Set global Spark parameters.

  2. On the toolbar at the top of the node editor, click Run to run the task.

    Important

    Before you publish the node, synchronize the ServerlessSpark Node Parameters from the Run Configuration panel to the ServerlessSpark Node Parameters in the Scheduling Settings panel.

Next steps

  • Schedule a node: Set Scheduling Policies in the Scheduling section to run the node on a recurring schedule.

  • Publish a node: Click the image icon to publish the node. A node runs on schedule only after it is published to the production environment.

  • Node O&M: After publishing, monitor scheduled task status in Operation Center. See Get started with Operation Center.