CDH resources and functions

更新时间:
复制 MD 格式

This topic explains how to use Resource Management to create different types of CDH resources and functions.

Prerequisites

  • You have registered a CDH cluster with DataWorks. All resource and function operations use CDH compute resources.

  • Your resource files have been developed and are ready to be uploaded from your local machine.

Access resource management

  1. Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose Shortcuts > Data Studio in the Actions column.

  2. In the left-side navigation pane, click the Resource Management icon image to go to the Resource Management page.

  3. On the Resource Management page, click the image icon to create a resource or function. Alternatively, you can first create a directory to organize your resources by clicking Create Directory. Then, right-click the target directory, select Create, and choose the type of resource or function that you want to create.

Create and use resources

Resources

In Data Studio, you can upload local resources to a CDH cluster through DataWorks. The following table lists the supported resource types for developing CDH jobs or creating custom functions.

Resource type

Description

Supported upload methods

Local

OSS

CDH Jar

A compiled Java JAR package used to run Java programs. The file extension is .jar.

image

image

CDH File

You can upload any file type as a CDH File resource. Its use depends on the compute engine.

Limitations

The following limitations apply when you upload resources:

  • Resource size:

  • Resource publishing: If you use a standard mode workspace, you must publish the resource to the production environment before you can use it.

    Note

    Data source configurations may differ between the development and production environments. Before you query tables or use resources, confirm the data source configuration for the current environment.

  • Resource management: In DataWorks, you can view and manage only the resources that are uploaded through the DataWorks UI.

Create a resource

You can upload CDH resources from your local machine. After you create a resource, you can reference it directly in data development or register it as a custom function.

  1. On the Resource Management page, create a resource, which opens the Create Resource and Function dialog box. Configure the resource Type, storage Path, and resource Name.

  2. After you create the resource entry, you must upload a local file. The following table describes the key upload parameters:

    Parameter

    Description

    Storage Path

    The default path is /user/admin/lib.

    Note

    If Kerberos authentication is enabled, you must first grant the current user write permissions to this directory.

    Data Sources

    Select an existing CDH data source.

    Resource Group

    Select a Serverless resource group that can connect to the CDH cluster.

  3. In the top toolbar, Save and then Publish the resource. You can use only published resources in data development.

Use a resource

After you create a resource, you can reference it during data development. In the left-side navigation pane, click Resource Management, find the target resource or function, right-click it, and select Insert Resource Path. This action inserts a code snippet in the format ##@resource_reference{"Resource Name"} into your editor.

Note

For example, in a CDH Hive node, the reference might look like ##@resource_reference{"example"}. The format may vary between different node types. Refer to the actual UI for the correct format.

In addition to using resources directly, you can also create a function from a resource and then use the function in your development nodes.

Create and use functions

Functions

Data Studio allows you to register resources as CDH functions. In data development or SQL queries, you can use both the built-in functions provided by Hive and the custom functions that you create.

Create a function

  1. On the Resource Management page, create a function, which opens the Create Resource and Function dialog box. Configure the function Type, storage Path, and function Name.

  2. Click Confirm to create the function. Then, configure the function details based on its type.

    Before you configure a CDH function, ensure that you have registered the CDH cluster as a compute resource in DataWorks and have uploaded the required CDH resource. The following table describes the key parameters for a CDH function.

    Parameter

    Description

    Function type

    Select a function type: MATH (mathematical), AGGREGATE (aggregation), STRING (string manipulation), DATE (date), ANALYTIC (analytic), or OTHER (other).

    Data Sources

    Select an existing CDH data source from the drop-down list.

    Class Name

    • The class name for the user-defined function (UDF), in the format ResourceName.ClassName. The resource name can be a Java package name or a file resource name.

    • When you create a custom function in DataWorks, you can use either JAR or File type CDH resources. If the resource type is JAR, the Class Name format is PackageName.ActualClassName. You can obtain this value from IntelliJ IDEA by using the Copy Reference command. For example, if the package name is com.aliyun.cdh.examples.udf and the actual class name is UDAFExample, set the Class Name parameter to com.aliyun.cdh.examples.udf.UDAFExample.

    Note
    • Do not include the .jar suffix when you enter the resource name.

    • You must publish the resource before you can use it.

    Resource List

    For a CDH function, only visual mode is supported. This requires selecting a CDH Jar or CDH File resource.

    Command Format

    A usage example for the UDF.

  3. In the top toolbar, Save and then Publish the function. You can use only published functions in data development.

Use a function

After you create and publish a function, you can reference it directly in data development or SQL queries.

  • When you edit a data development node, click Resource Management in the left-side navigation pane. Find the target function, right-click it, and select Insert Function.

    The function name, such as example_function(), is automatically inserted into the editor.

  • When you edit a SQL query, you can use the created function directly in your SQL statement.

SELECT example_function(column_name) FROM table;

Manage resources and functions

After you create resources and functions, you can manage them from the Resource Management page. Click a resource or function to open it in the editor.

  • View historical versions: In the right-side pane of the editor, click the versions icon. You can view and compare saved or submitted versions to track changes.

    Note

    You must select at least two versions to run a comparison.

  • Delete a resource or function: In the Resource Management pane, right-click the target item and select Delete.

    To delete a resource or function from the production environment, you must publish this change. After the publishing task is complete, the item is deleted from the production environment.