Create and use a user-defined function

更新时间:
复制 MD 格式

If MaxCompute's built-in functions do not meet your business requirements, you can create a user-defined function (UDF) to extend its functionality. This topic describes how to create a UDF in DataWorks using the visual interface.

Background

A user-defined function (UDF) extends the existing function library. It enables you to define custom logic for queries and enhance data processing. For more information, see UDF overview. In addition to creating UDFs using the visual interface in DataWorks, you can also use MaxCompute Studio or the command line. For more information, see Create a UDF in MaxCompute Studio and Create a UDF in MaxCompute using commands.

Prerequisites

Before creating a function, you must create and upload a MaxCompute resource to DataWorks. For more information, see Create a MaxCompute resource.

Note

To prepare your resource file, see Develop a UDF (Java) or Develop a UDF (Python 3).

Limitations

In DataWorks, you can only view and manage UDFs uploaded through its visual interface. If a UDF is added to the MaxCompute compute engine using another tool, such as MaxCompute Studio, you must manually load the UDF to DataWorks using the MaxCompute Functions feature. Then, you can view and manage the UDF in DataWorks. For more information, see Manage MaxCompute functions.

Register a function

  1. Log on to the DataWorks console. In the target region, click Data Development and O&M > Data Development in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Development.

  2. Create a workflow. For more information, see Create a recurring workflow.

  3. Create a function.

    1. Open the workflow, right-click MaxCompute, and select Create Function.

    2. In the Create Function dialog box, enter a Name and select a Path.

    3. Click Create.

    4. In the Register Function dialog box, configure the parameters.

      Parameter

      Description

      Function Type

      Select the function type. Valid values: Mathematical Operation Functions, Aggregate Functions, String Processing Functions, Date Functions, Window Functions, and Other Functions. For more information, see Use built-in functions.

      MaxCompute Engine Instance

      Read-only.

      Function Name

      The name of the UDF, which is used to reference the function in SQL statements. The name must be globally unique and cannot be modified after the function is registered.

      Owner

      By default, the currently logged-in account is displayed. You can also select a different account.

      Class Name

      The class name of the UDF, in the format resource_name.class_name. The resource name can be a Java package name or a Python resource name.

      When you create a UDF in DataWorks, you can use MaxCompute resources of the JAR or Python type. The class name configuration varies by resource type:

      • If the resource type is JAR, the Class Name is in the format package_name.actual_class_name. You can obtain this value in IntelliJ IDEA using the copy reference command.

        For example, if the Java package name is com.aliyun.odps.examples.udf and the actual class name is UDAFExample, set the Class Name parameter to com.aliyun.odps.examples.udf.UDAFExample.

      • If the resource type is Python, the Class Name is in the format python_resource_name.actual_class_name.

        For example, if the Python resource name is LcLognormDist_sh and the actual class name is LcLognormDist_sh, set the Class Name parameter to LcLognormDist_sh.LcLognormDist_sh.

        Note
        • Do not add the .jar or .py suffix to the resource name.

        • The resource must be committed and published before it can be used. For more information, see Create and use MaxCompute resources.

      Resources

      Select the resources required to register the function.

      • Visual mode: You can select only resources that have been uploaded or added to DataWorks.

      • Code editor: You can select all resources in the corresponding MaxCompute compute engine.

      Note
      • You do not need to enter the paths of the added resources.

      • If the UDF calls multiple resources, separate the resource names with a comma (,).

      Description

      A brief description of the UDF's purpose.

      Expression Syntax

      An example of how to use the UDF. For example, test.

      Parameter Description

      A description of the supported input parameter types and the return value type.

      Return Value

      An example of the return value, such as 1. This parameter is optional.

      Example

      A usage example for the function. This parameter is optional.

  4. Click the Save icon 保存 in the toolbar to save the function.

  5. Commit the function.

    1. Click the Submit icon 提交 in the toolbar.

    2. In the Submission dialog box, enter a Change Description.

    3. Click OK.

For more information about how to view existing functions in the MaxCompute compute engine and their change history, see Manage MaxCompute functions.

View and roll back function versions

Right-click a function name and click View Earlier Versions to view its version history or roll it back.

The version history panel lists the commit time, change description, and submitter for each version. To restore a previous version, select it and click Roll Back.

Use a function in a node

You can use the name of a user-defined function directly. To quickly insert the function name into the editor for the current node, you can select a resource and right-click Insert Function. In the project file tree on the left, under the Function node, right-click the target function, such as getregion, and select Reference Function from the context menu to insert the function into the SQL editor on the right. For example, use getregion(ip) AS region in a SELECT statement to call the UDF.

Appendix 1: View user-defined functions

  • Run the SHOW FUNCTIONS command to list all registered UDFs in the MaxCompute project bound to the DataStudio component of your DataWorks workspace.

  • MaxCompute provides many built-in functions. For more information, see Overview of built-in functions.

// View the functions in the current project.
SHOW FUNCTIONS;

Appendix 2: View function details

  • Run the DESCRIBE or DESC command with a function name to view the details of a UDF.

    // Use the short form to view the details of a UDF.
    DESC FUNCTION <function_name>;
  • If existing functions in a workflow do not meet your requirements, write a MaxCompute UDF. Then, upload and associate its resources, such as JAR or Python files, to extend your data processing capabilities. For more information, see Manage MaxCompute resources.

Best practices

After you create a UDF, see Grant access to a specific UDF to a specified user to implement access control for the UDF.

Related documents

FAQ

Q: After I upload a resource and define it as a UDF in DataWorks, can I use it in SQL queries in DataAnalysis in addition to ODPS SQL nodes in DataStudio?

A: Yes. UDFs registered in DataWorks are stored in the underlying MaxCompute project, making them available to both ODPS SQL nodes and SQL queries in DataAnalysis.