If MaxCompute's built-in functions do not meet your business requirements, you can create a user-defined function (UDF) to extend its functionality. This topic describes how to create a UDF in DataWorks using the visual interface.
Background
A user-defined function (UDF) extends the existing function library. It enables you to define custom logic for queries and enhance data processing. For more information, see UDF overview. In addition to creating UDFs using the visual interface in DataWorks, you can also use MaxCompute Studio or the command line. For more information, see Create a UDF in MaxCompute Studio and Create a UDF in MaxCompute using commands.
Prerequisites
Before creating a function, you must create and upload a MaxCompute resource to DataWorks. For more information, see Create a MaxCompute resource.
To prepare your resource file, see Develop a UDF (Java) or Develop a UDF (Python 3).
Limitations
In DataWorks, you can only view and manage UDFs uploaded through its visual interface. If a UDF is added to the MaxCompute compute engine using another tool, such as MaxCompute Studio, you must manually load the UDF to DataWorks using the MaxCompute Functions feature. Then, you can view and manage the UDF in DataWorks. For more information, see Manage MaxCompute functions.
Register a function
Log on to the DataWorks console. In the target region, click in the left-side navigation pane. Select a workspace from the drop-down list and click Go to Data Development.
-
Create a workflow. For more information, see Create a recurring workflow.
-
Create a function.
-
Open the workflow, right-click MaxCompute, and select Create Function.
-
In the Create Function dialog box, enter a Name and select a Path.
-
Click Create.
-
In the Register Function dialog box, configure the parameters.
Parameter
Description
Function Type
Select the function type. Valid values: Mathematical Operation Functions, Aggregate Functions, String Processing Functions, Date Functions, Window Functions, and Other Functions. For more information, see Use built-in functions.
MaxCompute Engine Instance
Read-only.
Function Name
The name of the UDF, which is used to reference the function in SQL statements. The name must be globally unique and cannot be modified after the function is registered.
Owner
By default, the currently logged-in account is displayed. You can also select a different account.
Class Name
The class name of the UDF, in the format
resource_name.class_name. The resource name can be a Java package name or a Python resource name.When you create a UDF in DataWorks, you can use MaxCompute resources of the JAR or Python type. The class name configuration varies by resource type:
-
If the resource type is JAR, the Class Name is in the format
package_name.actual_class_name. You can obtain this value in IntelliJ IDEA using thecopy referencecommand.For example, if the Java package name is
com.aliyun.odps.examples.udfand the actual class name isUDAFExample, set the Class Name parameter tocom.aliyun.odps.examples.udf.UDAFExample. -
If the resource type is Python, the Class Name is in the format
python_resource_name.actual_class_name.For example, if the Python resource name is
LcLognormDist_shand the actual class name isLcLognormDist_sh, set the Class Name parameter toLcLognormDist_sh.LcLognormDist_sh.Note-
Do not add the .jar or .py suffix to the resource name.
-
The resource must be committed and published before it can be used. For more information, see Create and use MaxCompute resources.
-
Resources
Select the resources required to register the function.
-
Visual mode: You can select only resources that have been uploaded or added to DataWorks.
-
Code editor: You can select all resources in the corresponding MaxCompute compute engine.
Note-
You do not need to enter the paths of the added resources.
-
If the UDF calls multiple resources, separate the resource names with a comma (,).
Description
A brief description of the UDF's purpose.
Expression Syntax
An example of how to use the UDF. For example,
test.Parameter Description
A description of the supported input parameter types and the return value type.
Return Value
An example of the return value, such as 1. This parameter is optional.
Example
A usage example for the function. This parameter is optional.
-
-
-
Click the Save icon
in the toolbar to save the function. -
Commit the function.
-
Click the Submit icon
in the toolbar. -
In the Submission dialog box, enter a Change Description.
-
Click OK.
-
For more information about how to view existing functions in the MaxCompute compute engine and their change history, see Manage MaxCompute functions.
View and roll back function versions
Right-click a function name and click View Earlier Versions to view its version history or roll it back.
Use a function in a node
You can use the name of a user-defined function directly. To quickly insert the function name into the editor for the current node, you can select a resource and right-click Insert Function. In the project file tree on the left, under the Function node, right-click the target function, such as getregion, and select Reference Function from the context menu to insert the function into the SQL editor on the right. For example, use getregion(ip) AS region in a SELECT statement to call the UDF.
Appendix 1: View user-defined functions
-
Run the
SHOW FUNCTIONScommand to list all registered UDFs in the MaxCompute project bound to the DataStudio component of your DataWorks workspace. -
MaxCompute provides many built-in functions. For more information, see Overview of built-in functions.
// View the functions in the current project.
SHOW FUNCTIONS;
Appendix 2: View function details
-
Run the
DESCRIBEorDESCcommand with a function name to view the details of a UDF.// Use the short form to view the details of a UDF. DESC FUNCTION <function_name>; -
If existing functions in a workflow do not meet your requirements, write a MaxCompute UDF. Then, upload and associate its resources, such as JAR or Python files, to extend your data processing capabilities. For more information, see Manage MaxCompute resources.
Best practices
After you create a UDF, see Grant access to a specific UDF to a specified user to implement access control for the UDF.
Related documents
-
MaxCompute supports one-click packaging of JAR files, uploading of JAR resources, and registration of MaxCompute UDFs. For more information, see Package, upload, and register.
-
For information about issues that might occur when you write MaxCompute UDFs in Java, see FAQ about MaxCompute UDFs written in Java.
-
For information about issues that might occur when you write MaxCompute UDFs in Python, see FAQ about MaxCompute UDFs written in Python.
FAQ
Q: After I upload a resource and define it as a UDF in DataWorks, can I use it in SQL queries in DataAnalysis in addition to ODPS SQL nodes in DataStudio?
A: Yes. UDFs registered in DataWorks are stored in the underlying MaxCompute project, making them available to both ODPS SQL nodes and SQL queries in DataAnalysis.