DataStudio lets you manage resources for your MaxCompute projects. You can upload a file from your local machine or Object Storage Service (OSS) to create a resource, and then register it as a function for use in data development nodes. This topic describes how to create and manage different types of MaxCompute resources and functions.
Prerequisites
-
You have bound a MaxCompute compute resource.
-
You have prepared the required resource files. You can upload these files from your local machine or retrieve them from OSS. If you upload files from OSS to create resources, the following conditions must be met:
-
You have activated OSS, created a bucket, and stored the required resource files in the OSS bucket. For more information, see Create a bucket and Simple upload.
NoteFor information about supported resource files, see Resource description.
-
The Alibaba Cloud account used to upload the files must have permissions to access the target bucket. To prevent permission errors, grant the necessary permissions to the account before you upload the files.
-
Access resource management
Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose in the Actions column.
-
In the left-side navigation pane, click the Resource Management icon
to go to the Resource Management page. -
On the Resource Management page, click the
icon to create a resource or function. Alternatively, you can first create a directory to organize your resources by clicking Create Directory. Then, right-click the target directory, select Create, and choose the type of resource or function that you want to create.
Create and manage resources
Resource overview
Resources are the foundation for implementing UDF or MapReduce features in MaxCompute. In DataStudio, you can use the visual interface to upload resources that are stored on your local machine or in OSS. MaxCompute can read and use these resources when executing user-defined functions (UDFs) and MapReduce. The following MaxCompute resource types are supported.
Uploading resources to MaxCompute by using DataWorks incurs MaxCompute storage fees.
|
Type |
Description |
|
Python |
A Python script used to register a Python UDF. The file extension must be |
|
JAR |
A compiled Java Archive (JAR) package used to run a Java program. The file extension must be |
|
Archive |
A compressed file. Supported formats include |
|
File |
When you create a resource of the |
Limitations
The following limitations apply when you upload resources:
-
Resource size:
-
Online editing: The maximum size is 10 MB for a Python resource and 500 KB for a File resource.
-
Upload from a local file: The maximum size for a single resource is 500 MB.
-
Upload from an OSS file: The maximum size for a single resource is 500 MB.
-
-
Resource publishing: If you use a standard mode workspace, you must publish the resource to the production environment before you can use it.
NoteThe data source configuration may differ between the development environment and the production environment. Before you query tables or use resources, verify the data source configuration for the current environment.
-
Resource management: In DataWorks, you can view and manage only the resources that are uploaded through the DataWorks UI.
Create a resource
You can upload MaxCompute resources from your local machine or from OSS. After creating a resource, you can reference it directly in data development or create a function from it.
-
On the Resource Management page, create a resource, which opens the Create Resource and Function dialog box. Configure the resource Type, storage Path, and resource Name.
-
Upload a local or OSS file as the source. The following table describes the key parameters.
Parameter
Description
File Source
The source of the target file. Options include Local and OSS.
File Content
-
If you select Local, click Click Upload in the Document Content section to upload a local file.
-
If you select OSS, select the target OSS file from the Document Content drop-down list.
Data Source
Select the data source to which the MaxCompute resource belongs.
-
-
In the top toolbar, Save and then Publish the resource. Only published resources can be used in data development.
Use resources
After you create a resource, you can reference it when you edit a data development node. In the left-side navigation pane, click Resource Management. Find the target resource or function, right-click it, and select Insert Resource Path. This adds a reference statement in the format ##@resource_reference{"ResourceName"} to the node.
For example, a PyODPS 3 node displays the reference as ##@resource_reference{"example.py"}. The format of the reference varies depending on the node type. The actual display takes precedence.
In addition to using a resource directly, you can also create a function from the resource and then use the function in your development nodes.
Manage resources
In DataWorks, you can view and manage only the resources that are uploaded by using the DataWorks UI. On the Resource Management page, click a resource to perform management operations.
-
View historical versions: You can view and compare published versions of a resource to track changes between versions.
NoteTo compare versions, you must select at least two versions.
-
Delete a resource: This operation deletes the resource only from the development environment. To delete the resource from the production environment, you must publish the deletion task. After the task is published, the resource is deleted from the production environment. For more information, see Publish a task.
-
View other resources.
MaxCompute may contain resources uploaded by methods other than DataWorks. You can view these resources in the following ways:
-
Use the data catalog to view all resources in a MaxCompute project.
After you add a MaxCompute project to the data catalog, you can browse to the project's folder in the catalog and view all of its resources in the resource directory.
-
Use a MaxCompute SQL node to view other resources in the MaxCompute project.
-
To view all resources in the current project, run the following command in a new MaxCompute SQL script. By default, the command accesses the MaxCompute compute resource that is bound to the development environment.
list resources; -
To view all resources in a specific project, run the following commands:
use <maxcompute_project_name>; list resources;
For more information about resource commands, see Resource operations.
-
-
Create and manage functions
Before creating a function, ensure you have created a resource.
When you prepare a MaxCompute resource file, you can refer to UDF Development (Java) and UDF Development (Python 3).
Function overview
In DataStudio, you can use Resource and Function Management to register resources as functions. In data development or SQL queries, you can use Create a function to convert uploaded and published resources into functions, create embedded functions by using JAVA, PYTHON2, or PYTHON3, or directly use MaxCompute built-in functions.
Create a function
-
On the Resource Management page, create a function, which opens the Create Resource and Function dialog box. Configure the function Type, storage Path, and function Name.
-
Configure the settings for the function.
Before you configure a MaxCompute function, make sure the MaxCompute project is registered as a compute resource in DataWorks and the required MaxCompute resources have been uploaded. The following table describes the key parameters for a MaxCompute function.
Parameter
Description
Function type
The type of the function. Valid values include MATH (mathematical), AGGREGATE (aggregate), STRING (string), DATE (date), ANALYTIC (window), and OTHER.
Class Name
The class name of the UDF, in
resource_name.class_nameformat. The resource name can be a Java package name or a Python resource name.DataWorks supports creating UDFs from JAR and Python resources. The format of the class name varies based on the resource type:
-
If the resource is a JAR file, the Class Name format is
JavaPackageName.ActualClassName. You can get this value fromIntelliJ IDEAby using theCopy Referenceaction.For example, if the Java package name is
com.aliyun.odps.examples.udfand the actual class name isUDAFExample, set the Class Name parameter tocom.aliyun.odps.examples.udf.UDAFExample. -
If the resource is a Python file, the Class Name format is
PythonResourceName.ActualClassName.For example, if the Python resource name is
LcLognormDist_shand the actual class name isLcLognormDist_sh, set the Class Name parameter toLcLognormDist_sh.LcLognormDist_sh.
Note-
Do not include the
.jaror.pyfile extension in the resource name. -
You must submit and publish the resource before you can use it.
Type
You can select Resource function or Embedded function:
-
If you select Resource function, you only need to configure the Resource List.
-
If you select Embedded Function, you must also specify the Language (
JAVA,PYTHON2, andPYTHON3are supported) and provide the Code in addition to the Resource List.
Resource List
Select the resources required to register the function.
-
Visual mode: You can select only from resources that are already uploaded to or created in DataWorks.
-
Script mode: You can enter any resource that exists in the data source. If the UDF uses multiple resources, separate the resource names with a comma (,).
Note-
You do not need to enter the full path for the added resources.
-
You can use script mode to manually specify resources that cannot be uploaded through the visual interface, such as table resources or those managed outside of DataWorks.
Command Format
A usage example for the UDF.
-
-
In the top toolbar, Save and then Publish the function. Only a published function can be used in data development.
Use functions
Use user-defined functions
After creating and publishing a function, you can reference it directly in data development or in a SQL query.
-
When you edit a data development node, click Resource Management in the left-side navigation pane. Find the target function, right-click it, and select Insert Function.
The function name, such as
example_function(), is automatically inserted into the editor. -
When you edit a SQL query, you can use the created function directly in your SQL statement.
SELECT example_function(column_name) FROM table;
Use built-in functions
DataWorks supports two types of functions: user-defined functions and MaxCompute built-in functions. You can view built-in functions by category or in alphabetical order.
-
Notes: For information about using built-in functions, see Notes.
-
Limitations: For information about the limits on built-in functions, see JSON function limits and Built-in function overview.
-
You can view built-in functions in one of the following three ways:
-
Run the following command in a MaxCompute SQL node:
show builtin functions [<function_name>]; --<function_name> specifies the name of a built-in function.Note-
<function_name>is a placeholder. Replace it with the name of a specific built-in function. -
If you run the
show builtin functions;command by using the MaxCompute client (odpscmd), the client version must be 0.43.0 or later.
-
-
For typical use cases of built-in functions, see the following topics:
-
To quickly troubleshoot issues with built-in functions, see the following topics:
Manage functions
On the Resource Management page, click a function to perform management operations.
-
View historical versions: On the right side of the function editor, click the versions icon. You can view and compare saved or submitted versions of a function to track changes.
NoteTo compare versions, you must select at least two versions.
-
Delete a function: Right-click the target function and select Delete.
To delete the function from the production environment, you must publish the deletion. The function is deleted from the production environment after the publication is complete. For more information, see Publish a task.
View user-defined functions
// View the functions in the MaxCompute project that is bound to the DataWorks workspace.
SHOW FUNCTIONS;
View user-defined function details
-
Use the
DESCRIBEcommand or its abbreviationDESC, followed by the function name, to view the details of a UDF.// Use the abbreviated form to view the details of a user-defined function. DESC FUNCTION <function_name>; -
If existing functions in DataWorks do not meet your business requirements, you can create a MaxCompute UDF. Then, you can upload and associate resources, such as JAR packages or Python files, to extend your data processing capabilities.
FAQ
Q: I uploaded a resource by using DataWorks and defined it as a UDF. Can I use this UDF in DataAnalysis SQL queries in addition to using it in MaxCompute SQL nodes in data development?
A: Yes. UDFs registered in DataWorks are stored in the underlying MaxCompute project. Therefore, you can use them in both MaxCompute SQL nodes and the SQL Query (legacy version) feature in DataAnalysis.