MaxCompute resources and functions

更新时间:
复制 MD 格式

DataStudio lets you manage resources for your MaxCompute projects. You can upload a file from your local machine or Object Storage Service (OSS) to create a resource, and then register it as a function for use in data development nodes. This topic describes how to create and manage different types of MaxCompute resources and functions.

Prerequisites

  • You have bound a MaxCompute compute resource.

  • You have prepared the required resource files. You can upload these files from your local machine or retrieve them from OSS. If you upload files from OSS to create resources, the following conditions must be met:

Access resource management

  1. Go to the Workspaces page in the DataWorks console. In the top navigation bar, select a desired region. Find the desired workspace and choose Shortcuts > Data Studio in the Actions column.

  2. In the left-side navigation pane, click the Resource Management icon image to go to the Resource Management page.

  3. On the Resource Management page, click the image icon to create a resource or function. Alternatively, you can first create a directory to organize your resources by clicking Create Directory. Then, right-click the target directory, select Create, and choose the type of resource or function that you want to create.

Create and manage resources

Resource overview

Resources are the foundation for implementing UDF or MapReduce features in MaxCompute. In DataStudio, you can use the visual interface to upload resources that are stored on your local machine or in OSS. MaxCompute can read and use these resources when executing user-defined functions (UDFs) and MapReduce. The following MaxCompute resource types are supported.

Important

Uploading resources to MaxCompute by using DataWorks incurs MaxCompute storage fees.

Type

Description

Python

A Python script used to register a Python UDF. The file extension must be .py.

JAR

A compiled Java Archive (JAR) package used to run a Java program. The file extension must be .jar.

Archive

A compressed file. Supported formats include .zip, .tgz, .tar.gz, .tar, and .jar. The compression type is identified by the file extension of the resource name.

File

When you create a resource of the File type, you can upload any file. The actual files supported depend on the specific engine.

Limitations

The following limitations apply when you upload resources:

  • Resource size:

    • Online editing: The maximum size is 10 MB for a Python resource and 500 KB for a File resource.

    • Upload from a local file: The maximum size for a single resource is 500 MB.

    • Upload from an OSS file: The maximum size for a single resource is 500 MB.

  • Resource publishing: If you use a standard mode workspace, you must publish the resource to the production environment before you can use it.

    Note

    The data source configuration may differ between the development environment and the production environment. Before you query tables or use resources, verify the data source configuration for the current environment.

  • Resource management: In DataWorks, you can view and manage only the resources that are uploaded through the DataWorks UI.

Create a resource

You can upload MaxCompute resources from your local machine or from OSS. After creating a resource, you can reference it directly in data development or create a function from it.

  1. On the Resource Management page, create a resource, which opens the Create Resource and Function dialog box. Configure the resource Type, storage Path, and resource Name.

  2. Upload a local or OSS file as the source. The following table describes the key parameters.

    Parameter

    Description

    File Source

    The source of the target file. Options include Local and OSS.

    File Content

    • If you select Local, click Click Upload in the Document Content section to upload a local file.

    • If you select OSS, select the target OSS file from the Document Content drop-down list.

    Data Source

    Select the data source to which the MaxCompute resource belongs.

  3. In the top toolbar, Save and then Publish the resource. Only published resources can be used in data development.

Use resources

After you create a resource, you can reference it when you edit a data development node. In the left-side navigation pane, click Resource Management. Find the target resource or function, right-click it, and select Insert Resource Path. This adds a reference statement in the format ##@resource_reference{"ResourceName"} to the node.

Note

For example, a PyODPS 3 node displays the reference as ##@resource_reference{"example.py"}. The format of the reference varies depending on the node type. The actual display takes precedence.

In addition to using a resource directly, you can also create a function from the resource and then use the function in your development nodes.

Manage resources

In DataWorks, you can view and manage only the resources that are uploaded by using the DataWorks UI. On the Resource Management page, click a resource to perform management operations.

  • View historical versions: You can view and compare published versions of a resource to track changes between versions.

    Note

    To compare versions, you must select at least two versions.

  • Delete a resource: This operation deletes the resource only from the development environment. To delete the resource from the production environment, you must publish the deletion task. After the task is published, the resource is deleted from the production environment. For more information, see Publish a task.

  • View other resources.

    MaxCompute may contain resources uploaded by methods other than DataWorks. You can view these resources in the following ways:

    • Use the data catalog to view all resources in a MaxCompute project.

      After you add a MaxCompute project to the data catalog, you can browse to the project's folder in the catalog and view all of its resources in the resource directory.

    • Use a MaxCompute SQL node to view other resources in the MaxCompute project.

      • To view all resources in the current project, run the following command in a new MaxCompute SQL script. By default, the command accesses the MaxCompute compute resource that is bound to the development environment.

        list resources;
      • To view all resources in a specific project, run the following commands:

        use <maxcompute_project_name>;
        list resources;

      For more information about resource commands, see Resource operations.

Create and manage functions

Before creating a function, ensure you have created a resource.

Note

When you prepare a MaxCompute resource file, you can refer to UDF Development (Java) and UDF Development (Python 3).

Function overview

In DataStudio, you can use Resource and Function Management to register resources as functions. In data development or SQL queries, you can use Create a function to convert uploaded and published resources into functions, create embedded functions by using JAVA, PYTHON2, or PYTHON3, or directly use MaxCompute built-in functions.

Create a function

  1. On the Resource Management page, create a function, which opens the Create Resource and Function dialog box. Configure the function Type, storage Path, and function Name.

  2. Configure the settings for the function.

    Before you configure a MaxCompute function, make sure the MaxCompute project is registered as a compute resource in DataWorks and the required MaxCompute resources have been uploaded. The following table describes the key parameters for a MaxCompute function.

    Parameter

    Description

    Function type

    The type of the function. Valid values include MATH (mathematical), AGGREGATE (aggregate), STRING (string), DATE (date), ANALYTIC (window), and OTHER.

    Class Name

    The class name of the UDF, in resource_name.class_name format. The resource name can be a Java package name or a Python resource name.

    DataWorks supports creating UDFs from JAR and Python resources. The format of the class name varies based on the resource type:

    • If the resource is a JAR file, the Class Name format is JavaPackageName.ActualClassName. You can get this value from IntelliJ IDEA by using the Copy Reference action.

      For example, if the Java package name is com.aliyun.odps.examples.udf and the actual class name is UDAFExample, set the Class Name parameter to com.aliyun.odps.examples.udf.UDAFExample.

    • If the resource is a Python file, the Class Name format is PythonResourceName.ActualClassName.

      For example, if the Python resource name is LcLognormDist_sh and the actual class name is LcLognormDist_sh, set the Class Name parameter to LcLognormDist_sh.LcLognormDist_sh.

    Note
    • Do not include the .jar or .py file extension in the resource name.

    • You must submit and publish the resource before you can use it.

    Type

    You can select Resource function or Embedded function:

    • If you select Resource function, you only need to configure the Resource List.

    • If you select Embedded Function, you must also specify the Language (JAVA, PYTHON2, and PYTHON3 are supported) and provide the Code in addition to the Resource List.

    Resource List

    Select the resources required to register the function.

    • Visual mode: You can select only from resources that are already uploaded to or created in DataWorks.

    • Script mode: You can enter any resource that exists in the data source. If the UDF uses multiple resources, separate the resource names with a comma (,).

    Note
    • You do not need to enter the full path for the added resources.

    • You can use script mode to manually specify resources that cannot be uploaded through the visual interface, such as table resources or those managed outside of DataWorks.

    Command Format

    A usage example for the UDF.

  3. In the top toolbar, Save and then Publish the function. Only a published function can be used in data development.

Use functions

Use user-defined functions

After creating and publishing a function, you can reference it directly in data development or in a SQL query.

  • When you edit a data development node, click Resource Management in the left-side navigation pane. Find the target function, right-click it, and select Insert Function.

    The function name, such as example_function(), is automatically inserted into the editor.

  • When you edit a SQL query, you can use the created function directly in your SQL statement.

SELECT example_function(column_name) FROM table;

Use built-in functions

DataWorks supports two types of functions: user-defined functions and MaxCompute built-in functions. You can view built-in functions by category or in alphabetical order.

Manage functions

On the Resource Management page, click a function to perform management operations.

  • View historical versions: On the right side of the function editor, click the versions icon. You can view and compare saved or submitted versions of a function to track changes.

    Note

    To compare versions, you must select at least two versions.

  • Delete a function: Right-click the target function and select Delete.

    To delete the function from the production environment, you must publish the deletion. The function is deleted from the production environment after the publication is complete. For more information, see Publish a task.

View user-defined functions

// View the functions in the MaxCompute project that is bound to the DataWorks workspace.
SHOW FUNCTIONS;

View user-defined function details

  • Use the DESCRIBE command or its abbreviation DESC, followed by the function name, to view the details of a UDF.

    // Use the abbreviated form to view the details of a user-defined function.
    DESC FUNCTION <function_name>;
  • If existing functions in DataWorks do not meet your business requirements, you can create a MaxCompute UDF. Then, you can upload and associate resources, such as JAR packages or Python files, to extend your data processing capabilities.

FAQ

Q: I uploaded a resource by using DataWorks and defined it as a UDF. Can I use this UDF in DataAnalysis SQL queries in addition to using it in MaxCompute SQL nodes in data development?

A: Yes. UDFs registered in DataWorks are stored in the underlying MaxCompute project. Therefore, you can use them in both MaxCompute SQL nodes and the SQL Query (legacy version) feature in DataAnalysis.