Create and use MaxCompute resources

更新时间:
复制 MD 格式

If your code or function needs to use a MaxCompute resource, you must first create or upload the resource to the target workspace. You can then use the resource in tasks within that workspace. You can upload a resource by using MaxCompute SQL commands or the visual tools in DataWorks. This topic describes how to use the visual tools in DataWorks to create a resource, use the resource in a node, and register a function that is based on the resource.

Overview

You can use MaxCompute resources to implement user-defined functions (UDFs) or MapReduce jobs. DataWorks provides a visual interface for you to upload resource packages that you developed locally or stored in OSS. You can also create resources directly in DataWorks. These resources can then be accessed when UDFs and MapReduce jobs run. The following table describes the resource types that you can create in DataWorks.

Resource type

Description

Creation methods

Python

Stores Python code that is used to register Python UDFs. The file extension must be .py.

Create in the online editor

JAR

A compiled Java Archive (JAR) package that is used to run Java programs. The file extension must be .jar.

  • Upload a local resource

  • Upload an OSS resource

Archive

You can upload compressed files such as .zip, .tgz, .tar.gz, and .tar files as an archive resource. The file extension identifies the compression type.

  • Upload a local resource

  • Upload an OSS resource

File

You can upload .zip, .so, and .jar files as a file resource.

  • Upload a local resource

  • Upload an OSS resource

  • Create in the online editor

For more information about how to manage resources, see Manage resources, Manage resources by using commands, and Add external resources to DataWorks for management.

Limits

  • Resource size

    • Online editing: The maximum size is 10 MB for a Python resource and 500 KB for a file resource.

    • Upload a local file: The maximum size of a resource that you can upload is 500 MB.

    • Upload an OSS file: The maximum size of a resource that you can upload is 500 MB.

  • Resource deployment

    If your workspace is in standard mode, you must deploy the resource to the production environment for it to take effect.

    Note

    The development environment and production environment use different compute engines. Before you query tables or resources, make sure you are aware of the compute engine information for the specific environment. To learn how to view the MaxCompute compute engine for each environment, see Data Studio (legacy version): Associate a MaxCompute compute engine.

  • Resource management

    In DataWorks, you can view and manage only the resources that you upload by using the visual interface. If you add a resource to MaxCompute by using other tools, such as MaxCompute Studio, you must manually load the resource into DataWorks by using the MaxCompute Resources feature. You can then view and manage the resource in DataWorks. For more information, see Manage MaxCompute resources.

Billing

DataWorks does not charge you for creating or uploading resources. However, MaxCompute charges for resource storage. For more information, see Storage pricing.

Prerequisites

  • You have associated a MaxCompute compute engine for development tasks.

  • (Optional) To create a resource by uploading a file from OSS, complete the following preparations:

    • You have activated OSS, created a bucket, and uploaded the file to the bucket. When you upload from OSS, you must select a file from a specific bucket. For more information, see Create a bucket and Upload objects.

    • The Alibaba Cloud account that you use to upload the file has the required permissions to access the target bucket. To avoid permission errors, grant the permissions to the account in advance. For more information, see Access control overview.

Go to the resource creation page

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and O&M > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

  2. On the Data Studio page, right-click the target workflow, select Create Resource, and then select a resource type from the MaxCompute directory.

    Note

    If no workflow is available, see Create a workflow to create one.

Step 1: Create or upload a resource

DataWorks lets you upload resource packages that you developed locally or stored in OSS. For example, if you have a UDF that you developed locally, you must package and upload it to DataWorks before you can register the function. For some resource types, such as Python resources and file resources that are no larger than 500 KB, you can also create them directly in DataWorks.

Note

When you create or upload a resource by using the visual interface in DataWorks, take note of the following:

  • If the resource has not been uploaded to MaxCompute, select Upload to MaxCompute. If the resource already exists in MaxCompute, deselect Upload to MaxCompute. Otherwise, the upload will fail.

  • If you select Upload to MaxCompute during the upload, the resource is stored in both DataWorks and MaxCompute. If you later delete the resource from MaxCompute by using a command, the resource in DataWorks remains and is displayed as normal.

  • The resource name does not need to be the same as the name of the uploaded file.

Online editor

The following figure shows the configuration for creating a resource by using the online editor in DataWorks. You must configure the resource based on your business requirements.

Note
  • For Python resources larger than 10 MB or file resources larger than 500 KB, use Method 2: Upload a local resource or Method 3: Upload an OSS resource.

  • For a tutorial on how to create a Python resource and register a function in DataWorks, see Analyze IP address origins by using a MaxCompute UDF.

可视化新建资源

Local resource

The following figure shows the configuration for uploading a local resource by using the visual interface in DataWorks. You must configure the resource based on your business requirements.

上传本地资源

OSS resource

The following figure shows the configuration for uploading an OSS resource by using the visual interface in DataWorks. You must configure the resource based on your business requirements.

Note
  • You can use this method to upload a resource of up to 500 MB.

  • The Alibaba Cloud account that you use for the upload must be granted the AliyunDataWorksAccessingOSSRole policy. Follow the on-screen instructions to grant the permission with a single click.

image.png

Step 2: Submit and deploy

After you create the resource, click the 提交 icon on the resource editor's toolbar to submit it to the scheduling server in the development environment.

Note

If a task in the production environment needs to use this resource, you must also deploy the resource to the production environment. For more information, see Deploy tasks.

Step 3: Use the resource

Use in a node

After you create a resource in DataWorks, you must reference it in the relevant node. After a node successfully references the resource, a line of code in the format @resource_reference{"Resource Name"} is added to the node configuration. The format varies based on the node type. For example, a PyODPS 2 node displays the reference as ##@resource_reference{"Resource Name"}.

Note
  • If you have not created a node, see Create a compute node to create one.

  • If your PyODPS code depends on third-party packages, you must use a custom image to install the required packages in the runtime environment. Then, you can run the PyODPS code in that environment. For more information about custom images, see Custom images.

The following figure shows how to reference a resource.资源加载

Register a function

To register a function by using a resource, you must first create the function. For more information, see Create and use a custom function. On the function configuration page, enter the name of the resource that you created, as shown in the following figure.

Important

Before you use a resource to register a function, make sure that the resource has been submitted. For information about how to submit a resource, see Step 2: Submit and deploy.

使用资源注册函数

To view the functions that are provided by MaxCompute, see Use built-in functions.

To view the functions that exist in a MaxCompute compute engine, view the change history of functions, or perform other management operations, see Manage MaxCompute functions.

Manage resources

In the resource directory of a workflow, right-click the target resource to perform management operations:

  • View History: You can view and compare saved or submitted versions of a resource to track changes.

    Note

    To compare versions, you must select at least two versions.

  • Delete: This operation deletes the resource only from the project in the development environment. To delete the resource from the production environment, you must deploy this change. The resource is removed from production after a successful deployment. For more information, see Deploy tasks.

Appendix 1: Manage resources with commands

The following table describes common commands for resource operations.

Operation

Description

Role

Platform

Add a resource

Adds a resource to a MaxCompute project.

Users with the Write permission on the resource.

You can run these commands on the following platforms:

View resource information

Views the details of a resource.

Users with the Read permission on the resource.

List resources

Lists all resources in the current project.

Users with the List permission on the project.

Create an alias for a resource

Creates an alias for a resource.

Users with the Write permission on the resource.

Download a resource

Downloads a resource from a MaxCompute project to your local machine.

Users with the Write permission on the resource.

Delete a resource

Deletes an existing resource from a MaxCompute project.

Users with the Delete permission on the resource.

When you view resources in DataWorks, if you do not specify a project name, the command views resources in the current project by default. Examples:

  • To view all resources in the current project. When you run this command in Data Studio, the command queries the MaxCompute compute engine that is associated with the development environment by default.

    list resources;
  • To view all resources in a specific project:

    use <MaxCompute_project_name>;
    list resources;

For more information about resource commands, see Resource operations.

Appendix 2: Add external resources to DataWorks

You can use the MaxCompute Resource feature to load a MaxCompute resource that is no larger than 500 MB into DataWorks for visual management. For more information, see Manage MaxCompute resources.