Access DLF from MaxCompute

更新时间:
复制 MD 格式

MaxCompute introduces external projects that map to a Data Lake Formation (DLF) catalog, enabling real-time access to metadata and read/write access to DLF-managed data in Object Storage Service (OSS). DLF manages permissions, and this catalog-level data lake mapping enables Paimon-based multi-engine collaboration.

Scope

Note

This feature is in private preview and available in MaxCompute by invitation.

  • MaxCompute supports only Paimon tables stored on DLF-managed OSS.

  • MaxCompute does not support writing data to a dynamic bucket table or a cross-partition table using INSERT INTO or INSERT OVERWRITE.

  • You cannot use the SHOW TABLES operation on cross-partition tables that use fixed buckets and whose primary key does not include a partition key.

  • Schema evolution and partition operations are not supported for tables in external projects.

  • Resource and function operations are not supported.

  • You cannot use Paimon tables as a data source for data integration in DataWorks.

Data type mapping

For more information about MaxCompute data types, see Data Types (Version 1.0) and Data Types (Version 2.0).

Paimon data type

MaxCompute 2.0 data type

Read/write support

Description

TINYINT

TINYINT

Supported

8-bit signed integer.

SMALLINT

SMALLINT

Supported

16-bit signed integer.

INT

INT

Supported

32-bit signed integer.

BIGINT

BIGINT

Supported

64-bit signed integer.

BINARY(MAX_LENGTH)

BINARY

Supported

Binary data type. The current maximum length is 8 MB.

FLOAT

FLOAT

Supported

32-bit binary floating-point number.

DOUBLE

DOUBLE

Supported

64-bit binary floating-point number.

DECIMAL(precision,scale)

DECIMAL(precision,scale)

Supported

Exact decimal numeric type. The default is decimal(38,18). You can customize the precision and scale values.

  • precision: The maximum number of digits. The valid values are 1 to 38.

  • scale: The number of digits in the fractional part. The valid values are 0 to 18.

VARCHAR(n)

VARCHAR(n)

Supported

Variable-length character type. n specifies the length and ranges from 1 to 65,535.

CHAR(n)

CHAR(n)

Supported

Fixed-length character type. n specifies the length and ranges from 1 to 255.

VARCHAR(MAX_LENGTH)

STRING

Supported

String type. The current maximum length is 8 MB.

DATE

DATE

Supported

Date type. The format is yyyy-mm-dd.

TIME, TIME(p)

Not supported

Not supported

The Paimon TIME data type represents a time without a time zone, consisting of hours, minutes, and seconds, with nanosecond precision.

TIME(p) specifies the fractional-second precision from 0 to 9. The default value is 0.

No corresponding type exists in MaxCompute.

TIMESTAMP, TIMESTAMP(p)

TIMESTAMP_NTZ

Supported

A timestamp type without a time zone that is accurate to the nanosecond.

To read this data type, you must disable the native JNI bridge by running the following command: SET odps.sql.common.table.jni.disable.native=true;

TIMESTAMP WITH LOCAL TIME_ZONE(9)

TIMESTAMP

Supported

  • A timestamp type that is accurate to the nanosecond. The format is yyyy-mm-dd hh:mm:ss.xxxxxxxxx.

  • For low-precision TIMESTAMP types in the Paimon source table, MaxCompute truncates values during writes. For precision 0-3, data is truncated to 3 digits. For precision 4-6, data is truncated to 6 digits. For precision 7-9, data is truncated to 9 digits.

TIMESTAMP WITH LOCAL TIME_ZONE(9)

DATETIME

Not supported

A timestamp type that is accurate to the nanosecond.

The format is yyyy-mm-dd hh:mm:ss.xxxxxxxxx.

BOOLEAN

BOOLEAN

Supported

A BOOLEAN type.

ARRAY

ARRAY

Supported

A complex type.

MAP

MAP

Supported

A complex type.

ROW

STRUCT

Supported

A complex type.

MULTISET<t>

Not supported

Not supported

No corresponding type exists in MaxCompute.

VARBINARY, VARBINARY(n), BYTES

BINARY

Supported

A data type of variable-length binary strings.

Procedure

Step 1: Grant permissions

Grant permissions to a RAM user

If you are using a RAM user, make sure that the following permission policies are attached. For more information about how to grant permissions, see Manage permissions for a RAM user.

  • AliyunRAMFullAccess: If the RAM user does not have this permission, the Alibaba Cloud account must grant it.

  • AliyunMaxComputeFullAccess: This permission is required to create an external data source and an external project.

  • AliyunDLFReadOnlyAccess: This permission is required to create an external project. You need this permission policy because creating a Paimon_DLF external project requires List permissions on the DLF catalog.

Authorize and activate DLF

  1. Authorize cloud resource access

    When you first use Data Lake Formation (DLF), you must authorize it to access related cloud resources.

    Log on to the Data Lake Formation (DLF) console, and in the upper-left corner, select a region.

    To the right of Permissions to access cloud resources are granted., click Authorize.

  2. Activate Data Lake Formation

    Select a region. MaxCompute and DLF must be deployed in the same region. Before you use DLF, activate the service in the target region.

    To the right of DLF is activated., click Activate.

Grant DLF data permissions to MaxCompute

To allow MaxCompute to access Paimon_DLF data using an external data source and an external project, you must first grant the required DLF data permissions to the MaxCompute access account.

To use the task executor's identity, you must grant the service-linked role for MaxCompute permission to access DLF. When MaxCompute accesses DLF, it passes the task executor's identity to DLF. Both services then enforce the permissions granted within their respective systems. Therefore, you must first create this service-linked role and grant it the necessary permissions. To do so, follow these steps:

  1. 登录RAM控制台

  2. 在左侧导航栏选择Identities > Roles

  3. Roles页面,单击Create Role

  4. Create Role页面的右上角,单击Create Service Linked Role

  5. Create Service Linked Role页面,选择Select ServiceAliyunServiceRoleForMaxComputeLakehouse,单击Create Service Linked Role

    If a message indicates that the role already exists, you can ignore it.

Permissions

  • Control plane permissions: These permissions are checked when you perform operations in the MaxCompute console, primarily when you create a MaxCompute external project that is bound to a DLF catalog.

    These permissions are managed by Resource Access Management (RAM). Use your Alibaba Cloud account to log on to the RAM console to configure these permissions. For more information, see Manage permissions for a RAM user.

  • Data plane permissions: These permissions are checked when you read from or write to Paimon tables after you log on to the created external project. This typically occurs when you create, maintain, or use schemas, tables, or other resources in the bound DLF catalog by running SQL statements in MaxCompute.

    These permissions are managed in the DLF console. Use your Alibaba Cloud account to log on to the DLF console to configure these permissions. For more information, see Grant permissions to a RAM user.

Step 2: Create a Paimon_DLF external data source

  1. Log in to the MaxCompute console and select a region in the upper-left corner.

  2. In the left-side navigation pane, choose Manage Configurations > External Data Source .

  3. On the External Data Source page, click Create External Data Source.

  4. In the Create External Data Source dialog box, configure the parameters. The following tables describe the parameters.

    Parameter

    Required

    Description

    External Data Source Type

    Required

    Select Paimon_DLF.

    External Data Source Name

    Required

    Specify a custom name that meets the following requirements:

    • Must start with a letter and contain only lowercase letters, digits, and underscores (_).

    • Can be up to 128 characters in length.

    Example: paimon_dlf.

    Description

    Optional

    Enter a description as needed.

    Region

    Required

    Defaults to the current region.

    Authentication and Authorization

    Required

    Defaults to an Alibaba Cloud RAM role.

    Service-linked Role

    Required

    This parameter is generated automatically.

    Endpoint

    Required

    This parameter is generated automatically. For the China (Hangzhou) region, the endpoint is cn-hangzhou-intranet.dlf.aliyuncs.com.

    Foreign Server Supplemental Properties

    Optional

    Additional properties for the external data source. After you specify these properties, tasks that use this external data source can access the source system as defined by these properties.

    Note

    The list of supported parameters is updated as product capabilities evolve. For the latest information, refer to the official documentation.

  5. Click OK to create the external data source.

  6. On the External Data Source page, find the data source and click Details in the Actions column to view its details.

Step 3: Create an external project

  1. Log in to the MaxCompute console and select a region in the upper-left corner.

  2. In the left-side navigation pane, choose Manage Configurations > Projects.

  3. On the External Project tab, click Create Project.

  4. In the Create Project dialog box, configure the settings and click OK.

    Parameter

    Required

    Description

    Project Type

    Required

    Defaults to external project.

    Region

    Required

    Defaults to the current region and cannot be changed.

    Project Name (Globally Unique)

    Required

    The name must start with a letter, contain 3 to 28 characters, and can include only letters, digits, and underscores (_).

    MaxCompute Foreign Server Type

    Optional

    Defaults to Paimon_DLF.

    MaxCompute Foreign Server

    Optional

    • Use Existing: A list of existing external data sources appears.

    • Create Foreign Server: You can create and use a new external data source.

    MaxCompute Foreign Server Name

    Required

    • If you chose to use an existing data source, select its name from the drop-down list.

    • If you chose to create a new external data source, its name is automatically used.

    Data Catalog

    Required

    The DLF data catalog.

    Billing Method

    Required

    Select Subscription or Pay-as-you-go.

    Default Quota

    Required

    Select an existing quota.

    Description

    Optional

    Enter a custom description for the project.

Step 4: Access the data source using SQL

  1. Log on to the external project using a connection tool.

    The following example shows how to run SQL in a DataWorks SQL node within a workspace.

    1. 登录DataWorks控制台,在左上角选择地域。

    2. 在左侧导航栏选择Workspace

    3. On the Workspaces page, click Create Workspace or click the name of an existing target workspace.

    4. Workspace Details页面,单击左侧导航栏Computing Resource

    5. Computing Resource页面,单击Associate Computing Resources,选择MaxCompute

    6. 填写Associate MaxCompute Computing ResourceBasic Information

      For the MaxCompute project parameter, select the external project.

  2. List the schemas in the external project.

    -- Enable schema syntax at the session level.
    SET odps.namespace.schema=true;
    SHOW schemas;
    -- Sample result:
    ID = 20250919****am4qb
    default
    system
    OK
  3. List the tables in a schema within the external project.

    If you need to access tables from an external production project in the current environment, you need to add use external_project_pro; before all your SQL statements.

    -- <schema_name> is the name of a schema displayed by the SHOW schemas command.
    SET odps.namespace.schema=true;
    USE schema <schema_name>;
    SHOW tables;
    -- Sample result:
    ID = 20250919****am4qb
    acs:ram::<uid>:root  emp
    OK
  4. Create a schema in the external project.

    -- In this example, the schema_name is schema_demo_test.
    CREATE schema <schema_name>;
  5. Use the newly created schema.

    USE schema <schema_name>;
  6. Create a table in the schema and insert data.

    If you are logged on as a RAM user, you must have permissions on the DLF catalog for this operation. For information about how to grant permissions, see Data authorization management.

    • Command format:

      -- Create a table.
      CREATE TABLE [IF NOT EXISTS] <table_name> 
      (
        <col_name> <data_type>,
        ...
      )
      [COMMENT <table_comment>]
      [PARTITIONED BY (<col_name> <data_type>, ...)] 
      ;
      -- Insert data.
      INSERT {INTO|OVERWRITE} TABLE <table_name> [PARTITION (<pt_spec>)] [(<col_name> [,<col_name> ...)]]
      <select_statement>
      FROM <from_statement>
    • Example:

      Note

      When you write data to a Paimon source table, lower-precision TIMESTAMP values are truncated: precision 0-3 is truncated to milliseconds (3 decimal places), 4-6 to microseconds (6 decimal places), and 7-9 to nanoseconds (9 decimal places).

      CREATE TABLE schema_table(id int,name string);
      INSERT INTO schema_table VALUES (101,'Zhang San'),(102,'Li Si');
      -- Query the schema_table table.
      SELECT * FROM schema_table;
      -- Sample result:
      +------------+------------+
      | id         | name       | 
      +------------+------------+
      | 101        | Zhang San  | 
      | 102        | Li Si      | 
      +------------+------------+
  7. Switch to the existing default schema.

    use schema default;
    SHOW tables;
    -- Sample result:
    ID = 20250919*******yg5
    acs:ram::<uid>:root	emp
    acs:ram::<uid>:root	emp_detail
    acs:ram::<uid>:root	test_table
    OK
    -- Read data from the table.
    SELECT * FROM test_table;
    -- Sample result:
    +------------+------------+
    | id         | name       | 
    +------------+------------+
    | 101        | Zhang San  | 
    | 102        | Li Si      | 
    +------------+------------+
    -- Write data to the table and check whether the data is successfully written.
    INSERT INTO test_table VALUES (103,'Wang Wu');
    SELECT * FROM test_table;
    -- Sample result:
    +------------+------------+
    | id         | name       | 
    +------------+------------+
    | 101        | Zhang San  | 
    | 102        | Li Si      | 
    | 103        | Wang Wu    | 
    +------------+------------+

Common errors

Problem 1: Error creating an external project

Symptom: When you create a new external project, the error "You are not authorized to perform this action." is returned.

The error details include the error code NoMCPermission.

Solution:

  1. If you are a RAM user, verify that the AliyunMaxComputeFullAccess permission is granted to the RAM user. For more information, see the relevant procedure in Step 1.

  2. This operation requires selecting a catalog from Data Lake Formation (DLF). Ensure the required DLF permissions are granted. For more information, see the relevant procedure in Step 1.

Problem 2: show tables error in default schema

Symptom: When you run show tables in the default schema of an external project, the error "Forbidden:User acs: ram: :<uid>:user/** doesn't have privilege LIST on DATABASE default" is returned.

Solution:

  1. Log on to the Data Lake Formation (DLF) console, and in the upper-left corner, select a region.

  2. In the navigation pane on the left, choose System & Security

  3. On the Access Control > Users tab, refresh the page to check if the current RAM user exists.

  4. If the user exists, go to the Access Control > Role tab to grant permissions to the RAM user.

Problem 3: "invalid database operations on two-tier" error

Symptom: When you run show schemas in an external project, the error "invalid database operations on two-tier" is returned.

Solution:

  1. Verify that the current project is an external project.

    1. If you are using a DataWorks SQL node, click Debugging Configuration in the right-side pane and verify that the associated compute resource is an external project.

    2. If you are using SQL analysis in DataWorks, click Data Source in the upper-right corner and verify that the associated data source is an external project.

    3. If you are using SQL analysis in MaxCompute, click Execution Configuration in the right-side pane and verify that the project selected for the compute resource is an external project.

    4. If you are using odpscmd, check the odps_config.ini file and verify that the value of the project_name parameter is the name of an external project.

  2. After confirming you are using an external project, select and run the following SQL statements together.

    -- Enable schema syntax at the session level.
    SET odps.namespace.schema=true;
    SHOW schemas;

Problem 4: show schemas returns only default schema

Symptom: When you run the show schemas command in an external project, only the default schema is returned.

Solution:

  1. Verify that the current project is an external project.

    1. If you are using a DataWorks SQL node, click Debugging Configuration in the right-side pane and verify that the associated compute resource is an external project.

    2. If you are using SQL analysis in DataWorks, click Data Source in the upper-right corner and verify that the associated data source is an external project.

    3. If you are using SQL analysis in MaxCompute, click Execution Configuration in the right-side pane and verify that the project selected for the compute resource is an external project.

    4. If you are using odpscmd, check the odps_config.ini file and verify that the value of the project_name parameter is the name of an external project.

  2. After you confirm that you are using an external project, log in to the Data Lake Formation (DLF) console and check whether other databases exist in the catalog that is bound to the external project.

  3. If the show schemas command still does not return the expected database after you perform the checks in steps 1 and 2, submit a ticket for assistance.

Problem 5: "You should use dynamic bucket..." error

Symptom: When you run show tables in an external project, the following error is returned: ODPS-0110005: com.aliyun.odps.meta.exception.MetaException: com.aliyun.odps.common.table.na.NativeException: common/table/jni/utils/jni_helper.cpp(79): UnretryableException: Common table connector exception - ExceptionType: java.lang.IllegalArgumentException - Message: You should use dynamic bucket (bucket = -1) mode in cross partition update case (Primary key constraint [sending_time, symbol, sequence_number] not include all partition fields [pt]).[pt])

Solution:

This error occurs because MaxCompute does not currently support cross-partition primary key tables with fixed buckets. As a result, the show tables command fails.

As a workaround, delete the unsupported cross-partition primary key table in the Data Lake Formation (DLF) console.

Problem 6: "Can't set default schema..." error

Symptom: When you run the show tables or show schemas command, the error "FAILED: Can't set default schema if odps.namespace.schema is false" is returned.

Solution:

  1. Run the SET odps.namespace.schema=true; command before your other SQL statements to enable schema syntax at the session level.

  2. If the error persists, verify the command order. To read a table from another external project, use the following command sequence:

    USE external_project;
    SET odps.namespace.schema=true;
    USE schema schema_name;
    SHOW tables;