MaxCompute introduces external projects that map to a Data Lake Formation (DLF) catalog, enabling real-time access to metadata and read/write access to DLF-managed data in Object Storage Service (OSS). DLF manages permissions, and this catalog-level data lake mapping enables Paimon-based multi-engine collaboration.
Scope
This feature is in private preview and available in MaxCompute by invitation.
-
MaxCompute supports only Paimon tables stored on DLF-managed OSS.
-
MaxCompute does not support writing data to a dynamic bucket table or a cross-partition table using
INSERT INTOorINSERT OVERWRITE. -
You cannot use the
SHOW TABLESoperation on cross-partition tables that use fixed buckets and whose primary key does not include a partition key. -
Schema evolution and partition operations are not supported for tables in external projects.
-
Resource and function operations are not supported.
-
You cannot use Paimon tables as a data source for data integration in DataWorks.
Data type mapping
For more information about MaxCompute data types, see Data Types (Version 1.0) and Data Types (Version 2.0).
|
Paimon data type |
MaxCompute 2.0 data type |
Read/write support |
Description |
|
TINYINT |
TINYINT |
|
8-bit signed integer. |
|
SMALLINT |
SMALLINT |
|
16-bit signed integer. |
|
INT |
INT |
|
32-bit signed integer. |
|
BIGINT |
BIGINT |
|
64-bit signed integer. |
|
BINARY(MAX_LENGTH) |
BINARY |
|
Binary data type. The current maximum length is 8 MB. |
|
FLOAT |
FLOAT |
|
32-bit binary floating-point number. |
|
DOUBLE |
DOUBLE |
|
64-bit binary floating-point number. |
|
DECIMAL(precision,scale) |
DECIMAL(precision,scale) |
|
Exact decimal numeric type. The default is
|
|
VARCHAR(n) |
VARCHAR(n) |
|
Variable-length character type. n specifies the length and ranges from 1 to 65,535. |
|
CHAR(n) |
CHAR(n) |
|
Fixed-length character type. n specifies the length and ranges from 1 to 255. |
|
VARCHAR(MAX_LENGTH) |
STRING |
|
String type. The current maximum length is 8 MB. |
|
DATE |
DATE |
|
Date type. The format is |
|
TIME, TIME(p) |
Not supported |
|
The Paimon TIME data type represents a time without a time zone, consisting of hours, minutes, and seconds, with nanosecond precision. TIME(p) specifies the fractional-second precision from 0 to 9. The default value is 0. No corresponding type exists in MaxCompute. |
|
TIMESTAMP, TIMESTAMP(p) |
TIMESTAMP_NTZ |
|
A timestamp type without a time zone that is accurate to the nanosecond. To read this data type, you must disable the native JNI bridge by running the following command: |
|
TIMESTAMP WITH LOCAL TIME_ZONE(9) |
TIMESTAMP |
|
|
|
TIMESTAMP WITH LOCAL TIME_ZONE(9) |
DATETIME |
|
A timestamp type that is accurate to the nanosecond. The format is |
|
BOOLEAN |
BOOLEAN |
|
A BOOLEAN type. |
|
ARRAY |
ARRAY |
|
A complex type. |
|
MAP |
MAP |
|
A complex type. |
|
ROW |
STRUCT |
|
A complex type. |
|
MULTISET<t> |
Not supported |
|
No corresponding type exists in MaxCompute. |
|
VARBINARY, VARBINARY(n), BYTES |
BINARY |
|
A data type of variable-length binary strings. |
Procedure
Step 1: Grant permissions
Grant permissions to a RAM user
If you are using a RAM user, make sure that the following permission policies are attached. For more information about how to grant permissions, see Manage permissions for a RAM user.
-
AliyunRAMFullAccess: If the RAM user does not have this permission, the Alibaba Cloud account must grant it.
-
AliyunMaxComputeFullAccess: This permission is required to create an external data source and an external project.
-
AliyunDLFReadOnlyAccess: This permission is required to create an external project. You need this permission policy because creating a Paimon_DLF external project requires List permissions on the DLF catalog.
Authorize and activate DLF
-
Authorize cloud resource access
When you first use Data Lake Formation (DLF), you must authorize it to access related cloud resources.
Log on to the Data Lake Formation (DLF) console, and in the upper-left corner, select a region.
To the right of Permissions to access cloud resources are granted., click Authorize.
-
Activate Data Lake Formation
Select a region. MaxCompute and DLF must be deployed in the same region. Before you use DLF, activate the service in the target region.
To the right of DLF is activated., click Activate.
Grant DLF data permissions to MaxCompute
To allow MaxCompute to access Paimon_DLF data using an external data source and an external project, you must first grant the required DLF data permissions to the MaxCompute access account.
To use the task executor's identity, you must grant the service-linked role for MaxCompute permission to access DLF. When MaxCompute accesses DLF, it passes the task executor's identity to DLF. Both services then enforce the permissions granted within their respective systems. Therefore, you must first create this service-linked role and grant it the necessary permissions. To do so, follow these steps:
-
登录RAM控制台。
-
在左侧导航栏选择。
-
在Roles页面,单击Create Role。
-
在Create Role页面的右上角,单击Create Service Linked Role。
-
在Create Service Linked Role页面,选择Select Service
AliyunServiceRoleForMaxComputeLakehouse,单击Create Service Linked Role。If a message indicates that the role already exists, you can ignore it.
Permissions
-
Control plane permissions: These permissions are checked when you perform operations in the MaxCompute console, primarily when you create a MaxCompute external project that is bound to a DLF catalog.
These permissions are managed by Resource Access Management (RAM). Use your Alibaba Cloud account to log on to the RAM console to configure these permissions. For more information, see Manage permissions for a RAM user.
-
Data plane permissions: These permissions are checked when you read from or write to Paimon tables after you log on to the created external project. This typically occurs when you create, maintain, or use schemas, tables, or other resources in the bound DLF catalog by running SQL statements in MaxCompute.
These permissions are managed in the DLF console. Use your Alibaba Cloud account to log on to the DLF console to configure these permissions. For more information, see Grant permissions to a RAM user.
Step 2: Create a Paimon_DLF external data source
-
Log in to the MaxCompute console and select a region in the upper-left corner.
-
In the left-side navigation pane, choose .
-
On the External Data Source page, click Create External Data Source.
-
In the Create External Data Source dialog box, configure the parameters. The following tables describe the parameters.
Parameter
Required
Description
External Data Source Type
Required
Select Paimon_DLF.
External Data Source Name
Required
Specify a custom name that meets the following requirements:
-
Must start with a letter and contain only lowercase letters, digits, and underscores (_).
-
Can be up to 128 characters in length.
Example:
paimon_dlf.Description
Optional
Enter a description as needed.
Region
Required
Defaults to the current region.
Authentication and Authorization
Required
Defaults to an Alibaba Cloud RAM role.
Service-linked Role
Required
This parameter is generated automatically.
Endpoint
Required
This parameter is generated automatically. For the China (Hangzhou) region, the endpoint is
cn-hangzhou-intranet.dlf.aliyuncs.com.Foreign Server Supplemental Properties
Optional
Additional properties for the external data source. After you specify these properties, tasks that use this external data source can access the source system as defined by these properties.
NoteThe list of supported parameters is updated as product capabilities evolve. For the latest information, refer to the official documentation.
-
-
Click OK to create the external data source.
-
On the External Data Source page, find the data source and click Details in the Actions column to view its details.
Step 3: Create an external project
-
Log in to the MaxCompute console and select a region in the upper-left corner.
-
In the left-side navigation pane, choose .
-
On the External Project tab, click Create Project.
-
In the Create Project dialog box, configure the settings and click OK.
Parameter
Required
Description
Project Type
Required
Defaults to external project.
Region
Required
Defaults to the current region and cannot be changed.
Project Name (Globally Unique)
Required
The name must start with a letter, contain 3 to 28 characters, and can include only letters, digits, and underscores (_).
MaxCompute Foreign Server Type
Optional
Defaults to Paimon_DLF.
MaxCompute Foreign Server
Optional
-
Use Existing: A list of existing external data sources appears.
-
Create Foreign Server: You can create and use a new external data source.
MaxCompute Foreign Server Name
Required
-
If you chose to use an existing data source, select its name from the drop-down list.
-
If you chose to create a new external data source, its name is automatically used.
Data Catalog
Required
The DLF data catalog.
Billing Method
Required
Select Subscription or Pay-as-you-go.
Default Quota
Required
Select an existing quota.
Description
Optional
Enter a custom description for the project.
-
Step 4: Access the data source using SQL
-
Log on to the external project using a connection tool.
The following example shows how to run SQL in a DataWorks SQL node within a workspace.
-
登录DataWorks控制台,在左上角选择地域。
-
在左侧导航栏选择Workspace。
-
On the Workspaces page, click Create Workspace or click the name of an existing target workspace.
-
在Workspace Details页面,单击左侧导航栏Computing Resource。
-
在Computing Resource页面,单击Associate Computing Resources,选择MaxCompute。
-
填写Associate MaxCompute Computing Resource的Basic Information。
For the MaxCompute project parameter, select the external project.
-
-
List the schemas in the external project.
-- Enable schema syntax at the session level. SET odps.namespace.schema=true; SHOW schemas; -- Sample result: ID = 20250919****am4qb default system OK -
List the tables in a schema within the external project.
If you need to access tables from an external production project in the current environment, you need to add
use external_project_pro;before all your SQL statements.-- <schema_name> is the name of a schema displayed by the SHOW schemas command. SET odps.namespace.schema=true; USE schema <schema_name>; SHOW tables; -- Sample result: ID = 20250919****am4qb acs:ram::<uid>:root emp OK -
Create a schema in the external project.
-- In this example, the schema_name is schema_demo_test. CREATE schema <schema_name>; -
Use the newly created schema.
USE schema <schema_name>; -
Create a table in the schema and insert data.
If you are logged on as a RAM user, you must have permissions on the DLF catalog for this operation. For information about how to grant permissions, see Data authorization management.
-
Command format:
-- Create a table. CREATE TABLE [IF NOT EXISTS] <table_name> ( <col_name> <data_type>, ... ) [COMMENT <table_comment>] [PARTITIONED BY (<col_name> <data_type>, ...)] ; -- Insert data. INSERT {INTO|OVERWRITE} TABLE <table_name> [PARTITION (<pt_spec>)] [(<col_name> [,<col_name> ...)]] <select_statement> FROM <from_statement> -
Example:
NoteWhen you write data to a Paimon source table, lower-precision TIMESTAMP values are truncated: precision 0-3 is truncated to milliseconds (3 decimal places), 4-6 to microseconds (6 decimal places), and 7-9 to nanoseconds (9 decimal places).
CREATE TABLE schema_table(id int,name string); INSERT INTO schema_table VALUES (101,'Zhang San'),(102,'Li Si'); -- Query the schema_table table. SELECT * FROM schema_table; -- Sample result: +------------+------------+ | id | name | +------------+------------+ | 101 | Zhang San | | 102 | Li Si | +------------+------------+
-
-
Switch to the existing default schema.
use schema default; SHOW tables; -- Sample result: ID = 20250919*******yg5 acs:ram::<uid>:root emp acs:ram::<uid>:root emp_detail acs:ram::<uid>:root test_table OK -- Read data from the table. SELECT * FROM test_table; -- Sample result: +------------+------------+ | id | name | +------------+------------+ | 101 | Zhang San | | 102 | Li Si | +------------+------------+ -- Write data to the table and check whether the data is successfully written. INSERT INTO test_table VALUES (103,'Wang Wu'); SELECT * FROM test_table; -- Sample result: +------------+------------+ | id | name | +------------+------------+ | 101 | Zhang San | | 102 | Li Si | | 103 | Wang Wu | +------------+------------+
Common errors
Problem 1: Error creating an external project
Symptom: When you create a new external project, the error "You are not authorized to perform this action." is returned.
The error details include the error code NoMCPermission.
Solution:
-
If you are a RAM user, verify that the
AliyunMaxComputeFullAccesspermission is granted to the RAM user. For more information, see the relevant procedure in Step 1. -
This operation requires selecting a catalog from Data Lake Formation (DLF). Ensure the required DLF permissions are granted. For more information, see the relevant procedure in Step 1.
Problem 2: show tables error in default schema
Symptom: When you run show tables in the default schema of an external project, the error "Forbidden:User acs: ram: :<uid>:user/** doesn't have privilege LIST on DATABASE default" is returned.
Solution:
-
Log on to the Data Lake Formation (DLF) console, and in the upper-left corner, select a region.
-
In the navigation pane on the left, choose System & Security
-
On the tab, refresh the page to check if the current RAM user exists.
-
If the user exists, go to the tab to grant permissions to the RAM user.
Problem 3: "invalid database operations on two-tier" error
Symptom: When you run show schemas in an external project, the error "invalid database operations on two-tier" is returned.
Solution:
-
Verify that the current project is an external project.
-
If you are using a DataWorks SQL node, click Debugging Configuration in the right-side pane and verify that the associated compute resource is an external project.
-
If you are using SQL analysis in DataWorks, click Data Source in the upper-right corner and verify that the associated data source is an external project.
-
If you are using SQL analysis in MaxCompute, click Execution Configuration in the right-side pane and verify that the project selected for the compute resource is an external project.
-
If you are using odpscmd, check the odps_config.ini file and verify that the value of the
project_nameparameter is the name of an external project.
-
-
After confirming you are using an external project, select and run the following SQL statements together.
-- Enable schema syntax at the session level. SET odps.namespace.schema=true; SHOW schemas;
Problem 4: show schemas returns only default schema
Symptom: When you run the show schemas command in an external project, only the default schema is returned.
Solution:
-
Verify that the current project is an external project.
-
If you are using a DataWorks SQL node, click Debugging Configuration in the right-side pane and verify that the associated compute resource is an external project.
-
If you are using SQL analysis in DataWorks, click Data Source in the upper-right corner and verify that the associated data source is an external project.
-
If you are using SQL analysis in MaxCompute, click Execution Configuration in the right-side pane and verify that the project selected for the compute resource is an external project.
-
If you are using odpscmd, check the odps_config.ini file and verify that the value of the
project_nameparameter is the name of an external project.
-
-
After you confirm that you are using an external project, log in to the Data Lake Formation (DLF) console and check whether other databases exist in the catalog that is bound to the external project.
-
If the
show schemascommand still does not return the expected database after you perform the checks in steps 1 and 2, submit a ticket for assistance.
Problem 5: "You should use dynamic bucket..." error
Symptom: When you run show tables in an external project, the following error is returned: ODPS-0110005: com.aliyun.odps.meta.exception.MetaException: com.aliyun.odps.common.table.na.NativeException: common/table/jni/utils/jni_helper.cpp(79): UnretryableException: Common table connector exception - ExceptionType: java.lang.IllegalArgumentException - Message: You should use dynamic bucket (bucket = -1) mode in cross partition update case (Primary key constraint [sending_time, symbol, sequence_number] not include all partition fields [pt]).[pt])
Solution:
This error occurs because MaxCompute does not currently support cross-partition primary key tables with fixed buckets. As a result, the show tables command fails.
As a workaround, delete the unsupported cross-partition primary key table in the Data Lake Formation (DLF) console.
Problem 6: "Can't set default schema..." error
Symptom: When you run the show tables or show schemas command, the error "FAILED: Can't set default schema if odps.namespace.schema is false" is returned.
Solution:
-
Run the
SET odps.namespace.schema=true;command before your other SQL statements to enable schema syntax at the session level. -
If the error persists, verify the command order. To read a table from another external project, use the following command sequence:
USE external_project; SET odps.namespace.schema=true; USE schema schema_name; SHOW tables;