MaxCompute provides external projects that map to a Data Lake Formation (DLF) catalog for real-time access to metadata and data. These projects delegate permission management to DLF and support metadata access and read/write operations on data in DLF-managed Object Storage Service (OSS). This approach enables you to map an entire data lake at the catalog level, facilitating Paimon-based cross-engine collaboration. The Paimon_DLF external project is currently in invitational preview.
Limitations
-
Table format: Only Paimon tables stored in OSS that are fully managed by DLF are supported.
-
Write limitations: Dynamic bucket tables and cross-partition tables do not support
INSERT INTOorINSERT OVERWRITE. -
Query limitations: You cannot use
SHOW TABLESon cross-partition tables that have fixed buckets and whose primary key does not include the partition key. -
External projects: You cannot perform schema evolution or partition operations.
-
Other limitations:
-
You cannot perform resource or function operations.
-
You cannot use these tables as a data source for DataWorks Data Integration.
-
Procedure
Step 1: Grant permissions
Grant permissions to a RAM user
If you are a RAM user, attach the following permission policies. For information about how to add permissions, see Manage permissions for a RAM user.
-
AliyunRAMFullAccess: If a RAM user does not have this permission, an Alibaba Cloud account must grant it.
-
AliyunMaxComputeFullAccess: Required to create external data sources and external projects.
-
AliyunDLFReadOnlyAccess: Required to create external projects. This permission policy is necessary because creating a Paimon_DLF external project requires the List permission for the DLF catalog.
Authorize and activate DLF
-
Authorize cloud resource access
The first time you use Data Lake Formation (DLF), you must authorize it to access the required cloud resources.
Log on to the Data Lake Formation (DLF) console, and in the upper-left corner, select a region.
To the right of Permissions to access cloud resources are granted., click Authorize.
-
Activate Data Lake Formation
Select a region. MaxCompute and DLF must be deployed in the same region. Before you use DLF, you must activate the service to ensure its availability in the target region.
To the right of DLF is activated., click Activate.
Grant DLF permissions to MaxCompute
MaxCompute accesses Paimon_DLF data through an external data source and an external project. Before you proceed, you must grant the necessary DLF data permissions to the MaxCompute access account.
The "Use task executor identity" option passes the task executor's identity to DLF. Both MaxCompute and DLF then enforce permissions based on this identity. Therefore, you must first create a service-linked role for MaxCompute and grant it the necessary permissions. Follow these steps:
-
登录RAM控制台。
-
在左侧导航栏选择。
-
在Roles页面,单击Create Role。
-
在Create Role页面的右上角,单击Create Service Linked Role。
-
在Create Service Linked Role页面,选择Select Service
AliyunServiceRoleForMaxComputeLakehouse,单击Create Service Linked Role。If you are prompted that the role already exists, it means the role has already been authorized and you can ignore the prompt.
Permission model
-
Control plane: These permissions are checked when you perform operations in the MaxCompute console, primarily during the creation of a MaxCompute external project and its binding to a DLF catalog.
These permissions are managed by RAM. Your Alibaba Cloud account must log on to the RAM console to configure the required permissions. For more information, see Manage permissions for a RAM user.
-
Data plane: These permissions are checked when you read from or write to Paimon tables after you log on to the external project. This typically occurs when you create, maintain, or use schemas, tables, and other resources within the bound DLF catalog by executing SQL statements in MaxCompute.
These permissions are managed by the DLF console. Your Alibaba Cloud account must log on to the DLF console to configure the required permissions. For more information, see Grant permissions to a RAM user.
Step 2: Create a Paimon_DLF external data source
-
Log in to the MaxCompute console and select a region in the upper-left corner.
-
In the left-side navigation pane, choose .
-
On the External Data Source page, click Create External Data Source.
-
In the Create External Data Source dialog box, configure the parameters. The following tables describe the parameters.
Parameter
Required
Description
External Data Source Type
Yes
Select Paimon_DLF.
External Data Source Name
Yes
Enter a custom name. The name must meet the following requirements:
-
It must start with a letter and can contain only lowercase letters, digits, and underscores (_).
-
It cannot exceed 128 characters in length.
For example, you can enter
paimon_dlf.Description
No
Enter a description as needed.
Region
Yes
The current region is selected by default.
Authentication and Authorization
Yes
The default value is an Alibaba Cloud RAM role.
Service-linked Role
Yes
The role is generated by default.
Endpoint
Yes
The endpoint is automatically generated. For the China (Hangzhou) region, the endpoint is
cn-hangzhou-intranet.dlf.aliyuncs.com.Foreign Server Supplemental Properties
No
These properties define how tasks that use this data source access the source system.
NoteFor information about the supported parameters, see future updates to the official documentation. The available parameters will be expanded as the product evolves.
-
-
Click OK to create the external data source.
-
On the External Data Source page, find the data source and click Details in the Actions column to view its details.
Step 3: Create an external project
-
Log in to the MaxCompute console and select a region in the upper-left corner.
-
In the left-side navigation pane, choose .
-
On the External Project tab, click Create Project.
-
In the Create Project dialog box, configure the settings and click OK.
Parameter
Required
Description
Project Type
Yes
The default value is External Project.
Region
Yes
The current region is selected by default and cannot be changed.
Project Name (Globally Unique)
Yes
The name must be 3 to 28 characters in length, start with a letter, and contain only letters, digits, and underscores (_).
MaxCompute Foreign Server Type
No
The default value is Paimon_DLF.
MaxCompute Foreign Server
No
-
Use Existing: Displays a list of existing external data sources.
-
Create Foreign Server: Allows you to create and use a new external data source.
MaxCompute Foreign Server Name
Yes
-
If you chose to use an existing data source, select its name from the drop-down list.
-
If you chose to create a new data source, its name is automatically used.
Data Catalog
Yes
The DLF data catalog.
Billing Method
Yes
Select Subscription or Pay-as-you-go.
Default Quota
Yes
Select an existing quota.
Description
No
Enter a custom project description.
-
Step 4: Use SQL to access the data source
Deleting an external project does not delete the underlying data because the project is only a mapping to the data source.
However, unlike with standard external tables, running a DROP TABLE or DROP SCHEMA command in an external project sends the request to the peer service. This permanently deletes the corresponding table or database. Use DROP operations with caution.
-
Choose a connection tool to log on to the external project.
The following example shows how to use an SQL node for data development in a DataWorks workspace (new version).
-
Log on to the DataWorks console, and select a region in the upper-left corner.
-
In the left-side navigation pane, select Workspace.
-
On the Workspaces page, click Create Workspace or click the name of an existing workspace.
-
On the Workspace Details page, click Computing Resource in the left-side navigation pane.
-
On the Computing Resource page, click Associate Computing Resources, and select MaxCompute.
-
In the Associate MaxCompute Computing Resource dialog box, fill in the Basic Information.
For the MaxCompute project, select the external project.
-
-
List the schemas in the external project.
-- Enable schema syntax at the session level. SET odps.namespace.schema=true; SHOW schemas; -- Sample result. ID = 20250919****am4qb default system OK -
List the tables in a schema within the external project.
If you need to access tables from a production external project in the current environment, add
use external_project_pro;before all SQL statements.-- schema_name is the schema name returned by SHOW SCHEMAS in the external project. SET odps.namespace.schema=true; USE schema <schema_name>; SHOW tables; -- Result ID = 20250919****am4qb acs:ram::<uid>:root emp OK -
Create a new schema in the external project.
-- In this example, schema_name is schema_demo_test. CREATE schema <schema_name>; -
Use the new schema.
USE schema <schema_name>; -
Create a table and insert data into the schema.
If you are logged on as a RAM user, this operation requires permissions on the DLF catalog. For information about how to grant permissions, see Data authorization management.
-
Command format:
-- Create a table. CREATE TABLE [IF NOT EXISTS] <table_name> ( <col_name> <data_type>, ... ) [COMMENT <table_comment>] [PARTITIONED BY (<col_name> <data_type>, ...)] ; -- Insert data. INSERT {INTO|OVERWRITE} TABLE <table_name> [PARTITION (<pt_spec>)] [(<col_name> [,<col_name> ...)]] <select_statement> FROM <from_statement> -
Example:
NoteFor low-precision TIMESTAMP types from the source Paimon table, data is truncated during write operations as follows: precision 0–3 is truncated to 3 decimal places, 4–6 to 6 decimal places, and 7–9 to 9 decimal places.
CREATE TABLE schema_table(id int,name string); INSERT INTO schema_table VALUES (101,'Zhang San'),(102,'Li Si'); -- Query the schema_table table. SELECT * FROM schema_table; -- Result +------------+------------+ | id | name | +------------+------------+ | 101 | Zhang San | | 102 | Li Si | +------------+------------+
-
-
Switch to the existing
defaultschema.use schema default; SHOW tables; -- Sample result ID = 20250919*******yg5 acs:ram::<uid>:root emp acs:ram::<uid>:root emp_detail acs:ram::<uid>:root test_table OK -- Read data from the table. SELECT * FROM test_table; -- Result +------------+------------+ | id | name | +------------+------------+ | 101 | Zhang San | | 102 | Li Si | +------------+------------+ -- Write data to the table and query to verify the write operation. INSERT INTO test_table VALUES (103,'Wang Wu'); SELECT * FROM test_table; -- Result +------------+------------+ | id | name | +------------+------------+ | 101 | Zhang San | | 102 | Li Si | | 103 | Wang Wu | +------------+------------+
Passing Paimon table properties
Apache Paimon includes core configuration options. When creating a Paimon table in an external project, specify these options in the TBLPROPERTIES clause of your CREATE TABLE statement.
Configuration: Add parameters prefixed with mcfed. to the TBLPROPERTIES list. The parameter names following the prefix must match the native Paimon parameter names.
Example
Create a Paimon external table with buckets, a primary key, and partitions
-
Create the table and configure its external parameters
-- Switch to your external project. You can skip this step if you are already in it. use <your external project>; -- Enable schema syntax at the session level. SET odps.namespace.schema=true; -- Select the schema to use. use schema <your schema>; CREATE TABLE oss_extable_bucket_pk_pt_bucket ( id BIGINT, name STRING, dt STRING )tblproperties ( 'mcfed.bucket'='3', -- Number of buckets 'mcfed.bucket-key'='id', -- Bucket key. Optional if a primary key is specified. "mcfed.primary-key"="dt,id", -- Primary key "mcfed.partition"="dt" -- Partition field ); -
Insert data into the external table
INSERT INTO oss_extable_bucket_pk_pt_bucket PARTITION (dt='2025-06-18') VALUES (1, 'Alice'),(2, 'Bob'); INSERT INTO oss_extable_bucket_pk_pt_bucket PARTITION (dt='2025-06-19') VALUES (3, 'Charlie'),(4, 'David'),(5, 'Eva'); -
Query the external table
SELECT * FROM oss_extable_bucket_pk_pt_bucket; -- Result: +------------+---------+------------+ | id | name | dt | +------------+---------+------------+ | 1 | Alice | 2025-06-18 | | 2 | Bob | 2025-06-18 | | 4 | David | 2025-06-19 | | 3 | Charlie | 2025-06-19 | | 5 | Eva | 2025-06-19 | +------------+---------+------------+ -
Log on to the Data Lake Formation (DLF) console, and in the upper-left corner, select a region.
View the table details in the catalog:
The table name is
oss_extable_bucket_pk_pt_bucket, the table format is Paimon primary key table, the table type is Managed, the file format is parquet, and it is in the default database. Key settings in Table Properties includebucket: 3andmanifest.merge-min-count: 10. The field list includesid(BIGINT, not null, primary key, not a partition field),name(STRING, nullable, not a primary key, not a partition field), anddt(STRING, not null, primary key, partition field).
FAQ
Issue 1: Error creating an external project
Problem: When you attempt to create an external project, you receive the error message: "You are not authorized to perform this action."
In the error details, the error code is NoMCPermission.
Solution:
-
If you are using a RAM user, ensure that the
AliyunMaxComputeFullAccesspolicy is attached to the RAM user. For more information, see Step 1. -
This operation requires you to select a catalog from Data Lake Formation (DLF). Ensure you have the required permissions for DLF operations. For more information, see Step 1.
Issue 2: Error running SHOW TABLES in default schema
Problem: When you run the SHOW TABLES command in the default schema of an external project, you receive the following error message: "Forbidden:User acs: ram: :<uid>:user/** doesn't have privilege LIST on DATABASE default".
Solution:
-
Log on to the Data Lake Formation (DLF) console, and in the upper-left corner, select a region.
-
In the navigation pane on the left, choose System & Security
-
On the tab, refresh the page to check if the current RAM user exists.
-
If the user exists, go to the tab to grant permissions to the RAM user.
Issue 3: "invalid database operations on two-tier" error
Problem: When you run the SHOW SCHEMAS command in an external project, you receive the error message: "invalid database operations on two-tier".
Solution:
-
First, ensure that you are using an external project.
-
If you are using a DataWorks SQL node, click Debug Configuration in the right-side pane and verify that the bound compute resource is an external project.
-
If you are using DataWorks SQL analysis, click Data Source in the upper-right corner and verify that the bound data source is an external project.
-
If you are using MaxCompute SQL analysis, click Run Configuration in the right-side pane and verify that the project selected in the compute resource section is an external project.
-
If you are using odpscmd, check the value of the
project_nameparameter in the odps_config.ini file to ensure that it is set to the name of the external project.
-
-
After you confirm that you are using an external project, select and run the following SQL statements together.
-- Enable schema syntax at the session level. SET odps.namespace.schema=true; SHOW schemas;
Issue 4: SHOW SCHEMAS returns only default schema
Problem: When you run the SHOW SCHEMAS command in an external project, the command returns only 'default'.
Solution:
-
First, ensure that you are using an external project.
-
If you are using a DataWorks SQL node, click Debug Configuration in the right-side pane and verify that the bound compute resource is an external project.
-
If you are using DataWorks SQL analysis, click Data Source in the upper-right corner and verify that the bound data source is an external project.
-
If you are using MaxCompute SQL analysis, click Run Configuration in the right-side pane and verify that the project selected in the compute resource section is an external project.
-
If you are using odpscmd, check the value of the
project_nameparameter in the odps_config.ini file to ensure that it is set to the name of the external project.
-
-
After you confirm that you are using an external project, log on to the Data Lake Formation (DLF) console and check whether other databases exist in the catalog that is bound to the external project.
-
If you have completed the verifications in Steps 1 and 2, but the SHOW SCHEMAS command still does not return the expected databases, you can submit a ticket to report the issue.
Issue 5: "Dynamic bucket" error
Problem: When you run the SHOW TABLES command in an external project, you receive the following error message: ODPS-0110005: com.aliyun.odps.meta.exception.MetaException: com.aliyun.odps.common.table.na.NativeException: common/table/jni/utils/jni_helper.cpp(79): UnretryableException: Common table connector exception - ExceptionType: java.lang.IllegalArgumentException - Message: You should use dynamic bucket (bucket = -1) mode in cross partition update case (Primary key constraint [sending_time, symbol, sequence_number] not include all partition fields [pt]).[pt])
Solution:
This error occurs because MaxCompute does not support cross-partition primary key tables with fixed buckets.
Temporary workaround: Delete the unsupported cross-partition PK tables in the Data Lake Formation (DLF) console.
Issue 6: "Can't set default schema..." error
Problem: When you run SHOW TABLES or SHOW SCHEMAS, you receive the following error message: "FAILED: Can't set default schema if odps.namespace.schema is false."
Solution:
-
First, ensure that you have added the session flag
SET odps.namespace.schema=true;before your SQL statements to enable schema syntax. -
If the error persists after you enable the flag, verify the order of your statements. To read a table from another external project, use the following sequence of commands:
USE external_project; SET odps.namespace.schema=true; USE schema schema_name; SHOW tables;
Issue 7: "Failed to connect to...dlf.aliyuncs.com" error
Problem: When you try to read data from a table in a Paimon_DLF external project, you receive an error message similar to the following: "Caused by: java.net.ConnectException: Failed to connect to cn-hangzhou-intranet.dlf.aliyuncs.com/xx.xx.xx.xx:80"
This error indicates that you need to add the address cn-hangzhou-intranet.dlf.aliyuncs.com to the allowlist for the internal project where the job is running.
Solution:
-
Join the MaxCompute developer community DingTalk group by using the application link or by searching for the group ID
11782920in DingTalk. Contact the MaxCompute technical support team and request that they add the address from the error message to the allowlist. -
Once the address is added to the allowlist, include the following parameters before your read operations.
set odps.security.outbound.intranetlist=cn-hangzhou-intranet.dlf.aliyuncs.com:80; set odps.internet.access.list=cn-hangzhou-intranet.dlf.aliyuncs.com:80;
Data type mapping
For details on MaxCompute data types, see Data types (V1.0) and Data types (V2.0).
|
Paimon data type |
MaxCompute 2.0 data type |
Read/write support |
Description |
|
TINYINT |
TINYINT |
|
8-bit signed integer. |
|
SMALLINT |
SMALLINT |
|
16-bit signed integer. |
|
INT |
INT |
|
32-bit signed integer. |
|
BIGINT |
BIGINT |
|
64-bit signed integer. |
|
BINARY(MAX_LENGTH) |
BINARY |
|
Binary data type. The current maximum length is 8 MB. |
|
FLOAT |
FLOAT |
|
32-bit binary floating-point number. |
|
DOUBLE |
DOUBLE |
|
64-bit binary floating-point number. |
|
DECIMAL(precision,scale) |
DECIMAL(precision,scale) |
|
Exact decimal numeric type. The default is
|
|
VARCHAR(n) |
VARCHAR(n) |
|
Variable-length character type. n specifies the length and ranges from 1 to 65,535. |
|
CHAR(n) |
CHAR(n) |
|
Fixed-length character type. n specifies the length and ranges from 1 to 255. |
|
VARCHAR(MAX_LENGTH) |
STRING |
|
String type. The current maximum length is 8 MB. |
|
DATE |
DATE |
|
Date type. The format is |
|
TIME, TIME(p) |
Not supported |
|
The Paimon TIME data type represents a time without a time zone, consisting of hours, minutes, and seconds, with nanosecond precision. TIME(p) specifies the fractional-second precision from 0 to 9. The default value is 0. No corresponding type exists in MaxCompute. |
|
TIMESTAMP, TIMESTAMP(p) |
TIMESTAMP_NTZ |
|
A timestamp type without a time zone that is accurate to the nanosecond. To read this data type, you must disable the native JNI bridge by running the following command: |
|
TIMESTAMP WITH LOCAL TIME_ZONE(9) |
TIMESTAMP |
|
|
|
TIMESTAMP WITH LOCAL TIME_ZONE(9) |
DATETIME |
|
A timestamp type that is accurate to the nanosecond. The format is |
|
BOOLEAN |
BOOLEAN |
|
A BOOLEAN type. |
|
ARRAY |
ARRAY |
|
A complex type. |
|
MAP |
MAP |
|
A complex type. |
|
ROW |
STRUCT |
|
A complex type. |
|
MULTISET<t> |
Not supported |
|
No corresponding type exists in MaxCompute. |
|
VARBINARY, VARBINARY(n), BYTES |
BINARY |
|
A data type of variable-length binary strings. |