Paimon_DLF external projects-MaxCompute(MaxCompute)-阿里云帮助中心

A Paimon_DLF external project maps a Data Lake Formation (DLF) catalog to MaxCompute for real-time metadata and data access. It delegates permission management to DLF and supports read/write operations on data in DLF-managed Object Storage Service (OSS), enabling catalog-level data lake mapping for Paimon-based cross-engine collaboration. The Paimon_DLF external project is currently in invitational preview.

Limitations

Table format: Only Paimon tables stored in OSS that are fully managed by DLF are supported.
Write limitations: Dynamic bucket tables and cross-partition tables do not support INSERT INTO or INSERT OVERWRITE.
Query limitations: You cannot use SHOW TABLES on cross-partition tables that have fixed buckets and whose primary key does not include the partition key.
External projects: You cannot perform schema evolution or partition operations.
Other limitations:
- You cannot perform resource or function operations.
- You cannot use these tables as a data source for DataWorks Data Integration.
Data type mappings

Procedure

Step 1: Grant permissions

Grant permissions to a RAM user

If you are a RAM user, attach the following permission policies. For information about how to add permissions, see Manage permissions for a RAM user.

AliyunRAMFullAccess: If a RAM user does not have this permission, an Alibaba Cloud account must grant it.
AliyunMaxComputeFullAccess: Required to create external data sources and external projects.
AliyunDLFReadOnlyAccess: Required to create external projects. This permission policy is necessary because creating a Paimon_DLF external project requires the List permission for the DLF catalog.

Authorize and activate DLF

Authorize cloud resource access

The first time you use Data Lake Formation (DLF), you must authorize it to access the required cloud resources.

Log on to the Data Lake Formation (DLF) console, and in the upper-left corner, select a region.

To the right of Permissions to access cloud resources are granted., click Authorize.
Activate Data Lake Formation

Select a region. MaxCompute and DLF must be deployed in the same region. Activate DLF before you use it to ensure its availability in the target region.

To the right of DLF is activated., click Activate.

Grant DLF permissions to MaxCompute

MaxCompute accesses Paimon_DLF data through an external data source and an external project. Before you proceed, you must grant the necessary DLF data permissions to the MaxCompute access account.

The "Use task executor identity" option passes the task executor's identity to DLF. Both MaxCompute and DLF then enforce permissions based on this identity. You must first create a service-linked role for MaxCompute and grant it the necessary permissions:

Log in to the RAM console.
In the left navigation bar, select Identities > Roles.
On the Roles page, click Create Role.
In the upper-right corner of the Create Role page, click Create Service Linked Role.
On the Create Service Linked Role page, select AliyunServiceRoleForMaxComputeLakehouse for Select Service and click Create Service Linked Role.

If you are prompted that the role already exists, it has already been authorized and you can ignore the prompt.

Permission model

Control plane: These permissions are checked when you perform operations in the MaxCompute console, primarily during the creation of a MaxCompute external project and its binding to a DLF catalog.

These permissions are managed by RAM. Your Alibaba Cloud account must log on to the RAM console to configure the required permissions. For more information, see Manage permissions for a RAM user.
Data plane: These permissions are checked when you read from or write to Paimon tables after you log on to the external project. This typically occurs when you create, maintain, or use schemas, tables, and other resources within the bound DLF catalog by running SQL statements in MaxCompute.

These permissions are managed by the DLF console. Your Alibaba Cloud account must log on to the DLF console to configure the required permissions. For more information, see Grant permissions to a RAM user.

Step 2: Create a Paimon_DLF external data source

Log in to the MaxCompute console and select a region in the upper-left corner.
In the left-side navigation pane, choose Manage Configurations > External Data Source .
On the External Data Source page, click Create External Data Source.

In the Create External Data Source dialog box, configure the parameters. The following tables describe the parameters.

Parameter	Required	Description
External Data Source Type	Yes	Select Paimon_DLF.
External Data Source Name	Yes	Enter a custom name. The name must meet the following requirements: It must start with a letter and can contain only lowercase letters, digits, and underscores (_). It cannot exceed 128 characters in length. For example, you can enter `paimon_dlf`.
Description	No	Enter a description as needed.
Region	Yes	The current region is selected by default.
Authentication and Authorization	Yes	The default value is an Alibaba Cloud RAM role.
Service-linked Role	Yes	The role is generated by default.
Endpoint	Yes	The endpoint is automatically generated. For the China (Hangzhou) region, the endpoint is `cn-hangzhou-intranet.dlf.aliyuncs.com`.
Foreign Server Supplemental Properties	No	These properties define how tasks using this data source access the source system. Note For information about supported parameters, see future updates to the official documentation. Available parameters will be expanded as the product evolves.

Click OK to create the external data source.
On the External Data Source page, find the data source and click Details in the Actions column to view its details.

Step 3: Create an external project

Log in to the MaxCompute console and select a region in the upper-left corner.
In the left-side navigation pane, choose Manage Configurations > Projects.
On the External Project tab, click Create Project.

In the Create Project dialog box, configure the settings and click OK.

Parameter	Required	Description
Project Type	Yes	The default value is External Project.
Region	Yes	The current region is selected by default and cannot be changed.
Project Name (Globally Unique)	Yes	The name must be 3 to 28 characters in length, start with a letter, and contain only letters, digits, and underscores (_).
MaxCompute Foreign Server Type	No	The default value is Paimon_DLF.
MaxCompute Foreign Server	No	Use Existing: Displays a list of existing external data sources. Create Foreign Server: Allows you to create and use a new external data source.
MaxCompute Foreign Server Name	Yes	If you chose to use an existing data source, select its name from the drop-down list. If you chose to create a new data source, its name is automatically used.
Data Catalog	Yes	The DLF data catalog.
Billing Method	Yes	Select Subscription or Pay-as-you-go.
Default Quota	Yes	Select an existing quota.
Description	No	Enter a custom project description.

Step 4: Use SQL to access the data source

Important

Deleting an external project does not delete the underlying data because the project is only a mapping to the data source.

However, unlike with standard external tables, running a DROP TABLE or DROP SCHEMA command in an external project sends the request to the peer service. This permanently deletes the corresponding table or database. Use DROP operations with caution.

Choose a connection tool to log on to the external project.

The following example uses an SQL node for data development in a DataWorks workspace (new version).
1. Log in to the DataWorks console and select a region in the upper-left corner.
2. In the navigation pane on the left, choose Workspace.
3. On the Workspaces page, click Create Workspace or click the name of an existing workspace.
4. On the Workspace Details page, click Computing Resource in the navigation pane on the left.
5. On the Computing Resource page, click Associate Computing Resources, and select MaxCompute.
6. Configure the Basic Information for Associate MaxCompute Computing Resource.
  
  For the MaxCompute project, select the external project.

List the schemas in the external project.

-- Enable schema syntax at the session level.
SET odps.namespace.schema=true;
SHOW schemas;
-- Sample result. 
ID = 20250919****am4qb
default
system
OK

List the tables in a schema within the external project.

If you need to access tables from a production external project in the current environment, add use external_project_pro; before all SQL statements.

-- schema_name is the schema name returned by SHOW SCHEMAS in the external project.
SET odps.namespace.schema=true;
USE schema <schema_name>;
SHOW tables;
-- Result
ID = 20250919****am4qb
acs:ram::<uid>:root  emp
OK

Create a new schema in the external project.

-- In this example, schema_name is schema_demo_test.
CREATE schema <schema_name>;

Use the new schema.
```
USE schema <schema_name>;
```

Create a table and insert data into the schema.

If you are logged on as a RAM user, this operation requires permissions on the DLF catalog. For information about how to grant permissions, see Data authorization management.

Command format:

-- Create a table.
CREATE TABLE [IF NOT EXISTS] <table_name> 
(
  <col_name> <data_type>,
  ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)] 
;
-- Insert data.
INSERT {INTO|OVERWRITE} TABLE <table_name> [PARTITION (<pt_spec>)] [(<col_name> [,<col_name> ...)]]
<select_statement>
FROM <from_statement>

Example:

Note

For low-precision TIMESTAMP types from the source Paimon table, data is truncated during write operations as follows: precision 0–3 is truncated to 3 decimal places, 4–6 to 6 decimal places, and 7–9 to 9 decimal places.

CREATE TABLE schema_table(id int,name string);
INSERT INTO schema_table VALUES (101,'Zhang San'),(102,'Li Si');
-- Query the schema_table table.
SELECT * FROM schema_table;
-- Result
+------------+------------+
| id         | name       | 
+------------+------------+
| 101        | Zhang San  | 
| 102        | Li Si      | 
+------------+------------+

Switch to the existing default schema.

use schema default;
SHOW tables;
-- Sample result
ID = 20250919*******yg5
acs:ram::<uid>:root	emp
acs:ram::<uid>:root	emp_detail
acs:ram::<uid>:root	test_table
OK
-- Read data from the table.
SELECT * FROM test_table;
-- Result
+------------+------------+
| id         | name       | 
+------------+------------+
| 101        | Zhang San  | 
| 102        | Li Si      | 
+------------+------------+
-- Write data to the table and query to verify the write operation.
INSERT INTO test_table VALUES (103,'Wang Wu');
SELECT * FROM test_table;
-- Result
+------------+------------+
| id         | name       | 
+------------+------------+
| 101        | Zhang San  | 
| 102        | Li Si      | 
| 103        | Wang Wu    | 
+------------+------------+

Passing Paimon table properties

Apache Paimon provides core configuration options. When creating a Paimon table in an external project, specify these options in the TBLPROPERTIES clause of your CREATE TABLE statement.

Configuration: Add parameters prefixed with mcfed. to the TBLPROPERTIES list. The parameter names after the prefix must match the native Paimon parameter names.

Example

Create a Paimon external table with buckets, a primary key, and partitions

Create the table and configure its external parameters

-- Switch to your external project. You can skip this step if you are already in it.
use <your external project>;
-- Enable schema syntax at the session level.
SET odps.namespace.schema=true;
-- Select the schema to use.
use schema <your schema>;
CREATE TABLE oss_extable_bucket_pk_pt_bucket
(
    id BIGINT,
    name STRING,
    dt STRING
)tblproperties (
    'mcfed.bucket'='3', -- Number of buckets
    'mcfed.bucket-key'='id', -- Bucket key. Optional if a primary key is specified.
    "mcfed.primary-key"="dt,id", -- Primary key
    "mcfed.partition"="dt" -- Partition field
    );

Insert data into the external table

INSERT INTO oss_extable_bucket_pk_pt_bucket PARTITION (dt='2025-06-18') 
  VALUES (1, 'Alice'),(2, 'Bob');
INSERT INTO oss_extable_bucket_pk_pt_bucket PARTITION (dt='2025-06-19') 
  VALUES (3, 'Charlie'),(4, 'David'),(5, 'Eva');

Query the external table

SELECT * FROM oss_extable_bucket_pk_pt_bucket;
-- Result:
+------------+---------+------------+
| id         | name    | dt         |
+------------+---------+------------+
| 1          | Alice   | 2025-06-18 |
| 2          | Bob     | 2025-06-18 |
| 4          | David   | 2025-06-19 |
| 3          | Charlie | 2025-06-19 |
| 5          | Eva     | 2025-06-19 |
+------------+---------+------------+

Log on to the Data Lake Formation (DLF) console, and in the upper-left corner, select a region.

View the table details in the catalog.

FAQ

Issue 1: Error creating an external project

Problem: When you attempt to create an external project, you receive the error message: "You are not authorized to perform this action."

In the error details, the error code is NoMCPermission.

Solution:

If you are using a RAM user, ensure that the AliyunMaxComputeFullAccess policy is attached to the RAM user. For more information, see Step 1.
This operation requires you to select a catalog from Data Lake Formation (DLF). Ensure you have the required permissions for DLF operations. For more information, see Step 1.

Issue 2: Error running SHOW TABLES in default schema

Problem: When you run the SHOW TABLES command in the default schema of an external project, you receive the following error message: "Forbidden:User acs: ram: :<uid>:user/** doesn't have privilege LIST on DATABASE default".

Solution:

Log on to the Data Lake Formation (DLF) console, and in the upper-left corner, select a region.
In the navigation pane on the left, choose System & Security
On the Access Control > Users tab, refresh the page to check if the current RAM user exists.
If the user exists, go to the Access Control > Role tab to grant permissions to the RAM user.

Issue 3: "invalid database operations on two-tier" error

Problem: When you run the SHOW SCHEMAS command in an external project, you receive the error message: "invalid database operations on two-tier".

Solution:

First, ensure that you are using an external project.
1. If you are using a DataWorks SQL node, click Debug Configuration in the right-side pane and verify that the bound compute resource is an external project.
2. If you are using DataWorks SQL analysis, click Data Source in the upper-right corner and verify that the bound data source is an external project.
3. If you are using MaxCompute SQL analysis, click Run Configuration in the right-side pane and verify that the project selected in the compute resource section is an external project.
4. If you are using odpscmd, check the value of the project_name parameter in the odps_config.ini file to ensure that it is set to the name of the external project.
After you confirm that you are using an external project, select and run the following SQL statements together.
```
-- Enable schema syntax at the session level.
SET odps.namespace.schema=true;
SHOW schemas;
```

Issue 4: SHOW SCHEMAS returns only default schema

Problem: When you run the SHOW SCHEMAS command in an external project, the command returns only 'default'.

Solution:

First, ensure that you are using an external project.
1. If you are using a DataWorks SQL node, click Debug Configuration in the right-side pane and verify that the bound compute resource is an external project.
2. If you are using DataWorks SQL analysis, click Data Source in the upper-right corner and verify that the bound data source is an external project.
3. If you are using MaxCompute SQL analysis, click Run Configuration in the right-side pane and verify that the project selected in the compute resource section is an external project.
4. If you are using odpscmd, check the value of the project_name parameter in the odps_config.ini file to ensure that it is set to the name of the external project.
After you confirm that you are using an external project, log on to the Data Lake Formation (DLF) console and check whether other databases exist in the catalog that is bound to the external project.
If you have completed the verifications in Steps 1 and 2, but the SHOW SCHEMAS command still does not return the expected databases, submit a ticket to report the issue.

Issue 5: "Dynamic bucket" error

Problem: When you run the SHOW TABLES command in an external project, you receive the following error message:

ODPS-0110005: 
com.aliyun.odps.meta.exception.MetaException: 
com.aliyun.odps.common.table.na.NativeException: 
common/table/jni/utils/jni_helper.cpp(79): UnretryableException: Common table connector exception 
  - ExceptionType: java.lang.IllegalArgumentException 
  - Message: You should use dynamic bucket (bucket = -1) mode in cross partition update case (Primary key constraint [sending_time, symbol, sequence_number] not include all partition fields [pt]).[pt])

Solution:

This error occurs because MaxCompute does not support cross-partition primary key tables with fixed buckets.

Temporary workaround: Delete the unsupported cross-partition PK tables in the Data Lake Formation (DLF) console.

Issue 6: "Can't set default schema..." error

Problem: When you run SHOW TABLES or SHOW SCHEMAS, you receive the following error message: "FAILED: Can't set default schema if odps.namespace.schema is false."

Solution:

First, ensure that you have added the session flag SET odps.namespace.schema=true; before your SQL statements to enable schema syntax.
If the error persists after you enable the flag, verify the order of your statements. To read a table from another external project, use the following sequence of commands:
```
USE external_project;
SET odps.namespace.schema=true;
USE schema schema_name;
SHOW tables;
```

Issue 7: "Failed to connect to...dlf.aliyuncs.com" error

Problem: When you try to read data from a table in a Paimon_DLF external project, you receive an error message similar to the following: "Caused by: java.net.ConnectException: Failed to connect to cn-***-intranet.dlf.aliyuncs.com/xx.xx.xx.xx:80"

This error indicates that you need to add the address cn-***-intranet.dlf.aliyuncs.com to the allowlist of the internal project where the job runs.

Solution:

Join the MaxCompute developer community DingTalk group by using the application link or by searching for the group ID 11782920 in DingTalk. Contact the MaxCompute technical support team and request that they add the address from the error message to the allowlist.

Once the address is added to the allowlist, include the following parameters before your read operations.

set odps.security.outbound.intranetlist=cn-***-intranet.dlf.aliyuncs.com:80;
set odps.internet.access.list=cn-***-intranet.dlf.aliyuncs.com:80;

Data type mapping

For details on MaxCompute data types, see Data types (V1.0) and Data types (V2.0).

Paimon data type	MaxCompute 2.0 data type	Read/write support	Description
TINYINT	TINYINT	Supported	8-bit signed integer.
SMALLINT	SMALLINT	Supported	16-bit signed integer.
INT	INT	Supported	32-bit signed integer.
BIGINT	BIGINT	Supported	64-bit signed integer.
BINARY(MAX_LENGTH)	BINARY	Supported	Binary data type. The current maximum length is 8 MB.
FLOAT	FLOAT	Supported	32-bit binary floating-point number.
DOUBLE	DOUBLE	Supported	64-bit binary floating-point number.
DECIMAL(precision,scale)	DECIMAL(precision,scale)	Supported	Exact decimal numeric type. The default is `decimal(38,18)`. You can customize the precision and scale values. precision: The maximum number of digits. The valid values are `1` to `38`. scale: The number of digits in the fractional part. The valid values are `0` to `18`.
VARCHAR(n)	VARCHAR(n)	Supported	Variable-length character type. n specifies the length and ranges from 1 to 65,535.
CHAR(n)	CHAR(n)	Supported	Fixed-length character type. n specifies the length and ranges from 1 to 255.
VARCHAR(MAX_LENGTH)	STRING	Supported	String type. The current maximum length is 8 MB.
DATE	DATE	Supported	Date type. The format is `yyyy-mm-dd`.
TIME, TIME(p)	Not supported	Not supported	The Paimon TIME data type represents a time without a time zone, consisting of hours, minutes, and seconds, with nanosecond precision. TIME(p) specifies the fractional-second precision from 0 to 9. The default value is 0. No corresponding type exists in MaxCompute.
TIMESTAMP, TIMESTAMP(p)	TIMESTAMP_NTZ	Supported	A timestamp type without a time zone that is accurate to the nanosecond. To read this data type, you must disable the native JNI bridge by running the following command: `SET odps.sql.common.table.jni.disable.native=true;`
TIMESTAMP WITH LOCAL TIME_ZONE(9)	TIMESTAMP	Supported	A timestamp type that is accurate to the nanosecond. The format is `yyyy-mm-dd hh:mm:ss.xxxxxxxxx`. For low-precision TIMESTAMP types in the Paimon source table, MaxCompute truncates values during writes. For precision 0-3, data is truncated to 3 digits. For precision 4-6, data is truncated to 6 digits. For precision 7-9, data is truncated to 9 digits.
TIMESTAMP WITH LOCAL TIME_ZONE(9)	DATETIME	Not supported	A timestamp type that is accurate to the nanosecond. The format is `yyyy-mm-dd hh:mm:ss.xxxxxxxxx`.
BOOLEAN	BOOLEAN	Supported	A BOOLEAN type.
ARRAY	ARRAY	Supported	A complex type.
MAP	MAP	Supported	A complex type.
ROW	STRUCT	Supported	A complex type.
MULTISET<t>	Not supported	Not supported	No corresponding type exists in MaxCompute.
VARBINARY, VARBINARY(n), BYTES	BINARY	Supported	A data type of variable-length binary strings.