Filesystem catalog external projects

更新时间:
复制 MD 格式

You can use MaxCompute to create an external project by mapping a Paimon catalog directory in OSS. This creates a project that mirrors the standard Paimon file system hierarchy. Access permissions are based on the authorization granted to a user's RAM role for the OSS bucket, supporting read and write operations on both metadata and data in the Paimon lake format. With this catalog-level mapping, you are responsible for ensuring file system compliance and managing your own permissions and storage for files in the lake format. This approach is ideal for stream-batch integration scenarios where permissions are controllable and you can handle maintenance independently.

Limitations

  • Only tables in Paimon format are supported.

  • Writing to Dynamic Bucket tables is not supported.

  • Writing to Cross Partition tables is not supported.

  • For more information, see Data type mapping.

Procedure

Step 1: Grant permissions to a RAM user

If you are a RAM user, ensure the following policy is attached to your account. For instructions, see Manage RAM user permissions.

  • AliyunMaxComputeFullAccess: Grants the permissions required to create external data sources and external projects.

Step 2: Create a FileSystem Catalog external data source

  1. Activate OSS and create a bucket to store your Paimon data. For more information, see Quick start.

  2. Log on to the MaxCompute console, and select a region in the upper-left corner.

  3. In the navigation pane on the left, choose Manage Configurations > External Data Source.

  4. On the External Data Source page, click Create External Data Source.

  5. In the Create External Data Source dialog box, configure the parameters. The following tables describe the parameters.

    Parameter

    Required

    Description

    External Data Source Type

    Yes

    Select FileSystem Catalog.

    External Data Source Name

    Yes

    The name of the external data source. The name must meet the following requirements:

    • It must start with a letter and can contain only lowercase letters, digits, and underscores (_).

    • It must be 128 characters or fewer.

    Example: external_fs.

    Description

    No

    Optional. A description of the external data source.

    Region

    Yes

    Defaults to the current region.

    Authentication and Authorization

    Yes

    Defaults to Alibaba Cloud RAM role.

    Role ARN

    Yes

    The Alibaba Cloud Resource Name (ARN) of the RAM role. This role must have permissions to access OSS.

    1. Log on to the Resource Access Management (RAM) console.

    2. In the navigation pane on the left, choose Identities > Roles.

    3. In the Basic Information section, you can find the ARN.

    Example: acs:ram::124****:role/aliyunodpsdefaultrole.

    Storage Type

    Yes

    • OSS

    • OSS-HDFS

    Endpoint

    Yes

    The endpoint is automatically generated. For the China (Hangzhou) region, the endpoint is oss-cn-hangzhou-internal.aliyuncs.com.

    Foreign Server Supplemental Properties

    No

    Additional properties that define how tasks that use this data source access the source system.

    Note

    For information about the supported parameters, see the latest updates in the official documentation. More parameters will be gradually made available as the product evolves.

  6. Click OK to create the external data source.

  7. On the External Data Source page, find the target data source and click Details in the Actions column.

Step 3: Create an external project

  1. Log on to the MaxCompute console, and select a region in the upper-left corner.

  2. In the navigation pane on the left, choose Manage Configurations > Projects.

  3. On the External Project tab, click Create Project.

  4. In the Create Project dialog box, configure the project information as prompted and click OK.

    Parameter

    Required

    Description

    Project Type

    Yes

    Defaults to external project.

    Region

    Yes

    Defaults to the current region. This parameter cannot be modified.

    Project Name (Globally Unique)

    Yes

    Must be 3 to 28 characters long, start with a letter, and contain only letters, digits, and underscores (_).

    MaxCompute Foreign Server Type

    No

    Select FileSystem Catalog.

    MaxCompute Foreign Server

    No

    • Use Existing: Select an existing external data source.

    • Create Foreign Server: Create and use a new external data source.

    MaxCompute Foreign Server Name

    Yes

    • Use Existing: Select the name of an existing external data source from the drop-down list.

    • Create External Data Source: The name of the new external data source is used.

    Bucket Catalog

    Yes

    Select the full path from the OSS bucket to the catalog-level file system directory. Example: oss://paimon-fs/paimon-test/.

    Table Format

    Yes

    Defaults to Paimon.

    Billing Method

    Yes

    Subscription or Pay-as-you-go.

    Default Quota

    Yes

    Select an existing quota.

    Description

    No

    A custom description for the project.

Step 4: Use SQL to access data

Important

Deleting an external project does not delete the underlying data because the project is only a mapping to the data source.

However, unlike with standard external tables, running a DROP TABLE or DROP SCHEMA command in an external project sends the request to the peer service. This permanently deletes the corresponding table or database. Use DROP operations with caution.

  1. Select a connection tool to log on to the external project.

  2. List the schemas in the external project. By default, only the database paths that store Paimon tables are listed.

    -- Enable schema syntax at the session level.
    SET odps.namespace.schema=true;
    SHOW schemas;
    
    -- The following result is returned.
    ID = 20250922********wbh2u7
    default
    
    
    OK
  3. List the tables in a schema within the external project.

    --  is the name of the schema displayed in the external project.
    USE SCHEMA <schema_name>; 
    SHOW tables;
  4. Create a new schema in the external project.

    CREATE schema <schema_name>;
    
    -- Example:
    CREATE schema schema_test;
  5. Use the new schema.

    use schema <schema_name>;
    
    -- Example:
    use schema schema_test;
  6. Create a table in the schema and insert data.

    • Syntax:

      -- Create a table.
      CREATE TABLE [IF NOT EXISTS] <table_name> 
      (
        <col_name> <data_type>,
        ...
      )
      [COMMENT <table_comment>]
      [PARTITIONED BY (<col_name> <data_type>, ...)] 
      ;
      
      -- Insert data.
      INSERT {INTO|OVERWRITE} TABLE <table_name> [PARTITION (<pt_spec>)] [(<col_name> [,<col_name> ...)]]
      <select_statement>
      FROM <from_statement>
    • Example:

      CREATE TABLE new_table(id INT,name STRING);
      
      INSERT INTO new_table VALUES (101,'Alice'),(102,'Bob');
      
      -- Query the new_table table.
      SELECT * FROM new_table;
      
      -- The following result is returned.
      +------------+------------+
      | id         | name       | 
      +------------+------------+
      | 101        | Alice      | 
      | 102        | Bob        | 
      +------------+------------+

Configure Paimon table properties

Apache Paimon includes specific core configuration options. When you create a Paimon table in an external project, you can configure these options by adding them to the TBLPROPERTIES clause of the Paimon external table.

Configuration method: Add parameters with the prefix mcfed. to the TBLPROPERTIES list. The parameter names must be the same as the native parameters of open-source Paimon.

Example

Configuring buckets, a primary key, and a partition key

  1. Create the table and configure its properties

    -- Switch to the external project. This is not needed if you are already in it.
    use <your external project>;
    
    -- Enable schema syntax for the current session.
    SET odps.namespace.schema=true;
    
    -- Select the schema to use.
    use schema <your schema>;
    
    CREATE TABLE oss_extable_bucket_pk_pt_bucket
    (
        id BIGINT,
        name STRING,
        dt STRING
    )tblproperties (
        'mcfed.bucket'='3', -- Number of buckets
        'mcfed.bucket-key'='id', -- Bucket key. Optional if a primary key is specified.
        "mcfed.primary-key"="dt,id", -- Primary key
        "mcfed.partition"="dt" -- Partition field
        );
  2. Insert data into the external table

    INSERT INTO oss_extable_bucket_pk_pt_bucket PARTITION (dt='2025-06-18') 
      VALUES (1, 'Alice'),(2, 'Bob');
    INSERT INTO oss_extable_bucket_pk_pt_bucket PARTITION (dt='2025-06-19') 
      VALUES (3, 'Charlie'),(4, 'David'),(5, 'Eva');
  3. Query the external table

    SELECT * FROM oss_extable_bucket_pk_pt_bucket;
    
    -- Sample output:
    +------------+---------+------------+
    | id         | name    | dt         |
    +------------+---------+------------+
    | 1          | Alice   | 2025-06-18 |
    | 2          | Bob     | 2025-06-18 |
    | 4          | David   | 2025-06-19 |
    | 3          | Charlie | 2025-06-19 |
    | 5          | Eva     | 2025-06-19 |
    +------------+---------+------------+
    
  4. Log on to the Data Lake Formation (DLF) console and select a region in the upper-left corner.

    View the details of the table created in the catalog:

    image

Data type mapping

For more information about MaxCompute data types, see Data types (Version 1.0) and Data types (Version 2.0).

Paimon data type

MaxCompute 2.0 data type

Read/write support

Description

TINYINT

TINYINT

Supported

8-bit signed integer.

SMALLINT

SMALLINT

Supported

16-bit signed integer.

INT

INT

Supported

32-bit signed integer.

BIGINT

BIGINT

Supported

64-bit signed integer.

BINARY(MAX_LENGTH)

BINARY

Supported

Binary data type. The current length limit is 8 MB.

FLOAT

FLOAT

Supported

32-bit binary floating-point type.

DOUBLE

DOUBLE

Supported

64-bit binary floating-point type.

DECIMAL(precision,scale)

DECIMAL(precision,scale)

Supported

A decimal exact numeric type. The default is decimal(38,18). You can customize the precision and scale values.

  • precision: Specifies the maximum number of digits. Value range: 1 <= precision <= 38.

  • scale: Specifies the number of digits in the decimal part. The default value range is 0 <= scale <= 18.

VARCHAR(n)

VARCHAR(n)

Supported

Variable-length character type. n is the length, which must be in the range of [1, 65535].

CHAR(n)

CHAR(n)

Supported

Fixed-length character type. n is the length, which must be in the range of [1, 255].

VARCHAR(MAX_LENGTH)

STRING

Supported

The current length limit for the string type is 8 MB.

DATE

DATE

Supported

The date type format is yyyy-mm-dd.

TIME, TIME(p)

Not supported

Not supported

Represents a time of day without a time zone, with nanosecond precision.

TIME(p) indicates the precision of the fractional part, which must be in the range of 0 to 9. The default is 0.

MaxCompute has no corresponding type.

TIMESTAMP, TIMESTAMP(p)

TIMESTAMP_NTZ

Supported

A timestamp type without a time zone, precise to nanoseconds.

Reading a table requires disabling the native featureSET odps.sql.common.table.jni.disable.native=true;

TIMESTAMP WITH LOCAL TIME_ZONE(9)

TIMESTAMP

Supported

  • A timestamp data type with nanosecond precision in the format yyyy-mm-dd hh:mm:ss.xxxxxxxxx.

  • When writing data, Paimon TIMESTAMP values are normalized based on their precision. Values with a precision of 0-3 are written with 3 fractional digits, 4-6 with 6 fractional digits, and 7-9 with 9 fractional digits.

TIMESTAMP WITH LOCAL TIME_ZONE(9)

DATETIME

Not supported

A timestamp type precise to nanoseconds.

The format is yyyy-mm-dd hh:mm:ss.xxxxxxxxx.

BOOLEAN

BOOLEAN

Supported

BOOLEAN type.

ARRAY

ARRAY

Supported

Complex type.

MAP

MAP

Supported

Complex type.

ROW

STRUCT

Supported

Complex type.

MULTISET<t>

Not supported

Not supported

MaxCompute has no corresponding type.

VARBINARY, VARBINARY(n), BYTES

BINARY

Supported

Variable-length binary string data type.