MaxCompute-managed Iceberg tables (beta)

更新时间:
复制 MD 格式
Important

This feature is in private preview. To apply, complete the Trial Application Form. The MaxCompute team reviews applications within 3 business days and sends results by SMS. For questions about the result, contact us through the Application Link.

Overview

MaxCompute stores Apache Iceberg tables in Object Storage Service (OSS) with unified metadata, permission, and lifecycle management. Iceberg tables are compatible with open-source engines such as Spark, Flink, Trino, and Presto, enabling multi-engine data sharing for a Lakehouse architecture.

The open Iceberg format eliminates data silos and enables seamless data sharing with Spark, Flink, and Hive. You can choose the right compute engine for each workload without vendor lock-in. The open standard also ensures long-term data portability and supports cross-organizational collaboration in multi-cloud scenarios.

MaxCompute-managed Iceberg tables offer the following features:

  • Iceberg lake table data management:

    Create, read, and write Iceberg tables on OSS using MaxCompute SQL. Supports DDL, DML, time travel, schema evolution, and partition evolution.

  • Automated background maintenance:

    Built-in optimization handles compaction, snapshot expiration, and metadata archiving automatically, eliminating manual maintenance.

  • Metadata consistency:

    MaxCompute Metastore (MaxMeta) manages table metadata with ACID-compliant writes. External engines must use the MaxCompute Storage Write API or a MaxCompute-native engine, preventing metadata conflicts from multiple write sources.

  • Open ecosystem integration:

    Trino, Presto, Spark, Flink, and Doris can access managed Iceberg tables natively, enabling low-latency, high-concurrency multi-engine queries.

  • Unified permission management: Define permissions for your data lake and data warehouse in a single system. Integrates RAM permissions with data warehouse access control for enterprise-grade Lakehouse security.

Usage notes

The following operations on a MaxCompute-managed Iceberg table can cause data loss or make data unreadable:

Forbidden operation

Consequence

Correct practice

Modifying Iceberg table data through any interface other than MaxCompute write APIs.

The table may fail consistency checks and become unreadable.

Modify data only through MaxCompute SQL.

Uploading files to the managed OSS path.

The background service treats uploaded files as orphan files and deletes them.

Do not upload data to the managed path.

Sharing the same OSS path across multiple managed Iceberg tables.

Each table's garbage collection process deletes the other table's files.

Use a unique OSS path for each table.

Using OSS paths with a parent-child directory relationship for different managed Iceberg tables.

Each table's garbage collection treats the other table's files as orphan data and deletes them, causing data loss.

Use a unique OSS path for each managed Iceberg table. Avoid paths with parent-child relationships.

Create a table

Syntax

CREATE ICEBERG TABLE [IF NOT EXISTS] <table name> (
  <col_name> <data_type>,
  ...
)
PARTITIONED BY (<partitionExpression>)
WITH CONNECTION <connection name>
OPTIONS(
  location='<oss_location>'
)
;

Parameters

Click to expand parameter details

Parameter

Required

Description

table name

Yes

The table name.

Case-insensitive. Case conversion is not supported.

col_name

Yes

The column name.

The schema must match the OSS data files. Mismatched schemas prevent data reads.

data_type

Yes

The column data type.

Each column's data type must match the corresponding type in the OSS data files. Mismatched types prevent data reads.

table_comment

No

A table comment. Must be a valid string of no more than 1,024 bytes.

partitionExpression

No

A partition expression with the same semantics as Iceberg partition expressions. Supported functions:

  • <col_name>: The partition value is the raw value of the column.

  • BUCKET ( <num_buckets> , <col_name> ): Hashes the column value modulo N. Result range: 0 to N-1.

  • TRUNCATE ( <width> , <col_name> ): Truncates the field value to the specified width for partitioning.

  • YEAR ( <col_name> ): Extracts the year from a date or timestamp.

  • MONTH ( <col_name> ): Extracts the month from a date or timestamp.

  • DAY ( <col_name> ): Extracts the day from a date or timestamp.

  • HOUR ( <col_name> ): Extracts the hour from a timestamp.

connection name

Yes

The CONNECTION name. Create CONNECTION.

oss_location

Yes

The OSS path for data files. This is the parent directory of the data and metadata directories.

The format is location='oss://<Bucket name>/<OSS directory name>/'.

  • Bucket name: The OSS bucket name. For more information, see List buckets.

  • Directory name: The OSS directory path. Must not include a file name. Example: oss://oss-mc-test/Demo1/.

Example

CREATE ICEBERG TABLE mc_iceberg_table (
  id bigint COMMENT 'Unique user ID',
  name string COMMENT 'User name',
  age bigint COMMENT 'User age',
  gender string COMMENT 'User gender',
  height float COMMENT 'User height',
  birthday date COMMENT 'User date of birth',
  phone_number string COMMENT 'User phone number',
  email string COMMENT 'User email address',
  address string COMMENT 'User address',
  salary decimal(18, 2) COMMENT 'User salary',
  create_time timestamp COMMENT 'Time when user information was created',
  update_time timestamp COMMENT 'Time when user information was last updated',
  is_deleted boolean COMMENT 'Flag indicating if the user information has been deleted',
  dt string COMMENT 'Partition field'
)
PARTITIONED BY (dt)
WITH CONNECTION <connection name>
OPTIONS(
  location='oss://<oss bucket>/Demo-iceberg/'
);

Writing data

  • MaxCompute write syntax: Syntax description.

  • Example:

    SET odps.sql.type.system.odps2=true;
    SET odps.sql.decimal.odps2=true;
    
    INSERT INTO mc_iceberg_table VALUES
        (1, 'Zhang San', 18, 'Male', cast (178.56 as float), DATE '1990-01-01', '13800000000', 'zhangsan@example.com', 'Haidian District, Beijing', 5000.00, TIMESTAMP '2023-04-19 11:32:00', TIMESTAMP '2023-04-19 11:32:00', false,'20260402'),
        (2, 'Li Si', 20, 'Female', cast (162.70 as float), DATE '1992-02-02', '13900000000', 'lisi@example.com', 'Pudong New Area, Shanghai', 6000.00, TIMESTAMP '2023-04-19 11:32:00', TIMESTAMP '2023-04-19 11:32:00',false,'20260401'),
        (3, 'Wang Wu', 22, 'Male', cast (185.21 as float), DATE '1994-03-03', '14000000000', 'wangwu@example.com', 'Nanshan District, Shenzhen', 7000.00, TIMESTAMP '2023-04-19 11:32:00', TIMESTAMP '2023-04-19 11:32:00', false,'20260403')
    ;

Querying and analyzing data

  • SELECT syntax: Syntax description.

  • Example:

    SELECT * FROM mc_iceberg_table;
    
    -- The following result is returned:
    +------------+------+------------+--------+--------+----------+--------------+-------+---------+--------+-------------+-------------+------------+----+
    | id         | name | age        | gender | height | birthday | phone_number | email | address | salary | create_time | update_time | is_deleted | dt |
    +------------+------+------------+--------+--------+----------+--------------+-------+---------+--------+-------------+-------------+------------+----+
    | 1          | Zhang San  | 18         | Male   | 178.56 | 1990-01-01 | 13800000000  | zhangsan@example.com | Haidian District, Beijing | 5000   | 2023-04-19 03:32:00 | 2023-04-19 03:32:00 | false      | 20260402 |
    | 2          | Li Si      | 20         | Female | 162.7  | 1992-02-02 | 13900000000  | lisi@example.com     | Pudong New Area, Shanghai | 6000   | 2023-04-19 03:32:00 | 2023-04-19 03:32:00 | false      | 20260401 |
    | 3          | Wang Wu    | 22         | Male   | 185.21 | 1994-03-03 | 14000000000  | wangwu@example.com   | Nanshan District, Shenzhen| 7000   | 2023-04-19 03:32:00 | 2023-04-19 03:32:00 | false      | 20260403 |
    +------------+------+------------+--------+--------+----------+--------------+-------+---------+--------+-------------+-------------+------------+----+

Iceberg external vs. managed tables

Dimension

Iceberg external table

MaxCompute-managed Iceberg table

CREATE TABLE statement

CREATE EXTERNAL TABLE [if NOT EXISTS] <mc_oss_extable_name>
(
 <col_name> <data_type>,
 ...
)
[COMMENT <table_comment>]
[PARTITIONED BY (<col_name> <data_type>, ...)]
STORED AS ICEBERG
WITH SERDEPROPERTIES (
 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole'
)
LOCATION '<oss_location>' 
TBLPROPERTIES ('<tbproperty_name>'='<tbproperty_value>',...);
CREATE [OR REPLACE] ICEBERG TABLE [IF NOT EXISTS] <table name> 
(
 <col_name> <data_type>,
 ...
)
PARTITIONED BY (<partitionExpression>)
WITH CONNECTION <connection name>
OPTIONS(
 <identifier> = expr,
 ...
)
[LIFECYCLE <days>];

Table lifecycle

MaxCompute only maps the table. DROP TABLE removes metadata only, not OSS data.

MaxCompute manages the full lifecycle of metadata and OSS data. The specified OSS directory must be empty at creation. DROP TABLE removes metadata only, not OSS data.

Metadata and data governance

Same as standard external tables.

High-performance metadata caching.

Data openness (read/write by open-source engines)

Consistency guaranteed by the Iceberg format.

MaxCompute guarantees read/write consistency. Open-source engines read from OSS directly or use the MaxCompute Storage API.

Lake table maintenance

Maintained by the user.

Automatic background maintenance:

  • Automatic compaction

  • Automatic cleanup of expired snapshots

  • Automatic garbage collection

Supported data types

Iceberg data type

MaxCompute data type

Read/write support

Types.BooleanType

BOOLEAN

Supported

Types.IntegerType

INT

Supported

Types.LongType

BIGINT

Supported

Types.FloatType

FLOAT

Supported

Types.DoubleType

DOUBLE

Supported

Types.DecimalType

DECIMAL(precision, scale)

  • 0 <= precision <= 38

  • 0 <= scale <= 18

Supported

Types.DateType

DATE

Supported

Types.TimeType

BIGINT

  • MaxCompute writes data to the Iceberg TIME type with microsecond precision.

  • To write to an Iceberg external table, you must specify the time fields in the iceberg_time_fields table property.

    • col1;col2

    • For a list type: col1.entry

    • For a map type: col1.key;col1.value

    • For a struct type: col1.field1;col2.field2

Supported

Types.TimestampType

TIMESTAMP_NTZ

Supported

Types.TimestampType_z

TIMESTAMP

Supported

Types.StringType

STRING

Supported

Types.UUIDType

BINARY

  • The UUID must be 16 bytes long.

  • For details on uuid conversion, refer to this conversion link.

  • To write to an Iceberg external table, you must specify the UUID fields in the iceberg_uuid_fields table property.

    • col1;col2

    • For a list type: col1.entry

    • For a map type: col1.key;col1.value

    • For a struct type: col1.field1;col2.field2

Not supported

Types.FixedType

BINARY

Supported

Types.BinaryType

BINARY

Supported

TypeID.STRUCT

STRUCT

Supported

TypeID.LIST

ARRAY

Supported

TypeID.MAP

MAP

Supported

N/A

TINYINT, SMALLINT, VARCHAR(n), CHAR(n), DATETIME, JSON

N/A

Limitations

  • Version limitations

    • Only features compatible with Iceberg SDK 1.6.1 are supported.

    • Reading and writing are supported only for Iceberg table format v2.

    • Features of the Iceberg v3 format are not supported.

  • Supported operations

    • MaxCompute data types: Data types (Version 1.0) and Data types (Version 2.0).

    • Schema evolution currently supports only adding and dropping columns.

    • Partition pruning is supported. If not enabled, submit feedback through the Application Link or join the MaxCompute Developer Community DingTalk group (group ID: 11782920).

  • Unsupported operations