MaxCompute-managed Iceberg tables (beta)

Overview

MaxCompute stores Apache Iceberg tables in Object Storage Service (OSS) with unified metadata, permission, and lifecycle management. Iceberg tables are compatible with open-source engines such as Spark, Flink, Trino, and Presto, enabling multi-engine data sharing for a Lakehouse architecture.

The open Iceberg format eliminates data silos and enables seamless data sharing with Spark, Flink, and Hive. You can choose the right compute engine for each workload without vendor lock-in. The open standard also ensures long-term data portability and supports cross-organizational collaboration in multi-cloud scenarios.

MaxCompute-managed Iceberg tables offer the following features:

Iceberg lake table data management:

Create, read, and write Iceberg tables on OSS using MaxCompute SQL. Supports DDL, DML, time travel, schema evolution, and partition evolution.
Automated background maintenance:

Built-in optimization handles compaction, snapshot expiration, and metadata archiving automatically, eliminating manual maintenance.
Metadata consistency:

MaxCompute Metastore (MaxMeta) manages table metadata with ACID-compliant writes. External engines must use the MaxCompute Storage Write API or a MaxCompute-native engine, preventing metadata conflicts from multiple write sources.
Open ecosystem integration:

Trino, Presto, Spark, Flink, and Doris can access managed Iceberg tables natively, enabling low-latency, high-concurrency multi-engine queries.
Unified permission management: Define permissions for your data lake and data warehouse in a single system. Integrates RAM permissions with data warehouse access control for enterprise-grade Lakehouse security.

Usage notes

The following operations on a MaxCompute-managed Iceberg table can cause data loss or make data unreadable:

Forbidden operation	Consequence	Correct practice
Modifying Iceberg table data through any interface other than MaxCompute write APIs.	The table may fail consistency checks and become unreadable.	Modify data only through MaxCompute SQL.
Uploading files to the managed OSS path.	The background service treats uploaded files as orphan files and deletes them.	Do not upload data to the managed path.
Sharing the same OSS path across multiple managed Iceberg tables.	Each table's garbage collection process deletes the other table's files.	Use a unique OSS path for each table.
Using OSS paths with a parent-child directory relationship for different managed Iceberg tables.	Each table's garbage collection treats the other table's files as orphan data and deletes them, causing data loss.	Use a unique OSS path for each managed Iceberg table. Avoid paths with parent-child relationships.

Create a table

Syntax

CREATE ICEBERG TABLE [IF NOT EXISTS] <table name> (
  <col_name> <data_type>,
  ...
)
PARTITIONED BY (<partitionExpression>)
WITH CONNECTION <connection name>
OPTIONS(
  location='<oss_location>'
)
;

Parameters

Click to expand parameter details

Parameter	Required	Description
table name	Yes	The table name. Case-insensitive. Case conversion is not supported.
col_name	Yes	The column name. The schema must match the OSS data files. Mismatched schemas prevent data reads.
data_type	Yes	The column data type. Each column's data type must match the corresponding type in the OSS data files. Mismatched types prevent data reads.
table_comment	No	A table comment. Must be a valid string of no more than 1,024 bytes.
partitionExpression	No	A partition expression with the same semantics as Iceberg partition expressions. Supported functions: `<col_name>`: The partition value is the raw value of the column. `BUCKET ( <num_buckets> , <col_name> )`: Hashes the column value modulo N. Result range: 0 to N-1. `TRUNCATE ( <width> , <col_name> )`: Truncates the field value to the specified width for partitioning. `YEAR ( <col_name> )`: Extracts the year from a date or timestamp. `MONTH ( <col_name> )`: Extracts the month from a date or timestamp. `DAY ( <col_name> )`: Extracts the day from a date or timestamp. `HOUR ( <col_name> )`: Extracts the hour from a timestamp.
connection name	Yes	The CONNECTION name. Create CONNECTION.
oss_location	Yes	The OSS path for data files. This is the parent directory of the `data` and `metadata` directories. The format is `location='oss://<Bucket name>/<OSS directory name>/'`. Bucket name: The OSS bucket name. For more information, see List buckets. Directory name: The OSS directory path. Must not include a file name. Example: `oss://oss-mc-test/Demo1/`.

Example

CREATE ICEBERG TABLE mc_iceberg_table (
  id bigint COMMENT 'Unique user ID',
  name string COMMENT 'User name',
  age bigint COMMENT 'User age',
  gender string COMMENT 'User gender',
  height float COMMENT 'User height',
  birthday date COMMENT 'User date of birth',
  phone_number string COMMENT 'User phone number',
  email string COMMENT 'User email address',
  address string COMMENT 'User address',
  salary decimal(18, 2) COMMENT 'User salary',
  create_time timestamp COMMENT 'Time when user information was created',
  update_time timestamp COMMENT 'Time when user information was last updated',
  is_deleted boolean COMMENT 'Flag indicating if the user information has been deleted',
  dt string COMMENT 'Partition field'
)
PARTITIONED BY (dt)
WITH CONNECTION <connection name>
OPTIONS(
  location='oss://<oss bucket>/Demo-iceberg/'
);

Writing data

MaxCompute write syntax: Syntax description.

Example:

SET odps.sql.type.system.odps2=true;
SET odps.sql.decimal.odps2=true;

INSERT INTO mc_iceberg_table VALUES
    (1, 'Zhang San', 18, 'Male', cast (178.56 as float), DATE '1990-01-01', '13800000000', 'zhangsan@example.com', 'Haidian District, Beijing', 5000.00, TIMESTAMP '2023-04-19 11:32:00', TIMESTAMP '2023-04-19 11:32:00', false,'20260402'),
    (2, 'Li Si', 20, 'Female', cast (162.70 as float), DATE '1992-02-02', '13900000000', 'lisi@example.com', 'Pudong New Area, Shanghai', 6000.00, TIMESTAMP '2023-04-19 11:32:00', TIMESTAMP '2023-04-19 11:32:00',false,'20260401'),
    (3, 'Wang Wu', 22, 'Male', cast (185.21 as float), DATE '1994-03-03', '14000000000', 'wangwu@example.com', 'Nanshan District, Shenzhen', 7000.00, TIMESTAMP '2023-04-19 11:32:00', TIMESTAMP '2023-04-19 11:32:00', false,'20260403')
;

Querying and analyzing data

SELECT syntax: Syntax description.

Example:

SELECT * FROM mc_iceberg_table;

-- The following result is returned:
+------------+------+------------+--------+--------+----------+--------------+-------+---------+--------+-------------+-------------+------------+----+
| id         | name | age        | gender | height | birthday | phone_number | email | address | salary | create_time | update_time | is_deleted | dt |
+------------+------+------------+--------+--------+----------+--------------+-------+---------+--------+-------------+-------------+------------+----+
| 1          | Zhang San  | 18         | Male   | 178.56 | 1990-01-01 | 13800000000  | zhangsan@example.com | Haidian District, Beijing | 5000   | 2023-04-19 03:32:00 | 2023-04-19 03:32:00 | false      | 20260402 |
| 2          | Li Si      | 20         | Female | 162.7  | 1992-02-02 | 13900000000  | lisi@example.com     | Pudong New Area, Shanghai | 6000   | 2023-04-19 03:32:00 | 2023-04-19 03:32:00 | false      | 20260401 |
| 3          | Wang Wu    | 22         | Male   | 185.21 | 1994-03-03 | 14000000000  | wangwu@example.com   | Nanshan District, Shenzhen| 7000   | 2023-04-19 03:32:00 | 2023-04-19 03:32:00 | false      | 20260403 |
+------------+------+------------+--------+--------+----------+--------------+-------+---------+--------+-------------+-------------+------------+----+

Iceberg external vs. managed tables

Dimension	Iceberg external table	MaxCompute-managed Iceberg table
CREATE TABLE statement	`CREATE EXTERNAL TABLE [if NOT EXISTS] <mc_oss_extable_name> ( <col_name> <data_type>, ... ) [COMMENT <table_comment>] [PARTITIONED BY (<col_name> <data_type>, ...)] STORED AS ICEBERG WITH SERDEPROPERTIES ( 'odps.properties.rolearn'='acs:ram::<uid>:role/aliyunodpsdefaultrole' ) LOCATION '<oss_location>' TBLPROPERTIES ('<tbproperty_name>'='<tbproperty_value>',...);`	`CREATE [OR REPLACE] ICEBERG TABLE [IF NOT EXISTS] <table name> ( <col_name> <data_type>, ... ) PARTITIONED BY (<partitionExpression>) WITH CONNECTION <connection name> OPTIONS( <identifier> = expr, ... ) [LIFECYCLE <days>];`
Table lifecycle	MaxCompute only maps the table. `DROP TABLE` removes metadata only, not OSS data.	MaxCompute manages the full lifecycle of metadata and OSS data. The specified OSS directory must be empty at creation. `DROP TABLE` removes metadata only, not OSS data.
Metadata and data governance	Same as standard external tables.	High-performance metadata caching.
Data openness (read/write by open-source engines)	Consistency guaranteed by the Iceberg format.	MaxCompute guarantees read/write consistency. Open-source engines read from OSS directly or use the MaxCompute Storage API.
Lake table maintenance	Maintained by the user.	Automatic background maintenance: Automatic compaction Automatic cleanup of expired snapshots Automatic garbage collection

Supported data types

Iceberg data type	MaxCompute data type	Read/write support
Types.BooleanType	BOOLEAN	Supported
Types.IntegerType	INT	Supported
Types.LongType	BIGINT	Supported
Types.FloatType	FLOAT	Supported
Types.DoubleType	DOUBLE	Supported
Types.DecimalType	DECIMAL(precision, scale) 0 <= precision <= 38 0 <= scale <= 18	Supported
Types.DateType	DATE	Supported
Types.TimeType	BIGINT MaxCompute writes data to the Iceberg TIME type with microsecond precision. To write to an Iceberg external table, you must specify the time fields in the `iceberg_time_fields` table property. col1;col2 For a list type: col1.entry For a map type: col1.key;col1.value For a struct type: col1.field1;col2.field2	Supported
Types.TimestampType	TIMESTAMP_NTZ	Supported
Types.TimestampType_z	TIMESTAMP	Supported
Types.StringType	STRING	Supported
Types.UUIDType	BINARY The UUID must be 16 bytes long. For details on uuid conversion, refer to this conversion link. To write to an Iceberg external table, you must specify the UUID fields in the `iceberg_uuid_fields` table property. col1;col2 For a list type: col1.entry For a map type: col1.key;col1.value For a struct type: col1.field1;col2.field2	Not supported
Types.FixedType	BINARY	Supported
Types.BinaryType	BINARY	Supported
TypeID.STRUCT	STRUCT	Supported
TypeID.LIST	ARRAY	Supported
TypeID.MAP	MAP	Supported
N/A	TINYINT, SMALLINT, VARCHAR(n), CHAR(n), DATETIME, JSON	N/A

Limitations

Version limitations
- Only features compatible with Iceberg SDK 1.6.1 are supported.
- Reading and writing are supported only for Iceberg table format v2.
- Features of the Iceberg v3 format are not supported.
Supported operations
- MaxCompute data types: Data types (Version 1.0) and Data types (Version 2.0).
- Schema evolution currently supports only adding and dropping columns.
- Partition pruning is supported. If not enabled, submit feedback through the Application Link or join the MaxCompute Developer Community DingTalk group (group ID: 11782920).
Unsupported operations
- For MaxCompute-managed Iceberg tables:
  - The Rename Table statement is not supported.
  - The XCOPY Cross-Region Replication statement is not supported.
  - The CLONE TABLE and RENAME TABLE statements are not supported.
  - Statements to rename table snapshots are not supported.
  - Setting default values for columns is not supported.
  - Modifying the data type of a column is not supported.
  - Materialized view operations are not supported.
  - The UPDATE statement is not supported.
  - Row-level access control is not supported.
  - The TRUNCATE statement is not supported.
  - CDC (private preview) is not supported.
  - Cross-region disaster recovery is not supported.
  - Local backup is not supported.
  - The CREATE OR REPLACE statement is not supported.
  - The LOAD and UNLOAD commands are not supported.
- Console queries and API responses display the storage size of managed Iceberg tables as 0 bytes.