Terms

更新时间:
复制 MD 格式

A Delta Table is a high-performance table format developed by MaxCompute for large-scale analytical datasets. It comes in two types — Append Delta Tables and PK Delta Tables — and supports atomicity, consistency, isolation, and durability (ACID) transactions, incremental queries, time travel, dynamic cluster bucketing, real-time data updates, and schema evolution.

What is a Delta Table

Delta Table is MaxCompute's open table format for large-scale analytics. Unlike traditional Hive-style tables, Delta Tables maintain a transaction log that records every change to the data, enabling database-grade reliability on top of object storage.

Delta Tables are the foundation for lakehouse workloads in MaxCompute: they let you run batch and streaming writes on the same table, query historical snapshots without maintaining separate backups, and evolve your schema safely as requirements change.

Table types

Delta Tables come in two types. The right choice depends on whether your data has a natural primary key and how you need to write to the table.

Append Delta Table

An Append Delta Table stores records in append-only fashion — each write adds new rows without modifying or deleting existing rows. There is no primary key constraint.

Use an Append Delta Table when:

  • Data arrives as an immutable event stream (logs, clickstream, sensor readings)

  • Write throughput is the top priority

  • Updates and deletes are not required

PK Delta Table

A PK Delta Table defines one or more columns as the primary key. When a row with the same primary key arrives, the table merges it with the existing row rather than appending a duplicate. This makes the table the authoritative record of each entity's current state.

Use a PK Delta Table when:

  • Data represents entities that change over time (orders, user profiles, inventory)

  • You need upsert (insert-or-update) semantics

  • Downstream consumers need the latest state of each record without deduplication logic

Choosing between the two types

Dimension

Append Delta Table

PK Delta Table

Primary key

Not required

Required

Write mode

Append only

Upsert (insert or update)

Duplicate handling

Allowed

Automatically merged

Typical use case

Event streams, logs

Slowly changing dimensions, CDC

Write performance

Higher

Slightly lower (merge overhead)

Supported features

ACID transactions

Delta Tables guarantee atomicity, consistency, isolation, and durability (ACID) for every write operation:

  • Atomicity: Every write either completes fully or has no effect. Partial writes never occur, so readers never see incomplete data.

  • Consistency: The table is always in a valid state. Constraints such as primary key uniqueness in PK Delta Tables are enforced on every commit.

  • Isolation: Concurrent reads and writes do not interfere with each other. Readers see a consistent snapshot even while a write is in progress.

  • Durability: Once a write is committed, it is permanent — even if a node fails immediately after.

ACID guarantees eliminate data quality problems — partial deletes, read-your-writes anomalies, and lost updates — that are common in non-transactional table formats.

Time travel

Delta Tables record a full history of snapshots. Query the table as it existed at any past point in time, or roll back to a previous snapshot, without maintaining separate backup copies. This is useful for:

  • Auditing: verify what data looked like before a specific change

  • Debugging: reproduce a query result using the exact data that existed at a given timestamp

  • Recovery: quickly restore a table to a known good state after an erroneous write

Incremental query

An incremental query reads only the rows that changed since a given snapshot or timestamp. This lets downstream pipelines consume new data without re-scanning the entire table, which is critical for large tables where a full scan would be too slow or too expensive.

Schema evolution

Schema changes in traditional table formats often require rewriting the entire dataset or break existing queries. Delta Tables support schema evolution — add, drop, or rename columns without rewriting historical data and without breaking consumers that rely on the old schema.

Dynamic cluster bucketing

Query performance on large tables degrades when the scan must read irrelevant data. Dynamic cluster bucketing automatically co-locates related rows in the same storage files based on column values, so queries that filter on those columns skip large portions of the dataset. Unlike static partitioning, dynamic cluster bucketing rebalances itself as data volume and query patterns change — no manual re-partitioning is needed.

Real-time data updates

PK Delta Tables accept streaming writes from change data capture (CDC) pipelines. Combined with ACID guarantees, this lets you maintain a real-time, consistent view of mutable data alongside batch analytics on the same table — without managing separate hot and cold stores.

Next steps

  • Create a Delta Table

  • Write data to a Delta Table

  • Query Delta Table history with time travel