A Delta Table is a high-performance table format developed by MaxCompute for large-scale analytical datasets. It comes in two types — Append Delta Tables and PK Delta Tables — and supports atomicity, consistency, isolation, and durability (ACID) transactions, incremental queries, time travel, dynamic cluster bucketing, real-time data updates, and schema evolution.
What is a Delta Table
Delta Table is MaxCompute's open table format for large-scale analytics. Unlike traditional Hive-style tables, Delta Tables maintain a transaction log that records every change to the data, enabling database-grade reliability on top of object storage.
Delta Tables are the foundation for lakehouse workloads in MaxCompute: they let you run batch and streaming writes on the same table, query historical snapshots without maintaining separate backups, and evolve your schema safely as requirements change.
Table types
Delta Tables come in two types. The right choice depends on whether your data has a natural primary key and how you need to write to the table.
Append Delta Table
An Append Delta Table stores records in append-only fashion — each write adds new rows without modifying or deleting existing rows. There is no primary key constraint.
Use an Append Delta Table when:
Data arrives as an immutable event stream (logs, clickstream, sensor readings)
Write throughput is the top priority
Updates and deletes are not required
PK Delta Table
A PK Delta Table defines one or more columns as the primary key. When a row with the same primary key arrives, the table merges it with the existing row rather than appending a duplicate. This makes the table the authoritative record of each entity's current state.
Use a PK Delta Table when:
Data represents entities that change over time (orders, user profiles, inventory)
You need upsert (insert-or-update) semantics
Downstream consumers need the latest state of each record without deduplication logic
Choosing between the two types
|
Dimension |
Append Delta Table |
PK Delta Table |
|
Primary key |
Not required |
Required |
|
Write mode |
Append only |
Upsert (insert or update) |
|
Duplicate handling |
Allowed |
Automatically merged |
|
Typical use case |
Event streams, logs |
Slowly changing dimensions, CDC |
|
Write performance |
Higher |
Slightly lower (merge overhead) |
Supported features
ACID transactions
Delta Tables guarantee atomicity, consistency, isolation, and durability (ACID) for every write operation:
Atomicity: Every write either completes fully or has no effect. Partial writes never occur, so readers never see incomplete data.
Consistency: The table is always in a valid state. Constraints such as primary key uniqueness in PK Delta Tables are enforced on every commit.
Isolation: Concurrent reads and writes do not interfere with each other. Readers see a consistent snapshot even while a write is in progress.
Durability: Once a write is committed, it is permanent — even if a node fails immediately after.
ACID guarantees eliminate data quality problems — partial deletes, read-your-writes anomalies, and lost updates — that are common in non-transactional table formats.
Time travel
Delta Tables record a full history of snapshots. Query the table as it existed at any past point in time, or roll back to a previous snapshot, without maintaining separate backup copies. This is useful for:
Auditing: verify what data looked like before a specific change
Debugging: reproduce a query result using the exact data that existed at a given timestamp
Recovery: quickly restore a table to a known good state after an erroneous write
Incremental query
An incremental query reads only the rows that changed since a given snapshot or timestamp. This lets downstream pipelines consume new data without re-scanning the entire table, which is critical for large tables where a full scan would be too slow or too expensive.
Schema evolution
Schema changes in traditional table formats often require rewriting the entire dataset or break existing queries. Delta Tables support schema evolution — add, drop, or rename columns without rewriting historical data and without breaking consumers that rely on the old schema.
Dynamic cluster bucketing
Query performance on large tables degrades when the scan must read irrelevant data. Dynamic cluster bucketing automatically co-locates related rows in the same storage files based on column values, so queries that filter on those columns skip large portions of the dataset. Unlike static partitioning, dynamic cluster bucketing rebalances itself as data volume and query patterns change — no manual re-partitioning is needed.
Real-time data updates
PK Delta Tables accept streaming writes from change data capture (CDC) pipelines. Combined with ACID guarantees, this lets you maintain a real-time, consistent view of mutable data alongside batch analytics on the same table — without managing separate hot and cold stores.
Next steps
Create a Delta Table
Write data to a Delta Table
Query Delta Table history with time travel