Time Travel

更新时间:
复制 MD 格式

The computing engine uses Delta tables to support time travel queries, which allow you to query historical versions of your data. You can use this feature to review data from a specific point in time or restore a table to a previous state to correct errors. You can also use the restore operation to revert a table to a specific historical version.

How it works

The following diagram illustrates how the computing engine processes a time travel query on a Delta table.

image.png

When you submit a SQL statement, the computing engine determines the target data version. It then locates the most recent BaseFile created before or at that version's timestamp and finds all DeltaFiles written after the BaseFile up to the target version. Finally, the engine merges these files to return the query result. Using a BaseFile helps accelerate query read performance.

The figure above uses a transactional table named src as an example:

  • The schema contains a pk column and a val column. The diagram on the left illustrates the data modification process. t1 through t5 represent the transaction versions from five separate write transactions, generating five DeltaFiles. At t2 and t4, compaction operations run, creating two BaseFiles: b1 and b2. Notice that b1 has eliminated the intermediate record (2, a) and retains only the latest record, (2, b).

  • To query the data at t1, the engine only needs to read DeltaFile d1. To query at t2, it reads the three records from BaseFile b1. A query at t3 requires merging BaseFile b1 and DeltaFile d3. This pattern continues for other points in time. While BaseFiles speed up queries, their creation requires a resource-intensive compaction operation. You should select a trigger strategy that aligns with your business scenario.

  • For a time travel query, you can specify the transaction version by either timestamp or version ID. In addition to constants and standard functions, the SQL syntax also supports two custom functions: get_latest_timestamp and get_latest_version. The second parameter of these functions specifies the Nth-to-last commit, allowing you to precisely query internal MaxCompute data versions.

Data retention period

To perform time travel queries, you must retain historical data. You can configure the retention period by using the table property acid.data.retain.hours. The maximum retention period is 7 days. Set a retention period that meets your business requirements, as a longer period increases storage fees. If you do not require time travel queries, set this property to 0 to disable the feature. This can significantly reduce your storage costs.