Backup mechanism

更新时间:
复制 MD 格式

A data backup of a PolarDB cluster backs up the full data of the cluster at a specific point in time and generates a backup set (snapshot). PolarDB uses a Redirect-on-Write (ROW) snapshot mechanism to back up cluster data. Unlike traditional full or incremental backups that copy data at backup time, creating a backup set references existing data blocks without duplicating them — so backups consume no additional storage at creation time. Backup sets are stored directly on the distributed storage system of the PolarDB cluster.

This topic explains how data blocks are managed in three scenarios: when backup sets are created, when backup sets are deleted, and when data is deleted from the cluster.

How the ROW snapshot mechanism works

When a backup set is created, it references the current live data blocks in the cluster without copying them. No additional storage is consumed at creation time.

When a data block is modified after a backup set is created, the system:

  1. Allocates a new data block to store the modified data.

  2. Transfers the original (pre-modification) data block to the backup set.

  3. Updates the cluster's reference to point to the new data block.

The backup set continues to reference the original data block. Storage consumption increases only when data is actually modified — not when a backup is taken.

Create level-1 backup sets

When a level-1 backup set is created, it only references data blocks in the cluster and does not occupy storage space. After the backup is created, any modification to a data block causes the original version to be transferred to the backup set.

Example

A PolarDB cluster creates backup sets at 09:00, 10:00, and 11:00. Between backups, data blocks are added and modified. The following table shows how data blocks move between the cluster and backup sets.

Time Action Data block changes
09:00 Create backup set 1 The cluster has three data blocks: A, B, and C. Backup set 1 references A, B, and C. No storage is consumed.
09:00–10:00 Change A to A1; change B to B1; add D The cluster allocates new data blocks for A1, B1, and D. The original data blocks A and B are transferred to backup set 1.
10:00 Create backup set 2 The cluster has four data blocks: A1, B1, C, and D. Backup set 2 references A1, B1, C, and D. No storage is consumed.
10:00–11:00 Change C to C1; add E The cluster allocates new data blocks for C1 and E. The original data block C is transferred to backup set 2.
11:00 Create backup set 3 The cluster has five data blocks: A1, B1, C1, D, and E. Backup set 3 references A1, B1, C1, D, and E. No storage is consumed.
11:00 to next backup Change A1 to A2; change E to E1 The cluster allocates new data blocks for A2 and E1. The original data blocks A1 and E are transferred to backup set 3.

The same mechanism applies to subsequent backups.

image

Delete level-1 backup sets

When a backup set is deleted, data blocks it exclusively owns are released. Data blocks it shares with another backup set are transferred to that backup set before deletion.

Example

This example uses the state established in Create level-1 backup sets. A PolarDB cluster has three backup sets with the following data block distribution:

  • Backup set 1: Contains data blocks A, B, and C.

    • Stores data blocks A and B.

    • References data block C, which is stored in backup set 2.

  • Backup set 2: Contains data blocks A1, B1, C, and D.

    • Stores data block C.

    • References data block A1, which is stored in backup set 3.

    • References data blocks B1 and D from the primary version.

  • Backup set 3: Contains data blocks A1, B1, C1, D, and E.

    • Stores data blocks A1 and E.

    • References data blocks B1, C1, and D from the primary version.

The primary version is the current data version of the PolarDB cluster and contains the current version of all data blocks.

The following describes how data blocks are processed when each backup set is deleted:

  1. Delete backup set 1: Data blocks A and B are directly released.

  2. Delete backup set 3: Data block A1 is transferred to backup set 2 (because backup set 2 still references it). Data block E is directly released.

  3. Delete backup set 2: Data blocks A1 and C are directly released.

image

Delete data in a cluster

When you delete a data table from a PolarDB cluster, the primary version releases the corresponding data blocks and transfers them to the existing backup set.

The primary version is the current data version of the PolarDB cluster and contains the current version of all data blocks.

This transfer has a direct storage impact: the released data blocks are now stored in the backup set, so the storage size may increase after the deletion. The transferred data blocks are only freed when the backup set expires.

Example

A PolarDB cluster has backup set 1 that references data blocks A, B, C, D, and E from the primary version.

When the cluster releases data blocks B, C, D, and E, those data blocks are transferred to backup set 1. The storage used by backup set 1 increases at this point.

Data blocks B, C, D, and E are freed only after backup set 1 expires.

image