Lake storage optimization

更新时间:
复制 MD 格式

As data accumulates in an Enterprise Edition instance, small files and expired snapshots degrade query performance and waste storage. Lake storage optimization compacts small files, cleans up orphan files, and manages snapshot lifecycles to maintain an optimal storage layout.

Note

This feature is in the invitation-based testing phase. To apply for access, submit a ticket.

Scope

Condition

Requirement

Instance edition

Enterprise Edition.

Database type

Only databases and tables in AnalyticDB for MySQL lake storage.

Visibility

Only databases visible in Data Catalog can be configured.

Three-level inheritance

Configuration cascades through Instance > Database > Table. Each level inherits its parent's settings by default but can override them with custom values.

Instance-level settings (global defaults)
  ├── Database A (inherits instance settings)
  │     ├── Table 1 (inherits Database A settings)
  │     └── Table 2 (custom: compaction frequency = high)
  └── Database B (custom: lake storage optimization disabled)
        └── Table 3 (inherits Database B → disabled)

Policy

Behavior

Use case

Inherit

Follows parent-level changes automatically.

Databases and tables that need uniform settings.

Custom

Uses independent settings, unaffected by parent-level changes.

Objects with specific performance needs or that require optimization disabled.

Priority: Table-level > Database-level > Instance-level. After you select Custom, that level and its children are no longer affected by changes at the parent level.

Important

Turning off the Enable Lake Storage Optimization toggle at the instance level deactivates all database-level and table-level settings and stops optimization across the instance. Configured parameters are preserved and restore automatically when you re-enable optimization.

Access the settings

On the instance details page, choose Data Management > Data Catalog in the left-side navigation pane.

Level

Entry point

Instance

Click Edit in the Lake Storage Optimization section above the database list.

Database

Click the target database and go to the Lake Storage Optimization tab.

Table

Click the target table and go to the Lake Storage Optimization tab.

Parameters

Parameter

Description

Valid values

Default

Instance

Database

Table

Enable Lake Storage Optimization

Global toggle for the instance.

On / Off

Off

Policy

Inherit parent-level settings or use custom values.

Inherit / Custom

Inherit

Status

Enable or disable optimization at this level.

On / Off

Resource Group

Compute resources for running optimization tasks.

Existing resource groups.

Small File Merge Frequency

How often small files are compacted.

low / normal / high

normal

Snapshot Retention Period

How long historical snapshots are retained.

1 to 7 days

7 days

Orphan File Retention Period

How long orphan files are retained before cleanup.

3 / 5 / 7 / 10 / 14 days

3 days

Note

When the instance-level toggle is off, only the toggle is visible. Other parameters appear after you turn it on. At the database and table levels, parameters appear after you select Custom and enable the status.

Small file compaction frequency

Small file compaction merges small data files into larger ones to reduce file count and improve query efficiency.

Level

Use case

low

Cold data tables with low write volume and infrequent changes.

normal (default)

General-purpose workloads.

high

Hot data tables with high write throughput and frequent queries.

Higher compaction frequency consumes more compute resources. Assign a dedicated resource group for optimization tasks to avoid I/O contention with query workloads.

Snapshot retention period

Snapshots capture a table's full state at a point in time for data rollback and time travel queries. Excess snapshots increase storage usage and metadata overhead. The default retention period is 7 days; the minimum is 1 day.

Orphan file retention period

Orphan files are data files from writes, updates, or compaction that no valid snapshot references. They are automatically cleaned up after the retention period to reclaim storage.

Important

Reducing the retention period saves storage but may affect long-running queries that are in progress. The minimum is 3 days.

Execution history

On the Lake Storage Optimization tab at the table level, view the execution history of compaction tasks, including read data files, added data files, read bytes, failed data files, start time, and status.

Recommended configurations

Scenario

Recommended settings

Reason

High-frequency analytical tables.

Compaction frequency = high, Orphan file retention = 3 days.

Quickly eliminates small files and sustains query performance.

Low-frequency archive tables.

Compaction frequency = low, Orphan file retention = 14 days.

Reduces resource consumption and extends the data recovery window.

Mixed-workload databases.

Database-level = inherit, Hot tables = custom with high, Cold tables = custom with low.

Enables per-table optimization tailored to workload patterns.

Development and test environments.

Disable lake storage optimization at the instance level.

Conserves compute resources.