Operations to avoid in ApsaraDB for ClickHouse

更新时间:
复制 MD 格式

This topic describes some operations to avoid when you use ApsaraDB for ClickHouse.

Avoid mutation operations

Mutations are `ALTER` queries that modify table data using delete or update operations. In ApsaraDB for ClickHouse, executing queries such as ALTER TABLE … DELETE or UPDATE creates a new data version for the mutation. This process triggers a rewrite of all data inserted before the mutation, which generates many write requests.

For update operations, you can replace the default MergeTree table engine with a specialized one, such as ReplacingMergeTree or CollapsingMergeTree, to avoid generating many write requests.

For delete operations, you can use ALTER TABLE ... DROP PARTITION or configure a table TTL policy to avoid using a large amount of compute resources.

Avoid using Optimize Final

The OPTIMIZE TABLE ... FINAL query starts an unscheduled merge for a specific table, which combines multiple data parts into one. During this process, ApsaraDB for ClickHouse reads all data parts, decompresses and merges them, compresses the result into a single part, and then rewrites the part to storage. This process consumes a large amount of CPU and I/O resources. This optimization rewrites the data even if the data is already in a single part. For more information about OPTIMIZE TABLE ... FINAL, see OPTIMIZE Statement.

Avoid nullable columns

A nullable column, such as `Nullable(String)`, creates an additional `UInt8`-type column to store `NULL` values. For more information about nullable columns, see Nullable. This additional column must be processed each time the nullable column is used. This consumes extra storage space and negatively impacts performance. To avoid creating this extra column, you can set a default value for the column. Avoid the following operation:

CREATE TABLE default.sample
(
    `x` Int8,
    `y` Nullable(Int8)
)
ENGINE = MergeTree
ORDER BY x

Instead, use the following operation to set a default value for the column and avoid creating the extra column.

CREATE TABLE default.sample2
(
    `x` Int8,
    `y` Int8 DEFAULT 0
)
ENGINE = MergeTree
ORDER BY x
Note

Set a reasonable default value as needed.