磁盘空间分析

云数据库ClickHouse集群空间使用率是日常运维中重点关注的监控项之一。集群存储空间的不足可能导致严重后果,例如数据无法写入、无法备份,以及存储空间扩容任务耗时过长等。本文介绍如何通过SQL语句查看云数据库ClickHouse集群的磁盘空间使用情况。

示例环境

以下示例以s-2-r-0节点为基础环境,在实际使用过程中,请根据您的场景修改对应参数。如果您不知道如何获取节点名称,可以通过以下方式获取。

  • 通过控制台:您可以在集群监控页面,获取节点名称。如何进入集群监控页面,请参见查看集群监控信息

  • 通过SQL语句:您可以执行以下语句,获取集群所有节点的名称。

    SELECT * FROM system.clusters;

查看表占用磁盘空间大小

通过查看表数据大小,可以识别表占用磁盘空间的大小,从而为您优化数据库性能以及合理规划存储资源提供有效分析依据。

  • 查看表数据详情

    查看s-2-r-0节点下每个表活跃数据的情况。

    SELECT
        `database`,
        table,
        formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed, --压缩数据大小
        formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed, --未压缩数据大小
        round(usize / size, 2) AS compr_rate, --压缩率
        sum(rows) AS rows, --总行数
        count() AS part_count --part数量
    FROM clusterAllReplicas('default', system.parts)
    WHERE (active = 1) AND (table LIKE '%') AND (`database` LIKE '%') 
    AND substring(hostname(),38,8) = 's-2-r-0'
    GROUP BY
        `database`,
        table
    ORDER BY size DESC;
  • 查看副本表数据详情

    查看每个副本中的表数据。

    SELECT
        hostname() AS h,
       `database` ,
        table,
        count(*) AS data_part_cnt, --数据部分总量
        sum(rows) AS total_rows, --总行数
        formatReadableSize(sum(bytes_on_disk)) AS total_compressed_bytes, --压缩数据大小
        sum(data_uncompressed_bytes) AS total_uncompressed_bytes  --未压缩数据大小
    FROM clusterAllReplicas('default', system.parts) 
    WHERE active = 1
    GROUP BY h, `database`, table
    ORDER BY total_rows DESC;
    
  • 查看表占用磁盘空间排行

    查看集群中占用磁盘空间排名前十的表。

    SELECT
        `database`,
        table,
        sum(bytes_on_disk) AS bytes_on_disk
    FROM clusterAllReplicas('default', system.parts)
    WHERE active AND (`database` != 'system')
    GROUP BY
        `database`,
        table
    ORDER BY bytes_on_disk DESC
    LIMIT 10;
    

查看表各列的存储信息

查询模板

SELECT
    `database`,
    table,
    column,
    formatReadableSize(sum(column_data_compressed_bytes) AS size) AS compressed,
    formatReadableSize(sum(column_data_uncompressed_bytes) AS usize) AS uncompressed,
    round(usize / size, 2) AS compr_rate,
    sum(rows) rows_cnt,
    round(sum(column_data_uncompressed_bytes)/sum(rows) ,2) avg_row_size
FROM clusterAllReplicas('default', system.parts_columns) 
WHERE (active = <active_type>) AND (table LIKE '<table_name>')  
   AND substring(hostname(),38,8) = '<node_name>'
GROUP BY
    `database`,
    table,
    column
ORDER BY size DESC;

参数说明

参数

说明

table_name

目标数据表表名。

值为%时,表示匹配所有表。

node_name

目标集群节点名。

active_type

数据是否活跃。

  • 0:不活跃。

  • 1:活跃。

示例

查看s-2-r-0节点上query_log表中存储活跃数据的列。

SELECT
    `database`,
    table,
    column,
    formatReadableSize(sum(column_data_compressed_bytes) AS size) AS compressed,
    formatReadableSize(sum(column_data_uncompressed_bytes) AS usize) AS uncompressed,
    round(usize / size, 2) AS compr_rate,
    sum(rows) rows_cnt,
    round(sum(column_data_uncompressed_bytes)/sum(rows) ,2) avg_row_size
FROM clusterAllReplicas('default', system.parts_columns) 
WHERE (active = 1) 
AND (table LIKE 'query_log')  
AND substring(hostname(),38,8) = 's-2-r-0'
GROUP BY
    `database`,
    table,
    column
ORDER BY size DESC;

查看表分区信息

查询模板

SELECT 
    partition AS `分区`,
    sum(rows) AS `总行数`,
    formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
    formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
    round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`
FROM clusterAllReplicas('default', system.parts)
WHERE (database IN ('<database_name>')) 
AND (table IN ('<table_name>')) 
AND (partition LIKE '<partition_prefix>')
GROUP BY partition
ORDER BY partition ASC

参数说明

参数

说明

database_name

数据库名。

table_name

数据表表名。

partition_prefix

分区前缀。

示例

查看default数据库中test表分区前缀为2019-12-的分区信息。

SELECT 
    partition AS `分区`,
    sum(rows) AS `总行数`,
    formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
    formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
    round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`
FROM clusterAllReplicas('default', system.parts)
WHERE (`database` IN ('default')) 
AND (table IN ('test')) 
AND (partition LIKE '2019-12-%')
GROUP BY partition
ORDER BY partition ASC

查看datapart数据的大小

ClickHouse中,system.part表存储了datapart的状态、大小、创建时间等信息。您可以通过此表了解datapart的详细信息。

查看活跃datapart数据的大小

活跃的datapart是表中当前被活跃使用的数据。了解活跃数据的数据量大小,可以帮助您识别表的实际数据大小,从而识别表占用磁盘空间的大小。

查看s-2-r-0节点上所有非系统表的活跃数据分区信息。

SELECT 
    `database`, 
    table, 
    count(*) AS data_part_cnt,  --活跃数据分区的数量
    sum(rows) AS total_rows,  --总行数
    formatReadableSize(sum(bytes_on_disk)) AS total_compressed_bytes, --磁盘上的总压缩数据大小
    sum(data_uncompressed_bytes) AS total_uncompressed_bytes --未压缩的总数据大小
FROM clusterAllReplicas('default', system.parts )
WHERE active = 1 AND `database` != 'system' 
AND substring(hostname(),38,8) = 's-2-r-0'
GROUP BY `database`, table 
ORDER BY total_rows DESC;

查看非活跃datapart数据大小

非活跃的datapart可能包含过时或已标记为删除的数据。若这些数据不再需要,建议及时清理,以减少磁盘占用。

查看s-2-r-0节点上所有非系统表的非活跃数据分区信息。

SELECT 
    `database`, 
    table, 
    count(*) AS data_part_cnt, --非活跃数据分区数量
    sum(rows) AS total_rows,  --总行数
    formatReadableSize(sum(bytes_on_disk)) AS total_compressed_bytes,  --磁盘上的压缩数据总大小
    sum(data_uncompressed_bytes) AS total_uncompressed_bytes  --未压缩数据总大小
FROM clusterAllReplicas('default', system.parts )
WHERE active = 0 
AND `database` != 'system' 
AND substring(hostname(),38,8) = 's-2-r-0'
GROUP BY `database`, table;

Projection占用磁盘空间大小

Projections是ClickHouse中一种用于优化查询性能的数据结构,其类似于物化视图,存储了预先计算的聚合或者数据变换结果。了解Projections的大小有助于评估其对磁盘空间的影响。

查看s-2-r-0节点上test表的Projections占用磁盘空间大小。

SELECT
    database,
    table,
    name,
    formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed, --压缩数据大小
    formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed, --未压缩数据大小
    round(usize / size, 2) AS compr_rate, --压缩比率
    sum(rows) AS rows,  --总行数
    count() AS part_count  --part总数
FROM clusterAllReplicas('default', system.projection_parts )
WHERE (table = 'test') 
AND (active = 1) AND substring(hostname(),38,8) = 's-2-r-0'
GROUP BY
    database,
    table,
    name
ORDER BY size DESC;

查看s-2-r-0节点上test表的每个Projection列占用磁盘空间大小。

SELECT
    database,
    table,
    column,
    formatReadableSize(sum(column_data_compressed_bytes) AS size) AS compressed, --压缩数据大小
    formatReadableSize(sum(column_data_uncompressed_bytes) AS usize) AS uncompressed, --未压缩数据大小
    round(usize / size, 2) AS compr_rate --压缩比率
FROM clusterAllReplicas('default', system.projection_parts_columns )
WHERE (active = 1) AND (table LIKE 'test') 
AND substring(hostname(),38,8) = 's-2-r-0'
GROUP BY
    database,
    table,
    column
ORDER BY size DESC;