云数据库ClickHouse集群空间使用率是日常运维中重点关注的监控项之一。集群存储空间的不足可能导致严重后果,例如数据无法写入、无法备份,以及存储空间扩容任务耗时过长等。本文介绍如何通过SQL语句查看云数据库ClickHouse集群的磁盘空间使用情况。
示例环境
以下示例以s-2-r-0
节点为基础环境,在实际使用过程中,请根据您的场景修改对应参数。如果您不知道如何获取节点名称,可以通过以下方式获取。
通过控制台:您可以在集群监控页面,获取节点名称。如何进入集群监控页面,请参见查看集群监控信息。
通过SQL语句:您可以执行以下语句,获取集群所有节点的名称。
SELECT * FROM system.clusters;
查看表占用磁盘空间大小
通过查看表数据大小,可以识别表占用磁盘空间的大小,从而为您优化数据库性能以及合理规划存储资源提供有效分析依据。
查看表数据详情
查看
s-2-r-0
节点下每个表活跃数据的情况。SELECT `database`, table, formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed, --压缩数据大小 formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed, --未压缩数据大小 round(usize / size, 2) AS compr_rate, --压缩率 sum(rows) AS rows, --总行数 count() AS part_count --part数量 FROM clusterAllReplicas('default', system.parts) WHERE (active = 1) AND (table LIKE '%') AND (`database` LIKE '%') AND substring(hostname(),38,8) = 's-2-r-0' GROUP BY `database`, table ORDER BY size DESC;
查看副本表数据详情
查看每个副本中的表数据。
SELECT hostname() AS h, `database` , table, count(*) AS data_part_cnt, --数据部分总量 sum(rows) AS total_rows, --总行数 formatReadableSize(sum(bytes_on_disk)) AS total_compressed_bytes, --压缩数据大小 sum(data_uncompressed_bytes) AS total_uncompressed_bytes --未压缩数据大小 FROM clusterAllReplicas('default', system.parts) WHERE active = 1 GROUP BY h, `database`, table ORDER BY total_rows DESC;
查看表占用磁盘空间排行
查看集群中占用磁盘空间排名前十的表。
SELECT `database`, table, sum(bytes_on_disk) AS bytes_on_disk FROM clusterAllReplicas('default', system.parts) WHERE active AND (`database` != 'system') GROUP BY `database`, table ORDER BY bytes_on_disk DESC LIMIT 10;
查看表各列的存储信息
查询模板
SELECT
`database`,
table,
column,
formatReadableSize(sum(column_data_compressed_bytes) AS size) AS compressed,
formatReadableSize(sum(column_data_uncompressed_bytes) AS usize) AS uncompressed,
round(usize / size, 2) AS compr_rate,
sum(rows) rows_cnt,
round(sum(column_data_uncompressed_bytes)/sum(rows) ,2) avg_row_size
FROM clusterAllReplicas('default', system.parts_columns)
WHERE (active = <active_type>) AND (table LIKE '<table_name>')
AND substring(hostname(),38,8) = '<node_name>'
GROUP BY
`database`,
table,
column
ORDER BY size DESC;
参数说明
参数 | 说明 |
table_name | 目标数据表表名。 值为%时,表示匹配所有表。 |
node_name | 目标集群节点名。 |
active_type | 数据是否活跃。
|
示例
查看s-2-r-0
节点上query_log
表中存储活跃数据的列。
SELECT
`database`,
table,
column,
formatReadableSize(sum(column_data_compressed_bytes) AS size) AS compressed,
formatReadableSize(sum(column_data_uncompressed_bytes) AS usize) AS uncompressed,
round(usize / size, 2) AS compr_rate,
sum(rows) rows_cnt,
round(sum(column_data_uncompressed_bytes)/sum(rows) ,2) avg_row_size
FROM clusterAllReplicas('default', system.parts_columns)
WHERE (active = 1)
AND (table LIKE 'query_log')
AND substring(hostname(),38,8) = 's-2-r-0'
GROUP BY
`database`,
table,
column
ORDER BY size DESC;
查看表分区信息
查询模板
SELECT
partition AS `分区`,
sum(rows) AS `总行数`,
formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`
FROM clusterAllReplicas('default', system.parts)
WHERE (database IN ('<database_name>'))
AND (table IN ('<table_name>'))
AND (partition LIKE '<partition_prefix>')
GROUP BY partition
ORDER BY partition ASC
参数说明
参数 | 说明 |
database_name | 数据库名。 |
table_name | 数据表表名。 |
partition_prefix | 分区前缀。 |
示例
查看default
数据库中test
表分区前缀为2019-12-
的分区信息。
SELECT
partition AS `分区`,
sum(rows) AS `总行数`,
formatReadableSize(sum(data_uncompressed_bytes)) AS `原始大小`,
formatReadableSize(sum(data_compressed_bytes)) AS `压缩大小`,
round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 0) AS `压缩率`
FROM clusterAllReplicas('default', system.parts)
WHERE (`database` IN ('default'))
AND (table IN ('test'))
AND (partition LIKE '2019-12-%')
GROUP BY partition
ORDER BY partition ASC
查看datapart数据的大小
在ClickHouse中,system.part
表存储了datapart
的状态、大小、创建时间等信息。您可以通过此表了解datapart
的详细信息。
查看活跃datapart数据的大小
活跃的datapart
是表中当前被活跃使用的数据。了解活跃数据的数据量大小,可以帮助您识别表的实际数据大小,从而识别表占用磁盘空间的大小。
查看s-2-r-0
节点上所有非系统表的活跃数据分区信息。
SELECT
`database`,
table,
count(*) AS data_part_cnt, --活跃数据分区的数量
sum(rows) AS total_rows, --总行数
formatReadableSize(sum(bytes_on_disk)) AS total_compressed_bytes, --磁盘上的总压缩数据大小
sum(data_uncompressed_bytes) AS total_uncompressed_bytes --未压缩的总数据大小
FROM clusterAllReplicas('default', system.parts )
WHERE active = 1 AND `database` != 'system'
AND substring(hostname(),38,8) = 's-2-r-0'
GROUP BY `database`, table
ORDER BY total_rows DESC;
查看非活跃datapart数据大小
非活跃的datapart
可能包含过时或已标记为删除的数据。若这些数据不再需要,建议及时清理,以减少磁盘占用。
查看s-2-r-0
节点上所有非系统表的非活跃数据分区信息。
SELECT
`database`,
table,
count(*) AS data_part_cnt, --非活跃数据分区数量
sum(rows) AS total_rows, --总行数
formatReadableSize(sum(bytes_on_disk)) AS total_compressed_bytes, --磁盘上的压缩数据总大小
sum(data_uncompressed_bytes) AS total_uncompressed_bytes --未压缩数据总大小
FROM clusterAllReplicas('default', system.parts )
WHERE active = 0
AND `database` != 'system'
AND substring(hostname(),38,8) = 's-2-r-0'
GROUP BY `database`, table;
Projection占用磁盘空间大小
Projections是ClickHouse中一种用于优化查询性能的数据结构,其类似于物化视图,存储了预先计算的聚合或者数据变换结果。了解Projections的大小有助于评估其对磁盘空间的影响。
查看s-2-r-0
节点上test
表的Projections占用磁盘空间大小。
SELECT
database,
table,
name,
formatReadableSize(sum(data_compressed_bytes) AS size) AS compressed, --压缩数据大小
formatReadableSize(sum(data_uncompressed_bytes) AS usize) AS uncompressed, --未压缩数据大小
round(usize / size, 2) AS compr_rate, --压缩比率
sum(rows) AS rows, --总行数
count() AS part_count --part总数
FROM clusterAllReplicas('default', system.projection_parts )
WHERE (table = 'test')
AND (active = 1) AND substring(hostname(),38,8) = 's-2-r-0'
GROUP BY
database,
table,
name
ORDER BY size DESC;
查看s-2-r-0
节点上test
表的每个Projection列占用磁盘空间大小。
SELECT
database,
table,
column,
formatReadableSize(sum(column_data_compressed_bytes) AS size) AS compressed, --压缩数据大小
formatReadableSize(sum(column_data_uncompressed_bytes) AS usize) AS uncompressed, --未压缩数据大小
round(usize / size, 2) AS compr_rate --压缩比率
FROM clusterAllReplicas('default', system.projection_parts_columns )
WHERE (active = 1) AND (table LIKE 'test')
AND substring(hostname(),38,8) = 's-2-r-0'
GROUP BY
database,
table,
column
ORDER BY size DESC;