Analyze table statistics
Starting from V1.3, Hologres provides the hologres.hg_table_info system view to collect daily statistics about the tables in your instance. You can query this view to monitor and analyze table information and take optimization measures based on the statistics.
Limitations
This feature is available only for Hologres V1.3 and later. If your instance is an earlier version, you must upgrade it. For assistance, see Common upgrade preparation errors or join the Hologres DingTalk group. For more information, see How to get more online support?.
Data in the hologres.hg_table_info view has T+1 freshness. Data for a given day is typically updated by 05:00 the next day. On the day a Hologres instance is upgraded to V1.3, no table statistics are generated. A query on this day returns the
meta warehouse store currently not availableerror. You can start querying table statistics the day after the upgrade.
Usage notes
Table statistics are retained for 30 days by default.
For non-partitioned internal tables (
type='TABLE'), you can query detailed statistics, such as storage space, the number of files, cumulative access counts, and the number of rows.For other objects, such as views, materialized views, foreign tables, and partitioned parent tables, only basic information is available. This includes the number of partitions, the name of the external table that a foreign table maps to, and definitions of views and materialized views.
The hologres.hg_table_info table belongs to the Hologres meta warehouse system. Because a failure to query hologres.hg_table_info does not affect business queries running in the instance, the stability of the hologres.hg_table_info table is not covered by the product's service level agreement (SLA).
hg_table_info view
The hg_table_info view contains the following fields.
Table statistics are stored in the hologres.hg_table_info system view. After an instance is upgraded to V1.3, table information is collected daily by default.
Some fields may be null for tables created before the instance was upgraded to V1.3 because their creation information was not collected. For tables created after the upgrade, all fields are populated.
Field | Type | Description | Remarks |
db_name | text | The name of the database where the table resides. | None |
schema_name | text | The name of the schema where the table resides. | None |
table_name | text | The name of the table. | None |
table_id | text | The unique identifier for the table. For a foreign table, the ID format is | None |
type | text | The type of the object. Valid values:
|
|
partition_spec | text | The partitioning condition. This field is valid only for partitioned child tables. | None |
is_partition | boolean | Indicates whether the table is a partitioned child table. | None |
owner_name | text | The username of the table owner. You can join this column with the | None |
create_time | timestamp with time zone | The timestamp when the table was created. | None |
last_ddl_time | timestamp with time zone | The timestamp when the table metadata was last updated. | None |
last_modify_time | timestamp with time zone | The timestamp when the table data was last modified. | None |
last_access_time | timestamp with time zone | The timestamp when the table was last accessed. | None |
view_def | text | The definition of the view. | This field is valid only for views. |
comment | text | The description of the table or view. | None |
hot_storage_size | bigint | The size of hot storage used by the table, in bytes. | The storage size reported in |
cold_storage_size | bigint | The size of cold storage used by the table, in bytes. | The storage size reported in |
hot_file_count | bigint | The number of files in hot storage for the table. | None |
cold_file_count | bigint | The number of files in cold storage for the table. | None |
table_meta | jsonb | The raw metadata of the table, in JSONB format. | None |
row_count | bigint | The number of rows in the table or partition. | For a partitioned parent table, this is the total number of rows across all its child tables. |
collect_time | timestamp with time zone | The timestamp when the statistics were collected. | None |
partition_count | bigint | The number of partitioned child tables. | This field is valid only for partitioned parent tables. |
parent_schema_name | text | The schema name of the parent table for a partitioned child table. | This field is valid only for partitioned child tables. |
parent_table_name | text | The table name of the parent table for a partitioned child table. | This field is valid only for partitioned child tables. |
total_read_count | bigint | The cumulative number of read operations on the table. This is an approximate value, as | This value is an approximation and is not recommended for accurate counting. |
total_write_count | bigint | The cumulative number of write operations on the table. This is an approximate value, as | This value is an approximation and is not recommended for accurate counting. |
read_sql_count_1d | bigint | The total number of read operations on the table on the previous day (T-1), from 00:00 to 24:00 (UTC+8). |
|
write_sql_count_1d | bigint | The total number of write operations on the table on the previous day (T-1), from 00:00 to 24:00 (UTC+8). |
|
Grant permissions
You must have specific permissions to query table statistics. This section describes the permission rules and how to grant them.
View table statistics for all databases in a Hologres instance
Grant the user superuser privileges.
A superuser can view table statistics for all databases in a Hologres instance. The following command grants superuser privileges.
-- Replace "Alibaba Cloud account ID" with the actual username. For a RAM user, prefix the ID with "p4_". ALTER USER "Alibaba Cloud account ID" SUPERUSER;Add users to the
pg_stat_scan_tableuser group.In addition to a superuser, Hologres allows users in the
pg_stat_scan_tablesuser group (for versions earlier than V1.3.44) or thepg_read_all_statsuser group (for V1.3.44 and later versions) to view statistics logs for all database tables. If a regular user needs to view all logs, they can contact a superuser to be added to the relevant user group. The authorization command is as follows.-- For versions earlier than V1.3.44 GRANT pg_stat_scan_tables TO "Alibaba Cloud account ID";-- For the expert permission model CALL spm_grant('pg_stat_scan_tables', 'Alibaba Cloud account ID'); -- For the simple permission model (SPM) CALL slpm_grant('pg_stat_scan_tables', 'Alibaba Cloud account ID'); -- For the schema-level permission model (SLPM) -- For V1.3.44 and later GRANT pg_read_all_stats TO "Alibaba Cloud account ID";-- For the expert permission model CALL spm_grant('pg_read_all_stats', 'Alibaba Cloud account ID'); -- For the simple permission model (SPM) CALL slpm_grant('pg_read_all_stats', 'Alibaba Cloud account ID'); -- For the schema-level permission model (SLPM)
View table statistics for the current database
If you enable the simple permission model (SPM) or the schema-level permission model (SLPM) and add a user to the
db_adminuser group, thedb_adminrole can view the table statistics logs for the current database.NoteA regular user can query statistics only for the tables they own in the current database.
CALL spm_grant('<db_name>_admin', 'Alibaba Cloud account ID'); -- For the simple permission model (SPM) CALL slpm_grant('<db_name>.admin', 'Alibaba Cloud account ID'); -- For the schema-level permission model (SLPM)
Sample queries
Use case 1: Monitor internal table trends
-- Monitor trends for all internal tables in an instance, including storage size, number of files, read/write counts, and number of rows.
SELECT
db_name,
schema_name,
table_name,
collect_time :: date AS collect_date,
hot_storage_size,
cold_storage_size,
hot_file_count,
cold_file_count,
read_sql_count_1d,
write_sql_count_1d,
row_count
FROM
hologres.hg_table_info
WHERE
collect_time > (current_date - interval '1 week')::timestamptz -- For the last week
AND type ='TABLE'
ORDER BY collect_date desc ;Use case 2: Check access patterns of large tables
-- Check access patterns for the top 10 tables by storage size.
SELECT
db_name,
schema_name,
table_name,
hot_storage_size + cold_storage_size AS total_storage_size,
row_count,
sum(read_sql_count_1d) AS total_read_count,
sum(write_sql_count_1d) AS total_write_count
FROM
hologres.hg_table_info
WHERE
collect_time > (current_date - interval '1 week')::timestamptz -- For the last week
AND type = 'TABLE'
AND (
cold_storage_size IS NOT NULL
OR hot_storage_size IS NOT NULL
)
GROUP BY db_name,schema_name,table_name,total_storage_size,row_count
ORDER BY total_storage_size DESC
LIMIT 10;Use case 3: Analyze trends for top 10 tables
-- Analyze access, storage, and data volume trends over the last week for the top 10 tables by storage (based on yesterday's statistics).
WITH top10_table AS (SELECT
db_name,
schema_name,
table_name,
hot_storage_size + cold_storage_size AS total_storage_size
FROM
hologres.hg_table_info
WHERE
collect_time >= (current_date - interval '1 day')::timestamptz -- Yesterday
AND collect_time < current_date
AND type = 'TABLE'
AND ( cold_storage_size IS NOT NULL OR hot_storage_size IS NOT NULL )
ORDER BY total_storage_size DESC
LIMIT 10
)
SELECT
base.db_name,
base.schema_name,
base.table_name,
base.hot_storage_size + cold_storage_size AS total_storage_size,
base.row_count,
base.read_sql_count_1d,
base.write_sql_count_1d,
base.collect_time :: date AS collect_date
FROM
hologres.hg_table_info AS base
LEFT JOIN
top10_table
ON
base.db_name = top10_table.db_name
AND base.schema_name = top10_table.schema_name
AND base.table_name = top10_table.table_name
WHERE
collect_time > (current_date - interval '1 week')::timestamptz -- For the last week
AND type = 'TABLE'
AND ( cold_storage_size IS NOT NULL OR hot_storage_size IS NOT NULL )
ORDER BY total_storage_size DESC , collect_date DESC;Use case 4: Analyze trends for bottom 10 tables
-- Analyze access, storage, and data volume trends over the last week for the 10 tables that use the least storage (based on yesterday's statistics).
WITH top10_table AS (SELECT
db_name,
schema_name,
table_name,
hot_storage_size + cold_storage_size AS total_storage_size
FROM
hologres.hg_table_info
WHERE
collect_time >= (current_date - interval '1 day')::timestamptz -- Yesterday
AND collect_time < current_date
AND type = 'TABLE'
ORDER BY total_storage_size ASC LIMIT 10
)
SELECT
base.db_name,
base.schema_name,
base.table_name,
base.hot_storage_size + cold_storage_size AS total_storage_size,
base.row_count,
base.read_sql_count_1d,
base.write_sql_count_1d,
base.collect_time :: date AS collect_date
FROM
hologres.hg_table_info AS base
RIGHT JOIN
top10_table
ON
base.db_name = top10_table.db_name
AND base.schema_name = top10_table.schema_name
AND base.table_name = top10_table.table_name
WHERE
collect_time > (current_date - interval '1 week')::timestamptz -- For the last week
AND type = 'TABLE'
ORDER BY total_storage_size ASC , collect_date DESC ;Use case 5: Identify large tables with small files
-- View the number of files and storage size for each table, and sort by average file size.
-- The table group shows the shard count only for the current database. For other databases, this value is null.
SELECT
db_name,
schema_name,
table_name,
cold_storage_size + hot_storage_size AS total_storage_size,
cold_file_count + hot_file_count AS total_file_count,
(cold_storage_size + hot_storage_size) / (cold_file_count + hot_file_count) AS avg_file_size,
tmp_table_info.table_meta ->> 'table_group' AS table_group,
tg_info.shard_count
FROM
hologres.hg_table_info tmp_table_info
LEFT JOIN (
SELECT
tablegroup_name,
property_value AS shard_count
FROM
hologres.hg_table_group_properties
WHERE
property_key = 'shard_count'
) tg_info ON tmp_table_info.table_meta ->> 'table_group' = tg_info.tablegroup_name
WHERE
collect_time > (current_date - interval '1 day')::timestamptz
AND type = 'TABLE'
AND (
cold_storage_size IS NOT NULL
OR hot_storage_size IS NOT NULL
)
AND (
cold_file_count IS NOT NULL
OR hot_file_count IS NOT NULL
)
AND cold_file_count + hot_file_count <> 0
ORDER BY avg_file_size;Use case 6: Check row count changes
-- View the last modification time and the total change in the number of rows from the last modification.
-- If the instance contains many tables, filter the CTE tmp_table_info to prevent long query times from fetching excessive data.
WITH tmp_table_info AS (
SELECT
db_name,
schema_name,
table_name,
row_count,
collect_time,
last_modify_time
FROM
hologres.hg_table_info
WHERE
last_modify_time IS NOT NULL
AND type = 'TABLE'
-- Add filters for tmp_table_info here.
-- For example: collect_time > (current_date - interval '14 day'):: timestamptz
-- For example: table_name like ''
-- For example: type = 'PARTITION'
)
SELECT
end_data.db_name AS db_name,
end_data.schema_name AS schema_name,
end_data.table_name AS table_name,
(end_data.row_count - start_data.row_count) AS modify_row_count,
end_data.row_count AS current_rows,
end_data.last_modify_time AS last_modify_time
FROM
(
SELECT
db_name,
schema_name,
table_name,
row_count,
last_modify_time
FROM
tmp_table_info
WHERE
collect_time > (current_date - interval '1 day')::timestamptz -- Query the last modification time of the table recorded yesterday.
) end_data
LEFT JOIN (
SELECT
db_name,
schema_name,
table_name,
row_count,
collect_time
FROM
tmp_table_info
) start_data ON (
end_data.db_name = start_data.db_name
AND end_data.schema_name = start_data.schema_name
AND end_data.table_name = start_data.table_name
AND end_data.last_modify_time::date = (start_data.collect_time + interval '1 day')::date
);