This topic describes the ANALYZE command in Hologres and the behavior of AUTO ANALYZE for automatic statistics collection. It explains the main behavior parameters to help you understand and control statistics collection, improving query plan quality.
Overview of Statistics and ANALYZE
Why Are Statistics Needed?
The optimizer relies on table and column statistics to generate reasonable execution plans, including:
Row count and column count
Column width (Width)
Number of Distinct Values (NDV)
Most Common Values (MCV) and their frequencies
Histogram and other distribution characteristics
This information guides the optimizer to:
Estimate operator execution cost
Prune the execution plan search space
Select appropriate Join order and Join algorithms
Estimate memory and parallelism
This results in a more optimal execution plan.
The ANALYZE command is the standard method for users to actively collect table/column statistics. When statistics are missing or inaccurate, the query plan may degrade significantly, such as abnormal Join ordering, which manifests as: query OOM, long execution time, or high instance CPU consumption.
AUTO ANALYZE is the standard method for the Hologres system to automatically collect table/column statistics. Since AUTO ANALYZE is an asynchronous background system behavior, there is a delay of tens of seconds to minutes from when table data changes to when AUTO ANALYZE discovers, schedules, and completes the automatic statistics collection. Therefore, in some specific scenarios, it is recommended that users manually run ANALYZE to ensure statistics are collected in a timely manner.
When to Manually Execute ANALYZE
Manual execution is recommended in the following scenarios:
After
INSERT/UPDATE/DELETEoperations that import/update/delete a large amount of data on a table, and the table will be queried immediately, it is recommended to run ANALYZE on the table after theINSERT/UPDATE/DELETE;When multi-table Join performance degrades significantly, execute column-level ANALYZE on key Join columns and Group By columns;
After executing
CREATE FOREIGN TABLEorIMPORT FOREIGN SCHEMAfor a foreign table, and it will be queried immediately, it is recommended to run ANALYZE on the newly created foreign table before querying to collect initial statistics;After executing
CREATE EXTERNAL DATABASE, and it will be queried immediately, it is recommended to run ANALYZE on the tables that need to be queried under the external database before querying;When one of the following errors/symptoms occurs:
Multi-table Join OOM error:
Query executor exceeded total memory limitation ..., orQuery exceed per query memory limitation ...;Multi-table Join error:
Capacity error: BinaryArray cannot contain more than 2147483646 bytes ...;Import or query tasks run abnormally long with uneven CPU utilization.
EXPLAIN <SQL>shows a Scan operator with an estimated row count of 1000,-> Seq Scan on tbl (cost=0.00..5.00 rows=1000 width=1); this indicates the table lacks statistics.EXPLAIN <SQL>shows a Scan operator with an estimated row count of 1,-> Seq Scan on tbl (cost=0.00..1.00 rows=1 width=1); this indicates the table is estimated to have 0 rows. If the actual Scan result is not 0 rows, the statistics may be outdated, and a manual ANALYZE is recommended.
In the above scenarios, it is generally recommended to first manually run ANALYZE and observe whether performance recovers, then further tune the AUTO ANALYZE configuration.
ANALYZE Command
Basic Syntax and Behavior
Collect statistics for all columns of the entire table
ANALYZE table_name;Collects the table row count and uniformly collects Width, MCV, Histogram, NDV, and other statistics for all regular columns in the table;
Uses a sampling-based approach to estimate various statistical values. The sampling operation initiates a sampling sub-SQL within the ANALYZE process.
By default, ANALYZE randomly samples 30,000 rows from the table for statistics collection and computation. If the table has fewer than 30,000 rows, all rows are sampled.
Collect statistics for specific columns only (recommended for key columns)
ANALYZE table_name(col1, col2, ...);Computes a more accurate NDV for the specified columns (typically using
APPROX_COUNT_DISTINCTlogic), which is more accurate than table-level sampling but more expensive;MCV, Histogram, Width, and other statistics are still obtained through sampling;
When
ANALYZEis executed multiple times on the same column, the later execution overwrites the old statistics for that column, but does not affect other unspecified columns.
For tables with many columns, ANALYZE table_name; is not fully equivalent to ANALYZE table_name(col1, col2, ...): the latter is typically more accurate for NDV but more costly. It is recommended to execute column-level ANALYZE on frequently used Join columns, Group By columns, and other key columns as a supplement.
Limitations and Considerations
Which columns will not be analyzed
Unsupported types: When a column type is a User Defined Type or is not in the set of types that Hologres supports for statistics, statistics collection will be skipped:
Column types that do not support ANALYZE include: "char" (single character type), BIT, VARBIT, BYTEA, NAME, JSON, TSVECTOR, TSQUERY, OID, XID, CID, INET, POINT, LINE, LSEG, BOX, CIRCLE, PATH, POLYGON, BITARRAY, VARBITARRAY, BYTEAARRAY, INT2ARRAY, MONEYARRAY, NUMERICARRAY, TIMEARRAY, TIMETZARRAY, TIMESTAMPTZARRAY, TIMESTAMPARRAY, ANYARRAY, REGCLASS, DATEARRAY, or other INTERNAL types.
When the column type is unsupported, both manual ANALYZE and AUTO ANALYZE will be ignored;
For partitioned tables, column types that do not support incremental partition statistics merging will also not be analyzed. In addition to the types listed above, unsupported types also include: BOOLARRAY, INT4ARRAY, TEXTARRAY, BPCHARARRAY, VARCHARARRAY, INT8ARRAY, FLOAT4ARRAY, FLOAT8ARRAY.
Columns marked as dropped: Logical columns retained after
ALTER TABLE ... DROP COLUMN(attisdropped = true) will not be analyzed;Explicitly disabled by column attribute:
Columns with the column attribute
enable_analyze = falsewill not be processed by ANALYZE or AUTO ANALYZE;Columns with the column attribute
enable_auto_analyze = falsewill not be processed by AUTO ANALYZE.For details about these column attributes, see the subsequent sections.
JSONB column statistics not enabled:
If the column type is
JSONBandenable_jsonb_statsis not enabled, JSONB statistics will not be collected for that column;If you need to rely on JSONB column statistics (for example, complex JSON filter conditions), you need to first enable
enable_jsonb_statsin the column attributes.
Summary: Even after executing ANALYZE table_name;, the above columns may be skipped due to type or column attribute restrictions. When you cannot find the corresponding statistics records in pg_stats, you should first check these restrictions.
ANALYZE limitations for data lake tables
ANALYZE is not supported for specific Timestamp, Version, Branch, Snapshot, or Tag individually
Typical Usage
Recommended typical usage
Execute column-level ANALYZE on Join columns, Group By columns, and filter condition columns:
ANALYZE tablename (order_id, user_id, dt);ANALYZE tablename;ANALYZE foreign_table;Skip statistics collection for specific columns by setting column attributes
Hologres supports controlling whether ANALYZE or AUTO ANALYZE is executed on a column through column-level attributes, which is applicable to the following scenarios:
Ultra-wide columns (such as very long Text) have high statistics collection overhead, and collecting statistics for them provides limited or no benefit to the query plan;
Certain columns never participate in Joins, filtering, or aggregation, and do not need statistics collection;
Need to reduce resource consumption of ANALYZE or AUTO ANALYZE.
The control parameters are the following column options:
enable_analyze: Controls whether the column participates in manual ANALYZE and automatic AUTO ANALYZE. Default istrue;enable_auto_analyze: Controls only whether the column participates in automatic AUTO ANALYZE. Default istrue.
Setting method (via
ALTER TABLEcommand):
-- 禁用某列的 ANALYZE 和 AUTO ANALYZE
ALTER TABLE t ALTER COLUMN bitmap_col SET (enable_analyze = false);
-- 禁用某列的 AUTO ANALYZE,但仍允许手动 ANALYZE
ALTER TABLE t ALTER COLUMN large_text_col SET (enable_auto_analyze = false);
-- 恢复默认行为(重新启用)
ALTER TABLE t ALTER COLUMN bitmap_col RESET (enable_analyze);
ALTER TABLE t ALTER COLUMN large_text_col RESET (enable_auto_analyze);Partitioned Tables and Incremental Partition ANALYZE
To reduce the overhead of executing ANALYZE on large partitioned tables, Hologres supports incremental partition ANALYZE:
Goals:
Avoid full sampling of the parent table each time;
Update parent table statistics by performing
ANALYZEon child partitions and merging the statistics;Skip re-collecting statistics for child partitions that have not changed.
Scenarios and Limitations
In Hologres V2.0 and later, incremental partition ANALYZE is enabled by default without manual configuration. For details, refer to the product version notes;
When enabled, executing
ANALYZEon a child partition will attempt to combine its statistics with those of other child partitions to produce the parent table statistics;The prerequisite for merging parent table statistics is that, apart from the child partition currently being analyzed, all other child partitions under the parent table must already have statistics (including row count statistics and column statistics);
Therefore, you can:
During initial setup, execute
ANALYZEon the parent table first. The system will automatically detect child partitions that needANALYZEand analyze them one by one, ultimately merging the results into the parent table statistics;When new partitions are added later, you only need to execute
ANALYZEon the newly added child partitions.
Brief Example
BEGIN;
DROP TABLE IF EXISTS t_parent;
CREATE TABLE t_parent(a int, b int) PARTITION BY LIST (a);
CREATE TABLE child1 PARTITION OF t_parent FOR VALUES IN (1);
CREATE TABLE child2 PARTITION OF t_parent FOR VALUES IN (2);
COMMIT;
insert into child1 values (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 2);
insert into child2 values (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 5), (2, 5), (2, 5), (2, 5), (2, 5);
-- child2 lacks statistics, so parent table statistics cannot be merged
dbname=# ANALYZE child1;
INFO: auto merging of leaf partition stats to calculate root partition stats is not possible because partition child2 is not analyzed
ANALYZE
-- child1 already has statistics. After analyzing child2, parent table statistics can be automatically merged
dbname=# ANALYZE child2;
ANALYZE
dbname=# SELECT tablename,
attname,
null_frac AS "NF",
avg_width,
n_distinct,
most_common_vals AS "MCV",
most_common_freqs AS "MCV_FRAQ"
FROM pg_stats
WHERE tablename = 't_parent';
tablename | attname | NF | avg_width | n_distinct | MCV | MCV_FRAQ
-----------+---------+----+-----------+------------+-------+------------
t_parent | a | 0 | 4 | 2 | {1,2} | {0.5,0.5}
t_parent | b | 0 | 4 | -0.2 | {1,5} | {0.45,0.3}
(2 rows)
-- After this, when new partitions are added, you only need to ANALYZE the child partition. Parent table statistics will be automatically merged through sampling the child partition only.ANALYZE on the Parent Table
Sometimes multiple child partitions of a partitioned table have undergone data changes, and you do not want to ANALYZE each child partition individually. In this case, you can perform a single ANALYZE on the parent table.
When incremental partition ANALYZE is enabled, manually executing ANALYZE on the parent table causes the system to adaptively select child partitions that need statistics updates, rather than blindly analyzing all child partitions. This mechanism significantly reduces the ANALYZE overhead for large partitioned tables while ensuring the accuracy of statistics.
Behavior Description
When executing ANALYZE partition_parent_table;, the system automatically identifies the following two types of child partitions, prioritizes collecting their ANALYZE statistics, and then merges them into the parent table statistics:
Stats-changed partitions
The child partition data has changed significantly since the last
ANALYZE(inserts, updates, or deletes);Determination criteria:
The number of changed rows reaches the change threshold (
1% * row count at last ANALYZE).
Unable-to-merge partitions
The child partition lacks complete statistics, making it unable to participate in the parent table statistics merge;
Common causes include:
ANALYZE has never been executed: The child partition statistics are completely missing;
Missing column statistics: Specified columns lack necessary statistics (such as MCV, Histogram, or NDV HLL Counter);
Statistics type mismatch with column type: The child partition column type has changed, causing existing statistics to be inconsistent with the current column type;
Outdated statistics version: The parent table's statistics version (statistic_version) is lower than the latest child partition's statistics version, indicating that the parent table statistics are not up-to-date and need to be re-merged.
Execution Example and Output
-- Execute ANALYZE on the parent table
ANALYZE partition_parent_table;Typical output:
INFO: will analyze 5 part tables first (stats changed 3, unable-to-merge 2)This output indicates:
The system identified 5 child partitions that need priority statistics collection;
Among them, 3 child partitions require statistics updates due to significant data changes;
Among them, 2 child partitions were marked as unable-to-merge due to missing or unusable statistics, requiring an internal ANALYZE to be triggered first.
Notes
It is not recommended to set
autovacuum_enabled = false(disabling AUTO ANALYZE at the table level) for child partitions, as this may cause the child partition to remain in an "unable-to-merge" state for an extended period unless you manually ANALYZE it again.
Through this mechanism, Hologres can significantly improve the execution efficiency of ANALYZE for large partitioned tables while ensuring the quality of statistics, providing the query optimizer with more timely and accurate decision-making data.
ANALYZE for Logical Partitioned Tables
A logical partitioned table is a partition table type unique to Hologres. Unlike standard PostgreSQL partitioned tables (which create physical child tables via PARTITION BY), a logical partitioned table remains physically as a single table and only partitions data logically based on partition column values, providing more flexible partition management capabilities.
The ANALYZE logic for logical partitioned tables is essentially the same as for physical partitioned tables. The only difference is that child partitions of a logical partitioned table are not separate tables, but like physical child partitions, they each still have their own independent row count statistics and column statistics.
Behavior Description
When executing ANALYZE on a logical partitioned table, the system uses processing logic similar to but independent from physical partitioned tables:
Automatic discovery of logical partitions
When executing
ANALYZE logical_partitioned_table;on a logical partitioned (parent) table, the system automatically queries the storage engine for all logical partitions of the table;
Adaptive selection of logical partitions requiring statistics
Similar to physical partitioned tables, the system identifies the following logical partitions and prioritizes their statistics collection:
Stats-changed partitions
The child partition data has changed significantly since the last
ANALYZE(inserts, updates, or deletes), with the same determination criteria as physical partitioned tables described above;Logical partitions with missing statistics: ANALYZE has never been executed, or specified columns lack necessary statistics (such as HLL counters);
Logical partitions with unusable statistics: Column statistics do not match the current column type, or the statistics are outdated.
Empty partitions are skipped: Logical partitions with 0 rows are automatically ignored.
The system outputs informational messages, for example:
INFO: will analyze 10 logical partitions firstPer-partition sampling and statistics collection
For each logical partition that needs statistics collection, the system constructs a sampling query with partition column filter conditions (such as
WHERE user_id = 1 AND event_date = '2024-11-04');Data sampling is performed within the scope of that logical partition to collect column-level statistics (NDV, MCV, Histogram, HLL counters, etc.);
Statistics are stored at the logical partition granularity in
hologres_statistic.hg_table_statistic, distinguished by theunique_namefield for different logical partitions.
Merging parent table statistics
After statistics collection for all logical partitions is complete, the system automatically triggers statistics merging to generate the parent table statistics.
Execution Example and Output
Example 1: Execute ANALYZE on a logical partitioned parent table
-- Create a logical partitioned table
CREATE TABLE user_events (
user_id INT NOT NULL,
event_type TEXT NOT NULL,
event_date DATE NOT NULL,
event_data INT
)
LOGICAL PARTITION BY LIST(user_id, event_date);
-- Insert data into different logical partitions
INSERT INTO user_events SELECT 1, 'login', '2024-11-01', i FROM generate_series(1, 1000) i;
INSERT INTO user_events SELECT 1, 'logout', '2024-11-02', i FROM generate_series(1, 1000) i;
INSERT INTO user_events SELECT 2, 'purchase', '2024-11-03', i FROM generate_series(1, 1000) i;
INSERT INTO user_events SELECT 3, 'view', '2024-11-04', i FROM generate_series(1, 1000) i;
-- Execute ANALYZE on the parent table
ANALYZE VERBOSE user_events;Typical output:
INFO: will analyze 4 logical partitions first
INFO: analyzing hologres table "public.user_events" PARTITION (user_id=1,event_date='2024-11-01')
INFO: analyzing hologres table "public.user_events" PARTITION (user_id=1,event_date='2024-11-02')
INFO: analyzing hologres table "public.user_events" PARTITION (user_id=2,event_date='2024-11-03')
INFO: analyzing hologres table "public.user_events" PARTITION (user_id=3,event_date='2024-11-04')
INFO: try to merge root partition
INFO: automatically merging leaf partition stats to calculate root partition statsDescription:
The system automatically discovered 4 logical partitions that need statistics updates;
Statistics were collected for each logical partition one by one;
Finally, the parent table's global statistics were automatically merged and generated.
Example 2: Execute ANALYZE on specified logical partitions
-- Execute ANALYZE on a single logical partition
ANALYZE user_events PARTITION (user_id=1, event_date='2024-11-01');
-- Execute ANALYZE on multiple logical partitions
ANALYZE user_events
PARTITION (user_id=1, event_date='2024-11-01')
PARTITION (user_id=2, event_date='2024-11-03');
-- Execute ANALYZE on specific columns of a specified logical partition
ANALYZE user_events
PARTITION (user_id=1, event_date='2024-11-01') (user_id, event_type);Description:
You can use the
PARTITIONclause to specify which logical partitions to analyze;Multiple logical partitions can be specified at the same time;
As with standard ANALYZE, you can further specify the columns to collect statistics for.
Statistics Storage
Statistics for logical partitioned tables are stored in the hologres_statistic.hg_table_statistic table:
SELECT
unique_name,
schema_name,
table_name,
total_rows,
sample_rows,
nattr
FROM hologres_statistic.hg_table_statistic
WHERE table_name = 'user_events'
ORDER BY unique_name;Example result:
| unique_name | schema_name | table_name | total_rows | sample_rows | nattr |
|-------------|-------------|------------|------------|-------------| ----- |
| user_events | public | user_events | 4000 | 0 | 4 |
| user_events.a1b2c3d4e5f6... | public | user_events | 1000 | 1000 | 4 |
| user_events.f6e5d4c3b2a1... | public | user_events | 1000 | 1000 | 4 |
| user_events.1234567890ab... | public | user_events | 1000 | 1000 | 4 |
| user_events.abcdef123456... | public | user_events | 1000 | 1000 | 4 |Field descriptions:
Parent table record:
total_rows = 4000: Total row count across all logical partitions;sample_rows = 0: Parent table statistics are derived through merging, without direct sampling.
Logical partition records (
unique_namecontains the MD5 hash identifying the statistics object (table/partition)):Each logical partition has an independent statistics record;
sample_rows = 1000: The actual number of rows sampled for that logical partition.
Best Practices for Logical Partitioned Table ANALYZE
Prefer executing ANALYZE on the logical partitioned parent table
When executing
ANALYZEon the logical partitioned parent table, the system automatically discovers and collects statistics for all logical partitions without the need to manually specify each partition;Only use the
PARTITIONclause to specify partitions when you need to quickly update statistics for specific logical partitions.
Leverage incremental statistics updates
When data is only written to a few logical partitions, you can execute ANALYZE on just those partitions;
The system will automatically trigger parent table statistics merging without the need to re-collect statistics for all partitions.
Be aware of the number of logical partitions
When there are too many logical partitions (e.g., thousands), the initial ANALYZE may take a long time. It is recommended to execute it during off-peak hours;
Use in conjunction with AUTO ANALYZE
Logical partitioned tables are also managed by AUTO ANALYZE, which automatically identifies logical partitions that need statistics updates.
Configurable Parameters for the ANALYZE Command
Parameter | Description | Supported Version | Default Value | Notes / Usage Example |
hg_experimental_analyze_foreign_partitions_access_limit | Maximum number of partitions allowed to be accessed during random sampling when executing ANALYZE on foreign tables | v0.10 and later | 0 (unlimited) | -- Only sample 100 partitions to prevent random sampling from scanning too much foreign table data ALTER DATABASE dbname SET hg_experimental_analyze_foreign_partitions_access_limit = 100; |
hg_analyze_foreign_table_max_sample_row_count | Maximum number of rows to sample during random sampling when executing ANALYZE on foreign tables | v4.1 and later | 0 (unlimited). For details, see version updates. |
AUTO ANALYZE Automatic Statistics Collection
Starting from Hologres V0.10, Hologres supports the automatic statistics collection mechanism AUTO ANALYZE:
Automatically determines which tables need statistics updates based on table creation, data writes, and data changes;
Asynchronously schedules statistics collection tasks in the background, eliminating the need for users to manually ANALYZE each table;
Reduces the risk of missing statistics due to overlooked ANALYZE operations.
Switch and Scope
Database-level Switch
The configurable parameter is
hg_enable_start_auto_analyze_worker. Starting from Hologres V0.10, it is enabled by default.
-- 查看当前数据库是否开启 AUTO ANALYZE
SHOW hg_enable_start_auto_analyze_worker;
-- 关闭(仅临时排查、规避问题时使用)
ALTER DATABASE dbname SET hg_enable_start_auto_analyze_worker = OFF;
-- 重置回默认值ON(推荐)
ALTER DATABASE dbname RESET hg_enable_start_auto_analyze_worker;Note: The above GUC is a database-level configuration. SESSION-level or ROLE-level configurations are ineffective. It must be set through ALTER DATABASE to take effect on that DB, and only superusers can modify it.
Table-level Switch
Hologres supports controlling AUTO ANALYZE behavior for individual tables.
Behavior Description
Disable AUTO ANALYZE for a single table
ALTER TABLE my_table SET (autovacuum_enabled = false);Enable AUTO ANALYZE for a single table (restore default behavior)
-- 推荐,重置以恢复默认
ALTER TABLE my_table RESET (autovacuum_enabled);
-- 或者显式设置为 true(不推荐)
ALTER TABLE my_table SET (autovacuum_enabled = true);Check the autovacuum_enabled status of a table
SELECT relname, reloptions FROM pg_class WHERE relname = 'my_table';
relname | reloptions
----------+----------------------------
my_table | {autovacuum_enabled=false}
(1 row)Typical Use Cases
Special business tables that do not need statistics
Some tables are only used for temporary storage or logging and do not participate in complex queries, so statistics collection is unnecessary;
Tables with large data volumes that change frequently but do not require statistics can avoid unnecessary resource consumption by setting
autovacuum_enabled = false.
Statistics are already maintained manually
For certain critical tables, you may have already established a regular manual ANALYZE workflow that meets your business requirements.
Temporarily disable to reduce resource usage
During peak business hours or emergency troubleshooting, you can temporarily disable AUTO ANALYZE for specific tables;
After the situation is resolved, re-enable it or manually execute ANALYZE.
Considerations
For partitioned tables:
It is not recommended to set
autovacuum_enabled = falseon child partitions, as this may cause the child partition to remain in a "cannot merge" state for an extended period, affecting the accuracy of parent table statistics;If you need to disable it, it is recommended to only set it on the parent partition table.
The table-level switch has higher priority than the database-level switch:
Even if the database-level AUTO ANALYZE is enabled, tables with
autovacuum_enabled = falsewill not be processed by AUTO ANALYZE.
Column-level Switch
Hologres supports controlling whether ANALYZE or AUTO ANALYZE is performed on specific columns through column-level attributes, which is applicable to the following scenarios:
Wide columns (such as long Text fields) have high statistics collection overhead but provide limited benefit to query plans;
Certain columns never participate in Joins, filters, or aggregations and do not need statistics collection;
Need to reduce resource consumption of AUTO ANALYZE.
The control parameter is the following column option:
enable_auto_analyze: Controls only whether the column participates in automatic AUTO ANALYZE, defaults totrue.Configuration method (through the
ALTER TABLEcommand):
-- 禁用某列的 AUTO ANALYZE,但仍允许手动 ANALYZE
ALTER TABLE t ALTER COLUMN large_text_col SET (enable_auto_analyze = false);
-- 恢复默认行为(重新启用)
ALTER TABLE t ALTER COLUMN large_text_col RESET (enable_auto_analyze);Considerations:
After setting
enable_auto_analyze = falseat the column level, AUTO ANALYZE tasks will no longer collect statistics for that column. However, you can still explicitly collect statistics through manualANALYZE table_name;orANALYZE table_name(col);;If a column has significant impact on query plans (such as Join columns or filter condition columns), it is not recommended to disable its statistics collection.
Trigger Logic of AUTO ANALYZE
AUTO ANALYZE combines multiple types of signals to determine whether statistics need to be updated for a particular table:
Data Change Volume
Applies only to regular Hologres tables (excludes foreign tables);
At 1-minute intervals, collects the row count of
INSERT/UPDATE/DELETEoperations observed by the FE on each table. If the change exceeds the threshold, an AUTO ANALYZE is triggered (fast response);At 10-minute intervals, retrieves the insert/update/delete change row counts from the storage engine. If the change exceeds the threshold, an AUTO ANALYZE is triggered (precise calibration);
Schema Changes
Applies only to regular Hologres tables (excludes foreign tables);
At 1-minute intervals, collects tables with the following changes and performs AUTO ANALYZE:
ADD / DROP COLUMN;
Partitioned tables (including logical partition tables) ATTACH / DETACH child partitions.
Missing Statistics / Unable to Merge Parent Partition Table Statistics
Applies to both regular Hologres tables and foreign tables;
At 1-minute intervals, collects tables/columns with missing statistics and child partitions that cannot meet the conditions for incremental statistics merging.
Foreign Tables
Applies only to foreign tables introduced through
CREATE FOREIGN TABLE,IMPORT FOREIGN SCHEMA, or the Auto Load mechanism under non-External Database contexts;Currently only supports AUTO ANALYZE for MaxCompute foreign tables;
At 4-hour intervals, periodically checks all foreign tables in the DB. Between two checks, if there are external data changes (the criterion is that the corresponding foreign table's last_modify_timestamp falls between the two check intervals), an AUTO ANALYZE is triggered.
Fallback Statistics
During the 1:00-5:00 AM window, a "fallback" AUTO ANALYZE is performed on tables that have continuous changes but have not triggered the threshold (>5000 row changes, <10% change volume), to capture data distribution drift in columns (for example, date-type fields written after midnight the next day are completely different from the previous day, resulting in distribution changes).
Access Hotspot Tables (External Database only)
After enabling the
enable_auto_analyzeparameter on an External Database, the Hologres system begins tracking recently accessed tables since the last system startup and adds them to the observation list;At 1-hour intervals, tables in the observation list that are "frequently accessed recently" trigger an AUTO ANALYZE;
Note that after a system restart, the accessed table list is cleared and recording starts over.
AUTO ANALYZE for External Database
Starting from Hologres V3.0, Hologres supports the External Database feature. When creating an External Database, you can enable AUTO ANALYZE (disabled by default, see release notes for details):
-- 创建 External Database 时,开启 AUTO ANALYZE
-- 参考文档:https://help.aliyun.com/zh/hologres/developer-reference/create-external-database
CREATE EXTERNAL DATABASE <ext_database_name> WITH
metastore_type 'maxcompute'
mc_project 'project_name'
enable_auto_analyze 'true';
-- 创建 External Database 之后,为其开启 AUTO ANALYZE
ALTER EXTERNAL DATABASE dbname WITH enable_auto_analyze 'true';Feature limitation: For V3.2 and earlier, AUTO ANALYZE for External Database requires that the CREATE EXTERNAL DATABASE statement is configured with Access Key and Access Secret. AUTO ANALYZE is not supported for External Databases configured with SLR or STS. There are no such limitations for V4.0 and later.
The
metastore_typescope supported by AUTO ANALYZE on External Database is:dlf, dlf-paimon, dlf-rest, maxcompute.The table formats supported by AUTO ANALYZE on External Database include:
MaxCompute, Paimon, Iceberg.AUTO ANALYZE on External Database uses the Owner identity of the External Database by default, and employs Time-based One-Time Password (TOTP) authentication to execute ANALYZE tasks. If the Owner does not have sufficient table permissions, the system cannot successfully execute AUTO ANALYZE tasks.
For a foreign table in an External Database to be eligible for AUTO ANALYZE, the following conditions must be met:
The External Database configuration
hg_enable_start_auto_analyze_worker = on; (default is on)The External Database attribute
enable_auto_analyzeis set totrue(not configured by default, which meansfalse, see release notes for details)The External Database Owner must have query permissions for the data lake Project (if applicable) and the table
The table must have been accessed at least once within the last 3 days
AUTO ANALYZE under External Database only monitors foreign tables that have been accessed at least once;
After a system restart, if a foreign table is accessed at least once, the AUTO ANALYZE system adds it to the observation list and periodically triggers AUTO ANALYZE for these tables;
Note: For foreign tables in an External Database that have never been accessed, the first access uses a fast row count estimation mechanism to obtain the row count as fallback statistics (see the "Fast Row Count Estimation" section below), which still provides a certain level of statistics coverage.
Resource Limits of AUTO ANALYZE
To prevent background AUTO ANALYZE tasks from impacting foreground user tasks, Hologres has established the following resource limits for the AUTO ANALYZE feature:
When AUTO ANALYZE tasks execute, the default memory limit per worker is
4 GB. If the table data volume is too large, sampling may exceed the memory limit, causing the AUTO ANALYZE sampling SQL to fail. In such cases, only row count information can be collected, and column distribution information (MCV, Histogram, NDV, etc.) cannot be collected. You can adjustauto_analyze_work_memory_mbto change this behavior. The larger the instance specification, the higher the available memory limit for AUTO ANALYZE.The instance-level concurrency for simultaneously scheduled AUTO ANALYZE tasks typically does not exceed 4, and in extreme cases does not exceed 6.
For child partition AUTO ANALYZE tasks under the same partitioned table, the maximum simultaneous scheduling concurrency is 3.
By default, AUTO ANALYZE collects statistics for the first 256 columns. If a table has more than 256 columns, only the first 256 are collected (for partitioned tables, partition columns are prioritized and will be collected even if they are beyond position 256). You can adjust
hg_experimental_auto_analyze_max_columns_countto change this value.The sampling sub-SQLs of AUTO ANALYZE tasks run using a low-priority Background Pool with limited query execution-level concurrency. This causes AUTO ANALYZE tasks to take longer than manual ANALYZE, which users do not need to be concerned about.
For foreign tables, AUTO ANALYZE only collects column statistics for partition columns (e.g., MCV). It does not perform sampling to collect column statistics for columns other than partition columns.
Configurable Parameters of AUTO ANALYZE
By default, the Hologres AUTO ANALYZE feature does not require any parameter modifications.
In rare business scenarios (such as infrequent data writes/updates, query workloads that do not need statistics, or system load increases caused by AUTO ANALYZE), users can modify certain default parameters to adjust AUTO ANALYZE behavior for intervention or partial performance tuning purposes.
Note: Only superusers can adjust the default behavior of AUTO ANALYZE. All parameters must be set at the DB level and take effect after the next minute.
-- Superuser,DB级别修改 AUTO ANALYZE 参数的默认值
ALTER DATABASE dbname SET <GUC> = <values>;Parameter | Description | Supported Version | Default Value | Notes / Usage Examples |
hg_enable_start_auto_analyze_worker | Enable the AUTO ANALYZE feature | V0.10 and later | on | -- Temporarily disable AUTO ANALYZE for the DB |
hg_experimental_auto_analyze_max_columns_count | Number of columns for which AUTO ANALYZE automatically collects statistics | V1.1.0 and later | 256 |
|
auto_analyze_work_memory_mb | Memory limit for a single table in AUTO ANALYZE, in MB | V1.1.54 and later | 4096 | -- Change to 9 GB |
auto_analyze_work_statement_timeout | Timeout for AUTO ANALYZE task execution, in ms | V2.0 and later | 3600000 | -- Change to 3h |
hg_experimental_auto_analyze_max_foreign_table_partitions | Maximum number of foreign table partitions accessed during AUTO ANALYZE | V1.1.54 and later | 100 | If the number of foreign table partitions exceeds 100, the row counts of the top 100 partitions by lexicographic order of the first-level partition values are used by default to estimate the overall row count and partition column MCV of the foreign table. |
hg_auto_analyze_run_with_serverless_computing | Whether AUTO ANALYZE runs using Serverless resources | V3.1 and later | off (default is on for Serverless instances) | Generally no adjustment is needed. AUTO ANALYZE typically consumes minimal resources and runs in the background using low-priority processes, with minimal impact on instance load. |
hg_auto_analyze_serverless_computing_query_priority | Serverless task priority, range 1-5, higher values indicate higher priority. Note: hg_auto_analyze_run_with_serverless_computing must be enabled first | V3.1 and later | 2 | -- Change to highest priority |
Fast Row Count Estimation (Fast Num of Rows)
Starting from Hologres V3.1, Hologres introduced the fast row count estimation feature. When it detects that tables in an executing SQL query lack statistics or that statistics may be outdated, Hologres can quickly estimate table row counts through metadata from the underlying storage or external systems, thereby generating more reasonable execution plans.
Starting from Hologres V3.2, the fast row count estimation (Fast Num of Rows) feature is enabled by default.
Switch
How to enable fast row count estimation:
-- 关闭(仅适用于临时处置,或全库明确不需要统计信息的情况)
ALTER DATABASE dbname SET hg_experimental_get_fast_num_of_rows = OFF;
-- V3.2及以上,打开(重置回默认值)
ALTER DATABASE dbname RESET hg_experimental_get_fast_num_of_rows;
-- V3.1及以下,打开
ALTER DATABASE dbname SET hg_experimental_get_fast_num_of_rows = ON;Note: Fast row count estimation cannot replace complete statistics collection. It is still recommended to keep statistics fresh on critical business tables through ANALYZE/AUTO ANALYZE.
Usage Limits
Fast row count estimation is applicable to regular tables (including partitioned tables), foreign tables in External Database, and other scenarios.
The row count estimated by the fast row count estimation feature is not guaranteed to be completely accurate. For example, for performance reasons, on Hologres partitioned tables, only the row count statistics from up to 60 partitions (by default) are used to estimate the overall row count of the partitioned table.
In V4.1.15 and earlier, fast row count estimation is not enabled by default for foreign table types (Foreign Table). It can be enabled through the parameter
hg_experimental_enable_foreign_table_get_fast_num_of_rows. It is enabled by default in V4.1.16 and later.For foreign table types, fast row count estimation only supports MaxCompute foreign tables, Paimon foreign tables, and Iceberg foreign tables.
Examples
hg_experimental_get_fast_num_of_rows example
If a table has no statistics, enabling fast row count estimation can provide more accurate row counts. (Enabled by default in Hologres V3.2 and later)
-- V3.2+
create table test_tbl (a int);
insert into test_tbl select * from generate_series (1, 999);
-- 准确的行数估计(rows=999)
explain select count(1) from test_tbl ;
QUERY PLAN
----------------------------------------------------------------------------------------
Final Aggregate (cost=0.00..5.00 rows=1 width=8)
-> Gather (cost=0.00..5.00 rows=10 width=8)
-> Partial Aggregate (cost=0.00..5.00 rows=10 width=8)
-> Local Gather (cost=0.00..5.00 rows=20 width=8)
-> Partial Aggregate (cost=0.00..5.00 rows=20 width=8)
-> Seq Scan on test_tbl (cost=0.00..5.00 rows=999 width=1)The following is a real production case:
Without this feature enabled, if a table has no statistics, the Scan node in the query plan will show rows=1000 (indicating no statistics are available, and the plan is generated based on the default estimate of 1000 rows).
After enabling this feature, if a table has no statistics, the Scan node rows in the query plan will not equal 1000 (the system called the metadata from the underlying storage engine to obtain the table's row count).
Configurable Parameters for Fast Row Count Estimation
Parameter | Description | Supported Version | Default Value | Notes / Usage Example |
hg_experimental_get_fast_num_of_rows | Whether to enable fast row count estimation. | v3.1 and later | off (V3.1) on (V3.2+) | |
hg_experimental_enable_foreign_table_get_fast_num_of_rows | Whether to enable foreign table row count estimation. | v3.1 and later | off (V4.1.15 and earlier) on (V4.1.16+) | -- How to enable: ALTER DATABASE dbname SET hg_experimental_enable_foreign_table_get_fast_num_of_rows = on; |
hg_experimental_fast_num_rows_foreign_partitions_access_limit | For foreign partitioned tables, the maximum number of partitions from which to obtain row counts as the basis for overall estimation. Note: This only takes effect when foreign table row count estimation is enabled. | v3.1 and later | 60 | This is to control the time overhead of fast row count estimation. |
hg_get_fast_num_of_rows_holo_partitions_access_limit | For Hologres physical partitioned tables, the maximum number of partitions from which to obtain row counts as the basis for overall estimation. | v3.1 and later | 60 | This is to control the time overhead of fast row count estimation. |
Viewing Historical Execution Records of ANALYZE and AUTO ANALYZE
After ANALYZE and AUTO ANALYZE are executed, their execution records are written to the Query Log. Users can view the historical execution records of ANALYZE and AUTO ANALYZE through the hologres.hg_query_log view, including information such as the executed SQL, duration, and status.
How to Identify ANALYZE and AUTO ANALYZE Records in Query Log
ANALYZE and AUTO ANALYZE, along with their sampling SQL statements, are recorded separately in the Query Log.
The execution records have the following characteristics in the Query Log:
Field | ANALYZE | AUTO ANALYZE | Description |
| The username that executed ANALYZE |
| AUTO ANALYZE uses an internal system account for execution, which is uniformly displayed as |
| The application name of the session that executed ANALYZE |
| The AUTO ANALYZE framework connection uses |
|
|
| |
|
|
| The type of command executed |
|
|
| The sampling sub-SQL is a SELECT statement. You can use this statement to check the resource consumption of sampling. |
ANALYZE and its sampling SQL are associated through the comments in the query field, or through extended_info->>'source_query_id'.
SELECT query_id,
extended_info->>'src_query_id' as "source query id",
application_name,
status,
duration
FROM hologres.hg_query_log
WHERE query_start >= now() - interval '1 hour'
AND application_name IN ('AutoAnalyze', 'Hologres SQL Generated BY AUTO ANALYZE')
ORDER BY query_start DESC
LIMIT 2;
query_id | source query id | application_name | status | duration
---------------------+---------------------+----------------------------------------+---------+----------
1004019226350863009 | 1004019226350778807 | Hologres SQL Generated BY AUTO ANALYZE | SUCCESS | 119
1004019226350778807 | | AutoAnalyze | SUCCESS | 3276Query Examples
View recent AUTO ANALYZE execution records
SELECT usename,
status,
duration,
query_start,
query_end,
query,
application_name
FROM hologres.hg_query_log
WHERE query_start >= now() - interval '1 hour'
AND application_name IN ('AutoAnalyze')
ORDER BY query_start DESC
LIMIT 20;
query_id,
extended_info->>'src_query_id' as "source query id"View the AUTO ANALYZE execution history of a specific table
SELECT status,
duration,
query_start,
query
FROM hologres.hg_query_log
WHERE query_start >= now() - interval '1 day'
AND application_name IN ('AutoAnalyze')
AND query LIKE '%my_table_name%'
ORDER BY query_start DESC;Summarize AUTO ANALYZE execution overview for the past 3 days
SELECT query_date,
status,
COUNT(*) AS task_count,
AVG(duration) AS avg_duration_ms,
MAX(duration) AS max_duration_ms
FROM hologres.hg_query_log
WHERE query_start >= CURRENT_DATE::timestamptz - interval '2 day'
AND application_name IN ('AutoAnalyze')
AND command_tag = 'ANALYZE'
AND (status = 'SUCCESS' OR (
message NOT LIKE '%does not exist%'
AND message NOT LIKE '%retry later%'))
GROUP BY query_date, status
ORDER BY query_date DESC;View failed AUTO ANALYZE tasks in the past day
SELECT query_start,
duration,
message,
query
FROM hologres.hg_query_log
WHERE query_start >= now() - interval '1 day'
AND application_name IN ('AutoAnalyze', 'Hologres SQL Generated BY AUTO ANALYZE')
AND status != 'SUCCESS'
AND (message NOT LIKE '%does not exist%'
AND message NOT LIKE '%retry later%')
ORDER BY query_start DESC;Notes
Since AUTO ANALYZE connections are initiated with an internal administrator identity, regular users need the
pg_read_all_statsrole or database administrator privileges to view complete AUTO ANALYZE execution records.If you find that there are no AUTO ANALYZE records in the Query Log for an extended period, it is recommended to check whether the AUTO ANALYZE switch is enabled.
Viewing and Troubleshooting Statistics
Viewing Table Statistics (hologres_statistic.hg_table_statistic)
Table statistics are stored in the hologres_statistic.hg_table_statistic table and can also be observed in system tables.
Query this table to get the statistics from the most recent ANALYZE.
SELECT schema_name, -- 表的schema
table_name, -- 表名
user_name, -- 最近一次ANALYZE的用户
schema_version, -- 最近一次ANALYZE的表Schema版本
total_rows, -- 最近一次ANALYZE的行数
sample_rows, -- 最近一次ANALYZE获取统计信息的采样行数
analyze_timestamp, -- 最近一次ANALYZE的结束时间
analyze_count -- 到当前为止总计的ANALYZE次数
FROM hologres_statistic.hg_table_statistic
WHERE unique_name = hologres.hg_internal_statistic_unique_name ('schemaname', 'tablename')
ORDER BY analyze_timestamp DESC;
-- 示例输出
schema_name | table_name | user_name | schema_version | total_rows | sample_rows | analyze_timestamp | analyze_count
-------------+------------+------------------+----------------+------------+-------------+---------------------+---------------
public | test_fnr | BASIC$test_fnr | -1 | 999 | 999 | 2026-03-02 22:05:29 | 2
(1 row)V3.1 and earlier:
Each table has 0 to n records in the hologres_statistic.hg_table_statistic table. 0 records means ANALYZE has never been performed, and 1 or more records means ANALYZE has been run.
If there are two or more records, the schema_version of the two records must be different, because changes to the table schema (such as ADD COLUMN, CALL SET_TABLE_PROPERTY, etc.) generate a new version, which adds a new statistics record. The record corresponding to the old schema_version is no longer used.
The following example query result shows that the same table has 2 records, and the schema_version of the second record is lower than the first. The second record is therefore invalid and will not be used, and you do not need to pay attention to it. Hologres does not currently clean up historical expired records in the hg_table_statistic table, and users do not need to worry about old data.
schema_name | table_name | user_name | schema_version | total_rows | sample_rows | analyze_timestamp | analyze_count
-------------+------------+------------------+----------------+------------+-------------+---------------------+---------------
public | test_fnr | BASIC$test_fnr | 13 | 999 | 999 | 2026-03-01 18:05:29 | 2
public | test_fnr | BASIC$test_fnr | 12 | 999 | 999 | 2026-03-01 08:05:29 | 1
(1 row)V3.1 and later:
Each table has 0 to 1 records in the hologres_statistic.hg_table_statistic table. 0 records means ANALYZE has never been performed, and 1 record means ANALYZE has been run.
The schema_version is uniformly set to -1 (representing Deprecated).
This means that many DDL operations no longer invalidate statistics. For example, CALL SET_TABLE_PROPERTY will not re-trigger AUTO ANALYZE, and the existing statistics will continue to be used. Compared to Hologres V3.0 and earlier, the trigger frequency of AUTO ANALYZE is significantly reduced.
Querying row count and other statistics
Row count information is recorded in the reltuples field of the pg_class table.
-- relallvisible > 0:表有行数统计信息
-- relallvisible = 0:表行数未知,此时不能参考reltuples
-- relallvisible < 0:表无统计信息
SELECT relallvisible, reltuples FROM pg_class WHERE relname = 'test_table';If the table has no statistics, refer to the "Viewing tables with missing statistics" section below to identify the cause.
Querying column statistics
By querying the pg_stats view, you can obtain the statistics for all columns of the current table. For example, to get the statistics for the ds column of test_table:
select * from pg_stats where tablename = 'test_table' and attname = 'ds';
schemaname | tablename | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram
------------+------------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------------+----------------------
public | test_table | ds | f | 0 | 4 | 1 | {20241104} | {1} | | | | |
(1 row)If the column has no statistics, refer to the "Viewing tables with missing statistics" section below to identify the cause.
Viewing Tables with Missing Statistics
Hologres provides the view HG_STATS_MISSING (the actual view name may vary by version) to check tables in the current database that are missing statistics, making it easy to identify objects that need supplementary ANALYZE.
For specific fields and usage methods, refer to the corresponding instance documentation or console help.
Common Issues and Troubleshooting
Possible reasons for 0 records returned from the hologres_statistic.hg_table_statistic table:
ANALYZE has never been executed (neither manually nor by AUTO ANALYZE);
AUTO ANALYZE is not working, or the table has not met the trigger conditions;
Recommended actions:
First, manually execute:
ANALYZE schema_name.table_name;If the table still cannot be covered by AUTO ANALYZE for an extended period, it is recommended to check the AUTO ANALYZE switch and GUC configuration, or contact technical support.
Possible reasons for an obviously outdated analyze_timestamp:
AUTO ANALYZE has been disabled or restricted (such as
autovacuum_enabled = false);AUTO ANALYZE has insufficient resources or has failed (can be investigated in conjunction with logs/monitoring);
Recommended actions:
Manually execute ANALYZE once and observe the results;
Check:
Whether
hg_enable_start_auto_analyze_workeris enabled;Whether
autovacuum_enabled = falsehas been mistakenly set or related GUC restrictions are too strict;Check whether there are errors related to AUTO ANALYZE tasks in the Query Log.
Abnormal row count estimation in query plans (significantly too small or too large)
Check whether the corresponding table is missing statistics or has outdated statistics;
Check whether predicate columns have statistics (pg_stats), and perform column-level ANALYZE on critical columns;
Check whether the fast row count estimation feature has been disabled or not enabled.
Inconsistent parent-child statistics for partitioned tables or missing statistics on the parent table
Symptom: The partition parent table has no records in
hologres_statistic.hg_table_statistic, ortotal_rowsis significantly less than the sum of all child table row counts, or the partition column statistics MCV distribution of the parent table is outdated (for example, the ds column in pg_stats is missing yesterday's distribution);Troubleshooting points:
Check whether incremental partition ANALYZE / AUTO ANALYZE has been disabled;
SHOW hg_experimental_enable_incremental_analyze;
SHOW hg_experimental_enable_incremental_auto_analyze;Use the
HG_STATS_MISSINGview to check whether there are child tables that cannot be merged;Check whether there are errors related to AUTO ANALYZE tasks in the Query Log;
Manually execute
ANALYZE child_table;on child tables with missing statistics, or executeANALYZE partition_parent_table;on the parent table during off-peak hours to let the system fill in and merge statistics.
Certain columns have no statistics in pg_stats
Symptom: The table has the column, but the corresponding column statistics cannot be found in the
pg_statsview.Troubleshooting points:
Check whether statistics have been disabled through column attributes (
enable_analyze = falseorenable_auto_analyze = false);
SELECT attoptions
FROM pg_attribute
WHERE attrelid = 'tablename'::regclass::oid
AND attname = 'columnname';
-[ RECORD 1 ]----------------------
attoptions | {enable_analyze=false}Confirm whether the column has participated in ANALYZE (manual or AUTO ANALYZE). If necessary, execute column-level
ANALYZE table_name(col_name);;
SELECT query_id,
application_name,
status,
query_start,
query
FROM hologres.hg_query_log
WHERE query_start >= now() - interval '1 hour'
AND application_name IN ('Hologres SQL Generated BY ANALYZE', 'Hologres SQL Generated BY AUTO ANALYZE')
AND query like '%tablename%'
ORDER BY query_start DESC
LIMIT 2;Whether the column is a special type (
bytea,jsonb).If the table has executed AUTO ANALYZE but no columns have statistics, consider whether the AUTO ANALYZE task executed abnormally. For example, check whether AUTO ANALYZE timed out. Since auto_analyze_work_statement_timeout defaults to 1 hour, if the AUTO ANALYZE task in
hologres.hg_query_loghas a duration >= 3600000, in this case, only row count information is collected without column statistics.
AUTO ANALYZE has executed but the query plan has not significantly improved
Symptom: AUTO ANALYZE records for a specific table can be seen in
hologres.hg_query_log, but the actual query plan still uses obviously unreasonable row count estimates or Join order;Troubleshooting points:
Check whether AUTO ANALYZE timed out. Since
auto_analyze_work_statement_timeoutdefaults to 1 hour, if the AUTO ANALYZE task inhologres.hg_query_loghas a duration >= 3600s, in this case, only row count information is collected without column statistics.Check whether there are Hints, fixed Join orders, or legacy session-level GUCs (such as
enable_nestloop, etc.) interfering with the optimizer's choices;Perform a manual
ANALYZEon critical tables/columns and then re-run the query. If there is still no improvement, it is recommended to investigate further in conjunction with Query Log, FE logs, or by reporting to technical support.
OOM / memory limit exceeded issues caused by statistics
Symptom: A complex multi-table Join reports error
Query executor exceeded total memory limitation ..., or the plan selects an extremely unreasonable Join strategy (such as a large table driving a large table Nested Loop, etc.);Troubleshooting points:
Confirm whether related tables are missing statistics or have severely outdated statistics (by combining records in
hg_table_statisticand AUTO ANALYZE execution records);Perform column-level
ANALYZEon Join columns and filter columns. If necessary, increase sampling precision or enable the fast row count estimation feature;If the issue only occurs in certain extreme data distribution or peak scenarios, consult with technical support to evaluate whether adjustments to the statistics model or related GUCs (such as AUTO ANALYZE thresholds, memory limits, etc.) are needed.
Running ANALYZE and AUTO ANALYZE with Serverless
Starting from Hologres V3.1, in Serverless instances or instances with Serverless computing resources enabled, ANALYZE and AUTO ANALYZE can run on Serverless resources to reduce CPU/memory pressure on the instance itself.
This section explains the typical behavior and recommended configuration for manual ANALYZE and AUTO ANALYZE in Serverless Computing scenarios, in conjunction with related GUCs.
Manual ANALYZE and Serverless Computing
hg_serverless_computing_enable_analyze_statementDescription: Controls whether manual
ANALYZEstatements can be executed through Serverless tasks;Behavior:
When set to
on(default), the system allows ANALYZE statements to be offloaded to Serverless computing resources for execution, minimizing the impact of statistics tasks on the instance itself;When set to
off, ANALYZE executes on the local instance resources.
Typical recommendations:
In Serverless instances, the default is
on, allowing only Serverless resources to complete statistics tasks;When troubleshooting or when there are special requirements for resource usage, you can temporarily set it to
offto explicitly run ANALYZE on local resources.
AUTO ANALYZE and Serverless Computing
When generating AUTO ANALYZE tasks, the system determines whether to use Serverless resources and how to control Serverless task behavior based on the following GUCs:
hg_auto_analyze_run_with_serverless_computingDescription: Whether to execute AUTO ANALYZE tasks through Serverless computing resources. The default is
offfor regular instances andonfor Serverless instances.Behavior:
When set to
on, AUTO ANALYZE tasks will send sampling computations to the Serverless resource pool;When set to
off, AUTO ANALYZE continues to complete statistics through the local instance resources.
-- DB级别设置
ALTER DATABASE datname SET hg_auto_analyze_run_with_serverless_computing = on;hg_auto_analyze_serverless_computing_query_priorityDescription: The priority of Serverless AUTO ANALYZE tasks, ranging from
1~5. Higher values indicate higher priority. The default is2;Behavior:
When Serverless AUTO ANALYZE is enabled, the system sets the corresponding priority for each statistics task through
SET hg_experimental_serverless_tasks_query_priority = <value>;;This is applicable when sharing a resource pool with other Serverless tasks (such as ETL and offline queries) to control the preemption capability of statistics tasks in the overall queue;
Recommendations:
In production environments, it is typically kept at a medium priority (such as the default
2or moderately increased to3) to avoid competing for too many resources with core business jobs;For scenarios with extremely high requirements for statistics timeliness and sufficient Serverless resources, the priority can be appropriately increased.
hg_auto_analyze_serverless_computing_enable_persisted_snapshotDescription: Whether to use a persisted snapshot (
persisted snapshot) in Serverless AUTO ANALYZE tasks, i.e., controlling whether to skip additional flush throughSET hg_experimental_enable_persisted_snapshot = on/off;;The default is
offfor non-Serverless instances andonfor Serverless instances.Behavior:
When set to
on, statistics tasks will try to reuse the persisted snapshot, reducing the need to trigger a flush. This is more suitable for high-frequency statistics or scenarios with a large number of partitioned tables;When set to
off, statistics tasks will try to use a more up-to-date snapshot, which is suitable for scenarios with higher requirements for consistency or specific version isolation.
Recommendations:
Do not adjust unless necessary.
Usage Recommendations for Serverless Computing Scenarios
Prioritize using Serverless Computing to handle heavy ANALYZE tasks
When performing ANALYZE on tables or foreign tables with very large data volumes, it is recommended to enable:
Manual ANALYZE:
set hg_computing_resource = 'serverless'; ANALYZE my_table;;AUTO ANALYZE:
ALTER DATABASE mydb SET hg_auto_analyze_run_with_serverless_computing = on;;
This migrates the statistics computation pressure to the Serverless resource pool, reducing interference with online queries on the instance.
Set Serverless AUTO ANALYZE task priority based on business importance
In Serverless instances, for critical business databases that are sensitive to statistics timeliness and depend on the latest statistics, you can appropriately increase
hg_auto_analyze_serverless_computing_query_priority;In regular instances, using local resources for AUTO ANALYZE is typically sufficient.
Follow the "small step trial" principle before adjusting parameters
Serverless Computing related GUCs are also advanced parameters. Before making adjustments, it is recommended to first verify in a test or staging environment:
Observe the queuing and execution duration of statistics tasks in the Serverless resource pool;
Monitor the impact on latency and resource usage of core business queries;
After confirming there are no significant side effects, roll out to the production environment by database or in phases.