How to use ANALYZE and AUTO ANALYZE to collect statistics in Hologres-Hologres(Hologres)-阿里云帮助中心

Overview of statistics and ANALYZE

Why are statistics needed?

The optimizer relies on table and column statistics to generate reasonable execution plans, including:

Row count and column count
Column width (Width)
Number of Distinct Values (NDV)
Most Common Values (MCV) and their frequencies
Histogram and other distribution characteristics

This information guides the optimizer to:

Estimate operator execution cost
Prune the execution plan search space
Select appropriate Join order and Join algorithms
Estimate memory and parallelism

This results in a more optimal execution plan.

The ANALYZE command is the standard method for users to actively collect table/column statistics. When statistics are missing or inaccurate, the query plan may degrade significantly, such as abnormal Join ordering, which manifests as: query OOM, long execution time, or high instance CPU consumption.

AUTO ANALYZE is the standard method for the Hologres system to automatically collect table/column statistics. Since AUTO ANALYZE is an asynchronous background system behavior, there is a delay of tens of seconds to minutes from when table data changes to when AUTO ANALYZE discovers, schedules, and completes the automatic statistics collection. Therefore, in some specific scenarios, it is recommended that users manually run ANALYZE to ensure statistics are collected in a timely manner.

When to manually execute ANALYZE

Manual execution is recommended in the following scenarios:

After INSERT / UPDATE / DELETE operations that import/update/delete a large amount of data on a table, and the table will be queried immediately, it is recommended to run ANALYZE on the table after the INSERT / UPDATE / DELETE;
When multi-table Join performance degrades significantly, execute column-level ANALYZE on key Join columns and Group By columns;
After executing CREATE FOREIGN TABLE or IMPORT FOREIGN SCHEMA for a foreign table, and it will be queried immediately, it is recommended to run ANALYZE on the newly created foreign table before querying to collect initial statistics;
After executing CREATE EXTERNAL DATABASE, and it will be queried immediately, it is recommended to run ANALYZE on the tables that need to be queried under the external database before querying;
When one of the following errors/symptoms occurs:
- Multi-table Join OOM error: Query executor exceeded total memory limitation ..., or Query exceed per query memory limitation ...;
- Multi-table Join error: Capacity error: BinaryArray cannot contain more than 2147483646 bytes ...;
- Import or query tasks run abnormally long with uneven CPU utilization.
- EXPLAIN <SQL> shows a Scan operator with an estimated row count of 1000, -> Seq Scan on tbl (cost=0.00..5.00 rows=1000 width=1); this indicates the table lacks statistics.
- EXPLAIN <SQL> shows a Scan operator with an estimated row count of 1, -> Seq Scan on tbl (cost=0.00..1.00 rows=1 width=1); this indicates the table is estimated to have 0 rows. If the actual Scan result is not 0 rows, the statistics may be outdated, and a manual ANALYZE is recommended.

In the above scenarios, it is generally recommended to first manually run ANALYZE and observe whether performance recovers, then further tune the AUTO ANALYZE configuration.

ANALYZE command

Basic syntax and behavior

Collect statistics for all columns of the entire table

ANALYZE table_name;

Collects the table row count and uniformly collects Width, MCV, Histogram, NDV, and other statistics for all regular columns in the table;
Uses a sampling-based approach to estimate various statistical values. The sampling operation initiates a sampling sub-SQL within the ANALYZE process.
By default, ANALYZE randomly samples 30,000 rows from the table for statistics collection and computation. If the table has fewer than 30,000 rows, all rows are sampled.
Collect statistics for specific columns only (recommended for key columns)

ANALYZE table_name(col1, col2, ...);

Computes a more accurate NDV for the specified columns (typically using APPROX_COUNT_DISTINCT logic), which is more accurate than table-level sampling but more expensive;
MCV, Histogram, Width, and other statistics are still obtained through sampling;
When ANALYZE is executed multiple times on the same column, the later execution overwrites the old statistics for that column, but does not affect other unspecified columns.

For tables with many columns, ANALYZE table_name; is not fully equivalent to ANALYZE table_name(col1, col2, ...): the latter is typically more accurate for NDV but more costly. It is recommended to execute column-level ANALYZE on frequently used Join columns, Group By columns, and other key columns as a supplement.

Limitations and considerations

Which columns will not be analyzed
- Unsupported types: When a column type is a User Defined Type or is not in the set of types that Hologres supports for statistics, statistics collection will be skipped:
  - Column types that do not support ANALYZE include: "char" (single character type), BIT, VARBIT, BYTEA, NAME, JSON, TSVECTOR, TSQUERY, OID, XID, CID, INET, POINT, LINE, LSEG, BOX, CIRCLE, PATH, POLYGON, BITARRAY, VARBITARRAY, BYTEAARRAY, INT2ARRAY, MONEYARRAY, NUMERICARRAY, TIMEARRAY, TIMETZARRAY, TIMESTAMPTZARRAY, TIMESTAMPARRAY, ANYARRAY, REGCLASS, DATEARRAY, or other INTERNAL types.
  - When the column type is unsupported, both manual ANALYZE and AUTO ANALYZE will be ignored;
  - For partitioned tables, column types that do not support incremental partition statistics merging will also not be analyzed. In addition to the types listed above, unsupported types also include: BOOLARRAY, INT4ARRAY, TEXTARRAY, BPCHARARRAY, VARCHARARRAY, INT8ARRAY, FLOAT4ARRAY, FLOAT8ARRAY.
- Columns marked as dropped: Logical columns retained after ALTER TABLE ... DROP COLUMN (attisdropped = true) will not be analyzed;
- Explicitly disabled by column attribute:
  - Columns with the column attribute enable_analyze = false will not be processed by ANALYZE or AUTO ANALYZE;
  - Columns with the column attribute enable_auto_analyze = false will not be processed by AUTO ANALYZE.
  - For details about these column attributes, see the subsequent sections.
- JSONB column statistics not enabled:
  - If the column type is JSONB and enable_jsonb_stats is not enabled, JSONB statistics will not be collected for that column;
  - If you need to rely on JSONB column statistics (for example, complex JSON filter conditions), you need to first enable enable_jsonb_stats in the column attributes.

Summary: Even after executing ANALYZE table_name;, the above columns may be skipped due to type or column attribute restrictions. When you cannot find the corresponding statistics records in pg_stats, you should first check these restrictions.

ANALYZE limitations for data lake tables
- ANALYZE is not supported for specific Timestamp, Version, Branch, Snapshot, or Tag individually

Typical usage

Recommended typical usage
- Execute column-level ANALYZE on Join columns, Group By columns, and filter condition columns:

ANALYZE tablename (order_id, user_id, dt);

- After import or batch update, execute on tables that are significantly affected:

ANALYZE tablename;

- Execute on foreign tables (FOREIGN TABLE / IMPORT FOREIGN SCHEMA / tables under EXTERNAL DATABASE) before the first query:

ANALYZE foreign_table;

Skip statistics collection for specific columns by setting column attributes
- Hologres supports controlling whether ANALYZE or AUTO ANALYZE is executed on a column through column-level attributes, which is applicable to the following scenarios:
  - Ultra-wide columns (such as very long Text) have high statistics collection overhead, and collecting statistics for them provides limited or no benefit to the query plan;
  - Certain columns never participate in Joins, filtering, or aggregation, and do not need statistics collection;
  - Need to reduce resource consumption of ANALYZE or AUTO ANALYZE.
- The control parameters are the following column options:
  - enable_analyze: Controls whether the column participates in manual ANALYZE and automatic AUTO ANALYZE. Default is true;
  - enable_auto_analyze: Controls only whether the column participates in automatic AUTO ANALYZE. Default is true.
- Setting method (via ALTER TABLE command):

-- Disable ANALYZE and AUTO ANALYZE for a specific column
ALTER TABLE t ALTER COLUMN bitmap_col SET (enable_analyze = false);

-- Disable AUTO ANALYZE for a specific column while still allowing manual ANALYZE
ALTER TABLE t ALTER COLUMN large_text_col SET (enable_auto_analyze = false);

-- Restore default behavior (re-enable)
ALTER TABLE t ALTER COLUMN bitmap_col RESET (enable_analyze);
ALTER TABLE t ALTER COLUMN large_text_col RESET (enable_auto_analyze);

- Considerations:
  - After setting enable_analyze = false, the column will be automatically skipped when executing ANALYZE table_name;, but can still be explicitly collected via manual ANALYZE table_name(col);;
  - After setting enable_auto_analyze = false, the column will not be collected by AUTO ANALYZE, but can still be explicitly collected via manual ANALYZE table_name(col);;
  - If a column has significant impact on the query plan (such as Join columns or filter condition columns), it is not recommended to disable its statistics collection.

Partitioned tables and incremental partition ANALYZE

To reduce the overhead of executing ANALYZE on large partitioned tables, Hologres supports incremental partition ANALYZE:

Goals:
- Avoid full sampling of the parent table each time;
- Update parent table statistics by performing ANALYZE on child partitions and merging the statistics;
- Skip re-collecting statistics for child partitions that have not changed.

Scenarios and limitations

In Hologres V2.0 and later, incremental partition ANALYZE is enabled by default without manual configuration. For details, refer to the product version notes;
When enabled, executing ANALYZE on a child partition will attempt to combine its statistics with those of other child partitions to produce the parent table statistics;
The prerequisite for merging parent table statistics is that, apart from the child partition currently being analyzed, all other child partitions under the parent table must already have statistics (including row count statistics and column statistics);
Therefore, you can:
- During initial setup, execute ANALYZE on the parent table first. The system will automatically detect child partitions that need ANALYZE and analyze them one by one, ultimately merging the results into the parent table statistics;
- When new partitions are added later, you only need to execute ANALYZE on the newly added child partitions.

Brief example

BEGIN;
DROP TABLE IF EXISTS t_parent;
CREATE TABLE t_parent(a int, b int) PARTITION BY LIST (a);
CREATE TABLE child1 PARTITION OF t_parent FOR VALUES IN (1);
CREATE TABLE child2 PARTITION OF t_parent FOR VALUES IN (2);
COMMIT;
insert into child1 values (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 1), (1, 2);
insert into child2 values (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 5), (2, 5), (2, 5), (2, 5), (2, 5);

-- child2 lacks statistics, so parent table statistics cannot be merged
dbname=# ANALYZE child1;
INFO:  auto merging of leaf partition stats to calculate root partition stats is not possible because partition child2 is not analyzed
ANALYZE

-- child1 already has statistics. After analyzing child2, parent table statistics can be automatically merged
dbname=# ANALYZE child2;
ANALYZE

dbname=# SELECT tablename,
       attname,
       null_frac         AS "NF",
       avg_width,
       n_distinct,
       most_common_vals  AS "MCV",
       most_common_freqs AS "MCV_FRAQ"
FROM   pg_stats
WHERE  tablename = 't_parent';

 tablename | attname | NF | avg_width | n_distinct |  MCV  |  MCV_FRAQ
-----------+---------+----+-----------+------------+-------+------------
 t_parent  | a       |  0 |         4 |          2 | {1,2} | {0.5,0.5}
 t_parent  | b       |  0 |         4 |       -0.2 | {1,5} | {0.45,0.3}
(2 rows)

-- After this, when new partitions are added, you only need to ANALYZE the child partition. Parent table statistics will be automatically merged through sampling the child partition only.

ANALYZE on the parent table

Sometimes multiple child partitions of a partitioned table have undergone data changes, and you do not want to ANALYZE each child partition individually. In this case, you can perform a single ANALYZE on the parent table.

When incremental partition ANALYZE is enabled, manually executing ANALYZE on the parent table causes the system to adaptively select child partitions that need statistics updates, rather than blindly analyzing all child partitions. This mechanism significantly reduces the ANALYZE overhead for large partitioned tables while ensuring the accuracy of statistics.

Behavior description

When executing ANALYZE partition_parent_table;, the system automatically identifies the following two types of child partitions, prioritizes collecting their ANALYZE statistics, and then merges them into the parent table statistics:

Stats-changed partitions
- The child partition data has changed significantly since the last ANALYZE (inserts, updates, or deletes);
- Determination criteria:
  - The number of changed rows reaches the change threshold (1% * row count at last ANALYZE).
Unable-to-merge partitions
- The child partition lacks complete statistics, making it unable to participate in the parent table statistics merge;
- Common causes include:
  - ANALYZE has never been executed: The child partition statistics are completely missing;
  - Missing column statistics: Specified columns lack necessary statistics (such as MCV, Histogram, or NDV HLL Counter);
  - Statistics type mismatch with column type: The child partition column type has changed, causing existing statistics to be inconsistent with the current column type;
  - Outdated statistics version: The parent table's statistics version (statistic_version) is lower than the latest child partition's statistics version, indicating that the parent table statistics are not up-to-date and need to be re-merged.

Execution example and output

-- Execute ANALYZE on the parent table
ANALYZE partition_parent_table;

Typical output:

INFO:  will analyze 5 part tables first (stats changed 3, unable-to-merge 2)

This output indicates:

The system identified 5 child partitions that need priority statistics collection;
Among them, 3 child partitions require statistics updates due to significant data changes;
Among them, 2 child partitions were marked as unable-to-merge due to missing or unusable statistics, requiring an internal ANALYZE to be triggered first.

Notes

It is not recommended to set autovacuum_enabled = false (disabling AUTO ANALYZE at the table level) for child partitions, as this may cause the child partition to remain in an "unable-to-merge" state for an extended period unless you manually ANALYZE it again.

Through this mechanism, Hologres can significantly improve the execution efficiency of ANALYZE for large partitioned tables while ensuring the quality of statistics, providing the query optimizer with more timely and accurate decision-making data.

ANALYZE for logical partitioned tables

A logical partitioned table is a partition table type unique to Hologres. Unlike standard PostgreSQL partitioned tables (which create physical child tables via PARTITION BY), a logical partitioned table remains physically as a single table and only partitions data logically based on partition column values, providing more flexible partition management capabilities.

The ANALYZE logic for logical partitioned tables is essentially the same as for physical partitioned tables. The only difference is that child partitions of a logical partitioned table are not separate tables, but like physical child partitions, they each still have their own independent row count statistics and column statistics.

Behavior description

When executing ANALYZE on a logical partitioned table, the system uses processing logic similar to but independent from physical partitioned tables:

Automatic discovery of logical partitions
- When executing ANALYZE logical_partitioned_table; on a logical partitioned (parent) table, the system automatically queries the storage engine for all logical partitions of the table;
Adaptive selection of logical partitions requiring statistics
- Similar to physical partitioned tables, the system identifies the following logical partitions and prioritizes their statistics collection:
- Stats-changed partitions
  - The child partition data has changed significantly since the last ANALYZE (inserts, updates, or deletes), with the same determination criteria as physical partitioned tables described above;
  - Logical partitions with missing statistics: ANALYZE has never been executed, or specified columns lack necessary statistics (such as HLL counters);
  - Logical partitions with unusable statistics: Column statistics do not match the current column type, or the statistics are outdated.
- Empty partitions are skipped: Logical partitions with 0 rows are automatically ignored.
- The system outputs informational messages, for example:
```
INFO:  will analyze 10 logical partitions first
```
Per-partition sampling and statistics collection
- For each logical partition that needs statistics collection, the system constructs a sampling query with partition column filter conditions (such as WHERE user_id = 1 AND event_date = '2024-11-04');
- Data sampling is performed within the scope of that logical partition to collect column-level statistics (NDV, MCV, Histogram, HLL counters, etc.);
- Statistics are stored at the logical partition granularity in hologres_statistic.hg_table_statistic, distinguished by the unique_name field for different logical partitions.
Merging parent table statistics
- After statistics collection for all logical partitions is complete, the system automatically triggers statistics merging to generate the parent table statistics.

Execution example and output

Example 1: Execute ANALYZE on a logical partitioned parent table

-- Create a logical partitioned table
CREATE TABLE user_events (
    user_id INT NOT NULL,
    event_type TEXT NOT NULL,
    event_date DATE NOT NULL,
    event_data INT
)
LOGICAL PARTITION BY LIST(user_id, event_date);

-- Insert data into different logical partitions
INSERT INTO user_events SELECT 1, 'login', '2024-11-01', i FROM generate_series(1, 1000) i;
INSERT INTO user_events SELECT 1, 'logout', '2024-11-02', i FROM generate_series(1, 1000) i;
INSERT INTO user_events SELECT 2, 'purchase', '2024-11-03', i FROM generate_series(1, 1000) i;
INSERT INTO user_events SELECT 3, 'view', '2024-11-04', i FROM generate_series(1, 1000) i;

-- Execute ANALYZE on the parent table
ANALYZE VERBOSE user_events;

Typical output:

INFO:  will analyze 4 logical partitions first
INFO:  analyzing hologres table "public.user_events" PARTITION (user_id=1,event_date='2024-11-01')
INFO:  analyzing hologres table "public.user_events" PARTITION (user_id=1,event_date='2024-11-02')
INFO:  analyzing hologres table "public.user_events" PARTITION (user_id=2,event_date='2024-11-03')
INFO:  analyzing hologres table "public.user_events" PARTITION (user_id=3,event_date='2024-11-04')
INFO:  try to merge root partition
INFO:  automatically merging leaf partition stats to calculate root partition stats

Description:

The system automatically discovered 4 logical partitions that need statistics updates;
Statistics were collected for each logical partition one by one;
Finally, the parent table's global statistics were automatically merged and generated.

Example 2: Execute ANALYZE on specified logical partitions

-- Execute ANALYZE on a single logical partition
ANALYZE user_events PARTITION (user_id=1, event_date='2024-11-01');

-- Execute ANALYZE on multiple logical partitions
ANALYZE user_events
    PARTITION (user_id=1, event_date='2024-11-01')
    PARTITION (user_id=2, event_date='2024-11-03');

-- Execute ANALYZE on specific columns of a specified logical partition
ANALYZE user_events
    PARTITION (user_id=1, event_date='2024-11-01') (user_id, event_type);

Description:

You can use the PARTITION clause to specify which logical partitions to analyze;
Multiple logical partitions can be specified at the same time;
As with standard ANALYZE, you can further specify the columns to collect statistics for.

Statistics storage

Statistics for logical partitioned tables are stored in the hologres_statistic.hg_table_statistic table:

SELECT
    unique_name,
    schema_name,
    table_name,
    total_rows,
    sample_rows,
    nattr
FROM hologres_statistic.hg_table_statistic
WHERE table_name = 'user_events'
ORDER BY unique_name;

Example result:

| unique_name | schema_name | table_name | total_rows | sample_rows | nattr |
|-------------|-------------|------------|------------|-------------| ----- |
| user_events | public | user_events | 4000 | 0 | 4 |
| user_events.a1b2c3d4e5f6... | public | user_events | 1000 | 1000 | 4 |
| user_events.f6e5d4c3b2a1... | public | user_events | 1000 | 1000 | 4 |
| user_events.1234567890ab... | public | user_events | 1000 | 1000 | 4 |
| user_events.abcdef123456... | public | user_events | 1000 | 1000 | 4 |

Field descriptions:

Parent table record:
- total_rows = 4000: Total row count across all logical partitions;
- sample_rows = 0: Parent table statistics are derived through merging, without direct sampling.
Logical partition records (unique_name contains the MD5 hash identifying the statistics object (table/partition)):
- Each logical partition has an independent statistics record;
- sample_rows = 1000: The actual number of rows sampled for that logical partition.

Best practices for logical partitioned table ANALYZE

Prefer executing ANALYZE on the logical partitioned parent table
- When executing ANALYZE on the logical partitioned parent table, the system automatically discovers and collects statistics for all logical partitions without the need to manually specify each partition;
- Only use the PARTITION clause to specify partitions when you need to quickly update statistics for specific logical partitions.
Leverage incremental statistics updates
- When data is only written to a few logical partitions, you can execute ANALYZE on just those partitions;
- The system will automatically trigger parent table statistics merging without the need to re-collect statistics for all partitions.
Be aware of the number of logical partitions
- When there are too many logical partitions (e.g., thousands), the initial ANALYZE may take a long time. It is recommended to execute it during off-peak hours;
Version-related performance considerations for logical partitioned tables
- In versions earlier than V4.1.20, executing ANALYZE on a logical partitioned table may experience significant performance degradation (extended execution time with intermittent pauses of approximately 10 seconds). This is caused by lock contention between the foreground ANALYZE process and the background statistics loading task (trigger load stats): as ANALYZE iterates through sub-partitions of the same logical partitioned table, it continuously generates statistics binlog entries, which triggers repeated background statistics loading for the same table and version, resulting in SUE (Statistics Update Exclusive) lock contention.
- In V4.2.5 and later versions, this issue has been optimized through deduplication of the background statistics loading process, eliminating redundant load operations for the same table and version. If you encounter slow ANALYZE performance on logical partitioned tables, upgrading to V4.2.5 or later is recommended.
Use in conjunction with AUTO ANALYZE
- Logical partitioned tables are also managed by AUTO ANALYZE, which automatically identifies logical partitions that need statistics updates.

Configurable parameters for the ANALYZE command

Parameter

Description

Supported Version

Default Value

Notes / Usage Example

hg_experimental_analyze_foreign_partitions_access_limit

Maximum number of partitions allowed to be accessed during random sampling when executing ANALYZE on foreign tables

v0.10 and later

0 (unlimited)

-- Only sample 100 partitions to prevent random sampling from scanning too much foreign table data

ALTER DATABASE dbname SET hg_experimental_analyze_foreign_partitions_access_limit = 100;

hg_analyze_foreign_table_max_sample_row_count

Maximum number of rows to sample during random sampling when executing ANALYZE on foreign tables

v4.1 and later

0 (unlimited). For details, see version updates.

AUTO ANALYZE automatic statistics collection

Starting from Hologres V0.10, Hologres supports the automatic statistics collection mechanism AUTO ANALYZE:

Automatically determines which tables need statistics updates based on table creation, data writes, and data changes;
Asynchronously schedules statistics collection tasks in the background, eliminating the need for users to manually ANALYZE each table;
Reduces the risk of missing statistics due to overlooked ANALYZE operations.

Switch and scope

Database-level switch

The configurable parameter is hg_enable_start_auto_analyze_worker. Starting from Hologres V0.10, it is enabled by default.

-- Check whether AUTO ANALYZE is enabled for the current database
SHOW hg_enable_start_auto_analyze_worker;

-- Disable (use only for temporary troubleshooting)
ALTER DATABASE dbname SET hg_enable_start_auto_analyze_worker = OFF;

-- Reset to the default value ON (recommended)
ALTER DATABASE dbname RESET hg_enable_start_auto_analyze_worker;

Note: The above GUC is a database-level configuration. SESSION-level or ROLE-level configurations are ineffective. It must be set through ALTER DATABASE to take effect on that DB, and only superusers can modify it.

Table-level switch

Hologres supports controlling AUTO ANALYZE behavior for individual tables.

Behavior description

Disable AUTO ANALYZE for a single table

ALTER TABLE my_table SET (autovacuum_enabled = false);

Enable AUTO ANALYZE for a single table (restore default behavior)

-- Recommended: reset to restore the default
ALTER TABLE my_table RESET (autovacuum_enabled);

-- Or explicitly set to true (not recommended)
ALTER TABLE my_table SET (autovacuum_enabled = true);

Check the autovacuum_enabled status of a table

SELECT relname, reloptions FROM pg_class WHERE relname = 'my_table';

 relname  |         reloptions
----------+----------------------------
 my_table | {autovacuum_enabled=false}
(1 row)

Typical use cases

Special business tables that do not need statistics
- Some tables are only used for temporary storage or logging and do not participate in complex queries, so statistics collection is unnecessary;
- Tables with large data volumes that change frequently but do not require statistics can avoid unnecessary resource consumption by setting autovacuum_enabled = false.
Statistics are already maintained manually
- For certain critical tables, you may have already established a regular manual ANALYZE workflow that meets your business requirements.
Temporarily disable to reduce resource usage
- During peak business hours or emergency troubleshooting, you can temporarily disable AUTO ANALYZE for specific tables;
- After the situation is resolved, re-enable it or manually execute ANALYZE.

Considerations

For partitioned tables:
- It is not recommended to set autovacuum_enabled = false on child partitions, as this may cause the child partition to remain in a "cannot merge" state for an extended period, affecting the accuracy of parent table statistics;
- If you need to disable it, it is recommended to only set it on the parent partition table.
The table-level switch has higher priority than the database-level switch:
- Even if the database-level AUTO ANALYZE is enabled, tables with autovacuum_enabled = false will not be processed by AUTO ANALYZE.

Column-level switch

Hologres supports controlling whether ANALYZE or AUTO ANALYZE is performed on specific columns through column-level attributes, which is applicable to the following scenarios:

Wide columns (such as long Text fields) have high statistics collection overhead but provide limited benefit to query plans;
Certain columns never participate in Joins, filters, or aggregations and do not need statistics collection;
Need to reduce resource consumption of AUTO ANALYZE.

The control parameter is the following column option:

enable_auto_analyze: Controls only whether the column participates in automatic AUTO ANALYZE, defaults to true.
Configuration method (through the ALTER TABLE command):

-- Disable AUTO ANALYZE for a specific column while still allowing manual ANALYZE
ALTER TABLE t ALTER COLUMN large_text_col SET (enable_auto_analyze = false);

-- Restore default behavior (re-enable)
ALTER TABLE t ALTER COLUMN large_text_col RESET (enable_auto_analyze);

Considerations:
- After setting enable_auto_analyze = false at the column level, AUTO ANALYZE tasks will no longer collect statistics for that column. However, you can still explicitly collect statistics through manual ANALYZE table_name; or ANALYZE table_name(col);;
- If a column has significant impact on query plans (such as Join columns or filter condition columns), it is not recommended to disable its statistics collection.

Trigger logic of AUTO ANALYZE

AUTO ANALYZE combines multiple types of signals to determine whether statistics need to be updated for a particular table:

Data Change Volume
- Applies only to regular Hologres tables (excludes foreign tables);
- At 1-minute intervals, collects the row count of INSERT/UPDATE/DELETE operations observed by the FE on each table. If the change exceeds the threshold, an AUTO ANALYZE is triggered (fast response);
- At 10-minute intervals, retrieves the insert/update/delete change row counts from the storage engine. If the change exceeds the threshold, an AUTO ANALYZE is triggered (precise calibration);
Schema Changes
- Applies only to regular Hologres tables (excludes foreign tables);
- At 1-minute intervals, collects tables with the following changes and performs AUTO ANALYZE:
  - ADD / DROP COLUMN;
  - Partitioned tables (including logical partition tables) ATTACH / DETACH child partitions.
Missing Statistics / Unable to Merge Parent Partition Table Statistics
- Applies to both regular Hologres tables and foreign tables;
- At 1-minute intervals, collects tables/columns with missing statistics and child partitions that cannot meet the conditions for incremental statistics merging.
Foreign Tables
- Applies only to foreign tables introduced through CREATE FOREIGN TABLE, IMPORT FOREIGN SCHEMA, or the Auto Load mechanism under non-External Database contexts;
- Currently only supports AUTO ANALYZE for MaxCompute foreign tables;
- At 4-hour intervals, periodically checks all foreign tables in the DB. Between two checks, if there are external data changes (the criterion is that the corresponding foreign table's last_modify_timestamp falls between the two check intervals), an AUTO ANALYZE is triggered.
Fallback Statistics
- During the 1:00-5:00 AM window, a "fallback" AUTO ANALYZE is performed on tables that have continuous changes but have not triggered the threshold (>5000 row changes, <10% change volume), to capture data distribution drift in columns (for example, date-type fields written after midnight the next day are completely different from the previous day, resulting in distribution changes).
Access Hotspot Tables (External Database only)
- After enabling the enable_auto_analyze parameter on an External Database, the Hologres system begins tracking recently accessed tables since the last system startup and adds them to the observation list;
- At 1-hour intervals, tables in the observation list that are "frequently accessed recently" trigger an AUTO ANALYZE;
- Note that after a system restart, the accessed table list is cleared and recording starts over.

AUTO ANALYZE for External Database

Starting from Hologres V3.0, Hologres supports the External Database feature. When creating an External Database, you can enable AUTO ANALYZE (disabled by default, see release notes for details):

-- Enable AUTO ANALYZE when creating an External Database
-- Reference: https://help.aliyun.com/zh/hologres/developer-reference/create-external-database
CREATE EXTERNAL DATABASE <ext_database_name> WITH
  metastore_type 'maxcompute'
  mc_project 'project_name'
  enable_auto_analyze 'true';

-- Enable AUTO ANALYZE for an existing External Database
ALTER EXTERNAL DATABASE dbname WITH enable_auto_analyze 'true';

Feature limitation: For V3.2 and earlier, AUTO ANALYZE for External Database requires that the CREATE EXTERNAL DATABASE statement is configured with Access Key and Access Secret. AUTO ANALYZE is not supported for External Databases configured with SLR or STS. There are no such limitations for V4.0 and later.

The metastore_type scope supported by AUTO ANALYZE on External Database is: dlf, dlf-paimon, dlf-rest, maxcompute.
The table formats supported by AUTO ANALYZE on External Database include: MaxCompute, Paimon, Iceberg.
AUTO ANALYZE on External Database uses the Owner identity of the External Database by default, and employs Time-based One-Time Password (TOTP) authentication to execute ANALYZE tasks. If the Owner does not have sufficient table permissions, the system cannot successfully execute AUTO ANALYZE tasks.
For a foreign table in an External Database to be eligible for AUTO ANALYZE, the following conditions must be met:
- The External Database configuration hg_enable_start_auto_analyze_worker = on; (default is on)
- The External Database attribute enable_auto_analyze is set to true (not configured by default, which means false, see release notes for details)
- The External Database Owner must have query permissions for the data lake Project (if applicable) and the table
- The table must have been accessed at least once within the last 3 days
  - AUTO ANALYZE under External Database only monitors foreign tables that have been accessed at least once;
  - After a system restart, if a foreign table is accessed at least once, the AUTO ANALYZE system adds it to the observation list and periodically triggers AUTO ANALYZE for these tables;

Note: For foreign tables in an External Database that have never been accessed, the first access uses a fast row count estimation mechanism to obtain the row count as fallback statistics (see the "Fast Row Count Estimation" section below), which still provides a certain level of statistics coverage.

Resource limits of AUTO ANALYZE

To prevent background AUTO ANALYZE tasks from impacting foreground user tasks, Hologres has established the following resource limits for the AUTO ANALYZE feature:

When AUTO ANALYZE tasks execute, the default memory limit per worker is 4 GB. If the table data volume is too large, sampling may exceed the memory limit, causing the AUTO ANALYZE sampling SQL to fail. In such cases, only row count information can be collected, and column distribution information (MCV, Histogram, NDV, etc.) cannot be collected. You can adjust auto_analyze_work_memory_mb to change this behavior. The larger the instance specification, the higher the available memory limit for AUTO ANALYZE.
The instance-level concurrency for simultaneously scheduled AUTO ANALYZE tasks typically does not exceed 4, and in extreme cases does not exceed 6.
For child partition AUTO ANALYZE tasks under the same partitioned table, the maximum simultaneous scheduling concurrency is 3.
By default, AUTO ANALYZE collects statistics for the first 256 columns. If a table has more than 256 columns, only the first 256 are collected (for partitioned tables, partition columns are prioritized and will be collected even if they are beyond position 256). You can adjust hg_experimental_auto_analyze_max_columns_count to change this value.
The sampling sub-SQLs of AUTO ANALYZE tasks run using a low-priority Background Pool with limited query execution-level concurrency. This causes AUTO ANALYZE tasks to take longer than manual ANALYZE, which users do not need to be concerned about.
For foreign tables, AUTO ANALYZE only collects column statistics for partition columns (e.g., MCV). It does not perform sampling to collect column statistics for columns other than partition columns.

Configurable parameters of AUTO ANALYZE

By default, the Hologres AUTO ANALYZE feature does not require any parameter modifications.

In rare business scenarios (such as infrequent data writes/updates, query workloads that do not need statistics, or system load increases caused by AUTO ANALYZE), users can modify certain default parameters to adjust AUTO ANALYZE behavior for intervention or partial performance tuning purposes.

Note: Only superusers can adjust the default behavior of AUTO ANALYZE. All parameters must be set at the DB level and take effect after the next minute.

-- Superuser: modify the default AUTO ANALYZE parameters at the database level
ALTER DATABASE dbname SET <GUC> = <values>;

Parameter	Description	Supported Version	Default Value	Notes / Usage Examples
hg_enable_start_auto_analyze_worker	Enable the AUTO ANALYZE feature	V0.10 and later	on	-- Temporarily disable AUTO ANALYZE for the DB`ALTER DATABASE dbname SET hg_enable_start_auto_analyze_worker = off;`
hg_experimental_auto_analyze_max_columns_count	Number of columns for which AUTO ANALYZE automatically collects statistics	V1.1.0 and later	256	`ALTER DATABASE dbname SET hg_experimental_auto_analyze_max_columns_count =300;`
auto_analyze_work_memory_mb	Memory limit for a single table in AUTO ANALYZE, in MB	V1.1.54 and later	4096	-- Change to 9 GB`ALTER DATABASE dbname SET auto_analyze_work_memory_mb = 9216;`Default is 4 GB per worker. The larger the instance specification, the more workers, and the higher the actual memory limit.)
auto_analyze_work_statement_timeout	Timeout for AUTO ANALYZE task execution, in ms	V2.0 and later	3600000	-- Change to 3h`ALTER DATABASE dbname SET auto_analyze_work_statement_timeout = '3h';`
hg_experimental_auto_analyze_max_foreign_table_partitions	Maximum number of foreign table partitions accessed during AUTO ANALYZE	V1.1.54 and later	100	If the number of foreign table partitions exceeds 100, the row counts of the top 100 partitions by lexicographic order of the first-level partition values are used by default to estimate the overall row count and partition column MCV of the foreign table.
hg_auto_analyze_run_with_serverless_computing	Whether AUTO ANALYZE runs using Serverless resources	V3.1 and later	off (default is on for Serverless instances)	Generally no adjustment is needed. AUTO ANALYZE typically consumes minimal resources and runs in the background using low-priority processes, with minimal impact on instance load.
hg_auto_analyze_serverless_computing_query_priority	Serverless task priority, range 1-5, higher values indicate higher priority. Note: hg_auto_analyze_run_with_serverless_computing must be enabled first	V3.1 and later	2	-- Change to highest priority`ALTER DATABASE dbname SET hg_auto_analyze_serverless_computing_query_priority = 5;`

Fast row count estimation (Fast Num of Rows)

Starting from Hologres V3.1, Hologres introduced the fast row count estimation feature. When it detects that tables in an executing SQL query lack statistics or that statistics may be outdated, Hologres can quickly estimate table row counts through metadata from the underlying storage or external systems, thereby generating more reasonable execution plans.

Starting from Hologres V3.2, the fast row count estimation (Fast Num of Rows) feature is enabled by default.

Switch

How to enable fast row count estimation:

-- Disable (use only for temporary troubleshooting or when statistics are not needed for the entire database)
ALTER DATABASE dbname SET hg_experimental_get_fast_num_of_rows = OFF;

-- V3.2 and later: enable (reset to the default value)
ALTER DATABASE dbname RESET hg_experimental_get_fast_num_of_rows;

-- V3.1 and earlier: enable
ALTER DATABASE dbname SET hg_experimental_get_fast_num_of_rows = ON;

Note: Fast row count estimation cannot replace complete statistics collection. It is still recommended to keep statistics fresh on critical business tables through ANALYZE/AUTO ANALYZE.

Usage limits

Fast row count estimation is applicable to regular tables (including partitioned tables), foreign tables in External Database, and other scenarios.
The row count estimated by the fast row count estimation feature is not guaranteed to be completely accurate. For example, for performance reasons, on Hologres partitioned tables, only the row count statistics from up to 60 partitions (by default) are used to estimate the overall row count of the partitioned table.
In V4.1.15 and earlier, fast row count estimation is not enabled by default for foreign table types (Foreign Table). It can be enabled through the parameter hg_experimental_enable_foreign_table_get_fast_num_of_rows. It is enabled by default in V4.1.16 and later.
For foreign table types, fast row count estimation only supports MaxCompute foreign tables, Paimon foreign tables, and Iceberg foreign tables.

Examples

hg_experimental_get_fast_num_of_rows example

If a table has no statistics, enabling fast row count estimation can provide more accurate row counts. (Enabled by default in Hologres V3.2 and later)

-- V3.2+
create table test_tbl (a int);
insert into test_tbl select * from generate_series (1, 999);

-- Accurate row count estimation (rows=999)
explain select count(1) from test_tbl ;
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Final Aggregate  (cost=0.00..5.00 rows=1 width=8)
   ->  Gather  (cost=0.00..5.00 rows=10 width=8)
         ->  Partial Aggregate  (cost=0.00..5.00 rows=10 width=8)
               ->  Local Gather  (cost=0.00..5.00 rows=20 width=8)
                     ->  Partial Aggregate  (cost=0.00..5.00 rows=20 width=8)
                           ->  Seq Scan on test_tbl  (cost=0.00..5.00 rows=999 width=1)

The following is a real production case:

Without this feature enabled, if a table has no statistics, the Scan node in the query plan will show rows=1000 (indicating no statistics are available, and the plan is generated based on the default estimate of 1000 rows).

After enabling this feature, if a table has no statistics, the Scan node rows in the query plan will not equal 1000 (the system called the metadata from the underlying storage engine to obtain the table's row count).

Configurable parameters for fast row count estimation

Parameter	Description	Supported Version	Default Value	Notes / Usage Example
hg_experimental_get_fast_num_of_rows	Whether to enable fast row count estimation.	v3.1 and later	off (V3.1) on (V3.2+)
hg_experimental_enable_foreign_table_get_fast_num_of_rows	Whether to enable foreign table row count estimation.	v3.1 and later	off (V4.1.15 and earlier) on (V4.1.16+)	-- How to enable: ALTER DATABASE dbname SET hg_experimental_enable_foreign_table_get_fast_num_of_rows = on;
hg_experimental_fast_num_rows_foreign_partitions_access_limit	For foreign partitioned tables, the maximum number of partitions from which to obtain row counts as the basis for overall estimation. Note: This only takes effect when foreign table row count estimation is enabled.	v3.1 and later	60	This is to control the time overhead of fast row count estimation.
hg_get_fast_num_of_rows_holo_partitions_access_limit	For Hologres physical partitioned tables, the maximum number of partitions from which to obtain row counts as the basis for overall estimation.	v3.1 and later	60	This is to control the time overhead of fast row count estimation.

Viewing historical execution records of ANALYZE and AUTO ANALYZE

After ANALYZE and AUTO ANALYZE are executed, their execution records are written to the Query Log. Users can view the historical execution records of ANALYZE and AUTO ANALYZE through the hologres.hg_query_log view, including information such as the executed SQL, duration, and status.

How to identify ANALYZE and AUTO ANALYZE records in query log

ANALYZE and AUTO ANALYZE, along with their sampling SQL statements, are recorded separately in the Query Log.

The execution records have the following characteristics in the Query Log:

Field	ANALYZE	AUTO ANALYZE	Description
`usename`	The username that executed ANALYZE	`system`	AUTO ANALYZE uses an internal system account for execution, which is uniformly displayed as `system` in the Query Log
`application_name`	The application name of the session that executed ANALYZE	`AutoAnalyze`	The AUTO ANALYZE framework connection uses `AutoAnalyze` as the application_name
`application_name` of sampling sub-SQL	`Hologres SQL Generated BY ANALYZE`	`Hologres SQL Generated BY AUTO ANALYZE`
`command_tag`	`ANALYZE`	`ANALYZE`	The type of command executed
`command_tag` of sampling sub-SQL	`SELECT`	`SELECT`	The sampling sub-SQL is a SELECT statement. You can use this statement to check the resource consumption of sampling.

ANALYZE and its sampling SQL are associated through the comments in the query field, or through extended_info->>'source_query_id'.

SELECT query_id,
       extended_info->>'src_query_id' as "source query id",
       application_name,
       status,
       duration
FROM   hologres.hg_query_log
WHERE  query_start >= now() - interval '1 hour'
  AND  application_name IN ('AutoAnalyze', 'Hologres SQL Generated BY AUTO ANALYZE')
ORDER BY query_start DESC
LIMIT  2;
      query_id       |   source query id   |            application_name            | status  | duration
---------------------+---------------------+----------------------------------------+---------+----------
 1004019226350863009 | 1004019226350778807 | Hologres SQL Generated BY AUTO ANALYZE | SUCCESS |      119
 1004019226350778807 |                     | AutoAnalyze                            | SUCCESS |     3276

Query examples

View recent AUTO ANALYZE execution records

SELECT usename,
       status,
       duration,
       query_start,
       query_end,
       query,
       application_name
FROM   hologres.hg_query_log
WHERE  query_start >= now() - interval '1 hour'
  AND  application_name IN ('AutoAnalyze')
ORDER BY query_start DESC
LIMIT  20;

       query_id,
       extended_info->>'src_query_id' as "source query id"

View the AUTO ANALYZE execution history of a specific table

SELECT status,
       duration,
       query_start,
       query
FROM   hologres.hg_query_log
WHERE  query_start >= now() - interval '1 day'
  AND  application_name IN ('AutoAnalyze')
  AND  query LIKE '%my_table_name%'
ORDER BY query_start DESC;

Summarize AUTO ANALYZE execution overview for the past 3 days

SELECT query_date,
       status,
       COUNT(*)          AS task_count,
       AVG(duration)     AS avg_duration_ms,
       MAX(duration)     AS max_duration_ms
FROM   hologres.hg_query_log
WHERE  query_start >= CURRENT_DATE::timestamptz - interval '2 day'
  AND  application_name IN ('AutoAnalyze')
  AND  command_tag = 'ANALYZE'
  AND  (status = 'SUCCESS' OR (
            message NOT LIKE '%does not exist%'
        AND message NOT LIKE '%retry later%'))
GROUP BY query_date, status
ORDER BY query_date DESC;

View failed AUTO ANALYZE tasks in the past day

SELECT query_start,
       duration,
       message,
       query
FROM   hologres.hg_query_log
WHERE  query_start >= now() - interval '1 day'
  AND  application_name IN ('AutoAnalyze', 'Hologres SQL Generated BY AUTO ANALYZE')
  AND  status != 'SUCCESS'
  AND  (message NOT LIKE '%does not exist%'
        AND message NOT LIKE '%retry later%')
ORDER BY query_start DESC;

Notes

Since AUTO ANALYZE connections are initiated with an internal administrator identity, regular users need the pg_read_all_stats role or database administrator privileges to view complete AUTO ANALYZE execution records.
If you find that there are no AUTO ANALYZE records in the Query Log for an extended period, it is recommended to check whether the AUTO ANALYZE switch is enabled.

Viewing and troubleshooting statistics

Viewing table statistics (hologres_statistic.hg_table_statistic)

Table statistics are stored in the hologres_statistic.hg_table_statistic table and can also be observed in system tables.

Query this table to get the statistics from the most recent ANALYZE.

SELECT schema_name,                -- Table schema
       table_name,                 -- Table name
       user_name,                  -- User who last ran ANALYZE
       schema_version,             -- Table schema version at last ANALYZE
       total_rows,                 -- Row count at last ANALYZE
       sample_rows,                -- Sample rows used at last ANALYZE
       analyze_timestamp,          -- Completion time of last ANALYZE
       analyze_count               -- Total ANALYZE count so far
FROM   hologres_statistic.hg_table_statistic
WHERE  unique_name = hologres.hg_internal_statistic_unique_name ('schemaname', 'tablename')
ORDER BY analyze_timestamp DESC;

-- Sample output
 schema_name | table_name |    user_name     | schema_version | total_rows | sample_rows |  analyze_timestamp  | analyze_count
-------------+------------+------------------+----------------+------------+-------------+---------------------+---------------
 public      | test_fnr   |   BASIC$test_fnr |             -1 |        999 |         999 | 2026-03-02 22:05:29 |             2
(1 row)

V3.1 and earlier:

Each table has 0 to n records in the hologres_statistic.hg_table_statistic table. 0 records means ANALYZE has never been performed, and 1 or more records means ANALYZE has been run.
If there are two or more records, the schema_version of the two records must be different, because changes to the table schema (such as ADD COLUMN, CALL SET_TABLE_PROPERTY, etc.) generate a new version, which adds a new statistics record. The record corresponding to the old schema_version is no longer used.
The following example query result shows that the same table has 2 records, and the schema_version of the second record is lower than the first. The second record is therefore invalid and will not be used, and you do not need to pay attention to it. Hologres does not currently clean up historical expired records in the hg_table_statistic table, and users do not need to worry about old data.

 schema_name | table_name |    user_name     | schema_version | total_rows | sample_rows |  analyze_timestamp  | analyze_count
-------------+------------+------------------+----------------+------------+-------------+---------------------+---------------
 public      | test_fnr   |   BASIC$test_fnr |             13 |        999 |         999 | 2026-03-01 18:05:29 |             2
 public      | test_fnr   |   BASIC$test_fnr |             12 |        999 |         999 | 2026-03-01 08:05:29 |             1
(1 row)

V3.1 and later:

Each table has 0 to 1 records in the hologres_statistic.hg_table_statistic table. 0 records means ANALYZE has never been performed, and 1 record means ANALYZE has been run.
The schema_version is uniformly set to -1 (representing Deprecated).
This means that many DDL operations no longer invalidate statistics. For example, CALL SET_TABLE_PROPERTY will not re-trigger AUTO ANALYZE, and the existing statistics will continue to be used. Compared to Hologres V3.0 and earlier, the trigger frequency of AUTO ANALYZE is significantly reduced.

Querying row count and other statistics

Row count information is recorded in the reltuples field of the pg_class table.

-- relallvisible > 0: the table has row count statistics
-- relallvisible = 0: the row count is unknown; do not rely on reltuples
-- relallvisible < 0: the table has no statistics
SELECT relallvisible, reltuples FROM pg_class WHERE relname = 'test_table';

If the table has no statistics, refer to the "Viewing tables with missing statistics" section below to identify the cause.

Querying column statistics

By querying the pg_stats view, you can obtain the statistics for all columns of the current table. For example, to get the statistics for the ds column of test_table:

select * from pg_stats where tablename = 'test_table' and attname = 'ds';

 schemaname | tablename  | attname | inherited | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation | most_common_elems | most_common_elem_freqs | elem_count_histogram
------------+------------+---------+-----------+-----------+-----------+------------+------------------+-------------------+------------------+-------------+-------------------+------------------------+----------------------
 public     | test_table | ds      | f         |         0 |         4 |          1 | {20241104}       | {1}               |                  |             |                   |                        |
(1 row)

If the column has no statistics, refer to the "Viewing tables with missing statistics" section below to identify the cause.

Viewing tables with missing statistics

Hologres provides the view HG_STATS_MISSING (the actual view name may vary by version) to check tables in the current database that are missing statistics, making it easy to identify objects that need supplementary ANALYZE.

For specific fields and usage methods, refer to the corresponding instance documentation or console help.

Common issues and troubleshooting

Possible reasons for 0 records returned from the hologres_statistic.hg_table_statistic table:

ANALYZE has never been executed (neither manually nor by AUTO ANALYZE);
AUTO ANALYZE is not working, or the table has not met the trigger conditions;

Recommended actions:

First, manually execute:

ANALYZE schema_name.table_name;

If the table still cannot be covered by AUTO ANALYZE for an extended period, it is recommended to check the AUTO ANALYZE switch and GUC configuration, or contact technical support.

Possible reasons for an obviously outdated analyze_timestamp:

AUTO ANALYZE has been disabled or restricted (such as autovacuum_enabled = false);
AUTO ANALYZE has insufficient resources or has failed (can be investigated in conjunction with logs/monitoring);

Recommended actions:

Manually execute ANALYZE once and observe the results;
Check:
- Whether hg_enable_start_auto_analyze_worker is enabled;
- Whether autovacuum_enabled = false has been mistakenly set or related GUC restrictions are too strict;
- Check whether there are errors related to AUTO ANALYZE tasks in the Query Log.

Abnormal row count estimation in query plans (significantly too small or too large)

Check whether the corresponding table is missing statistics or has outdated statistics;
Check whether predicate columns have statistics (pg_stats), and perform column-level ANALYZE on critical columns;
Check whether the fast row count estimation feature has been disabled or not enabled.

Inconsistent parent-child statistics for partitioned tables or missing statistics on the parent table

Symptom: The partition parent table has no records in hologres_statistic.hg_table_statistic, or total_rows is significantly less than the sum of all child table row counts, or the partition column statistics MCV distribution of the parent table is outdated (for example, the ds column in pg_stats is missing yesterday's distribution);
Troubleshooting points:
- Check whether incremental partition ANALYZE / AUTO ANALYZE has been disabled;

SHOW hg_experimental_enable_incremental_analyze;
SHOW hg_experimental_enable_incremental_auto_analyze;

Use the HG_STATS_MISSING view to check whether there are child tables that cannot be merged;
Check whether there are errors related to AUTO ANALYZE tasks in the Query Log;
Manually execute ANALYZE child_table; on child tables with missing statistics, or execute ANALYZE partition_parent_table; on the parent table during off-peak hours to let the system fill in and merge statistics.

Certain columns have no statistics in pg_stats

Symptom: The table has the column, but the corresponding column statistics cannot be found in the pg_stats view.
Troubleshooting points:
- Check whether statistics have been disabled through column attributes (enable_analyze = false or enable_auto_analyze = false);

SELECT attoptions
  FROM pg_attribute
 WHERE attrelid = 'tablename'::regclass::oid
   AND attname = 'columnname';
-[ RECORD 1 ]----------------------
attoptions | {enable_analyze=false}

Confirm whether the column has participated in ANALYZE (manual or AUTO ANALYZE). If necessary, execute column-level ANALYZE table_name(col_name);;

SELECT query_id,
       application_name,
       status,
       query_start,
       query
FROM   hologres.hg_query_log
WHERE  query_start >= now() - interval '1 hour'
  AND  application_name IN ('Hologres SQL Generated BY ANALYZE', 'Hologres SQL Generated BY AUTO ANALYZE')
  AND  query like '%tablename%'
ORDER BY query_start DESC
LIMIT  2;

Whether the column is a special type (bytea, jsonb).
If the table has executed AUTO ANALYZE but no columns have statistics, consider whether the AUTO ANALYZE task executed abnormally. For example, check whether AUTO ANALYZE timed out. Since auto_analyze_work_statement_timeout defaults to 1 hour, if the AUTO ANALYZE task in hologres.hg_query_log has a duration >= 3600000, in this case, only row count information is collected without column statistics.

AUTO ANALYZE has executed but the query plan has not significantly improved

Symptom: AUTO ANALYZE records for a specific table can be seen in hologres.hg_query_log, but the actual query plan still uses obviously unreasonable row count estimates or Join order;
Troubleshooting points:
- Check whether AUTO ANALYZE timed out. Since auto_analyze_work_statement_timeout defaults to 1 hour, if the AUTO ANALYZE task in hologres.hg_query_log has a duration >= 3600s, in this case, only row count information is collected without column statistics.
- Check whether there are Hints, fixed Join orders, or legacy session-level GUCs (such as enable_nestloop, etc.) interfering with the optimizer's choices;
- Perform a manual ANALYZE on critical tables/columns and then re-run the query. If there is still no improvement, it is recommended to investigate further in conjunction with Query Log, FE logs, or by reporting to technical support.

OOM / memory limit exceeded issues caused by statistics

Symptom: A complex multi-table Join reports error Query executor exceeded total memory limitation ..., or the plan selects an extremely unreasonable Join strategy (such as a large table driving a large table Nested Loop, etc.);
Troubleshooting points:
- Confirm whether related tables are missing statistics or have severely outdated statistics (by combining records in hg_table_statistic and AUTO ANALYZE execution records);
- Perform column-level ANALYZE on Join columns and filter columns. If necessary, increase sampling precision or enable the fast row count estimation feature;
- If the issue only occurs in certain extreme data distribution or peak scenarios, consult with technical support to evaluate whether adjustments to the statistics model or related GUCs (such as AUTO ANALYZE thresholds, memory limits, etc.) are needed.

Running ANALYZE and AUTO ANALYZE with Serverless

Starting from Hologres V3.1, in Serverless instances or instances with Serverless computing resources enabled, ANALYZE and AUTO ANALYZE can run on Serverless resources to reduce CPU/memory pressure on the instance itself.

This section explains the typical behavior and recommended configuration for manual ANALYZE and AUTO ANALYZE in Serverless Computing scenarios, in conjunction with related GUCs.

Manual ANALYZE and Serverless computing

hg_serverless_computing_enable_analyze_statement
- Description: Controls whether manual ANALYZE statements can be executed through Serverless tasks;
- Behavior:
  - When set to on (default), the system allows ANALYZE statements to be offloaded to Serverless computing resources for execution, minimizing the impact of statistics tasks on the instance itself;
  - When set to off, ANALYZE executes on the local instance resources.
- Typical recommendations:
  - In Serverless instances, the default is on, allowing only Serverless resources to complete statistics tasks;
  - When troubleshooting or when there are special requirements for resource usage, you can temporarily set it to off to explicitly run ANALYZE on local resources.

AUTO ANALYZE and Serverless computing

When generating AUTO ANALYZE tasks, the system determines whether to use Serverless resources and how to control Serverless task behavior based on the following GUCs:

hg_auto_analyze_run_with_serverless_computing
- Description: Whether to execute AUTO ANALYZE tasks through Serverless computing resources. The default is off for regular instances and on for Serverless instances.
- Behavior:
  - When set to on, AUTO ANALYZE tasks will send sampling computations to the Serverless resource pool;
  - When set to off, AUTO ANALYZE continues to complete statistics through the local instance resources.

-- Set at the database level
ALTER DATABASE datname SET hg_auto_analyze_run_with_serverless_computing = on;

hg_auto_analyze_serverless_computing_query_priority
- Description: The priority of Serverless AUTO ANALYZE tasks, ranging from 1~5. Higher values indicate higher priority. The default is 2;
- Behavior:
  - When Serverless AUTO ANALYZE is enabled, the system sets the corresponding priority for each statistics task through SET hg_experimental_serverless_tasks_query_priority = <value>;;
  - This is applicable when sharing a resource pool with other Serverless tasks (such as ETL and offline queries) to control the preemption capability of statistics tasks in the overall queue;
- Recommendations:
  - In production environments, it is typically kept at a medium priority (such as the default 2 or moderately increased to 3) to avoid competing for too many resources with core business jobs;
  - For scenarios with extremely high requirements for statistics timeliness and sufficient Serverless resources, the priority can be appropriately increased.
hg_auto_analyze_serverless_computing_enable_persisted_snapshot
- Description: Whether to use a persisted snapshot (persisted snapshot) in Serverless AUTO ANALYZE tasks, i.e., controlling whether to skip additional flush through SET hg_experimental_enable_persisted_snapshot = on/off;;
- The default is off for non-Serverless instances and on for Serverless instances.
- Behavior:
  - When set to on, statistics tasks will try to reuse the persisted snapshot, reducing the need to trigger a flush. This is more suitable for high-frequency statistics or scenarios with a large number of partitioned tables;
  - When set to off, statistics tasks will try to use a more up-to-date snapshot, which is suitable for scenarios with higher requirements for consistency or specific version isolation.
- Recommendations:
  - Do not adjust unless necessary.

Usage recommendations for Serverless computing scenarios

Prioritize using Serverless Computing to handle heavy ANALYZE tasks
- When performing ANALYZE on tables or foreign tables with very large data volumes, it is recommended to enable:
  - Manual ANALYZE: set hg_computing_resource = 'serverless'; ANALYZE my_table;;
  - AUTO ANALYZE: ALTER DATABASE mydb SET hg_auto_analyze_run_with_serverless_computing = on;;
- This migrates the statistics computation pressure to the Serverless resource pool, reducing interference with online queries on the instance.
Set Serverless AUTO ANALYZE task priority based on business importance
- In Serverless instances, for critical business databases that are sensitive to statistics timeliness and depend on the latest statistics, you can appropriately increase hg_auto_analyze_serverless_computing_query_priority;
- In regular instances, using local resources for AUTO ANALYZE is typically sufficient.
Follow the "small step trial" principle before adjusting parameters
- Serverless Computing related GUCs are also advanced parameters. Before making adjustments, it is recommended to first verify in a test or staging environment:
  - Observe the queuing and execution duration of statistics tasks in the Serverless resource pool;
  - Monitor the impact on latency and resource usage of core business queries;
- After confirming there are no significant side effects, roll out to the production environment by database or in phases.