Troubleshoot OOM errors in Hologres-Hologres(Hologres)-阿里云帮助中心

Analyze memory consumption

View memory consumption
- Total consumption: The Hologres console shows aggregated memory consumption across all nodes. For more information, see Monitoring metrics.
- Per-query consumption: The memory_bytes field approximates per-query memory consumption. This value may be inaccurate. For more information, see Get and analyze slow query logs.
Handle high memory usage

Monitor overall memory usage in the Hologres console (see Monitoring metrics). Sustained usage above 80% is high. Hologres pre-allocates memory for metadata and cache, so idle usage of 30-50% is normal. Usage near 100% degrades stability and performance.
- Causes
  - High memory consumption from metadata
    
    Metadata memory grows with table volume and can cause high usage even when no tasks run. Keep each Table Group under 10,000 tables (including partitions, excluding foreign tables). Too many shards in a Table Group increases fragmentation and metadata overhead.
  - High memory consumption from computation
    
    High query memory typically results from scanning large data volumes or complex operations such as multiple COUNT DISTINCT functions, complex JOIN operations, GROUP BY on multiple columns, or window functions.
  - High memory usage in the Other module
    
    When memory monitoring shows a sudden increase in the memory usage of the Other module along with a high overall memory utilization, the cause may be the experimental parameter hg_experimental_enable_hash_partitioned_sort_v2. This parameter enables a hash-partitioned sort algorithm for window row number filtering. The algorithm has known resource consumption issues and causes unclassified memory to accumulate under the Other module category.
    
    Solution
    
    Disable the experimental parameter by executing the following SQL statement:
```
SET hg_experimental_enable_hash_partitioned_sort_v2 = off;
```
    After disabling this parameter, monitor the instance memory usage. The memory attributed to the Other module should decrease to normal levels.
- Key impacts
  - Stability
    
    Excessive memory consumption, especially from metadata, reduces memory available for queries and can cause sporadic errors such as SERVER_INTERNAL_ERROR, ERPC_ERROR_CONNECTION_CLOSED, or Total memory used by all existing queries exceeded memory limitation.
  - Performance
    
    High memory usage from excessive metadata depletes cache space, lowering cache hit rates and increasing query latency.
- Solutions
  - If high memory usage is caused by excessive metadata: Use the hg_table_info table to manage your tables. For more information, see Get and analyze table statistics. Delete unused data or tables and reduce unnecessary partitions to free memory.
  - If high memory usage is caused by computation: Optimize SQL for write and query use cases separately. For more information, see Resolve OOM errors during queries and Resolve OOM errors during data import and export.
  - General solution: Scale up compute and storage resources. Instance list.

Identify OOM errors

An OOM error occurs when computation memory exceeds its allocated limit (e.g., 20 GB or more). A typical error message:

Total memory used by all existing queries exceeded memory limitation. 
memory usage for existing queries=(2031xxxx,184yy)(2021yyyy,85yy)(1021121xxxx,6yy)(2021xxx,18yy)(202xxxx,14yy); Used/Limit: xy1/xy2 quota/sum_quota: zz/100

Interpret the error message as follows:

queries=(query_id, memory_used_by_query)

Each entry, such as queries=(2031xxxx,184yy), shows per-query memory consumption. For example, queries=(2031xxxx,18441803528) means query query_id=2031xxxx consumed ~18 GB on a single node. The top 5 memory-intensive queries are listed. For more information, see Get and analyze slow query logs.
Used/Limit: xy1/xy2

Shows compute_memory_used_on_node / compute_memory_limit_on_node in bytes. Used is the total compute memory consumed by all running queries on that node. For example, Used/Limit: 33288093696/33114697728 means queries used 33.2 GB, exceeding the 33.1 GB limit and triggering OOM.
quota/sum_quota: zz/100

zz is the percentage of total instance resources allocated to a resource group. For example, quota/sum_quota: 50/100 means the resource group uses 50% of total instance resources.

Basic causes of OOM errors

Hologres prioritizes in-memory computation for optimal query efficiency. Unlike systems that spill to disk when memory is insufficient, Hologres raises an OOM error directly when a query exceeds available memory.

Memory allocation and limits

A Hologres instance operates as a distributed system, comprising multiple nodes whose quantity varies with instance specifications. For more details, see Instance management.

Each node typically has 16 vCPUs and 64 GB of memory. An OOM error occurs if any single node exhausts its memory. The 64 GB is partitioned for query computation, backend processes, cache, and metadata. Before V1.1.24, compute memory was capped at 20 GB. V1.1.24 and later dynamically allocate available memory to queries when metadata consumption is low.

Resolve OOM errors during queries

Causes.
- Incorrect execution plans: This can be due to inaccurate statistics, improper join order, or other optimization issues.
- High query concurrency: Many queries simultaneously consuming substantial memory.
- Complex queries: Inherently complex queries or those scanning large data volumes.
- UNION ALL operations: Queries containing UNION ALL can increase executor parallelism, leading to higher memory usage.
- Insufficient resource group allocation: A resource group is configured but allocated inadequate resources.
- Data skew or shard pruning: These can cause unbalanced load and high memory pressure on specific nodes.

Analysis and solutions:

Cause: Insufficient resource group allocation

Solution: Use the Serverless Computing feature to supplement your instance’s dedicated resources with additional compute capacity. For an overview and usage instructions, refer to Serverless computing and Work with serverless computing.

In Hologres V3.0 and later, Query Queues automatically rerun OOM queries on serverless computing resources. Control large queries.

Cause: Incorrect execution plan

Type 1: Inaccurate statistics

Run EXPLAIN <SQL> to view the execution plan. rows=1000 indicates missing or inaccurate statistics, leading to an inefficient execution plan that consumes excessive resources and triggers an OOM error.

tt=# explain  select count(1) from tmp join tmp1 on tmp.a = tmp1.b;
                                    QUERY PLAN
----------------------------------------------------------------------------------
Partial Aggregate  (cost=0.00..10.11 rows=1 width=8)
  ->  Gather Motion  (cost=0.00..10.11 rows=10 width=8)
    ->  Partial Aggregate  (cost=0.00..10.11 rows=10 width=8)
      ->  Hash Join  (cost=0.00..10.11 rows=1 width=1)
            Hash Cond: (tmp.a = tmp1.b)
            ->  Parallelism (Gather Exchange)  (cost=0.00..5.04 rows=1000 width=1)
                  ->  DecodeNode  (cost=0.00..5.04 rows=1000 width=1)
                        ->  Seq Scan on tmp  (cost=0.00..5.01 rows=1000 width=1)
            ->  Hash  (cost=5.04..5.04 rows=1000 width=1)
                  ->  Parallelism (Gather Exchange)  (cost=0.00..5.04 rows=1000 width=1)
                        ->  DecodeNode  (cost=0.00..5.04 rows=1000 width=1)
                              ->  Seq Scan on tmp1  (cost=0.00..5.01 rows=1000 width=1)
Optimizer: HQO version 0.8.0
(13 rows)

Solutions include the following:

Run the ANALYZE <tablename> command to update table statistics.
Enable auto analyze to automatically update statistics. For more information, see ANALYZE and AUTO ANALYZE.

Type 2: Incorrect join order

In a Hash Join, the smaller table should be the build side. Use EXPLAIN <SQL> to check the execution plan. If the larger table builds the hash table, the join order is inefficient and can cause OOM. Common reasons:

Outdated table statistics. For example, the upper table’s statistics were not updated, resulting in rows=1000.

In the execution plan, the Result node on the left side of the Hash Left Join has an estimated row count of only 1,000, while the Hash build side on the right has an estimated row count as high as 6,754,108,416 (approximately 6.75 billion). This large discrepancy indicates that the large table was used to build the Hash table.

Gather  (cost=0.00..56428622.14 rows=6754109416 width=496)
  -> Insert  (cost=0.00..49020180.01 rows=6754109416 width=496)
    -> Result  (cost=0.00..79269.98 rows=6754109416 width=742)
      -> Hash Left Join  (cost=0.00..54211.24 rows=6754109416 width=660)
            Hash Cond: (row_pk = dws_tb_crm_itm_prf_exp_analysis_nd.row_pk)
            -> Result  (cost=0.00..7.03 rows=1000 width=600)
                  -> Redistribution  (cost=0.00..6.01 rows=1000 width=496)
                        -> Result  (cost=0.00..6.00 rows=1000 width=496)
                              Filter: ((ds = ‘${bizdate}’::text) AND (NOT (row_pk IS NULL)))
                              -> Forward  (cost=0.00..6.00 rows=1000 width=504)
                                    -> Sequence  (cost=0.00..5.00 rows=1000 width=504)
                                          -> Partition Selector for dws_tb_crm_itm_prf_exp_analysis_nd_extl (dynamic scan id: 1)  (cost=10.00..100.00 rows=100 width=4)
                                                Partitions selected: 0 (out of 33)
                                          -> DynamicSeqScan  (cost=0.00..5.00 rows=1000 width=504)
            -> Hash  (cost=11717.79..11717.79 rows=6754108416 width=60)
                  -> Exchange (Gather Exchange)  (cost=0.00..11717.79 rows=6754108416 width=60)
                        -> Decode  (cost=0.00..2755.96 rows=6754108416 width=60)
                              -> Seq Scan on dws_tb_crm_itm_prf_exp_analysis_nd  (cost=0.00..871.47 rows=6754108416 width=60)
Optimizer: HQO version 0.10.0
(19 rows)

The optimizer failed to generate an optimal execution plan.

Solutions:

Run ANALYZE <tablename> on all tables involved in the join to ensure up-to-date statistics. This helps the optimizer determine the correct join order.

If the join order remains incorrect after running ANALYZE <tablename>, adjust a GUC parameter. Set optimizer_join_order = query to force the optimizer to follow the join sequence specified in the SQL statement. This approach is particularly suitable for complex queries.

SET optimizer_join_order = query;
SELECT * FROM a JOIN b ON a.id = b.id; -- Table b is used as the build side of the hash table.

You can also adjust the join order policy as needed.

Parameter

Description

set optimizer_join_order = <value>

This parameter controls the optimizer's Join Order algorithm. Valid values:

query: Does not perform Join Order transformation. Joins are executed strictly in the order specified in the SQL query. This setting incurs the lowest optimizer overhead.
greedy: Employs a greedy algorithm to explore possible Join Orders. This option results in moderate optimizer overhead.
exhaustive (default): Uses a dynamic planning algorithm for Join Order transformation. It aims to generate the optimal execution plan but comes with the highest optimizer overhead.

Type 3: Incorrect hash table estimation

In a hash join, the smaller input should build the hash table. However, query complexity or inaccurate statistics can cause the system to select a larger relation as the build input, creating an oversized hash table that triggers OOM.

Hash (cost=727353.45..627353.35 , rows=970902134 width=94) represents the build input, and rows=970902134 indicates the estimated data volume for building the hash table. If the actual table contains less data, the estimation is inaccurate.

The Hash node at the bottom of the execution plan has an estimated row count as high as rows=970902134 (approximately 970 million rows), which is a typical symptom of the Build Side data volume being severely overestimated or incorrectly selected.

-> Broadcast  (cost=0.00..5.17 rows=119488 width=16)
                    -> Exchange (Gather Exchange)  (cost=0.00..5.10 rows=1867 width=16)
                        -> Decode  (cost=0.00..5.10 rows=1867 width=16)
                            -> Seq Scan on xxx  (cost=0.00..5.00 rows=1867 width=16)
            -> Hash  (cost=5.17..5.17 rows=119488 width=16)
                -> Broadcast  (cost=0.00..5.17 rows=119488 width=16)
                    -> Exchange (Gather Exchange)  (cost=0.00..5.10 rows=1867 width=16)
                        -> Decode  (cost=0.00..5.10 rows=1867 width=16)
                            -> Seq Scan on xxx  (cost=0.00..5.00 rows=1867 width=16)
        -> Hash  (cost=5.13..5.13 rows=119488 width=8)
            -> Broadcast  (cost=0.00..5.13 rows=119488 width=8)
                -> Exchange (Gather Exchange)  (cost=0.00..5.10 rows=1867 width=8)
                    -> Decode  (cost=0.00..5.10 rows=1867 width=8)
                        -> Seq Scan on xxx  (cost=0.00..5.00 rows=1867 width=8)
    -> Hash  (cost=5.10..5.10 rows=896 width=3)
        -> Broadcast  (cost=0.00..5.10 rows=896 width=3)
            -> Exchange (Gather Exchange)  (cost=0.00..5.10 rows=14 width=3)
                -> Decode  (cost=0.00..5.10 rows=14 width=3)
                    -> Seq Scan on xxx  (cost=0.00..5.00 rows=14 width=3)
-> Hash  (cost=627353.45..627353.45 rows=970902134 width=94)
    -> Partial HashAggregate  (cost=0.00..627353.45 rows=970902134 width=94)

Solutions:

Verify statistics: Check if the subquery’s table statistics are current and accurate. If not, run ANALYZE <tablename> to update them.
Disable hash table estimation: Turn off the execution engine's hash table estimation using the following parameter:

Note
This parameter defaults to off. However, it may have been enabled in certain tuning scenarios. If it is currently enabled, ensure you set it back to off.
```
SET hg_experimental_enable_estimate_hash_table_size =off;
```

Type 4: Broadcasting a large table

Broadcasting copies data to all shards and is efficient only for small tables with few shards. During joins, the build input is broadcast to every shard. A large dataset or excessive shard count can consume substantial memory, causing OOM errors.

For example, an 80-million-row table might show only 1 estimated row in the execution plan. The actual broadcast of all 80 million rows consumes excessive memory, triggering OOM.

Gather  (cost=0.00..119000614.54 rows=495989952 width=5537)
  ->  Insert  (cost=0.00..112801537.07 rows=495989952 width=5537)
        ->  Redistribution  (cost=0.00..428813.57 rows=991979904 width=2320)
              ->  Result  (cost=0.00..338771.55 rows=991979904 width=2320)
                    ->  Result  (cost=0.00..338771.55 rows=991979904 width=2320)
                          ->  Hash Left Join  (cost=0.00..310003.14 rows=991979904 width=5561)
                                Hash Cond: ((olap_event_1480807263997566978.pub_distinct_id = olap_event_1480807263997566978_new.pub_distinct_id) AND (olap_event_1480807263997566978.pub_event_name = olap_event_1480807263997566978_new.pub_event_name) AND (olap_event_1480807263997566978.uuid = olap_event_1480807263997566978_new.uuid))
                                ->  Exchange (Gather Exchange)  (cost=0.00..68102.40 rows=495989952 width=5537)
                                      ->  Decode  (cost=0.00..65774.92 rows=495989952 width=5537)
                                            ->  Seq Scan on olap_event_1480807263997566978  (cost=0.00..1923.43 rows=495989952 width=5537)
                                ->  Hash  (cost=5.10..5.10 rows=80 width=24)
                                      ->  Broadcast  (cost=0.00..5.10 rows=80 width=24)
                                            ->  Exchange (Gather Exchange)  (cost=0.00..5.10 rows=1 width=24)
                                                  ->  Decode  (cost=0.00..5.10 rows=1 width=24)
                                                        ->  Seq Scan on olap_event_1480807263997566978_new  (cost=0.00..5.00 rows=1 width=24)
Optimizer: HQO version 1.1.0

Solutions:

Check whether the estimated row count in the execution plan matches reality. If not, run ANALYZE tablename to update statistics.
Disable broadcasting and rewrite it as a redistribution operator using the following GUC parameter.
```
SET optimizer_enable_motion_broadcast = off;
```

Cause: High query concurrency
If QPS spikes significantly, or the OOM error shows HGERR_detl memory usage for existing queries=(2031xxxx,184yy)(2021yyyy,85yy)(1021121xxxx,6yy)(2021xxx,18yy)(202xxxx,14yy); with each query using minimal memory, high concurrency is the likely cause. Solutions:
- Reduce write concurrency: If write operations are contributing, reduce their concurrency. For more information, see Resolve OOM errors during data import and export.
- Implement read/write splitting: Deploy a read/write splitting architecture with primary and secondary instances (shared storage).
- Increase the compute specifications of your instance.
Cause: Complex query
If a single query triggers OOM due to its complexity or large data volume, consider these approaches:
- Pre-compute data: Write pre-computed data into Hologres to avoid large-scale ETL operations within Hologres.
- Add filter conditions.
- Optimize SQL: Use techniques like Fixed Plan or Count Distinct optimization. For more information, see Optimize internal table query performance.
Cause: UNION ALL

As shown below, when an SQL statement contains many UNION ALL subqueries, the executor processes them concurrently. This can overload memory and cause an OOM error.
```
subquery1 UNION ALL subquery2 UNION ALL subquery3 ...
```
Solution: Force serial execution using the following parameters to mitigate OOM errors. Be aware that this will result in slower query performance.
```
SET hg_experimental_hqe_union_all_type=1;
SET hg_experimental_enable_fragment_instance_delay_open=on;
```
Cause: Inadequate resource group configuration
An OOM error reports: memory usage for existing queries=(3019xxx,37yy)(3022xxx,37yy)(3023xxx,35yy)(4015xxx,30yy)(2004xxx,2yy); Used/Limit: xy1/xy2 quota/sum_quota: zz/100. If zz is small—for example, 10 (only 10% of resources allocated)—queries in that group have limited memory, increasing OOM likelihood.
```
AcquireOrRelease] HGERR code 53200 HGERR msge Total memory used by all existing queries
exceeded memory limitation. HGERR detl memory usage for existing
queries=(7001xxx 5,4367295472)(673125xxx9,1701576)(6731xxx 3,3590736)(6731xxx 5,3510024) (673116xxx ,3088256
Used/Limit: 4408213504/42552795136 quota/sum_quota: 10/100.
```
Solution: Reset the resource group quota. Allocate at least 30% of the instance’s total resources to each resource group.

Cause: Data skew or shard pruning

If overall memory usage is low but OOM still occurs, data skew or shard pruning may be concentrating memory pressure on specific nodes.

Note

Shard pruning is a query optimization technique that scans only a subset of shards, rather than all of them.

Check for data skew: Use the following SQL query. The hg_shard_id is a built-in hidden field in every table that indicates the shard where each row resides.
```
SELECT hg_shard_id, count(1) FROM t1 GROUP BY hg_shard_id;
```

Inspect shard pruning: Inspect the execution plan for indications of Shard Pruning. For example, if the shard selector shows l0[1], it means only one specific shard's data was selected for the query.

-- The distribution key is x. Based on the filter condition x=1, you can quickly locate the shard.
SELECT count(1) FROM bbb WHERE x=1 GROUP BY y;

                          QUERY PLAN
--------------------------------------------------------------
Result  (cost=0.00..5.10 rows=1 width=8)
  ->  HashAggregate  (cost=0.00..5.10 rows=1 width=8)
        Group Key: y
        ->  Exchange (Gather Exchange)  (cost=0.00..5.10 rows=1 width=4)
            ->  Decode  (cost=0.00..5.10 rows=1 width=4)
                ->  Seq Scan on bbb  (cost=0.00..5.00 rows=1 width=4)
                      Filter: (x = 1)
                      Shard Selector(Eagerly):
                        ->: l0 [1]
Optimizer: HQO version 1.3.0
(10 rows)

Solutions:

Design an appropriate distribution key to prevent data skew.
If business logic inherently causes data skew, modify the application logic accordingly.

Cause: High-cardinality multi-stage GROUP BY

In Hologres V3.0 and later, multi-stage aggregations on high-cardinality data can cause OOM when GROUP BY columns do not align with the distribution key (the distribution key is not a subset of the GROUP BY key). Each concurrent instance maintains a large hash table, creating high memory pressure. To mitigate this, set the following parameter:
```
-- Use a GUC parameter to set the maximum number of rows in the aggregation hash table. The following SQL statement indicates that the partial_agg_hash_table can have a maximum of 8192 rows. The default value is 0, which indicates no limit.
SET hg_experimental_partial_agg_hash_table_size = 8192;
```

Resolve OOM errors during data import and export

OOM errors can occur during data transfers in Hologres, including between internal tables, interactions with foreign tables, and imports from MaxCompute.

Solution 1: Use Serverless Computing for imports and exports

Use Serverless Computing to supplement your instance’s resources for import and export tasks, avoiding resource contention. For an overview, see Serverless computing. For usage instructions, see Work with serverless computing.

Solution 2: Control scan concurrency for wide tables or columns

In MaxCompute imports, OOM errors can arise from wide tables or columns combined with high scan concurrency. Use the following parameters to control concurrency.

Control scan concurrency for wide tables (common scenario)

Note

Apply the following parameters along with your SQL statement. Prioritize the first two parameters. If an OOM error persists, reduce their values further.

-- Set the maximum concurrency for accessing foreign tables. The default value equals the instance's vCPU count. The maximum value is 128. Do not set a large value to prevent queries on foreign tables, especially in data import scenarios, from affecting other queries and causing system busy errors. This parameter is effective in Hologres V1.1 and later.
SET hg_foreign_table_executor_max_dop = 32;

-- Adjust the batch size for each read from a MaxCompute table. The default value is 8192.
SET hg_experimental_query_batch_size = 4096;

-- Set the maximum concurrency for executing DML statements when accessing foreign tables. The default value is 32. This parameter is optimized for data import and export scenarios to prevent import operations from consuming excessive system resources. This parameter is effective in Hologres V1.1 and later.
SET hg_foreign_table_executor_dml_max_dop = 16;

-- Set the split size for accessing MaxCompute tables. This parameter can adjust concurrency. The default value is 64 MB. If the table is large, increase this value to prevent too many splits from affecting performance. This parameter is effective in Hologres V1.1 and later.
SET hg_foreign_table_split_size = 128;

Control scan concurrency for wide columns

If you’ve already tuned parameters for wide tables but still encounter OOM errors, check whether your data includes wide columns. If so, adjust the following parameters to resolve the issue.

-- Adjust the shuffle parallelism for wide columns to reduce data accumulation.
SET hg_experimental_max_num_record_batches_in_buffer = 32;

-- Adjust the batch size for each read from a MaxCompute table. The default value is 8192.
SET hg_experimental_query_batch_size=128;

Cause: Excessive duplicate data in a foreign table

When a foreign table contains substantial duplicate data, import performance degrades and can cause OOM errors. For example, a table of 100 million rows with 80 million duplicates is highly duplicated. Assess duplication based on your business context.

Solution: Deduplicate data before import, or import in smaller batches.

What causes the error "The shards are incomplete, the workers or shards are unhealthy"?

This error is typically not caused directly by an out-of-memory (OOM) issue. It occurs when the CPU usage of the Hologres instance is excessively high, which causes Worker nodes or Shards to enter an unhealthy state.

To resolve this issue:

Check the monitoring metrics of the Hologres instance to verify whether CPU usage is elevated.
If CPU usage is high, wait for the workload to decrease and then retry the query.
If the error persists, check whether complex queries or excessive concurrency is causing CPU saturation. Optimize resource-intensive queries or reduce concurrency to lower CPU usage.