PolarDB PostgreSQL版支持OR子句转UNION ALL的查询优化功能,能够在生成计划期间尝试将合适的OR子句转换成UNION ALL形式,再进行基于代价的路径选择,得到更优的执行计划。
背景
目前,PostgreSQL优化器对SQL中的OR子句过滤条件的优化能力较为有限。如果OR子句中的过滤条件仅涉及一张表,且所有过滤条件上均具备适当的索引,则优化器会为此类场景生成一个BitmapOr的Index Path。例如:
EXPLAIN SELECT * FROM my_test WHERE (id = 123 OR name = '123' OR salary = 123.0);
QUERY PLAN
---------------------------------------------------------------------------------------
Bitmap Heap Scan on my_test (cost=12.90..24.33 rows=3 width=15)
Recheck Cond: ((id = 123) OR ((name)::text = '123'::text) OR (salary = 123.0))
-> BitmapOr (cost=12.90..12.90 rows=3 width=0)
-> Bitmap Index Scan on my_test_id_idx (cost=0.00..4.30 rows=1 width=0)
Index Cond: (id = 123)
-> Bitmap Index Scan on my_test_name_idx (cost=0.00..4.30 rows=1 width=0)
Index Cond: ((name)::text = '123'::text)
-> Bitmap Index Scan on my_test_salary_idx (cost=0.00..4.30 rows=1 width=0)
Index Cond: (salary = 123.0)
(9 rows)
如果OR子句涉及多张表,优化器只能将该OR子句视为连接后的过滤条件,这可能导致SQL执行效率降低。例如:
EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Hash Join (cost=299.00..660.50 rows=110 width=51) (actual time=5.992..15.673 rows=110 loops=1)
Hash Cond: (t1.id = t2.id)
Join Filter: ((t1.num = 1) OR (t2.cnt = 2))
Rows Removed by Join Filter: 9890
-> Seq Scan on t1 (cost=0.00..174.00 rows=10000 width=25) (actual time=0.012..2.080 rows=10000 loops=1)
-> Hash (cost=174.00..174.00 rows=10000 width=26) (actual time=5.855..5.857 rows=10000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 704kB
-> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.007..1.779 rows=10000 loops=1)
Planning Time: 1.237 ms
Execution Time: 15.836 ms
(10 rows)
上述OR子句被当作一个整体,优化器无法使用t1.num
或者t2.cnt
上的索引,导致t1
与t2
进行全表扫描。实际上,OR子句在逻辑上可以转化为UNION ALL,包含两个或多个分支的查询形式。例如,上述示例可以改写为:
EXPLAIN ANALYZE
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE t1.num = 1
UNION ALL
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE t2.cnt = 2 AND (t1.num != 1 OR (t1.num = 1) IS NULL);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Append (cost=85.48..412.26 rows=110 width=51) (actual time=0.350..4.832 rows=110 loops=1)
-> Hash Join (cost=85.48..297.98 rows=100 width=51) (actual time=0.349..4.653 rows=100 loops=1)
Hash Cond: (t2.id = t1.id)
-> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.009..1.719 rows=10000 loops=1)
-> Hash (cost=84.23..84.23 rows=100 width=25) (actual time=0.318..0.320 rows=100 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 14kB
-> Bitmap Heap Scan on t1 (cost=5.06..84.23 rows=100 width=25) (actual time=0.065..0.265 rows=100 loops=1)
Recheck Cond: (num = 1)
Heap Blocks: exact=73
-> Bitmap Index Scan on t1_num_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.037..0.037 rows=100 loops=1)
Index Cond: (num = 1)
-> Nested Loop (cost=4.65..112.63 rows=10 width=51) (actual time=0.049..0.159 rows=10 loops=1)
-> Bitmap Heap Scan on t2 t2_1 (cost=4.36..33.46 rows=10 width=26) (actual time=0.026..0.045 rows=10 loops=1)
Recheck Cond: (cnt = 2)
Heap Blocks: exact=10
-> Bitmap Index Scan on t2_cnt_idx (cost=0.00..4.36 rows=10 width=0) (actual time=0.017..0.018 rows=10 loops=1)
Index Cond: (cnt = 2)
-> Index Scan using t1_id_idx on t1 t1_1 (cost=0.29..7.91 rows=1 width=25) (actual time=0.009..0.009 rows=1 loops=10)
Index Cond: (id = t2_1.id)
Filter: ((num <> 1) OR ((num = 1) IS NULL))
Planning Time: 1.150 ms
Execution Time: 5.014 ms
(22 rows)
改写后优化器利用t1.num
和t2.cnt
上的索引,减少中间处理的数据量,从而提高执行性能。
PolarDB PostgreSQL版在生成计划期间,能够尝试将适当的OR子句转换为UNION ALL形式,随后基于代价进行路径选择,从而最终获得更优的执行计划。
前提条件
支持的PolarDB PostgreSQL版的版本如下:
PostgreSQL 14(内核小版本14.13.27.0及以上)
使用说明
OR子句转换为UNION ALL的能力受参数控制,相关的参数名称及其作用如下所示:
参数名称 | 描述 |
参数名称 | 描述 |
polar_cbqt_cost_threshold | 用于控制对SQL尝试进行OR转换的执行计划总代价阈值,如果 SQL 原始的执行计划总代价没有超过阈值,则不会进行OR转换。取值范围: 设置参数为0时,表示对任何SQL都尝试进行OR转换。不建议设置为0,可能会让所有SQL计划时间变长,影响性能。 |
polar_cbqt_convert_or_to_union_all_mode | 用于控制OR子句转换UNION ALL功能的开关,取值如下:
|
注意事项
设置
polar_cbqt_cost_threshold
为0,或者设置polar_cbqt_convert_or_to_union_all_mode
为FORCE,将会强制SQL运用优化。对于指定SQL强制OR子句转UNION ALL优化,建议使用HINT来设置,详情请参考使用HINT。如果OR子句中的条件仅涉及一张表,优化器将不会尝试对OR子句进行转换。
在强制模式下,如果SQL中存在多个合适的OR子句,将选择总代价最小的路径。
如果一个OR子句的参数超过6个,优化器将不会对该OR子句进行转换。
如果一个SQL的OR子句数量超过6个,将只对前6个OR子句进行转换选择。
示例
数据准备
CREATE TABLE t1(id int, num int, dsc text, log_date text);
CREATE TABLE t2(id int, cnt int, change text, op_date text);
INSERT INTO t1 SELECT i, i%100, 'test'||1, to_char('1990-10-10'::date + i, 'YYYY-MM-DD') FROM generate_series(1,10000)i;
INSERT INTO t2 SELECT i, i%1000, 'now'||i, to_char('1990-10-10'::date + i, 'YYYY-MM-DD') FROM generate_series(1,10000)i;
CREATE INDEX ON t1(id);
CREATE INDEX ON t1(num);
CREATE INDEX ON t2(id);
CREATE INDEX ON t2(cnt);
ANALYZE t1;
ANALYZE t2;
基本功能
关闭OR子句转换为UNION ALL功能
SET polar_cbqt_cost_threshold to 100; SET polar_cbqt_convert_or_to_union_all_mode to off; EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);
返回结果如下:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Hash Join (cost=299.00..660.50 rows=110 width=51) (actual time=5.992..15.673 rows=110 loops=1) Hash Cond: (t1.id = t2.id) Join Filter: ((t1.num = 1) OR (t2.cnt = 2)) Rows Removed by Join Filter: 9890 -> Seq Scan on t1 (cost=0.00..174.00 rows=10000 width=25) (actual time=0.012..2.080 rows=10000 loops=1) -> Hash (cost=174.00..174.00 rows=10000 width=26) (actual time=5.855..5.857 rows=10000 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 704kB -> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.007..1.779 rows=10000 loops=1) Planning Time: 1.237 ms Execution Time: 15.836 ms (10 rows)
开启OR子句转换为UNION ALL功能
SET polar_cbqt_cost_threshold to 100; SET polar_cbqt_convert_or_to_union_all_mode to on; EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);
返回结果如下:
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Append (cost=85.48..411.16 rows=110 width=51) (actual time=0.396..4.822 rows=110 loops=1) -> Hash Join (cost=85.48..297.98 rows=100 width=51) (actual time=0.395..4.639 rows=100 loops=1) Hash Cond: (t2.id = t1.id) -> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.010..1.750 rows=10000 loops=1) -> Hash (cost=84.23..84.23 rows=100 width=25) (actual time=0.333..0.335 rows=100 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 14kB -> Bitmap Heap Scan on t1 (cost=5.06..84.23 rows=100 width=25) (actual time=0.056..0.247 rows=100 loops=1) Recheck Cond: (num = 1) Heap Blocks: exact=73 -> Bitmap Index Scan on t1_num_idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.028..0.028 rows=100 loops=1) Index Cond: (num = 1) -> Nested Loop (cost=4.65..112.63 rows=10 width=51) (actual time=0.049..0.164 rows=10 loops=1) -> Bitmap Heap Scan on t2 (cost=4.36..33.46 rows=10 width=26) (actual time=0.027..0.044 rows=10 loops=1) Recheck Cond: (cnt = 2) Heap Blocks: exact=10 -> Bitmap Index Scan on t2_cnt_idx (cost=0.00..4.36 rows=10 width=0) (actual time=0.019..0.019 rows=10 loops=1) Index Cond: (cnt = 2) -> Index Scan using t1_id_idx on t1 (cost=0.29..7.91 rows=1 width=25) (actual time=0.010..0.010 rows=1 loops=10) Index Cond: (id = t2.id) Filter: ((num <> 1) OR ((num = 1) IS NULL)) Planning Time: 2.903 ms Execution Time: 4.980 ms (22 rows)
综上,开启OR子句转UNION ALL功能后,执行计划中可以利用t1.num
和t2.cnt
上的索引,达到和手动改写成UNION ALL相同的效果。
强制选择
开启OR子句转换为UNION ALL功能。
SET polar_cbqt_cost_threshold to 100; SET polar_cbqt_convert_or_to_union_all_mode to on; EXPLAIN ANALYZE SELECT * FROM t1, t2 WHERE t1.dsc= t2.change AND (t1.log_date = '2024-01-01' OR t2.op_date = '2024-01-01');
返回结果如下:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Hash Join (cost=299.00..660.50 rows=2 width=51) (actual time=14.321..14.325 rows=0 loops=1) Hash Cond: (t1.dsc = t2.change) Join Filter: ((t1.log_date = '2024-01-01'::text) OR (t2.op_date = '2024-01-01'::text)) -> Seq Scan on t1 (cost=0.00..174.00 rows=10000 width=25) (actual time=0.016..3.204 rows=10000 loops=1) -> Hash (cost=174.00..174.00 rows=10000 width=26) (actual time=6.506..6.508 rows=10000 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 704kB -> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.006..1.755 rows=10000 loops=1) Planning Time: 0.932 ms Execution Time: 14.571 ms (9 rows)
强制选择OR子句转换为UNION ALL功能。
SET polar_cbqt_cost_threshold to 100; SET polar_cbqt_convert_or_to_union_all_mode to force; EXPLAIN ANALYZE SELECT * FROM t1, t2 WHERE t1.dsc= t2.change AND (t1.log_date = '2024-01-01' OR t2.op_date = '2024-01-01');
返回结果如下:
QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Append (cost=199.01..871.05 rows=2 width=51) (actual time=9.915..9.923 rows=0 loops=1) -> Hash Join (cost=199.01..410.52 rows=1 width=51) (actual time=5.046..5.050 rows=0 loops=1) Hash Cond: (t2.change = t1.dsc) -> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.015..0.015 rows=1 loops=1) -> Hash (cost=199.00..199.00 rows=1 width=25) (actual time=5.014..5.016 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Seq Scan on t1 (cost=0.00..199.00 rows=1 width=25) (actual time=5.013..5.013 rows=0 loops=1) Filter: (log_date = '2024-01-01'::text) Rows Removed by Filter: 10000 -> Hash Join (cost=199.01..460.52 rows=1 width=51) (actual time=4.865..4.867 rows=0 loops=1) Hash Cond: (t1.dsc = t2.change) -> Seq Scan on t1 (cost=0.00..224.00 rows=9999 width=25) (actual time=0.015..0.016 rows=1 loops=1) Filter: ((log_date <> '2024-01-01'::text) OR ((log_date = '2024-01-01'::text) IS NULL)) -> Hash (cost=199.00..199.00 rows=1 width=26) (actual time=4.828..4.829 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Seq Scan on t2 (cost=0.00..199.00 rows=1 width=26) (actual time=4.827..4.827 rows=0 loops=1) Filter: (op_date = '2024-01-01'::text) Rows Removed by Filter: 10000 Planning Time: 0.777 ms Execution Time: 10.088 ms (20 rows)
综上,对比转换前后的执行计划总代价,可发现不进行OR子句转换的总代价更低。在非强制模式下,将按执行计划总代价阈值设置进行OR子句转换尝试,优化器不一定会选择经过OR子句转换后的路径。然而,在强制模式下,优化器则会强制选择经过OR子句转换后的路径。强制模式适用于优化器无法准确估算代价,且需要固定查询选择OR转换为UNION ALL的场景。
使用HINT
对于指定SQL强制OR子句转UNION ALL优化,也可以使用HINT功能进行SQL级别的功能控制。
SET polar_cbqt_convert_or_to_union_all_mode to off;
EXPLAIN ANALYZE /*+ Set(polar_cbqt_convert_or_to_union_all_mode force) Set(polar_cbqt_cost_threshold 0) */ SELECT * FROM t1, t2 WHERE t1.dsc= t2.change and (t1.log_date = '2024-01-01' or t2.op_date = '2024-01-01');
返回结果如下:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Append (cost=199.01..871.05 rows=2 width=51) (actual time=9.684..9.691 rows=0 loops=1)
-> Hash Join (cost=199.01..410.52 rows=1 width=51) (actual time=4.711..4.714 rows=0 loops=1)
Hash Cond: (t2.change = t1.dsc)
-> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.013..0.013 rows=1 loops=1)
-> Hash (cost=199.00..199.00 rows=1 width=25) (actual time=4.682..4.684 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on t1 (cost=0.00..199.00 rows=1 width=25) (actual time=4.681..4.681 rows=0 loops=1)
Filter: (log_date = '2024-01-01'::text)
Rows Removed by Filter: 10000
-> Hash Join (cost=199.01..460.52 rows=1 width=51) (actual time=4.969..4.970 rows=0 loops=1)
Hash Cond: (t1.dsc = t2.change)
-> Seq Scan on t1 (cost=0.00..224.00 rows=9999 width=25) (actual time=0.018..0.018 rows=1 loops=1)
Filter: ((log_date <> '2024-01-01'::text) OR ((log_date = '2024-01-01'::text) IS NULL))
-> Hash (cost=199.00..199.00 rows=1 width=26) (actual time=4.935..4.936 rows=0 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
-> Seq Scan on t2 (cost=0.00..199.00 rows=1 width=26) (actual time=4.934..4.934 rows=0 loops=1)
Filter: (op_date = '2024-01-01'::text)
Rows Removed by Filter: 10000
Planning Time: 0.798 ms
Execution Time: 9.858 ms
(20 rows)
转换阈值
在基本功能示例中,原始SQL执行计划的总cost
值为660.50。因此,我们将转换阈值设置为高于此值,并再次执行相同的SQL查询:
SET polar_cbqt_cost_threshold to 1000; -- 设置阈值
SET polar_cbqt_convert_or_to_union_all_mode to force; -- 强制选择转换后的路径
EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);
返回结果如下:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Hash Join (cost=299.00..660.50 rows=110 width=51) (actual time=6.374..15.802 rows=110 loops=1)
Hash Cond: (t1.id = t2.id)
Join Filter: ((t1.num = 1) OR (t2.cnt = 2))
Rows Removed by Join Filter: 9890
-> Seq Scan on t1 (cost=0.00..174.00 rows=10000 width=25) (actual time=0.011..2.038 rows=10000 loops=1)
-> Hash (cost=174.00..174.00 rows=10000 width=26) (actual time=6.266..6.268 rows=10000 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 704kB
-> Seq Scan on t2 (cost=0.00..174.00 rows=10000 width=26) (actual time=0.006..1.778 rows=10000 loops=1)
Planning Time: 0.663 ms
Execution Time: 16.036 ms
(10 rows)
可以看出,即使在强制模式下,优化器仍然选择了原始路径。这是由于原始计划的总代价未超过阈值,因此不会尝试进行OR子句的转化。
- 本页导读 (1)
- 背景
- 前提条件
- 使用说明
- 注意事项
- 示例
- 数据准备
- 基本功能
- 强制选择
- 使用HINT
- 转换阈值