OR子句转UNION ALL

更新时间:2025-01-13 03:19:48

PolarDB PostgreSQL支持OR子句转UNION ALL的查询优化功能,能够在生成计划期间尝试将合适的OR子句转换成UNION ALL形式,再进行基于代价的路径选择,得到更优的执行计划。

背景

目前,PostgreSQL优化器对SQL中的OR子句过滤条件的优化能力较为有限。如果OR子句中的过滤条件仅涉及一张表,且所有过滤条件上均具备适当的索引,则优化器会为此类场景生成一个BitmapOrIndex Path。例如:

EXPLAIN SELECT * FROM my_test WHERE (id = 123 OR name = '123' OR salary = 123.0);
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Bitmap Heap Scan on my_test  (cost=12.90..24.33 rows=3 width=15)
   Recheck Cond: ((id = 123) OR ((name)::text = '123'::text) OR (salary = 123.0))
   ->  BitmapOr  (cost=12.90..12.90 rows=3 width=0)
         ->  Bitmap Index Scan on my_test_id_idx  (cost=0.00..4.30 rows=1 width=0)
               Index Cond: (id = 123)
         ->  Bitmap Index Scan on my_test_name_idx  (cost=0.00..4.30 rows=1 width=0)
               Index Cond: ((name)::text = '123'::text)
         ->  Bitmap Index Scan on my_test_salary_idx  (cost=0.00..4.30 rows=1 width=0)
               Index Cond: (salary = 123.0)
(9 rows)

如果OR子句涉及多张表,优化器只能将该OR子句视为连接后的过滤条件,这可能导致SQL执行效率降低。例如:

EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=299.00..660.50 rows=110 width=51) (actual time=5.992..15.673 rows=110 loops=1)
   Hash Cond: (t1.id = t2.id)
   Join Filter: ((t1.num = 1) OR (t2.cnt = 2))
   Rows Removed by Join Filter: 9890
   ->  Seq Scan on t1  (cost=0.00..174.00 rows=10000 width=25) (actual time=0.012..2.080 rows=10000 loops=1)
   ->  Hash  (cost=174.00..174.00 rows=10000 width=26) (actual time=5.855..5.857 rows=10000 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 704kB
         ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.007..1.779 rows=10000 loops=1)
 Planning Time: 1.237 ms
 Execution Time: 15.836 ms
(10 rows)

上述OR子句被当作一个整体,优化器无法使用t1.num或者t2.cnt上的索引,导致t1t2进行全表扫描。实际上,OR子句在逻辑上可以转化为UNION ALL,包含两个或多个分支的查询形式。例如,上述示例可以改写为:

EXPLAIN ANALYZE
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE t1.num = 1
UNION ALL
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE t2.cnt = 2 AND (t1.num != 1 OR (t1.num = 1) IS NULL);
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=85.48..412.26 rows=110 width=51) (actual time=0.350..4.832 rows=110 loops=1)
   ->  Hash Join  (cost=85.48..297.98 rows=100 width=51) (actual time=0.349..4.653 rows=100 loops=1)
         Hash Cond: (t2.id = t1.id)
         ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.009..1.719 rows=10000 loops=1)
         ->  Hash  (cost=84.23..84.23 rows=100 width=25) (actual time=0.318..0.320 rows=100 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 14kB
               ->  Bitmap Heap Scan on t1  (cost=5.06..84.23 rows=100 width=25) (actual time=0.065..0.265 rows=100 loops=1)
                     Recheck Cond: (num = 1)
                     Heap Blocks: exact=73
                     ->  Bitmap Index Scan on t1_num_idx  (cost=0.00..5.04 rows=100 width=0) (actual time=0.037..0.037 rows=100 loops=1)
                           Index Cond: (num = 1)
   ->  Nested Loop  (cost=4.65..112.63 rows=10 width=51) (actual time=0.049..0.159 rows=10 loops=1)
         ->  Bitmap Heap Scan on t2 t2_1  (cost=4.36..33.46 rows=10 width=26) (actual time=0.026..0.045 rows=10 loops=1)
               Recheck Cond: (cnt = 2)
               Heap Blocks: exact=10
               ->  Bitmap Index Scan on t2_cnt_idx  (cost=0.00..4.36 rows=10 width=0) (actual time=0.017..0.018 rows=10 loops=1)
                     Index Cond: (cnt = 2)
         ->  Index Scan using t1_id_idx on t1 t1_1  (cost=0.29..7.91 rows=1 width=25) (actual time=0.009..0.009 rows=1 loops=10)
               Index Cond: (id = t2_1.id)
               Filter: ((num <> 1) OR ((num = 1) IS NULL))
 Planning Time: 1.150 ms
 Execution Time: 5.014 ms
(22 rows)

改写后优化器利用t1.numt2.cnt上的索引,减少中间处理的数据量,从而提高执行性能。

PolarDB PostgreSQL在生成计划期间,能够尝试将适当的OR子句转换为UNION ALL形式,随后基于代价进行路径选择,从而最终获得更优的执行计划。

前提条件

支持的PolarDB PostgreSQL的版本如下:

PostgreSQL 14(内核小版本14.13.27.0及以上)

说明

您可通过如下语句查看PolarDB PostgreSQL的内核小版本号:

SELECT version();

如需升级内核小版本,请参考升级版本

使用说明

OR子句转换为UNION ALL的能力受参数控制,相关的参数名称及其作用如下所示:

参数名称

描述

参数名称

描述

polar_cbqt_cost_threshold

用于控制对SQL尝试进行OR转换的执行计划总代价阈值,如果 SQL 原始的执行计划总代价没有超过阈值,则不会进行OR转换。取值范围:[0, +∞),默认值为50000。

设置参数为0时,表示对任何SQL都尝试进行OR转换。不建议设置为0,可能会让所有SQL计划时间变长,影响性能。

polar_cbqt_convert_or_to_union_all_mode

用于控制OR子句转换UNION ALL功能的开关,取值如下:

  • OFF(默认):关闭OR子句转UNION ALL功能。

  • ON:开启OR子句转UNION ALL功能。

  • FORCE:开启OR子句转UNION ALL功能,如果有转换后的路径,则强制选择转换后的路径(可能并不是总代价最低的路径)。

注意事项

  • 设置polar_cbqt_cost_threshold0,或者设置polar_cbqt_convert_or_to_union_all_modeFORCE,将会强制SQL运用优化。对于指定SQL强制OR子句转UNION ALL优化,建议使用HINT来设置,详情请参考使用HINT

  • 如果OR子句中的条件仅涉及一张表,优化器将不会尝试对OR子句进行转换。

  • 在强制模式下,如果SQL中存在多个合适的OR子句,将选择总代价最小的路径。

  • 如果一个OR子句的参数超过6个,优化器将不会对该OR子句进行转换。

  • 如果一个SQLOR子句数量超过6个,将只对前6OR子句进行转换选择。

示例

数据准备

CREATE TABLE t1(id int, num int, dsc text, log_date text);
CREATE TABLE t2(id int, cnt int, change text, op_date text);

INSERT INTO t1 SELECT i, i%100, 'test'||1, to_char('1990-10-10'::date + i, 'YYYY-MM-DD') FROM generate_series(1,10000)i;
INSERT INTO t2 SELECT i, i%1000, 'now'||i, to_char('1990-10-10'::date + i, 'YYYY-MM-DD') FROM generate_series(1,10000)i;

CREATE INDEX ON t1(id);
CREATE INDEX ON t1(num);
CREATE INDEX ON t2(id);
CREATE INDEX ON t2(cnt);

ANALYZE t1;
ANALYZE t2;

基本功能

  • 关闭OR子句转换为UNION ALL功能

    SET polar_cbqt_cost_threshold to 100;
    SET polar_cbqt_convert_or_to_union_all_mode to off;
    
    EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);

    返回结果如下:

                                                        QUERY PLAN
    -------------------------------------------------------------------------------------------------------------------
     Hash Join  (cost=299.00..660.50 rows=110 width=51) (actual time=5.992..15.673 rows=110 loops=1)
       Hash Cond: (t1.id = t2.id)
       Join Filter: ((t1.num = 1) OR (t2.cnt = 2))
       Rows Removed by Join Filter: 9890
       ->  Seq Scan on t1  (cost=0.00..174.00 rows=10000 width=25) (actual time=0.012..2.080 rows=10000 loops=1)
       ->  Hash  (cost=174.00..174.00 rows=10000 width=26) (actual time=5.855..5.857 rows=10000 loops=1)
             Buckets: 16384  Batches: 1  Memory Usage: 704kB
             ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.007..1.779 rows=10000 loops=1)
     Planning Time: 1.237 ms
     Execution Time: 15.836 ms
    (10 rows)
  • 开启OR子句转换为UNION ALL功能

    SET polar_cbqt_cost_threshold to 100;
    SET polar_cbqt_convert_or_to_union_all_mode to on;
    
    EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);

    返回结果如下:

                                                                   QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------------------------------
     Append  (cost=85.48..411.16 rows=110 width=51) (actual time=0.396..4.822 rows=110 loops=1)
       ->  Hash Join  (cost=85.48..297.98 rows=100 width=51) (actual time=0.395..4.639 rows=100 loops=1)
             Hash Cond: (t2.id = t1.id)
             ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.010..1.750 rows=10000 loops=1)
             ->  Hash  (cost=84.23..84.23 rows=100 width=25) (actual time=0.333..0.335 rows=100 loops=1)
                   Buckets: 1024  Batches: 1  Memory Usage: 14kB
                   ->  Bitmap Heap Scan on t1  (cost=5.06..84.23 rows=100 width=25) (actual time=0.056..0.247 rows=100 loops=1)
                         Recheck Cond: (num = 1)
                         Heap Blocks: exact=73
                         ->  Bitmap Index Scan on t1_num_idx  (cost=0.00..5.04 rows=100 width=0) (actual time=0.028..0.028 rows=100 loops=1)
                               Index Cond: (num = 1)
       ->  Nested Loop  (cost=4.65..112.63 rows=10 width=51) (actual time=0.049..0.164 rows=10 loops=1)
             ->  Bitmap Heap Scan on t2  (cost=4.36..33.46 rows=10 width=26) (actual time=0.027..0.044 rows=10 loops=1)
                   Recheck Cond: (cnt = 2)
                   Heap Blocks: exact=10
                   ->  Bitmap Index Scan on t2_cnt_idx  (cost=0.00..4.36 rows=10 width=0) (actual time=0.019..0.019 rows=10 loops=1)
                         Index Cond: (cnt = 2)
             ->  Index Scan using t1_id_idx on t1  (cost=0.29..7.91 rows=1 width=25) (actual time=0.010..0.010 rows=1 loops=10)
                   Index Cond: (id = t2.id)
                   Filter: ((num <> 1) OR ((num = 1) IS NULL))
     Planning Time: 2.903 ms
     Execution Time: 4.980 ms
    (22 rows)

综上,开启OR子句转UNION ALL功能后,执行计划中可以利用t1.numt2.cnt上的索引,达到和手动改写成UNION ALL相同的效果。

强制选择

  • 开启OR子句转换为UNION ALL功能。

    SET polar_cbqt_cost_threshold to 100;
    SET polar_cbqt_convert_or_to_union_all_mode to on;
    
    EXPLAIN ANALYZE SELECT * FROM t1, t2 WHERE t1.dsc= t2.change AND (t1.log_date = '2024-01-01' OR t2.op_date = '2024-01-01');

    返回结果如下:

                                                        QUERY PLAN
    -------------------------------------------------------------------------------------------------------------------
     Hash Join  (cost=299.00..660.50 rows=2 width=51) (actual time=14.321..14.325 rows=0 loops=1)
       Hash Cond: (t1.dsc = t2.change)
       Join Filter: ((t1.log_date = '2024-01-01'::text) OR (t2.op_date = '2024-01-01'::text))
       ->  Seq Scan on t1  (cost=0.00..174.00 rows=10000 width=25) (actual time=0.016..3.204 rows=10000 loops=1)
       ->  Hash  (cost=174.00..174.00 rows=10000 width=26) (actual time=6.506..6.508 rows=10000 loops=1)
             Buckets: 16384  Batches: 1  Memory Usage: 704kB
             ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.006..1.755 rows=10000 loops=1)
     Planning Time: 0.932 ms
     Execution Time: 14.571 ms
    (9 rows)
  • 强制选择OR子句转换为UNION ALL功能。

    SET polar_cbqt_cost_threshold to 100;
    SET polar_cbqt_convert_or_to_union_all_mode to force;
    
    EXPLAIN ANALYZE SELECT * FROM t1, t2 WHERE t1.dsc= t2.change AND (t1.log_date = '2024-01-01' OR t2.op_date = '2024-01-01');

    返回结果如下:

                                                       QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------
     Append  (cost=199.01..871.05 rows=2 width=51) (actual time=9.915..9.923 rows=0 loops=1)
       ->  Hash Join  (cost=199.01..410.52 rows=1 width=51) (actual time=5.046..5.050 rows=0 loops=1)
             Hash Cond: (t2.change = t1.dsc)
             ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.015..0.015 rows=1 loops=1)
             ->  Hash  (cost=199.00..199.00 rows=1 width=25) (actual time=5.014..5.016 rows=0 loops=1)
                   Buckets: 1024  Batches: 1  Memory Usage: 8kB
                   ->  Seq Scan on t1  (cost=0.00..199.00 rows=1 width=25) (actual time=5.013..5.013 rows=0 loops=1)
                         Filter: (log_date = '2024-01-01'::text)
                         Rows Removed by Filter: 10000
       ->  Hash Join  (cost=199.01..460.52 rows=1 width=51) (actual time=4.865..4.867 rows=0 loops=1)
             Hash Cond: (t1.dsc = t2.change)
             ->  Seq Scan on t1  (cost=0.00..224.00 rows=9999 width=25) (actual time=0.015..0.016 rows=1 loops=1)
                   Filter: ((log_date <> '2024-01-01'::text) OR ((log_date = '2024-01-01'::text) IS NULL))
             ->  Hash  (cost=199.00..199.00 rows=1 width=26) (actual time=4.828..4.829 rows=0 loops=1)
                   Buckets: 1024  Batches: 1  Memory Usage: 8kB
                   ->  Seq Scan on t2  (cost=0.00..199.00 rows=1 width=26) (actual time=4.827..4.827 rows=0 loops=1)
                         Filter: (op_date = '2024-01-01'::text)
                         Rows Removed by Filter: 10000
     Planning Time: 0.777 ms
     Execution Time: 10.088 ms
    (20 rows)

综上,对比转换前后的执行计划总代价,可发现不进行OR子句转换的总代价更低。在非强制模式下,将按执行计划总代价阈值设置进行OR子句转换尝试,优化器不一定会选择经过OR子句转换后的路径。然而,在强制模式下,优化器则会强制选择经过OR子句转换后的路径。强制模式适用于优化器无法准确估算代价,且需要固定查询选择OR转换为UNION ALL的场景。

使用HINT

对于指定SQL强制OR子句转UNION ALL优化,也可以使用HINT功能进行SQL级别的功能控制。

SET polar_cbqt_convert_or_to_union_all_mode to off;

EXPLAIN ANALYZE /*+ Set(polar_cbqt_convert_or_to_union_all_mode force) Set(polar_cbqt_cost_threshold 0) */ SELECT * FROM t1, t2 WHERE t1.dsc= t2.change and (t1.log_date = '2024-01-01' or t2.op_date = '2024-01-01');

返回结果如下:

                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Append  (cost=199.01..871.05 rows=2 width=51) (actual time=9.684..9.691 rows=0 loops=1)
   ->  Hash Join  (cost=199.01..410.52 rows=1 width=51) (actual time=4.711..4.714 rows=0 loops=1)
         Hash Cond: (t2.change = t1.dsc)
         ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.013..0.013 rows=1 loops=1)
         ->  Hash  (cost=199.00..199.00 rows=1 width=25) (actual time=4.682..4.684 rows=0 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 8kB
               ->  Seq Scan on t1  (cost=0.00..199.00 rows=1 width=25) (actual time=4.681..4.681 rows=0 loops=1)
                     Filter: (log_date = '2024-01-01'::text)
                     Rows Removed by Filter: 10000
   ->  Hash Join  (cost=199.01..460.52 rows=1 width=51) (actual time=4.969..4.970 rows=0 loops=1)
         Hash Cond: (t1.dsc = t2.change)
         ->  Seq Scan on t1  (cost=0.00..224.00 rows=9999 width=25) (actual time=0.018..0.018 rows=1 loops=1)
               Filter: ((log_date <> '2024-01-01'::text) OR ((log_date = '2024-01-01'::text) IS NULL))
         ->  Hash  (cost=199.00..199.00 rows=1 width=26) (actual time=4.935..4.936 rows=0 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 8kB
               ->  Seq Scan on t2  (cost=0.00..199.00 rows=1 width=26) (actual time=4.934..4.934 rows=0 loops=1)
                     Filter: (op_date = '2024-01-01'::text)
                     Rows Removed by Filter: 10000
 Planning Time: 0.798 ms
 Execution Time: 9.858 ms
(20 rows)

转换阈值

基本功能示例中,原始SQL执行计划的总cost值为660.50。因此,我们将转换阈值设置为高于此值,并再次执行相同的SQL查询:

SET polar_cbqt_cost_threshold to 1000; -- 设置阈值
SET polar_cbqt_convert_or_to_union_all_mode to force; -- 强制选择转换后的路径

EXPLAIN ANALYZE SELECT * FROM t1 JOIN t2 ON t1.id = t2.id WHERE (t1.num = 1 OR t2.cnt = 2);

返回结果如下:

                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=299.00..660.50 rows=110 width=51) (actual time=6.374..15.802 rows=110 loops=1)
   Hash Cond: (t1.id = t2.id)
   Join Filter: ((t1.num = 1) OR (t2.cnt = 2))
   Rows Removed by Join Filter: 9890
   ->  Seq Scan on t1  (cost=0.00..174.00 rows=10000 width=25) (actual time=0.011..2.038 rows=10000 loops=1)
   ->  Hash  (cost=174.00..174.00 rows=10000 width=26) (actual time=6.266..6.268 rows=10000 loops=1)
         Buckets: 16384  Batches: 1  Memory Usage: 704kB
         ->  Seq Scan on t2  (cost=0.00..174.00 rows=10000 width=26) (actual time=0.006..1.778 rows=10000 loops=1)
 Planning Time: 0.663 ms
 Execution Time: 16.036 ms
(10 rows)

可以看出,即使在强制模式下,优化器仍然选择了原始路径。这是由于原始计划的总代价未超过阈值,因此不会尝试进行OR子句的转化。

  • 本页导读 (1)
  • 背景
  • 前提条件
  • 使用说明
  • 注意事项
  • 示例
  • 数据准备
  • 基本功能
  • 强制选择
  • 使用HINT
  • 转换阈值
AI助理

点击开启售前

在线咨询服务

你好,我是AI助理

可以解答问题、推荐解决方案等