当MySQL优化器处理包含OR/IN的复杂查询(特别是多表JOIN)时,可能无法有效利用索引,导致全表扫描,影响查询性能。PolarDB MySQL版的查询改写优化功能,通过将符合条件的OR/IN表达式改写为UNION ALL结构,并基于代价选择最优执行路径,从而让查询能够充分利用索引,显著提升执行性能。
工作原理
在MySQL中,优化器对OR子句的处理能力有限。特别是当OR条件涉及多个表时,优化器往往只能将其作为连接后的过滤条件,无法有效利用单个条件上的索引,最终可能退化为全表扫描,导致查询性能急剧下降。
例如,对于以下查询,优化器无法使用t1.b或t3.c1列上的索引,只能执行全表扫描和哈希连接,效率很低。
-- 优化前,执行计划为全表扫描,耗时较长
EXPLAIN ANALYZE SELECT * FROM t1,t3 WHERE t3.c1 > 98 OR t1.b <= 0;
-> Filter: ((t3.c1 > 98) or (t1.b <= 0)) ... (actual time=115.259..5416.434 ...)
-> Inner hash join ...
-> Table scan on t3 ...
-> Hash
-> Table scan on t1 ...从逻辑上讲,这个OR查询等价于将两个独立查询的结果合并,即UNION ALL。如果手动改写,查询可以利用各自的索引,性能会大幅提升。
-- 手动改写为 UNION ALL,执行计划可以使用索引,耗时显著缩短
EXPLAIN ANALYZE
SELECT * FROM t1 ,t3 WHERE t1.b <= 0
UNION ALL
SELECT * FROM t1,t3 WHERE t3.c1 > 98 AND (t1.b > 0 OR (t1.b <= 0) IS NULL);
-> Append (actual time=58.272..302.546 ...)
...
-> Index range scan on t3 using idx_c1 ...PolarDB的OR/IN转UNION ALL功能正是将上述手动优化过程自动化。优化器会在计划生成阶段评估OR改写为UNION ALL的潜在收益,并与原始执行计划进行代价对比,最终选择成本更低的方案来执行,从而在不修改SQL的情况下实现查询加速。
适用范围
产品系列:集群版、标准版。
内核版本:MySQL 8.0.2,且修订版本需为8.0.2.2.32及以上版本。
当前功能处于灰度阶段,在只读(RO)节点上默认开启,读写(RW)节点上需要额外设置。如您有相关需求,请提交工单联系我们处理。
开启与配置查询改写优化
通过设置相关参数来控制此优化功能的行为。
PolarDB集群参数在控制台与会话中修改方式存在差异,详细区别如下:
在PolarDB控制台上修改:
兼容性说明:部分集群参数在PolarDB控制台上均已添加MySQL配置文件的兼容性前缀loose_。
操作方法:找到并修改这些带
loose_前缀的参数。
在数据库会话中修改(使用命令行或客户端):
操作方法:当您连接到数据库,使用
SET命令修改参数时,请去掉loose_前缀,直接使用参数的原始名称进行修改。
参数名称 | 级别 | 描述与建议 |
| Global/Session | 控制该功能的主开关。
|
| Global/Session | 控制优化的触发阈值,只有当原始查询的估算代价(可通过 取值范围:0~18446744073709551615。 默认值:100000。 说明 建议保持默认值。若设置为 |
使用限制
该功能仅在满足以下所有条件时触发:
通用限制:
OR子句或IN-LIST中的参数个数不能超过10个。查询块(Query Block)中不能包含子查询、
GROUP BY、窗口函数、DISTINCT或聚合函数。
通用
UNION ALL转换(主要针对多表JOIN):OR子句:OR条件必须涉及2张及以上的表。OR子句均采用field=const模式,或均能有效利用索引。field=const模式:field是指表中的某个列,const是指一个常量值。有效利用索引:例如,
t1.f1=t2.f2中f1是t1某个索引前缀且f2是t2某个索引前缀。
IN-LIST:无需转换为UNION ALL,因为range访问方式更优。
Top-K转换(主要针对单表ORDER BY...LIMIT):OR子句:OR条件必须作用于同一列,且该列与ORDER BY排序列必须是同一个索引的前缀。例如,索引为(c2, c3),查询为WHERE c2=... OR c2=... ORDER BY c3。IN-LIST:IN-LIST的左表达式列与ORDER BY排序列必须是同一个索引的前缀。
示例:验证优化效果
数据准备
-- 创建并填充表t1
CREATE TABLE `t1` (
`a` int(11) DEFAULT NULL,
`b` int(11) DEFAULT NULL,
KEY `idx_a` (`a`)
) ENGINE=InnoDB;
-- 插入数据
INSERT INTO `t1` VALUES (1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10);
-- 重复执行以增加数据量
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
INSERT INTO t1 SELECT * FROM t1;
-- 创建并填充表t3
CREATE TABLE `t3` (
`c1` int(11) NOT NULL,
`c2` int(11) DEFAULT NULL,
`c3` int(11) DEFAULT NULL,
`c4` int(11) DEFAULT NULL,
KEY `idx_c1`(`c1`),
KEY `idx_c2_c3` (`c2`,`c3`)
) ENGINE=InnoDB;
-- 插入大量数据
INSERT INTO `t3` VALUES (1,0,1,0),(2,0,2,0),(3,0,3,0),(4,0,4,0),(5,0,5,0),(6,0,6,0),(7,0,7,0),(8,0,8,0),(9,0,9,0),(10,0,10,0),(11,0,11,0),(12,0,12,0),(13,0,13,0),(14,0,14,0),(15,0,15,0),(16,0,16,0),(17,0,17,0),(18,0,18,0),(19,0,19,0),(20,0,20,0),(21,0,21,0),(22,0,22,0),(23,0,23,0),(24,0,24,0),(25,1,25,0),(26,1,26,0),(27,1,27,0),(28,1,28,0),(29,1,29,0),(30,1,30,0),(31,1,31,0),(32,1,32,0),(33,1,33,0),(34,1,34,0),(35,1,35,0),(36,1,36,0),(37,1,37,0),(38,1,38,0),(39,1,39,0),(40,1,40,0),(41,1,41,0),(42,1,42,0),(43,1,43,0),(44,1,44,0),(45,1,45,0),(46,1,46,0),(47,1,47,0),(48,1,48,0),(49,1,49,0),(50,1,50,1),(51,1,51,1),(52,1,52,1),(53,1,53,1),(54,1,54,1),(55,1,55,1),(56,1,56,1),(57,1,57,1),(58,1,58,1),(59,1,59,1),(60,1,60,1),(61,1,61,1),(62,1,62,1),(63,1,63,1),(64,1,64,1),(65,1,65,1),(66,1,66,1),(67,1,67,1),(68,1,68,1),(69,1,69,1),(70,1,70,1),(71,1,71,1),(72,1,72,1),(73,1,73,1),(74,1,74,1),(75,2,75,1),(76,2,76,1),(77,2,77,1),(78,2,78,1),(79,2,79,1),(80,2,80,1),(81,2,81,1),(82,2,82,1),(83,2,83,1),(84,2,84,1),(85,2,85,1),(86,2,86,1),(87,2,87,1),(88,2,88,1),(89,2,89,1),(90,2,90,1),(91,2,91,1),(92,2,92,1),(93,2,93,1),(94,2,94,1),(95,2,95,1),(96,2,96,1),(97,2,97,1),(98,2,98,1),(99,2,99,1),(100,2,100,1);
-- 重复执行以增加数据量
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
INSERT INTO t3 SELECT * FROM t3;
-- 分析表
ANALYZE TABLE t1, t3;场景一:优化多表JOIN查询
此场景展示了OR条件跨越两张表时,优化器如何通过改写来利用索引。
关闭优化功能,观察原始执行计划。
-- 关闭优化功能 SET polar_optimizer_switch='or_expansion=off'; -- 分析语句 DESC SELECT * FROM t1,t3 WHERE t3.c1 >98 OR t1.a<5;结果分析:执行计划显示为
Hash Join,对t1和t3进行了全表扫描。优化器未能利用t1.a和t3.c1上的索引。+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+ | 1 | SIMPLE | t1 | NULL | ALL | idx_a | NULL | NULL | NULL | 1280 | 100.00 | NULL | | 1 | SIMPLE | t3 | NULL | ALL | idx_c1 | NULL | NULL | NULL | 6591 | 100.00 | Using where; Using join buffer (hash join) | +----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------------------+开启优化功能,查看改写后的执行计划。
-- 开启优化功能 SET polar_optimizer_switch='or_expansion=on'; -- 将阈值调低以便触发优化 SET cbqt_cost_threshold=1; -- 分析语句 DESC SELECT * FROM t1,t3 WHERE t3.c1 >98 OR t1.a<5;结果分析:执行计划已调整为
UNION ALL,这使得能够利用t1.a和t3.c1上的索引,从而实现与手动改写为UNION ALL相同的效果。+----+-------------+-------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------------------------+ | 1 | PRIMARY | t1 | NULL | range | idx_a | idx_a | 5 | NULL | 256 | 100.00 | Using index condition; Using MRR | | 1 | PRIMARY | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 6400 | 100.00 | Using join buffer (hash join) | | 2 | UNION | t3 | NULL | range | idx_c1 | idx_c1 | 4 | NULL | 128 | 100.00 | Using index condition; Using MRR | | 2 | UNION | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 640 | 66.67 | Using where; Using join buffer (hash join) | +----+-------------+-------+------------+-------+---------------+--------+---------+------+------+----------+--------------------------------------------+
场景二:优化Top-K查询(OR子句)
此场景展示了对单表ORDER BY ... LIMIT查询,优化器如何将OR条件改写为UNION ALL并下推LIMIT,从而避免大范围排序。
关闭优化功能,观察原始执行计划。
-- 关闭优化功能 SET polar_optimizer_switch='or_expansion=off'; -- 分析语句 DESC ANALYZE SELECT c2 FROM t3 WHERE (c2 = 2 OR c2= 0 ) ORDER BY t3.c3 DESC LIMIT 5;结果分析:执行计划通过
Index range scan获取所有满足c2=2或c2=0的行后进行Sort操作,耗时约200毫秒。+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Limit: 5 row(s) (actual time=193.389..193.393 rows=5 loops=1) -> Sort: t3.c3 DESC, limit input to 5 row(s) per chunk (cost=641.82 rows=3200) (actual time=193.386..193.388 rows=5 loops=1) -> Index range scan on t3 using idx_c2_c3, with index condition: ((t3.c2 = 2) or (t3.c2 = 0)) (actual time=0.348..187.455 rows=3200 loops=1) | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.20 sec)开启优化功能,查看改写后的执行计划。
-- 关闭优化功能 SET polar_optimizer_switch='or_expansion=on'; -- 将阈值调低以便触发优化 SET cbqt_cost_threshold=1; -- 分析语句 DESC ANALYZE SELECT c2 FROM t3 WHERE (c2 = 2 OR c2= 0 ) ORDER BY t3.c3 DESC LIMIT 5;结果分析:执行计划变为
UNION ALL,并对每个分支(c2=2和c2=0)分别执行Index lookup并应用LIMIT 5。优化器合并两个已排序的5行结果集,无需全局排序,执行耗时降至约1毫秒。+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Limit: 5 row(s) (actual time=1.249..1.254 rows=5 loops=1) -> Sort: derived_1_2.Name_exp_1 DESC, limit input to 5 row(s) per chunk (actual time=0.104..0.106 rows=5 loops=1) -> Table scan on derived_1_2 (actual time=0.006..0.013 rows=10 loops=1) -> Union materialize (actual time=1.246..1.249 rows=5 loops=1) -> Limit: 5 row(s) (actual time=0.336..0.571 rows=5 loops=1) -> Index lookup on t3 using idx_c2_c3 (c2=2; iterate backwards) (cost=0.00 rows=5) (actual time=0.333..0.566 rows=5 loops=1) -> Limit: 5 row(s) (actual time=0.215..0.431 rows=5 loops=1) -> Index lookup on t3 using idx_c2_c3 (c2=0; iterate backwards) (cost=0.00 rows=5) (actual time=0.214..0.427 rows=5 loops=1) | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec)
场景三:优化Top-K查询(IN-LIST)
IN-LIST在逻辑上等同于OR,因此也支持Top-K优化。
关闭优化功能,观察原始执行计划。
-- 关闭优化功能 SET polar_optimizer_switch='or_expansion=off'; -- 分析语句 DESC ANALYZE SELECT c2 FROM t3 WHERE c2 IN (2, 0) ORDER BY t3.c3 DESC LIMIT 5;结果分析:执行计划通过
Index range scan获取所有满足t3.c2 in (2,0)的行后进行Sort操作,耗时约200毫秒。+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Limit: 5 row(s) (actual time=197.497..197.501 rows=5 loops=1) -> Sort: t3.c3 DESC, limit input to 5 row(s) per chunk (cost=641.82 rows=3200) (actual time=197.494..197.496 rows=5 loops=1) -> Index range scan on t3 using idx_c2_c3, with index condition: (t3.c2 in (2,0)) (actual time=0.319..191.560 rows=3200 loops=1) | +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.20 sec)开启优化功能,查看改写后的执行计划。
-- 关闭优化功能 SET polar_optimizer_switch='or_expansion=on'; -- 将阈值调低以便触发优化 SET cbqt_cost_threshold=1; -- 分析语句 DESC ANALYZE SELECT c2 FROM t3 WHERE c2 IN (2, 0) ORDER BY t3.c3 DESC LIMIT 5;结果分析:执行计划变为
UNION ALL,并对每个分支(c2=2和c2=0)分别执行Index lookup并应用LIMIT 5。优化器合并两个已排序的5行结果集,无需全局排序。+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Limit: 5 row(s) (actual time=1.256..1.260 rows=5 loops=1) -> Sort: derived_1_2.Name_exp_1 DESC, limit input to 5 row(s) per chunk (actual time=0.090..0.093 rows=5 loops=1) -> Table scan on derived_1_2 (actual time=0.005..0.012 rows=10 loops=1) -> Union materialize (actual time=1.252..1.255 rows=5 loops=1) -> Limit: 5 row(s) (actual time=0.259..0.545 rows=5 loops=1) -> Index lookup on t3 using idx_c2_c3 (c2=2; iterate backwards) (cost=0.00 rows=5) (actual time=0.256..0.540 rows=5 loops=1) -> Limit: 5 row(s) (actual time=0.237..0.455 rows=5 loops=1) -> Index lookup on t3 using idx_c2_c3 (c2=0; iterate backwards) (cost=0.00 rows=5) (actual time=0.236..0.451 rows=5 loops=1) +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)
使用HINT进行手动干预
在特定场景下,可使用HINT控制单个查询是否启用此优化。
NO_OR_EXPAND(@QB_NAME):强制对指定查询块禁用OR展开优化。DESC SELECT /*+NO_OR_EXPAND(@subq1) */ * FROM t1 WHERE EXISTS (SELECT /*+ QB_NAME(subq1) */ 1 FROM t3 WHERE (t1.a = 1 OR t1.b = 2) AND t3.c1 < 5 AND t1.b = t3.c1); +----+-------------+-------+------------+------+---------------+--------+---------+------------+------+----------+-----------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+--------+---------+------------+------+----------+-----------------------------+ | 1 | SIMPLE | t1 | NULL | ALL | idx_a | NULL | NULL | NULL | 640 | 19.00 | Using where | | 1 | SIMPLE | t3 | NULL | ref | idx_c1 | idx_c1 | 4 | test2.t1.b | 64 | 100.00 | Using index; FirstMatch(t1) | +----+-------------+-------+------------+------+---------------+--------+---------+------------+------+----------+-----------------------------+如果
WHERE子句中存在多个OR表达式,可以使用OR_EXPAND(@QB_NAME idx)强制将特定表达式转换为UNION ALL。其中,idx表示该表达式在WHERE子句中所处的位置,索引从0开始。示例中,将表达式(t3.c2 = 1 OR t1.b = 2)展开为UNION ALL形式。DESC format=tree SELECT /*+OR_EXPAND(@subq1 3) */ * FROM t1 WHERE EXISTS (SELECT /*+ QB_NAME(subq1) */ 1 FROM t3 WHERE (t3.c2 = 999 OR t1.b = 999) AND t3.c1 < 5 AND t1.b = t3.c1 AND (t3.c2= 1 OR t1.b = 2)); +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Filter: exists(select #2) (cost=64.75 rows=640) -> Table scan on t1 (cost=64.75 rows=640) -> Select #2 (subquery in condition; dependent) -> Limit: 1 row(s) -> Append -> Stream results -> Filter: (t3.c2 = 1) (cost=17.45 rows=32) -> Index lookup on t3 using idx_c1 (c1=t1.b), with index condition: ((t1.b = 999) and (t3.c1 < 5)) (cost=17.45 rows=64) -> Stream results -> Filter: (t3.c1 = 2) (cost=0.51 rows=0) -> Index lookup on t3 using idx_c2_c3 (c2=999), with index condition: ((t1.b = 2) and lnnvl((t3.c2 = 1))) (cost=0.51 rows=1)OR_EXPAND(@QB_NAME):强制对指定查询块(qb_name)启用OR展开优化。DESC SELECT /*+OR_EXPAND(@subq1) */ * FROM t1 WHERE EXISTS (SELECT /*+ QB_NAME(subq1) */ 1 FROM t3 WHERE (t3.c1 = 1 OR t1.b = 2) AND t3.c1 < 5 AND t1.b = t3.c1); +----+--------------------+-------+------------+------+---------------+--------+---------+-------+------+----------+------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-------+------------+------+---------------+--------+---------+-------+------+----------+------------------------------------+ | 1 | PRIMARY | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 640 | 100.00 | Using where | | 2 | DEPENDENT SUBQUERY | t3 | NULL | ref | idx_c1 | idx_c1 | 4 | const | 64 | 100.00 | Using where; Using index | | 3 | DEPENDENT UNION | t3 | NULL | ref | idx_c1 | idx_c1 | 4 | const | 64 | 100.00 | Using index condition; Using index | +----+--------------------+-------+------------+------+---------------+--------+---------+-------+------+----------+------------------------------------+