PolarDB支持条件下推(WHERE
子句下推至IN
子查询)功能,可以优化复杂查询,从而显著提升了查询性能(嵌套子查询效果最好)。
版本限制
适用的数据库引擎版本如下,如何查看内核版本,请参见查询版本号。
MySQL 8.0.1,且小版本需为8.0.1.1.42及以上版本。
MySQL 8.0.2,且小版本需为8.0.2.19及以上版本。
使用场景
在SQL查询语句中,与GROUP BY
结合的子查询通常会被物化,例如子查询是IN
子查询时:(FIELD[1],...) IN (SELECT col[1], ... FROM ... GROUP BY ...)
,并且该子查询是主查询WHERE
条件中AND
子句的一部分,例如:COND(FIELD[1])... AND (FIELD[1],...) IN (SELECT col[1], ... FROM ... GROUP BY ...)
,结果集的每一行都满足FIELD[1] = col[1]
都成立,所以条件COND(FIELD[1])
可以被下推到子查询中。
使用限制
如果子查询(IN SUBQUERY)有
LIMIT
限制,则不支持该条件下推,例如:SELECT * FROM t WHERE a > 1 AND (a, b) IN (SELECT c, MAX(d) FROM t2 GROUP BY c LIMIT 2);
外层
WHERE
条件表达式的列和映射到物化表对应的列,存在以下情况则不支持该条件下推:该列引用了子查询或者是非确定性的(相同的输入条件下,可能会产生不同的结果),例如:
SELECT * FROM t WHERE a > 5*RAND() AND (a, b) IN (SELECT c, MAX(d) FROM t2 GROUP BY c);
该列是存储过程或者存储函数(一般存储函数较为复杂,重复执行有可能产生负收益,因此不考虑这种情况的下推),例如:
CREATE FUNCTION f1() RETURNS INT BEGIN ... END; SELECT * FROM t WHERE a > f1() AND (a, b) IN (SELECT c, MAX(d) FROM t2 GROUP BY c);
使用方法
准备工作
在使用该条件下推之前,需根据您的业务需求调整loose_subquery_cond_pushdown_mode
参数。具体操作,请参见设置集群参数和节点参数。
参数说明如下:
参数名称 | 级别 | 说明 |
loose_subquery_cond_pushdown_mode | Global | 从
|
也可以通过OPTIMIZE HINT(SUBQUERY_CONDITION_PUSHDOWN
或NO_SUBQUERY_CONDITION_PUSHDOWN
)直接控制是否下推到某个子查询(无需配置loose_subquery_cond_pushdown_mode
)。
示例如下:
CREATE TABLE t1 (a INT, b INT, c INT, d INT);
CREATE TABLE t2 (e INT, f INT, g INT);
--该查询语句分别表示当前query block 禁止将条件下推到子查询@suq1中--
SELECT /*+ NO_SUBQUERY_CONDITION_PUSHDOWN(@subq1) */ * FROM t1
WHERE t1.c<25 AND
(t1.a,t1.c) IN (SELECT /*+ QB_NAME(subq1) */ t2.e,MAX(t2.g) FROM t2 WHERE t2.e<5 GROUP BY t2.e);
--该查询语句分别表示当前query block 会将符合规则的条件下推到子查询@suq1中--
SELECT /*+ SUBQUERY_CONDITION_PUSHDOWN(@subq1) */ * FROM t1
WHERE t1.c<25 AND
(t1.a,t1.c) IN (SELECT /*+ QB_NAME(subq1) */ t2.e,MAX(t2.g) FROM t2 WHERE t2.e<5 GROUP BY t2.e);
示例1
开启从WHERE
子句下推至IN
子查询的条件下推功能前后,都执行如下代码,查看执行计划:
EXPLAIN FORMAT=TREE SELECT * FROM t1
WHERE t1.c<25 AND
(t1.a,t1.c) IN (SELECT t2.e,MAX(t2.g) FROM t2 WHERE t2.e<5 GROUP BY t2.e);
返回结果如下:
-- 开启loose_subquery_cond_pushdown_mode前
-> Filter: ((t1.c < 25) and <in_optimizer>((t1.a,t1.c),(t1.a,t1.c) in (select #2))) (cost=*** rows=***)
-> Table scan on t1 (cost=*** rows=***)
-> Select #2 (subquery in condition; run only once)
-> Filter: ((t1.a = `<materialized_subquery>`.e) and (t1.c = `<materialized_subquery>`.`MAX(t2.g)`))
-> Limit: 1 row(s)
-> Index lookup on <materialized_subquery> using <auto_distinct_key> (e=t1.a, MAX(t2.g)=t1.c)
-> Materialize with deduplication
-> Table scan on <temporary>
-> Aggregate using temporary table
-> Filter: (t2.e < 5) (cost=*** rows=***)
-> Table scan on t2 (cost=*** rows=***)
-- 开启loose_subquery_cond_pushdown_mode后
-> Filter: ((t1.c < 25) and <in_optimizer>((t1.a,t1.c),(t1.a,t1.c) in (select #2))) (cost=*** rows=***)
-> Table scan on t1 (cost=*** rows=***)
-> Select #2 (subquery in condition; run only once)
-> Filter: ((t1.a = `<materialized_subquery>`.e) and (t1.c = `<materialized_subquery>`.`MAX(t2.g)`))
-> Limit: 1 row(s)
-> Index lookup on <materialized_subquery> using <auto_distinct_key> (e=t1.a, MAX(t2.g)=t1.c)
-> Materialize with deduplication
-> Filter: (max(t2.g) < 25)
-> Table scan on <temporary>
-> Aggregate using temporary table
-> Filter: (t2.e < 5) (cost=*** rows=***)
-> Table scan on t2 (cost=*** rows=***)
分析SQL语义可以得出,查询结果表t1
的结果集必然满足t1.c
与子查询中的投影列MAX(t2.g)
是等值关系,因此根据主查询WHERE
条件中的t1.c < 25
,子查询也必然满足MAX(t2.g)<25
,通过开启WHERE
条件下推子查询的功能,优化器可以将条件t1.c<25
下推至子查询中,又因为GROUP BY
的列不包含g
列,因此只能下推到子查询的HAVING
子句上。
示例2
开启从WHERE
子句下推至IN
子查询的条件下推功能前后,都执行如下代码,查看执行计划:
EXPLAIN FORMAT=TREE SELECT * FROM t1
WHERE ((t1.a<2 OR t1.a=5) AND t1.b>3) AND
(t1.a,t1.b,t1.c) IN
(
SELECT t2.e,t2.f,MAX(t2.g)
FROM t2
WHERE t2.e<5
GROUP BY t2.e,t2.f
);
返回结果如下:
-- 开启loose_subquery_cond_pushdown_mode前
-> Filter: ((t1.b > 3) and ((t1.a < 2) or (t1.a = 5)) and <in_optimizer>((t1.a,t1.b,t1.c),(t1.a,t1.b,t1.c) in (select #2))) (cost=*** rows=***)
-> Table scan on t1 (cost=*** rows=***)
-> Select #2 (subquery in condition; run only once)
-> Filter: ((t1.a = `<materialized_subquery>`.e) and (t1.b = `<materialized_subquery>`.f) and (t1.c = `<materialized_subquery>`.`MAX(t2.g)`))
-> Limit: 1 row(s)
-> Index lookup on <materialized_subquery> using <auto_distinct_key> (e=t1.a, f=t1.b, MAX(t2.g)=t1.c)
-> Materialize with deduplication
-> Table scan on <temporary>
-> Aggregate using temporary table
-> Filter: (t2.e < 5) (cost=*** rows=***)
-> Table scan on t2 (cost=*** rows=***)
-- 开启loose_subquery_cond_pushdown_mode后
-> Filter: ((t1.b > 3) and ((t1.a < 2) or (t1.a = 5)) and <in_optimizer>((t1.a,t1.b,t1.c),(t1.a,t1.b,t1.c) in (select #2))) (cost=*** rows=***)
-> Table scan on t1 (cost=*** rows=***)
-> Select #2 (subquery in condition; run only once)
-> Filter: ((t1.a = `<materialized_subquery>`.e) and (t1.b = `<materialized_subquery>`.f) and (t1.c = `<materialized_subquery>`.`MAX(t2.g)`))
-> Limit: 1 row(s)
-> Index lookup on <materialized_subquery> using <auto_distinct_key> (e=t1.a, f=t1.b, MAX(t2.g)=t1.c)
-> Materialize with deduplication
-> Table scan on <temporary>
-> Aggregate using temporary table
-> Filter: ((t2.e < 5) and (t2.f > 3) and ((t2.e < 2) or (t2.e = 5))) (cost=*** rows=***)
-> Table scan on t2 (cost=*** rows=***)
SQL查询的主查询中WHERE
条件稍微复杂一些,但通过WHERE
条件下推子查询的原理分析,可以发现主查询表t1
中的列t1.a,t1.b,t1.c
分别等价于子查询中的投影列t2.e,t2.f,MAX(t2.g)
,因此,对于AND
连接的主查询的WHERE
条件((t1.a<2 OR t1.a=5) AND t1.b>3)
,可以推导出子查询中对应列应满足条件((t2.e<2 OR t2.e=5) AND t2.f.b>3)
。开启WHERE
条件下推子查询的功能之后,优化器可以将WHERE
条件((t1.a<2 OR t1.a=5) AND t1.b>3
下推至子查询中,分析子查询GROUP BY
的列包括t2.e,t2.f
,因此下推至子查询的WHERE
子句上。