如何发现问题
自上而下,梳理痛点
从上向下梳理计划,查看时间到底花在了什么算子上面,然后针对具体算子深入分析。
查看代价,对比行数
查看比较代价估算的异常(特别小或特别大),对比估算行数和实际执行行数,找到代价估算和行数估算的问题。
耗时算子,尽量避免
AP场景很少需要NestLoop、Sort+GroupByAgg,遇到它们需要谨慎。
具体算子,是否合理
Motion:是否有不必要的Motion?是否可以优化分布键?是否可以使用复制表?
Join:内外表顺序是否合理?
Scan:是否可以使用索引?是否可以使用分区表?
内存信息,调整参数
查看下盘情况,分析后适当调整statement_mem参数。
举例分析
Update (cost=0.00..1274.11 rows=1 width=1) (actual time=995096.707..1517097.191 rows=245136 loops=1)
Executor Memory: 1kB Segments: 96 Max: 1kB (segment 0)
-> Partition Selector for t2 (cost=0.00..1274.04 rows=1 width=842) (actual time=995096.480..1514408.806 rows=245136 loops=1)
-> Redistribute Motion 96:96 (slice2; segments: 96) (cost=0.00..1274.04 rows=1 width=838) (actual time=995096.440..1513830.155 rows=245136 loops=1)
Hash Key: t2.c1, t2.c2
-> Split (cost=0.00..1274.04 rows=1 width=838) (actual time=995080.103..1496878.037 rows=245136 loops=1)
Executor Memory: 1kB Segments: 96 Max: 1kB (segment 0)
-> Hash Join (cost=0.00..1274.04 rows=1 width=1484) (actual time=995080.071..1496625.817 rows=122568 loops=1)
Hash Cond: ((t1.c1)::text = (t2.c1)::text)
Executor Memory: 33535270kB Segments: 96 Max: 349326kB (segment 33)
work_mem: 33535270kB Segments: 96 Max: 349326kB (segment 33) Workfile: (96 spilling)
Work_mem wanted: 26684983K bytes avg, 26684983K bytes max (seg0) to lessen workfile I/O affecting 96 workers.
-> Seq Scan on t1 (cost=0.00..672.28 rows=121412 width=736) (actual time=672.771..1039.167 rows=122568 loops=1)
Filter: ((t1.c2 = '2019-05-17'::date) AND ((t1.c3)::text = '0'::text))
-> Hash (cost=431.00..431.00 rows=1 width=762) (actual time=994417.443..994417.443 rows=34583155 loops=1)
-> Broadcast Motion 96:96 (slice1; segments: 96) (cost=0.00..431.00 rows=1 width=762) (actual time=25.562..912862.203 rows=34583155 loops=1)
-> Sequence (cost=0.00..431.00 rows=1 width=762) (actual time=34.475..4822.173 rows=361460 loops=1)
-> Partition Selector for t2 (dynamic scan id: 1) (cost=10.00..100.00 rows=2 width=4) (never executed)
Partitions selected: 27 (out of 27)
-> Dynamic Seq Scan on t2 (dynamic scan id: 1) (cost=0.00..431.00 rows=1 width=762) (actual time=34.441..4680.938 rows=361460 loops=1)
Partitions scanned: Avg 27.0 (out of 27) x 96 workers. Max 27 parts (seg0).
上面是Explain Analyze一个SQL语句输出的执行计划,您可以按照之前的指导来定位问题:
先自上而下梳理耗时算子,通过查看每个算子的actual time可以看到Hash Join执行时间很长。
从Hash Join算子往下梳理,发现内表有下盘操作,如下
“work_mem: 33535270kB Segments: 96 Max: 349326kB (segment 33) Workfile: (96 spilling) Work_mem wanted: 26684983K bytes avg, 26684983K bytes max (seg0) to lessen workfile I/O affecting 96 workers.”
。Hash Join的内表在build phase时有广播操作,如下
“Broadcast Motion 96:96 (slice1; segments: 96) (cost=0.00..431.00 rows=1 width=762) (actual time=25.562..912862.203 rows=34583155 loops=1)”
,从这里可以看到优化器预估的表t1的行数为1,与实际验证不符。基于以上诊断可以得出,由于没有及时收集表t2的统计信息,导致优化器认为t2为一张小表,从而在Hash Join时将t2广播到各个节点,并且以t1作为内表构建Hash table导致t2下盘,最终导致整个SQL执行耗时长。
以上问题的解决方案是重新收集一遍t2统计数据:analyze t2
。