通过EXPLAIN和EXPLAIN ANALYZE分析执行计划
本文介绍如何使用EXPLAIN
和EXPLAIN ANALYZE
命令来分析查询执行计划。
前提条件
AnalyticDB MySQL版集群需为3.1.3或以上版本。
EXPLAIN
您可以通过EXPLAIN
命令来评估查询语句的执行方式,评估结果仅供参考,并不等于实际的执行结果。
语法
EXPLAIN (format text) <SELECT statement>;
说明如果查询SQL不复杂,您可以在
EXPLAIN
命令中加上(format text)
,来提高返回结果中计划树层次结构的易读性。示例
EXPLAIN (format text) SELECT count(*) FROM nation, region, customer WHERE c_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA';
返回结果如下:
Output[count(*)] │ Outputs: [count:bigint] │ Estimates: {rows: 1 (8B)} │ count(*) := count └─ Aggregate(FINAL) │ Outputs: [count:bigint] │ Estimates: {rows: 1 (8B)} │ count := count(`count_1`) └─ LocalExchange[SINGLE] () │ Outputs: [count_0_1:bigint] │ Estimates: {rows: 1 (8B)} └─ RemoteExchange[GATHER] │ Outputs: [count_0_2:bigint] │ Estimates: {rows: 1 (8B)} └─ Aggregate(PARTIAL) │ Outputs: [count_0_4:bigint] │ Estimates: {rows: 1 (8B)} │ count_4 := count(*) └─ InnerJoin[(`c_nationkey` = `n_nationkey`)][$hashvalue, $hashvalue_0_6] │ Outputs: [] │ Estimates: {rows: 302035 (4.61MB)} │ Distribution: REPLICATED ├─ Project[] │ │ Outputs: [c_nationkey:integer, $hashvalue:bigint] │ │ Estimates: {rows: 1500000 (5.72MB)} │ │ $hashvalue := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`c_nationkey`), 0)) │ └─ RuntimeFilter │ │ Outputs: [c_nationkey:integer] │ │ Estimates: {rows: 1500000 (5.72MB)} │ ├─ TableScan[adb:AdbTableHandle{schema=tpch, tableName=customer, partitionColumnHandles=[c_custkey]}] │ │ Outputs: [c_nationkey:integer] │ │ Estimates: {rows: 1500000 (5.72MB)} │ │ c_nationkey := AdbColumnHandle{columnName=c_nationkey, type=4, isIndexed=true} │ └─ RuntimeCollect │ │ Outputs: [n_nationkey:integer] │ │ Estimates: {rows: 5 (60B)} │ └─ LocalExchange[ROUND_ROBIN] () │ │ Outputs: [n_nationkey:integer] │ │ Estimates: {rows: 5 (60B)} │ └─ RuntimeScan │ Outputs: [n_nationkey:integer] │ Estimates: {rows: 5 (60B)} └─ LocalExchange[HASH][$hashvalue_0_6] ("n_nationkey") │ Outputs: [n_nationkey:integer, $hashvalue_0_6:bigint] │ Estimates: {rows: 5 (60B)} └─ Project[] │ Outputs: [n_nationkey:integer, $hashvalue_0_10:bigint] │ Estimates: {rows: 5 (60B)} │ $hashvalue_10 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_nationkey`), 0)) └─ RemoteExchange[REPLICATE] │ Outputs: [n_nationkey:integer] │ Estimates: {rows: 5 (60B)} └─ InnerJoin[(`n_regionkey` = `r_regionkey`)][$hashvalue_0_7, $hashvalue_0_8] │ Outputs: [n_nationkey:integer] │ Estimates: {rows: 5 (60B)} │ Distribution: REPLICATED ├─ Project[] │ │ Outputs: [n_nationkey:integer, n_regionkey:integer, $hashvalue_0_7:bigint] │ │ Estimates: {rows: 25 (200B)} │ │ $hashvalue_7 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_regionkey`), 0)) │ └─ RuntimeFilter │ │ Outputs: [n_nationkey:integer, n_regionkey:integer] │ │ Estimates: {rows: 25 (200B)} │ ├─ TableScan[adb:AdbTableHandle{schema=tpch, tableName=nation, partitionColumnHandles=[]}] │ │ Outputs: [n_nationkey:integer, n_regionkey:integer] │ │ Estimates: {rows: 25 (200B)} │ │ n_nationkey := AdbColumnHandle{columnName=n_nationkey, type=4, isIndexed=true} │ │ n_regionkey := AdbColumnHandle{columnName=n_regionkey, type=4, isIndexed=true} │ └─ RuntimeCollect │ │ Outputs: [r_regionkey:integer] │ │ Estimates: {rows: 1 (4B)} │ └─ LocalExchange[ROUND_ROBIN] () │ │ Outputs: [r_regionkey:integer] │ │ Estimates: {rows: 1 (4B)} │ └─ RuntimeScan │ Outputs: [r_regionkey:integer] │ Estimates: {rows: 1 (4B)} └─ LocalExchange[HASH][$hashvalue_0_8] ("r_regionkey") │ Outputs: [r_regionkey:integer, $hashvalue_0_8:bigint] │ Estimates: {rows: 1 (4B)} └─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=region, partitionColumnHandles=[]}] Outputs: [r_regionkey:integer, $hashvalue_0_9:bigint] Estimates: {rows: 1 (4B)}/{rows: 1 (B)} $hashvalue_9 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`r_regionkey`), 0)) r_regionkey := AdbColumnHandle{columnName=r_regionkey, type=4, isIndexed=true}
返回结果中的主要参数说明见下表。
参数
说明
Outputs: [symbol:type]
每个算子的输出列及数据类型。
Estimates: {rows: %s (%sB)}
每个算子的估算行数及数据量。估算结果可用来决定优化器的Join Order和Data Shuffle。
EXPLAIN ANALYZE
您可以通过EXPLAIN ANALYZE
命令查看查询的分布式执行计划以及实际执行代价,包括执行耗时、内存使用量,输入输出数据量等。
语法
EXPLAIN ANALYZE <SELECT statement>;
示例
EXPLAIN ANALYZE SELECT count(*) FROM nation, region, customer WHERE c_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA';
返回结果如下:
Fragment 1 [SINGLE] Output: 1 row (9B), PeakMemory: 178KB, WallTime: 1.00ns, Input: 32 rows (288B); per task: avg.: 32.00 std.dev.: 0.00 Output layout: [count] Output partitioning: SINGLE [] Aggregate(FINAL) │ Outputs: [count:bigint] │ Estimates: {rows: 1 (8B)} │ Output: 2 rows (18B), PeakMemory: 24B (0.00%), WallTime: 70.39us (0.03%) │ count := count(`count_1`) └─ LocalExchange[SINGLE] () │ Outputs: [count1:bigint] │ Estimates: {rows: ? (?)} │ Output: 64 rows (576B), PeakMemory: 8KB (0.07%), WallTime: 238.69us (0.10%) └─ RemoteSource[2] Outputs: [count2:bigint] Estimates: Output: 32 rows (288B), PeakMemory: 32KB (0.27%), WallTime: 182.82us (0.08%) Input avg.: 4.00 rows, Input std.dev.: 264.58% Fragment 2 [adb:AdbPartitioningHandle{schema=tpch, tableName=customer, dimTable=false, shards=32, tableEngineType=Cstore, partitionColumns=c_custkey, prunedBuckets= empty}] Output: 32 rows (288B), PeakMemory: 6MB, WallTime: 164.00ns, Input: 1500015 rows (20.03MB); per task: avg.: 500005.00 std.dev.: 21941.36 Output layout: [count4] Output partitioning: SINGLE [] Aggregate(PARTIAL) │ Outputs: [count4:bigint] │ Estimates: {rows: 1 (8B)} │ Output: 64 rows (576B), PeakMemory: 336B (0.00%), WallTime: 1.01ms (0.42%) │ count_4 := count(*) └─ INNER Join[(`c_nationkey` = `n_nationkey`)][$hashvalue, $hashvalue6] │ Outputs: [] │ Estimates: {rows: 302035 (4.61MB)} │ Output: 300285 rows (210B), PeakMemory: 641KB (5.29%), WallTime: 99.08ms (41.45%) │ Left (probe) Input avg.: 46875.00 rows, Input std.dev.: 311.24% │ Right (build) Input avg.: 0.63 rows, Input std.dev.: 264.58% │ Distribution: REPLICATED ├─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=customer, partitionColumnHandles=[c_custkey]}] │ Outputs: [c_nationkey:integer, $hashvalue:bigint] │ Estimates: {rows: 1500000 (5.72MB)}/{rows: 1500000 (5.72MB)} │ Output: 1500000 rows (20.03MB), PeakMemory: 5MB (44.38%), WallTime: 68.29ms (28.57%) │ $hashvalue := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`c_nationkey`), 0)) │ c_nationkey := AdbColumnHandle{columnName=c_nationkey, type=4, isIndexed=true} │ Input: 1500000 rows (7.15MB), Filtered: 0.00% └─ LocalExchange[HASH][$hashvalue6] ("n_nationkey") │ Outputs: [n_nationkey:integer, $hashvalue6:bigint] │ Estimates: {rows: 5 (60B)} │ Output: 30 rows (420B), PeakMemory: 394KB (3.26%), WallTime: 455.03us (0.19%) └─ Project[] │ Outputs: [n_nationkey:integer, $hashvalue10:bigint] │ Estimates: {rows: 5 (60B)} │ Output: 15 rows (210B), PeakMemory: 24KB (0.20%), WallTime: 83.61us (0.03%) │ Input avg.: 0.63 rows, Input std.dev.: 264.58% │ $hashvalue_10 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_nationkey`), 0)) └─ RemoteSource[3] Outputs: [n_nationkey:integer] Estimates: Output: 15 rows (75B), PeakMemory: 24KB (0.20%), WallTime: 45.97us (0.02%) Input avg.: 0.63 rows, Input std.dev.: 264.58% Fragment 3 [adb:AdbPartitioningHandle{schema=tpch, tableName=nation, dimTable=true, shards=32, tableEngineType=Cstore, partitionColumns=, prunedBuckets= empty}] Output: 5 rows (25B), PeakMemory: 185KB, WallTime: 1.00ns, Input: 26 rows (489B); per task: avg.: 26.00 std.dev.: 0.00 Output layout: [n_nationkey] Output partitioning: BROADCAST [] INNER Join[(`n_regionkey` = `r_regionkey`)][$hashvalue7, $hashvalue8] │ Outputs: [n_nationkey:integer] │ Estimates: {rows: 5 (60B)} │ Output: 11 rows (64B), PeakMemory: 152KB (1.26%), WallTime: 255.86us (0.11%) │ Left (probe) Input avg.: 25.00 rows, Input std.dev.: 0.00% │ Right (build) Input avg.: 0.13 rows, Input std.dev.: 264.58% │ Distribution: REPLICATED ├─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=nation, partitionColumnHandles=[]}] │ Outputs: [n_nationkey:integer, n_regionkey:integer, $hashvalue7:bigint] │ Estimates: {rows: 25 (200B)}/{rows: 25 (200B)} │ Output: 25 rows (475B), PeakMemory: 16KB (0.13%), WallTime: 178.81us (0.07%) │ $hashvalue_7 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_regionkey`), 0)) │ n_nationkey := AdbColumnHandle{columnName=n_nationkey, type=4, isIndexed=true} │ n_regionkey := AdbColumnHandle{columnName=n_regionkey, type=4, isIndexed=true} │ Input: 25 rows (250B), Filtered: 0.00% └─ LocalExchange[HASH][$hashvalue8] ("r_regionkey") │ Outputs: [r_regionkey:integer, $hashvalue8:bigint] │ Estimates: {rows: 1 (4B)} │ Output: 2 rows (28B), PeakMemory: 34KB (0.29%), WallTime: 57.41us (0.02%) └─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=region, partitionColumnHandles=[]}] Outputs: [r_regionkey:integer, $hashvalue9:bigint] Estimates: {rows: 1 (4B)}/{rows: 1 (4B)} Output: 1 row (14B), PeakMemory: 8KB (0.07%), WallTime: 308.99us (0.13%) $hashvalue_9 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`r_regionkey`), 0)) r_regionkey := AdbColumnHandle{columnName=r_regionkey, type=4, isIndexed=true} Input: 1 row (5B), Filtered: 0.00%
返回结果中的主要参数说明见下表。
参数
说明
Outputs: [symbol:type]
每个算子的输出列及数据类型。
Estimates: {rows: %s (%sB)}
每个算子的估算行数及数据量。估算结果可用来决定优化器的Join Order和Data Shuffle。
PeakMemory: %s
内存使用总和,用于分析内存使用的瓶颈点。
WallTime: %s
算子执行时间的累加总和,用于分析计算瓶颈点。
说明由于存在并行计算,所以该时间并不是真实的执行时间。
Input: %s rows (%sB)
输入行数及数据量。
per task: avg.: %s std.dev.: %s
平均行数和其标准差,用于分析Stage内部的数据倾斜。
Output: %s row (%sB)
输出行数及数据量。
使用场景
您可以通过EXPLAIN ANALYZE
分析一些常见的计划问题。
过滤器未下推
在如下两个查询中,相较于SQL 1,SQL 2中由于存在不能下推的函数
length(string_test)
,需要扫描全量数据进行计算:SQL 1
SELECT count(*) FROM test WHERE string_test = 'a';
SQL 2
SELECT count(*) FROM test WHERE length(string_test) = 1;
使用
EXPLAIN ANALYZE
分别分析上述两个查询的执行计划,对比计划中的Fragment 2,可以看出:SQL 1使用的算子是
TableScan
,且Input avg.
为0.00 rows
,说明过滤器下推成功,扫描数据量为0行。Fragment 2 [adb:AdbPartitioningHandle{schema=test4dmp, tableName=test, dimTable=false, shards=4, tableEngineType=Cstore, partitionColumns=id, prunedBuckets= empty}] Output: 4 rows (36B), PeakMemory: 0B, WallTime: 6.00ns, Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00 Output layout: [count_0_1] Output partitioning: SINGLE [] Aggregate(PARTIAL) │ Outputs: [count_0_1:bigint] │ Estimates: {rows: 1 (8B)} │ Output: 8 rows (72B), PeakMemory: 0B (0.00%), WallTime: 212.92us (3.99%) │ count_0_1 := count(*) └─ TableScan[adb:AdbTableHandle{schema=test4dmp, tableName=test, partitionColumnHandles=[id]}] Outputs: [] Estimates: {rows: 4 (0B)} Output: 0 rows (0B), PeakMemory: 0B (0.00%), WallTime: 4.76ms (89.12%) Input avg.: 0.00 rows, Input std.dev.: ?%
SQL 2使用的算子是
ScanFilterProject
,且Input
为9999 rows
,同时,filterPredicate
属性不为空(即filterPredicate = (`test4dmp`.`length`(`string_test`) = BIGINT '1')
)表明没有下推的过滤器,扫描数据量为9999行。Fragment 2 [adb:AdbPartitioningHandle{schema=test4dmp, tableName=test, dimTable=false, shards=4, tableEngineType=Cstore, partitionColumns=id, prunedBuckets= empty}] Output: 4 rows (36B), PeakMemory: 0B, WallTime: 102.00ns, Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00 Output layout: [count_0_1] Output partitioning: SINGLE [] Aggregate(PARTIAL) │ Outputs: [count_0_1:bigint] │ Estimates: {rows: 1 (8B)} │ Output: 8 rows (72B), PeakMemory: 0B (0.00%), WallTime: 252.23us (0.12%) │ count_0_1 := count(*) └─ ScanFilterProject[table = adb:AdbTableHandle{schema=test4dmp, tableName=test, partitionColumnHandles=[id]}, filterPredicate = (`test4dmp`.`length`(`string_test`) = BIGINT '1')] Outputs: [] Estimates: {rows: 9999 (312.47kB)}/{rows: 9999 (312.47kB)}/{rows: ? (?)} Output: 0 rows (0B), PeakMemory: 0B (0.00%), WallTime: 101.31ms (49.84%) string_test := AdbColumnHandle{columnName=string_test, type=13, isIndexed=true} Input: 9999 rows (110.32kB), Filtered: 100.00%
Bad SQL内存使用率
您可以直接查看每个Fragment的
PeakMemory
来定位资源消耗的问题。排除计划中Disaster Broadcast的情况,高PeakMemory
通常是因为连接数据膨胀、连接的左表数据量过大、TableScan
算子扫描数据量过大等,需要从业务角度加条件限制数据量。另外,您也可以通过查看每个算子的PeakMemory
百分比定位资源消耗最大的算子,再进一步分析。