使用EXPLAIN和EXPLAIN ANALYZE分析查询执行计划-云原生数据仓库AnalyticDB-阿里云

本文介绍如何使用EXPLAIN和EXPLAIN ANALYZE命令来分析查询执行计划。

前提条件

AnalyticDB MySQL版集群需为3.1.3或以上版本。

说明

如何查看集群版本，请参见查看版本。如需升级版本，请提交工单联系技术支持。

EXPLAIN

您可以通过EXPLAIN命令来评估查询语句的执行方式，评估结果仅供参考，并不等于实际的执行结果。

语法
```
EXPLAIN (format text) <SELECT statement>;
```
说明
如果查询SQL不复杂，您可以在EXPLAIN命令中加上(format text)，来提高返回结果中计划树层次结构的易读性。

示例

EXPLAIN (format text)
      SELECT count(*)
      FROM
      nation, region, customer
      WHERE
      c_nationkey = n_nationkey
      AND n_regionkey = r_regionkey
      AND r_name = 'ASIA';

返回结果如下：

Output[count(*)]
│   Outputs: [count:bigint]
│   Estimates: {rows: 1 (8B)}
│   count(*) := count
└─ Aggregate(FINAL)
   │   Outputs: [count:bigint]
   │   Estimates: {rows: 1 (8B)}
   │   count := count(`count_1`)
   └─ LocalExchange[SINGLE] ()
      │   Outputs: [count_0_1:bigint]
      │   Estimates: {rows: 1 (8B)}
      └─ RemoteExchange[GATHER]
         │   Outputs: [count_0_2:bigint]
         │   Estimates: {rows: 1 (8B)}
         └─ Aggregate(PARTIAL)
            │   Outputs: [count_0_4:bigint]
            │   Estimates: {rows: 1 (8B)}
            │   count_4 := count(*)
            └─ InnerJoin[(`c_nationkey` = `n_nationkey`)][$hashvalue, $hashvalue_0_6]
               │   Outputs: []
               │   Estimates: {rows: 302035 (4.61MB)}
               │   Distribution: REPLICATED
               ├─ Project[]
               │  │   Outputs: [c_nationkey:integer, $hashvalue:bigint]
               │  │   Estimates: {rows: 1500000 (5.72MB)}
               │  │   $hashvalue := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`c_nationkey`), 0))
               │  └─ RuntimeFilter
               │     │   Outputs: [c_nationkey:integer]
               │     │   Estimates: {rows: 1500000 (5.72MB)}
               │     ├─ TableScan[adb:AdbTableHandle{schema=tpch, tableName=customer, partitionColumnHandles=[c_custkey]}]
               │     │      Outputs: [c_nationkey:integer]
               │     │      Estimates: {rows: 1500000 (5.72MB)}
               │     │      c_nationkey := AdbColumnHandle{columnName=c_nationkey, type=4, isIndexed=true}
               │     └─ RuntimeCollect
               │        │   Outputs: [n_nationkey:integer]
               │        │   Estimates: {rows: 5 (60B)}
               │        └─ LocalExchange[ROUND_ROBIN] ()
               │           │   Outputs: [n_nationkey:integer]
               │           │   Estimates: {rows: 5 (60B)}
               │           └─ RuntimeScan
               │                  Outputs: [n_nationkey:integer]
               │                  Estimates: {rows: 5 (60B)}
               └─ LocalExchange[HASH][$hashvalue_0_6] ("n_nationkey")
                  │   Outputs: [n_nationkey:integer, $hashvalue_0_6:bigint]
                  │   Estimates: {rows: 5 (60B)}
                  └─ Project[]
                     │   Outputs: [n_nationkey:integer, $hashvalue_0_10:bigint]
                     │   Estimates: {rows: 5 (60B)}
                     │   $hashvalue_10 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_nationkey`), 0))
                     └─ RemoteExchange[REPLICATE]
                        │   Outputs: [n_nationkey:integer]
                        │   Estimates: {rows: 5 (60B)}
                        └─ InnerJoin[(`n_regionkey` = `r_regionkey`)][$hashvalue_0_7, $hashvalue_0_8]
                           │   Outputs: [n_nationkey:integer]
                           │   Estimates: {rows: 5 (60B)}
                           │   Distribution: REPLICATED
                           ├─ Project[]
                           │  │   Outputs: [n_nationkey:integer, n_regionkey:integer, $hashvalue_0_7:bigint]
                           │  │   Estimates: {rows: 25 (200B)}
                           │  │   $hashvalue_7 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_regionkey`), 0))
                           │  └─ RuntimeFilter
                           │     │   Outputs: [n_nationkey:integer, n_regionkey:integer]
                           │     │   Estimates: {rows: 25 (200B)}
                           │     ├─ TableScan[adb:AdbTableHandle{schema=tpch, tableName=nation, partitionColumnHandles=[]}]
                           │     │      Outputs: [n_nationkey:integer, n_regionkey:integer]
                           │     │      Estimates: {rows: 25 (200B)}
                           │     │      n_nationkey := AdbColumnHandle{columnName=n_nationkey, type=4, isIndexed=true}
                           │     │      n_regionkey := AdbColumnHandle{columnName=n_regionkey, type=4, isIndexed=true}
                           │     └─ RuntimeCollect
                           │        │   Outputs: [r_regionkey:integer]
                           │        │   Estimates: {rows: 1 (4B)}
                           │        └─ LocalExchange[ROUND_ROBIN] ()
                           │           │   Outputs: [r_regionkey:integer]
                           │           │   Estimates: {rows: 1 (4B)}
                           │           └─ RuntimeScan
                           │                  Outputs: [r_regionkey:integer]
                           │                  Estimates: {rows: 1 (4B)}
                           └─ LocalExchange[HASH][$hashvalue_0_8] ("r_regionkey")
                              │   Outputs: [r_regionkey:integer, $hashvalue_0_8:bigint]
                              │   Estimates: {rows: 1 (4B)}
                              └─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=region, partitionColumnHandles=[]}]
                                     Outputs: [r_regionkey:integer, $hashvalue_0_9:bigint]
                                     Estimates: {rows: 1 (4B)}/{rows: 1 (B)}
                                     $hashvalue_9 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`r_regionkey`), 0))
                                     r_regionkey := AdbColumnHandle{columnName=r_regionkey, type=4, isIndexed=true}

返回结果中的主要参数说明见下表。

参数	说明
`Outputs: [symbol:type]`	每个算子的输出列及数据类型。
`Estimates: {rows: %s (%sB)}`	每个算子的估算行数及数据量。估算结果可用来决定优化器的Join Order和Data Shuffle。

EXPLAIN ANALYZE

您可以通过EXPLAIN ANALYZE命令查看查询的分布式执行计划以及实际执行代价，包括执行耗时、内存使用量，输入输出数据量等。

语法
```
EXPLAIN ANALYZE <SELECT statement>;
```

示例

EXPLAIN ANALYZE
SELECT count(*)
      FROM
      nation, region, customer
      WHERE
      c_nationkey = n_nationkey
      AND n_regionkey = r_regionkey
      AND r_name = 'ASIA';

返回结果如下：

Fragment 1 [SINGLE]
    Output: 1 row (9B), PeakMemory: 178KB, WallTime: 1.00ns, Input: 32 rows (288B); per task: avg.: 32.00 std.dev.: 0.00
    Output layout: [count]
    Output partitioning: SINGLE []
    Aggregate(FINAL)
    │   Outputs: [count:bigint]
    │   Estimates: {rows: 1 (8B)}
    │   Output: 2 rows (18B), PeakMemory: 24B (0.00%), WallTime: 70.39us (0.03%)
    │   count := count(`count_1`)
    └─ LocalExchange[SINGLE] ()
       │   Outputs: [count1:bigint]
       │   Estimates: {rows: ? (?)}
       │   Output: 64 rows (576B), PeakMemory: 8KB (0.07%), WallTime: 238.69us (0.10%)
       └─ RemoteSource[2]
              Outputs: [count2:bigint]
              Estimates:
              Output: 32 rows (288B), PeakMemory: 32KB (0.27%), WallTime: 182.82us (0.08%)
              Input avg.: 4.00 rows, Input std.dev.: 264.58%

Fragment 2 [adb:AdbPartitioningHandle{schema=tpch, tableName=customer, dimTable=false, shards=32, tableEngineType=Cstore, partitionColumns=c_custkey, prunedBuckets= empty}]
    Output: 32 rows (288B), PeakMemory: 6MB, WallTime: 164.00ns, Input: 1500015 rows (20.03MB); per task: avg.: 500005.00 std.dev.: 21941.36
    Output layout: [count4]
    Output partitioning: SINGLE []
    Aggregate(PARTIAL)
    │   Outputs: [count4:bigint]
    │   Estimates: {rows: 1 (8B)}
    │   Output: 64 rows (576B), PeakMemory: 336B (0.00%), WallTime: 1.01ms (0.42%)
    │   count_4 := count(*)
    └─ INNER Join[(`c_nationkey` = `n_nationkey`)][$hashvalue, $hashvalue6]
       │   Outputs: []
       │   Estimates: {rows: 302035 (4.61MB)}
       │   Output: 300285 rows (210B), PeakMemory: 641KB (5.29%), WallTime: 99.08ms (41.45%)
       │   Left (probe) Input avg.: 46875.00 rows, Input std.dev.: 311.24%
       │   Right (build) Input avg.: 0.63 rows, Input std.dev.: 264.58%
       │   Distribution: REPLICATED
       ├─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=customer, partitionColumnHandles=[c_custkey]}]
       │      Outputs: [c_nationkey:integer, $hashvalue:bigint]
       │      Estimates: {rows: 1500000 (5.72MB)}/{rows: 1500000 (5.72MB)}
       │      Output: 1500000 rows (20.03MB), PeakMemory: 5MB (44.38%), WallTime: 68.29ms (28.57%)
       │      $hashvalue := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`c_nationkey`), 0))
       │      c_nationkey := AdbColumnHandle{columnName=c_nationkey, type=4, isIndexed=true}
       │      Input: 1500000 rows (7.15MB), Filtered: 0.00%
       └─ LocalExchange[HASH][$hashvalue6] ("n_nationkey")
          │   Outputs: [n_nationkey:integer, $hashvalue6:bigint]
          │   Estimates: {rows: 5 (60B)}
          │   Output: 30 rows (420B), PeakMemory: 394KB (3.26%), WallTime: 455.03us (0.19%)
          └─ Project[]
             │   Outputs: [n_nationkey:integer, $hashvalue10:bigint]
             │   Estimates: {rows: 5 (60B)}
             │   Output: 15 rows (210B), PeakMemory: 24KB (0.20%), WallTime: 83.61us (0.03%)
             │   Input avg.: 0.63 rows, Input std.dev.: 264.58%
             │   $hashvalue_10 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_nationkey`), 0))
             └─ RemoteSource[3]
                    Outputs: [n_nationkey:integer]
                    Estimates:
                    Output: 15 rows (75B), PeakMemory: 24KB (0.20%), WallTime: 45.97us (0.02%)
                    Input avg.: 0.63 rows, Input std.dev.: 264.58%

Fragment 3 [adb:AdbPartitioningHandle{schema=tpch, tableName=nation, dimTable=true, shards=32, tableEngineType=Cstore, partitionColumns=, prunedBuckets= empty}]
    Output: 5 rows (25B), PeakMemory: 185KB, WallTime: 1.00ns, Input: 26 rows (489B); per task: avg.: 26.00 std.dev.: 0.00
    Output layout: [n_nationkey]
    Output partitioning: BROADCAST []
    INNER Join[(`n_regionkey` = `r_regionkey`)][$hashvalue7, $hashvalue8]
    │   Outputs: [n_nationkey:integer]
    │   Estimates: {rows: 5 (60B)}
    │   Output: 11 rows (64B), PeakMemory: 152KB (1.26%), WallTime: 255.86us (0.11%)
    │   Left (probe) Input avg.: 25.00 rows, Input std.dev.: 0.00%
    │   Right (build) Input avg.: 0.13 rows, Input std.dev.: 264.58%
    │   Distribution: REPLICATED
    ├─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=nation, partitionColumnHandles=[]}]
    │      Outputs: [n_nationkey:integer, n_regionkey:integer, $hashvalue7:bigint]
    │      Estimates: {rows: 25 (200B)}/{rows: 25 (200B)}
    │      Output: 25 rows (475B), PeakMemory: 16KB (0.13%), WallTime: 178.81us (0.07%)
    │      $hashvalue_7 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`n_regionkey`), 0))
    │      n_nationkey := AdbColumnHandle{columnName=n_nationkey, type=4, isIndexed=true}
    │      n_regionkey := AdbColumnHandle{columnName=n_regionkey, type=4, isIndexed=true}
    │      Input: 25 rows (250B), Filtered: 0.00%
    └─ LocalExchange[HASH][$hashvalue8] ("r_regionkey")
       │   Outputs: [r_regionkey:integer, $hashvalue8:bigint]
       │   Estimates: {rows: 1 (4B)}
       │   Output: 2 rows (28B), PeakMemory: 34KB (0.29%), WallTime: 57.41us (0.02%)
       └─ ScanProject[table = adb:AdbTableHandle{schema=tpch, tableName=region, partitionColumnHandles=[]}]
              Outputs: [r_regionkey:integer, $hashvalue9:bigint]
              Estimates: {rows: 1 (4B)}/{rows: 1 (4B)}
              Output: 1 row (14B), PeakMemory: 8KB (0.07%), WallTime: 308.99us (0.13%)
              $hashvalue_9 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`r_regionkey`), 0))
              r_regionkey := AdbColumnHandle{columnName=r_regionkey, type=4, isIndexed=true}
              Input: 1 row (5B), Filtered: 0.00%

返回结果中的主要参数说明见下表。

参数	说明
`Outputs: [symbol:type]`	每个算子的输出列及数据类型。
`Estimates: {rows: %s (%sB)}`	每个算子的估算行数及数据量。估算结果可用来决定优化器的Join Order和Data Shuffle。
`PeakMemory: %s`	内存使用总和，用于分析内存使用的瓶颈点。
`WallTime: %s`	算子执行时间的累加总和，用于分析计算瓶颈点。说明由于存在并行计算，所以该时间并不是真实的执行时间。
`Input: %s rows (%sB)`	输入行数及数据量。
`per task: avg.: %s std.dev.: %s`	平均行数和其标准差，用于分析Stage内部的数据倾斜。
`Output: %s row (%sB)`	输出行数及数据量。

使用场景

您可以通过EXPLAIN ANALYZE分析一些常见的计划问题。

过滤器未下推

在如下两个查询中，相较于SQL 1，SQL 2中由于存在不能下推的函数 length(string_test) ，需要扫描全量数据进行计算：

SQL 1

SELECT count(*) FROM test WHERE string_test = 'a';

SQL 2

SELECT count(*) FROM test WHERE length(string_test) = 1;

使用EXPLAIN ANALYZE分别分析上述两个查询的执行计划，对比计划中的Fragment 2，可以看出：

SQL 1使用的算子是TableScan，且Input avg.为0.00 rows，说明过滤器下推成功，扫描数据量为0行。

Fragment 2 [adb:AdbPartitioningHandle{schema=test4dmp, tableName=test, dimTable=false, shards=4, tableEngineType=Cstore, partitionColumns=id, prunedBuckets= empty}]
    Output: 4 rows (36B), PeakMemory: 0B, WallTime: 6.00ns, Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00
    Output layout: [count_0_1]
    Output partitioning: SINGLE []
    Aggregate(PARTIAL)
    │   Outputs: [count_0_1:bigint]
    │   Estimates: {rows: 1 (8B)}
    │   Output: 8 rows (72B), PeakMemory: 0B (0.00%), WallTime: 212.92us (3.99%)
    │   count_0_1 := count(*)
    └─ TableScan[adb:AdbTableHandle{schema=test4dmp, tableName=test, partitionColumnHandles=[id]}]
           Outputs: []
           Estimates: {rows: 4 (0B)}
           Output: 0 rows (0B), PeakMemory: 0B (0.00%), WallTime: 4.76ms (89.12%)
           Input avg.: 0.00 rows, Input std.dev.: ?%

SQL 2使用的算子是ScanFilterProject，且Input为9999 rows，同时，filterPredicate属性不为空（即filterPredicate = (`test4dmp`.`length`(`string_test`) = BIGINT '1')）表明没有下推的过滤器，扫描数据量为9999行。

Fragment 2 [adb:AdbPartitioningHandle{schema=test4dmp, tableName=test, dimTable=false, shards=4, tableEngineType=Cstore, partitionColumns=id, prunedBuckets= empty}]
    Output: 4 rows (36B), PeakMemory: 0B, WallTime: 102.00ns, Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00
    Output layout: [count_0_1]
    Output partitioning: SINGLE []
    Aggregate(PARTIAL)
    │   Outputs: [count_0_1:bigint]
    │   Estimates: {rows: 1 (8B)}
    │   Output: 8 rows (72B), PeakMemory: 0B (0.00%), WallTime: 252.23us (0.12%)
    │   count_0_1 := count(*)
    └─ ScanFilterProject[table = adb:AdbTableHandle{schema=test4dmp, tableName=test, partitionColumnHandles=[id]}, filterPredicate = (`test4dmp`.`length`(`string_test`) = BIGINT '1')]
           Outputs: []
           Estimates: {rows: 9999 (312.47kB)}/{rows: 9999 (312.47kB)}/{rows: ? (?)}
           Output: 0 rows (0B), PeakMemory: 0B (0.00%), WallTime: 101.31ms (49.84%)
           string_test := AdbColumnHandle{columnName=string_test, type=13, isIndexed=true}
           Input: 9999 rows (110.32kB), Filtered: 100.00%

Bad SQL内存使用率
您可以直接查看每个Fragment的PeakMemory来定位资源消耗的问题。排除计划中Disaster Broadcast的情况，高PeakMemory通常是因为连接数据膨胀、连接的左表数据量过大、TableScan算子扫描数据量过大等，需要从业务角度加条件限制数据量。另外，您也可以通过查看每个算子的PeakMemory百分比定位资源消耗最大的算子，再进一步分析。