Explain是一种的查询分析工具,可以对查询语句的执行方式进行评估。Explain Analyze是一个查询性能分析工具,可以给出实际的执行情况。

本章节内容适用于版本号为3.1.3及以上的AnalyticDB for MySQL实例,之前版本请参见通过EXPLAIN查询执行计划

语法结构

3.1.3版本及以后版本可以展示格式更清晰的计划和实际执行后的代价,包括下列语法:
  • explain (format text):查看格式化的计划详情。
  • explain analyze:查看分布式计划及实际执行代价。
Explain
EXPLAIN (format text)
      SELECT count(*)
      FROM    
      grade g, student s, course c
      WHERE 
      g.id = s.id 
      AND g.course = c.cid
      AND c.cname = '语文';

上述EXPLAIN语句的返回结果如下:

| Output[count(*)]
│   Outputs: [count:bigint]
│   Estimates: {rows: 1 (8B)}
│   count(*) := count
└─ Aggregate(FINAL)
   │   Outputs: [count:bigint]
   │   Estimates: {rows: 1 (8B)}
   │   count := count(`count_2`)
   └─ LocalExchange[SINGLE] ()
      │   Outputs: [count_0_2:bigint]
      │   Estimates: {rows: 1 (8B)}
      └─ RemoteExchange[GATHER]
         │   Outputs: [count_0_2:bigint]
         │   Estimates: {rows: 1 (8B)}
         └─ Aggregate(PARTIAL)
            │   Outputs: [count_0_2:bigint]
            │   Estimates: {rows: 1 (8B)}
            │   count_2 := count(*)
            └─ InnerJoin[(`expr` = `cid`)][$hashvalue_0_5, $hashvalue_0_6]
               │   Outputs: []
               │   Estimates: {rows: 128 (1kB)}
               │   Distribution: REPLICATED
               ├─ Project[]
               │  │   Outputs: [expr:bigint, $hashvalue_0_5:bigint]
               │  │   Estimates: {rows: 1000 (3.91kB)}
               │  │   expr := CAST(`course` AS bigint)
               │  │   $hashvalue_5 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(CAST(`course` AS bigint)), 0))
               │  └─ InnerJoin[(`id` = `id_0_0`)][$hashvalue, $hashvalue_0_3]
               │     │   Outputs: [course:integer]
               │     │   Estimates: {rows: 1000 (3.91kB)}
               │     │   Distribution: REPLICATED
               │     ├─ ScanProject[table = adb:AdbTableHandle{schema=test4dmp, tableName=grade, partitionColumnHandles=[sid]}]
               │     │      Outputs: [id:bigint, course:integer, $hashvalue:bigint]
               │     │      Estimates: {rows: 1000 (11.72kB)}/{rows: 1000 (11.72kB)}
               │     │      $hashvalue := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`id`), 0))
               │     │      course := AdbColumnHandle{columnName=course, type=4, isIndexed=true}
               │     │      id := AdbColumnHandle{columnName=id, type=5, isIndexed=true}
               │     └─ LocalExchange[HASH][$hashvalue_0_3] ("id_0_0")
               │        │   Outputs: [id_0_0:bigint, $hashvalue_0_3:bigint]
               │        │   Estimates: {rows: 1000 (7.81kB)}
               │        └─ Project[]
               │           │   Outputs: [id_0_0:bigint, $hashvalue_0_4:bigint]
               │           │   Estimates: {rows: 1000 (7.81kB)}
               │           │   $hashvalue_4 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`id_0`), 0))
               │           └─ RemoteExchange[REPLICATE]
               │              │   Outputs: [id_0_0:bigint]
               │              │   Estimates: {rows: 1000 (7.81kB)}
               │              └─ TableScan[adb:AdbTableHandle{schema=test4dmp, tableName=student, partitionColumnHandles=[id]}]
               │                     Outputs: [id_0_0:bigint]
               │                     Estimates: {rows: 1000 (7.81kB)}
               │                     id_0 := AdbColumnHandle{columnName=id, type=5, isIndexed=true}
               └─ LocalExchange[HASH][$hashvalue_0_6] ("cid")
                  │   Outputs: [cid:bigint, $hashvalue_0_6:bigint]
                  │   Estimates: {rows: 128 (1kB)}
                  └─ Project[]
                     │   Outputs: [cid:bigint, $hashvalue_0_7:bigint]
                     │   Estimates: {rows: 128 (1kB)}
                     │   $hashvalue_7 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`cid`), 0))
                     └─ RemoteExchange[REPLICATE]
                        │   Outputs: [cid:bigint]
                        │   Estimates: {rows: 128 (1kB)}
                        └─ TableScan[adb:AdbTableHandle{schema=test4dmp, tableName=course, partitionColumnHandles=[cid]}]
                               Outputs: [cid:bigint]
                               Estimates: {rows: 128 (1kB)}
                               cid := AdbColumnHandle{columnName=cid, type=5, isIndexed=true}
执行计划分析
EXPLAIN ANALYZE
      SELECT count(*)
      FROM    
      grade g, student s, course c
      WHERE 
      g.id = s.id 
      AND g.course = c.cid
      AND c.cname = '语文';

上述EXPLAIN ANALYZE语句的返回结果如下:

| Fragment 1 [SINGLE]
    Output: 1 row (9B), PeakMemory: 40KB, WallTime: 4.00ns, Input: 4 rows (36B); per task: avg.: 4.00 std.dev.: 0.00
    Output layout: [count]
    Output partitioning: SINGLE []
    Aggregate(FINAL)
    │   Outputs: [count:bigint]
    │   Estimates: {rows: 1 (8B)}
    │   Output: 2 rows (18B), PeakMemory: 24B (0.00%), WallTime: 257.01us (0.14%)
    │   count := count(`count_2`)
    └─ LocalExchange[SINGLE] ()
       │   Outputs: [count_0_2:bigint]
       │   Estimates: {rows: 1 (8B)}
       │   Output: 8 rows (72B), PeakMemory: 8KB (0.39%), WallTime: 167.86us (0.09%)
       └─ RemoteSource[2]
              Outputs: [count_0_2:bigint]
              Estimates:
              Output: 4 rows (36B), PeakMemory: 16KB (0.78%), WallTime: 111.02us (0.06%)
              Input avg.: 0.50 rows, Input std.dev.: 264.58%

Fragment 2 [adb:AdbPartitioningHandle{schema=test4dmp, tableName=grade, dimTable=false, shards=4, tableEngineType=Cstore, partitionColumns=sid, prunedBuckets= empty}]
    Output: 4 rows (36B), PeakMemory: 1MB, WallTime: 133.00ns, Input: 2104 rows (32.16kB); per task: avg.: 2104.00 std.dev.: 0.00
    Output layout: [count_0_2]
    Output partitioning: SINGLE []
    Aggregate(PARTIAL)
    │   Outputs: [count_0_2:bigint]
    │   Estimates: {rows: 1 (8B)}
    │   Output: 8 rows (72B), PeakMemory: 72B (0.00%), WallTime: 410.48us (0.23%)
    │   count_2 := count(*)
    └─ INNER Join[(`expr` = `cid`)][$hashvalue_0_5, $hashvalue_0_6]
       │   Outputs: []
       │   Estimates: {rows: 128 (0B)}
       │   Output: 199 rows (1.83kB), PeakMemory: 390KB (18.75%), WallTime: 10.73ms (6.02%)
       │   Left (probe) Input avg.: 250.00 rows, Input std.dev.: 173.21%
       │   Right (build) Input avg.: 13.00 rows, Input std.dev.: 264.58%
       │   Distribution: REPLICATED
       ├─ Project[]
       │  │   Outputs: [expr:bigint, $hashvalue_0_5:bigint]
       │  │   Estimates: {rows: 1000 (3.91kB)}
       │  │   Output: 1000 rows (17.58kB), PeakMemory: 24KB (1.17%), WallTime: 64.22us (0.04%)
       │  │   Input avg.: 250.00 rows, Input std.dev.: 173.21%
       │  │   expr := CAST(`course` AS bigint)
       │  │   $hashvalue_5 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(CAST(`course` AS bigint)), 0))
       │  └─ INNER Join[(`id` = `id_0_0`)][$hashvalue, $hashvalue_0_3]
       │     │   Outputs: [course:integer]
       │     │   Estimates: {rows: 1000 (3.91kB)}
       │     │   Output: 2000 rows (22.46kB), PeakMemory: 519KB (24.97%), WallTime: 53.56ms (30.03%)
       │     │   Left (probe) Input avg.: 250.00 rows, Input std.dev.: 173.21%
       │     │   Right (build) Input avg.: 125.00 rows, Input std.dev.: 264.58%
       │     │   Distribution: REPLICATED
       │     ├─ ScanProject[table = adb:AdbTableHandle{schema=test4dmp, tableName=grade, partitionColumnHandles=[sid]}]
       │     │      Outputs: [id:bigint, course:integer, $hashvalue:bigint]
       │     │      Estimates: {rows: 1000 (11.72kB)}/{rows: 1000 (11.72kB)}
       │     │      Output: 1000 rows (22.46kB), PeakMemory: 64KB (3.11%), WallTime: 21.97ms (12.32%)
       │     │      $hashvalue := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`id`), 0))
       │     │      course := AdbColumnHandle{columnName=course, type=4, isIndexed=true}
       │     │      id := AdbColumnHandle{columnName=id, type=5, isIndexed=true}
       │     │      Input: 1000 rows (13.67kB), Filtered: 0.00%
       │     └─ LocalExchange[HASH][$hashvalue_0_3] ("id_0_0")
       │        │   Outputs: [id_0_0:bigint, $hashvalue_0_3:bigint]
       │        │   Estimates: {rows: 1000 (7.81kB)}
       │        │   Output: 2000 rows (35.16kB), PeakMemory: 521KB (25.05%), WallTime: 14.53ms (8.15%)
       │        └─ Project[]
       │           │   Outputs: [id_0_0:bigint, $hashvalue_0_4:bigint]
       │           │   Estimates: {rows: 1000 (7.81kB)}
       │           │   Output: 1000 rows (17.58kB), PeakMemory: 24KB (1.17%), WallTime: 292.09us (0.16%)
       │           │   Input avg.: 125.00 rows, Input std.dev.: 264.58%
       │           │   $hashvalue_4 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`id_0`), 0))
       │           └─ RemoteSource[3]
       │                  Outputs: [id_0_0:bigint]
       │                  Estimates:
       │                  Output: 1000 rows (8.79kB), PeakMemory: 24KB (1.17%), WallTime: 82.91us (0.05%)
       │                  Input avg.: 125.00 rows, Input std.dev.: 264.58%
       └─ LocalExchange[HASH][$hashvalue_0_6] ("cid")
          │   Outputs: [cid:bigint, $hashvalue_0_6:bigint]
          │   Estimates: {rows: 128 (1kB)}
          │   Output: 208 rows (3.66kB), PeakMemory: 390KB (18.77%), WallTime: 10.04ms (5.63%)
          └─ Project[]
             │   Outputs: [cid:bigint, $hashvalue_0_7:bigint]
             │   Estimates: {rows: 128 (1kB)}
             │   Output: 104 rows (1.83kB), PeakMemory: 16KB (0.78%), WallTime: 295.60us (0.17%)
             │   Input avg.: 13.00 rows, Input std.dev.: 264.58%
             │   $hashvalue_7 := `combine_hash`(BIGINT '0', COALESCE(`$operator$hash_code`(`cid`), 0))
             └─ RemoteSource[4]
                    Outputs: [cid:bigint]
                    Estimates:
                    Output: 104 rows (936B), PeakMemory: 16KB (0.78%), WallTime: 3.81ms (2.14%)
                    Input avg.: 13.00 rows, Input std.dev.: 264.58%

Fragment 4 [adb:AdbPartitioningHandle{schema=test4dmp, tableName=course, dimTable=false, shards=4, tableEngineType=Cstore, partitionColumns=cid, prunedBuckets= empty}]
    Output: 104 rows (936B), PeakMemory: 0B, WallTime: 20.00ns, Input: 104 rows (936B); per task: avg.: 104.00 std.dev.: 0.00
    Output layout: [cid]
    Output partitioning: BROADCAST []
    TableScan[adb:AdbTableHandle{schema=test4dmp, tableName=course, partitionColumnHandles=[cid]}]
        Outputs: [cid:bigint]
        Estimates: {rows: 128 (1kB)}
        Output: 208 rows (1.83kB), PeakMemory: 0B (0.00%), WallTime: 19.50ms (10.93%)
        cid := AdbColumnHandle{columnName=cid, type=5, isIndexed=true}

Fragment 3 [adb:AdbPartitioningHandle{schema=test4dmp, tableName=student, dimTable=false, shards=4, tableEngineType=Cstore, partitionColumns=id, prunedBuckets= empty}]
    Output: 1000 rows (8.79kB), PeakMemory: 0B, WallTime: 20.00ns, Input: 1000 rows (8.79kB); per task: avg.: 1000.00 std.dev.: 0.00
    Output layout: [id_0_0]
    Output partitioning: BROADCAST []
    TableScan[adb:AdbTableHandle{schema=test4dmp, tableName=student, partitionColumnHandles=[id]}]
        Outputs: [id_0_0:bigint]
        Estimates: {rows: 1000 (7.81kB)}
        Output: 2000 rows (17.58kB), PeakMemory: 0B (0.00%), WallTime: 20.55ms (11.52%)
        id_0 := AdbColumnHandle{columnName=id, type=5, isIndexed=true}
参数说明
  • Outputs: [symbol:type]:每个算子的输出列及类型。
  • Estimates: {rows: %s (%sB)}:每个算子的估算行数及数据量,估算数据用来决定优化器的Join Order和Data Shuffle。
  • PeakMemory: %s:内存使用总和,用于分析内存使用的瓶颈点。
  • WallTime: %s:算子执行时间的累加。由于存在并行计算,所以这个时间并不是真实的执行时间,用于分析计算瓶颈点。
  • Input: %s rows (%sB):输入行数及数据量。
  • per task: avg.: %s std.dev.: %s:平均行数和其标准差,用于分析Stage内部的数据倾斜。
  • Output: %s row (%sB):输出行数及数据量。
  • PeakMemory: %s, WallTime: %s, Output: %s:展示算子级别的真实计算统计信息,百分比表示在整个查询中的占比,用于定位瓶颈算子。

使用场景

用户可以通过explain分析几类常见计划问题:
  1. 过滤器未下推

    对比以下两个SQL:

    SQL1: explain analyze select count(*) from test where string_test = 'a';

    SQL2: explain analyze select count(*) from test where length(string_test) = 1;

    SQL2中由于有函数,所以不能下推存储,需要扫描全量数据进行计算。对比计划中的Fragment2,即TableScan的Stage,我们可以看出:
    • SQL1的算子是TableScan,且Input rows是”0 rows“,说明filter下推成功,扫描数据量为0行。
    • SQL2的算子是ScanFilterProject,且Input rows是”999 rows“,由于 ”filterPredicate = (`test4dmp`.`length`(`string_test`) = BIGINT '1')]“,即没有下推的filter条件。说明filter没有下推存储,扫描数据量为999行。
    Fragment 2 [adb:AdbPartitioningHandle{schema=test4dmp, tableName=test, dimTable=false, shards=4, tableEngineType=Cstore, partitionColumns=id, prunedBuckets= empty}]
        Output: 4 rows (36B), PeakMemory: 0B, WallTime: 6.00ns, Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00
        Output layout: [count_0_1]
        Output partitioning: SINGLE []
        Aggregate(PARTIAL)
        │   Outputs: [count_0_1:bigint]
        │   Estimates: {rows: 1 (8B)}
        │   Output: 8 rows (72B), PeakMemory: 0B (0.00%), WallTime: 212.92us (3.99%)
        │   count_0_1 := count(*)
        └─ TableScan[adb:AdbTableHandle{schema=test4dmp, tableName=test, partitionColumnHandles=[id]}]
               Outputs: []
               Estimates: {rows: 4 (0B)}
               Output: 0 rows (0B), PeakMemory: 0B (0.00%), WallTime: 4.76ms (89.12%)
               Input avg.: 0.00 rows, Input std.dev.: ?%
    Fragment 2 [adb:AdbPartitioningHandle{schema=test4dmp, tableName=test, dimTable=false, shards=4, tableEngineType=Cstore, partitionColumns=id, prunedBuckets= empty}]
        Output: 4 rows (36B), PeakMemory: 0B, WallTime: 102.00ns, Input: 0 rows (0B); per task: avg.: 0.00 std.dev.: 0.00
        Output layout: [count_0_1]
        Output partitioning: SINGLE []
        Aggregate(PARTIAL)
        │   Outputs: [count_0_1:bigint]
        │   Estimates: {rows: 1 (8B)}
        │   Output: 8 rows (72B), PeakMemory: 0B (0.00%), WallTime: 252.23us (0.12%)
        │   count_0_1 := count(*)
        └─ ScanFilterProject[table = adb:AdbTableHandle{schema=test4dmp, tableName=test, partitionColumnHandles=[id]}, filterPredicate = (`test4dmp`.`length`(`string_test`) = BIGINT '1')
    ]
               Outputs: []
               Estimates: {rows: 9999 (312.47kB)}/{rows: 9999 (312.47kB)}/{rows: ? (?)}
               Output: 0 rows (0B), PeakMemory: 0B (0.00%), WallTime: 101.31ms (49.84%)
               string_test := AdbColumnHandle{columnName=string_test, type=13, isIndexed=true}
               Input: 9999 rows (110.32kB), Filtered: 100.00%
  2. Bad SQL内存利用率

    直接查看每个Stage的PeakMemory来定位资源消耗的问题。排除计划中Disaster Broadcast的情况,PeakMemory高经常是由于连接数据膨胀、连接的左表数据量太大、TableScan扫描数据量太大等原因造成的,需要从业务角度加条件限制数据量。另外,也可以通过查看每个算子的PeakMemory百分比定位资源消耗最大的算子,再进一步分析。

  3. 源头数据倾斜

    如下示例中,”skewed“为一张分区倾斜的表,分区键为”id“,只有1一个值,”customer“是一张正常的表,将”skewed“和”customer“做连接。以下是部分执行计划分析的输出计划,Stage 3做”skewed“这张表的扫描,可以看到“ per task: avg.: 1666.50 std.dev.: 3726.41”,标准差很大,且是TableScan的Stage,可以判断这张表是倾斜的,需要确认是否要调整分区键。

    Fragment 3 [adb:AdbPartitioningHandle{schema=mytest, tableName=skewed, dimTable=false, shards=8, tableEngineType=Cstore, partitionColumns=id, prunedBuckets= empty}]
        Output: 9999 rows (87.88kB), PeakMemory: 0B, WallTime: 20.00ns, Input: 9999 rows (87.88kB); per task: avg.: 1666.50 std.dev.: 3726.41
        Output layout: [id]
        Output partitioning: BROADCAST []
        TableScan[adb:AdbTableHandle{schema=mytest, tableName=skewed, partitionColumnHandles=[id]}]
            Outputs: [id:bigint]
            Estimates: {rows: 9999 (78.12kB)}
            Output: 19998 rows (175.76kB), PeakMemory: 0B (0.00%), WallTime: 19.98ms (0.17%)
            id := AdbColumnHandle{columnName=id, type=5, isIndexed=true}