本文介绍如何使用EXPLAIN语句查看执行计划输出中与并行查询相关的内容。

查询用表

本文示例中使用pq_test表进行并行查询测试,其中:

  • 表结构如下:
    SHOW CREATE TABLE pq_test\G
    *************************** 1. row ***************************
           Table: pq_test
    Create Table: CREATE TABLE `pq_test` (
      `id` BIGINT(20) NOT NULL AUTO_INCREMENT,
      `help_topic_id` INT(10) UNSIGNED NOT NULL,
      `name` CHAR(64) NOT NULL,
      `help_category_id` SMALLINT(5) UNSIGNED NOT NULL,
      `description` TEXT NOT NULL,
      `example` TEXT NOT NULL,
      `url` TEXT NOT NULL,
      PRIMARY KEY (`id`)
    ) ENGINE=InnoDB AUTO_INCREMENT=21495809 DEFAULT CHARSET=utf8
    1 row in set (0.00 sec)
  • 表大小如下:
    SHOW TABLE STATUS\G
    *************************** 1. row ***************************
               Name: pq_test
             Engine: InnoDB
            Version: 10
         Row_format: Dynamic
               Rows: 20064988
     Avg_row_length: 1898
        Data_length: 38085328896
    Max_data_length: 0
       Index_length: 0
          Data_free: 4194304
     Auto_increment: 21495809
        Create_time: 2019-07-30 01:35:27
        Update_time: NULL
         Check_time: NULL
          Collation: utf8_general_ci
           Checksum: NULL
     Create_options:
            Comment:
    1 row in set (0.02 sec)
  • 查询SQL如下:
    SELECT COUNT(*) FROM pq_test;

EXPLAIN查询语句

  • 通过EXPLAIN语句查看不使用并行查询的情况。查询语句如下:
    SET max_parallel_degree=0; EXPLAIN SELECT COUNT(*) FROM pq_test\G

    查询结果如下:

    *************************** 1. row ***************************
               Id: 1
      Select_type: SIMPLE
            Table: pq_test
      Partitions: NULL
             Type: index
    Possible_keys: NULL
              Key: PRIMARY
          Key_len: 8
              Ref: NULL
             Rows: 20064988
         Filtered: 100.00
            Extra: Using index
    1 row in set, 1 warning (0.03 sec)
  • PolarDB MySQL引擎8.0.1版本中,通过EXPLAIN语句查看使用并行查询的情况,查询语句如下:
    EXPLAIN SELECT COUNT(*) FROM pq_test\G
    查询结果如下:
    *************************** 1. row ***************************
               Id: 1
      Select_type: SIMPLE
            Table: <gather2>
       Partitions: NULL
             Type: ALL
    Possible_keys: NULL
              Key: NULL
          Key_len: NULL
              Ref: NULL
             Rows: 20064988
         Filtered: 100.00
            Extra: NULL
    *************************** 2. row ***************************
               Id: 2
      Select_type: SIMPLE
            Table: pq_test
       Partitions: NULL
             Type: index
    Possible_keys: NULL
              Key: PRIMARY
          Key_len: 8
              Ref: NULL
             Rows: 10032494
         Filtered: 100.00
            Extra: Parallel scan (2 workers); Using index
    2 rows in set, 1 warning (0.00 sec)
    PolarDB MySQL引擎8.0.2版本中,通过EXPLAIN语句查看使用并行查询的情况,可以使用更为清晰的EXPLAIN FORMAT=TREE查询。查询语句如下:
    EXPLAIN FORMAT=TREE SELECT COUNT(*) FROM pq_test\G
    查询结果如下:
    *************************** 1. row ***************************
    EXPLAIN: -> Count rows in gather  (cost=... rows=1)
        -> Gather (slice: 1; workers: 2)
            -> Parallel index scan on pq_test using PRIMARY, with parallel partitions: 2  (cost=... rows=20064988)
    从上述结果可以看出:
    • 上述并行计划中包含了Gather操作,该操作负责汇总所有Worker返回的中间结果。
    • 从执行计划输出的Extra信息中可以看到pq_test表使用了Parallel scan(并行扫描)策略,期望用2个Workers来并行执行。
  • PolarDB MySQL引擎8.0.1版本中,通过带有子查询的EXPLAIN语句查看使用并行查询的情况,查询语句如下:
    EXPLAIN SELECT
        o_orderpriority,
        COUNT(*) as order_count
    FROM
        orders
    WHERE
        o_orderdate >= '1994-04-01'
        AND o_orderdate < date_add('1994-04-01', interval '3' month)
        AND exists (
            SELECT
                *
            FROM
                lineitem
            WHERE
                l_orderkey = o_orderkey
                and l_commitdate < l_receiptdate
        )
    GROUP BY
        o_orderpriority
    ORDER BY
        o_orderpriority\G
    查询结果如下:
    *************************** 1. row ***************************
               id: 1
      select_type: PRIMARY
            table: <gather1.1>
       partitions: NULL
             type: ALL
    possible_keys: NULL
              key: NULL
          key_len: NULL
              ref: NULL
             rows: 1489068
         filtered: 100.00
            Extra: Using temporary; Using filesort
    *************************** 2. row ***************************
               id: 1
      select_type: PRIMARY
            table: orders
       partitions: NULL
             type: range
    possible_keys: i_o_orderdate
              key: i_o_orderdate
          key_len: 3
              ref: NULL
             rows: 568369
         filtered: 100.00
            Extra: Parallel scan (2 workers); Using index condition; Using where; Using temporary
    *************************** 3. row ***************************
               id: 2
      select_type: DEPENDENT SUBQUERY
            table: lineitem
       partitions: NULL
             type: ref
    possible_keys: PRIMARY,i_l_orderkey
              key: PRIMARY
          key_len: 4
              ref: tpch_10.orders.O_ORDERKEY
             rows: 4
         filtered: 33.33
            Extra: Parallel pushdown; Using where
    3 rows in set, 2 warnings (0.01 sec)

    从上述结果可以看出,select_typeDEPENDENT SUBQUERY的子查询中,Extra显示为Parallel pushdown,表示该子查询使用了Parallel pushdown策略,即整个子查询被整个下推到Worker去执行。

    PolarDB MySQL引擎8.0.2版本中,通过带有子查询的EXPLAIN语句查看使用并行查询的情况,通过FORMAT=TREE可以看到更清晰的执行计划。
    • 示例1:
      查询语句如下:
      EXPLAIN FORMAT=TREE SELECT
          o_orderpriority,
          COUNT(*) as order_count
      FROM
          orders
      WHERE
          o_orderdate >= '1994-04-01'
          AND o_orderdate < date_add('1994-04-01', interval '3' month)
          AND exists (
              SELECT
                  *
              FROM
                  lineitem
              WHERE
                  l_orderkey = o_orderkey
                  and l_commitdate < l_receiptdate
          )
      GROUP BY
          o_orderpriority
      ORDER BY
          o_orderpriority\G
      查询结果如下:
      *************************** 1. row ***************************
      EXPLAIN: -> Sort: <temporary>.o_orderpriority
          -> Table scan on <temporary>
              -> Aggregate using temporary table  (cost=1746887.76 rows=1489068)
                  -> Gather (slice: 1; workers: 2)  (cost=1597980.96 rows=1489068)
                      -> Table scan on <temporary>
                          -> Aggregate using temporary table  (cost=1486290.85 rows=744534)
                              -> Filter: exists(select #2)  (cost=772982.43 rows=568369)
                                  -> Parallel index range scan on orders using i_o_orderdate, with index condition: ((orders.O_ORDERDATE >= DATE'1994-04-01') and (orders.O_ORDERDATE < <cache>(('1994-04-01' + interval '3' month)))), with parallel partitions: 89  (cost=772982.43 rows=568369)
                                  -> Select #2 (subquery in condition; dependent)
                                      -> Limit: 1 row(s)
                                          -> Filter: (lineitem.L_COMMITDATE < lineitem.L_RECEIPTDATE)  (cost=1.14 rows=1)
                                              -> Index lookup on lineitem using PRIMARY (L_ORDERKEY=orders.O_ORDERKEY)  (cost=1.14 rows=4)
      
      1 row in set, 1 warning (0.02 sec)
    • 示例2:
      查询语句如下:
      EXPLAIN FORMAT=TREE select
      	c_name,
      	c_custkey,
      	o_orderkey,
      	o_orderdate,
      	o_totalprice,
      	sum(l_quantity)
      from
      	customer,
      	orders,
      	lineitem
      where
      	o_orderkey in (
      		select
      			l_orderkey
      		from
      			lineitem
      		group by
      			l_orderkey having
      				sum(l_quantity) > 313
      	)
      	and c_custkey = o_custkey
      	and o_orderkey = l_orderkey
      group by
      	c_name,
      	c_custkey,
      	o_orderkey,
      	o_orderdate,
      	o_totalprice
      order by
      	o_totalprice desc,
      	o_orderdate
      LIMIT 100;
      查询结果如下:
      *************************** 1. row ***************************
      EXPLAIN: -> Limit: 100 row(s)  (cost=17816010.29 rows=100)
          -> Gather (merge sort; slice: 1; workers: 32)  (cost=17816010.29 rows=3200)
              -> Limit: 100 row(s)  (cost=17815280.77 rows=100)
                  -> Sort: <temporary>.O_TOTALPRICE DESC, <temporary>.O_ORDERDATE, limit input to 100 row(s) per chunk  (cost=17815280.77 rows=1846093)
                      -> Table scan on <temporary>
                          -> Aggregate using temporary table  (cost=16483455.65 rows=1846093)
                              -> Nested loop inner join  (cost=4483831.82 rows=18460929)
                                  -> Parallel inner hash join (customer.C_CUSTKEY = orders.O_CUSTKEY)  (cost=1464180.48 rows=4660127)
                                      -> Parallel table scan on customer, with parallel partitions: 155, partition_keys: 1  (cost=102461.53 rows=461635)
                                      -> Parallel hash
                                          -> Filter: <in_optimizer>(orders.O_ORDERKEY,orders.O_ORDERKEY in (select #2))  (cost=476012.16 rows=4660127)
                                              -> Parallel table scan on orders, with parallel partitions: 1064, partition_keys: 1  (cost=476012.16 rows=4660127)
                                              -> Select #2 (subquery in condition; run only once; shared access)
                                                  -> Filter: ((orders.O_ORDERKEY = `<materialized_subquery>`.l_orderkey))
                                                      -> Limit: 1 row(s)
                                                          -> Index lookup on <materialized_subquery> using <auto_distinct_key> (l_orderkey=orders.O_ORDERKEY)
                                                              -> Materialize with deduplication
                                                                  -> Gather (slice: 1; workers: 32)  (cost=10557896.46 rows=149899005)
                                                                      -> Filter: (sum(lineitem.L_QUANTITY) > 313)
                                                                          -> Group aggregate: sum(lineitem.L_QUANTITY)  (cost=2828719.00 rows=4684344)
                                                                              -> Parallel index scan on lineitem using PRIMARY, with parallel partitions: 6225, partition_keys: 1  (cost=1900875.93 rows=18556862)
                                  -> Index lookup on lineitem using PRIMARY (L_ORDERKEY=orders.O_ORDERKEY)  (cost=0.25 rows=4)

      从上述结果可以看出,select_typeSUBQUERY的子查询中,在Select #2子查询处显示为Shared access,表示该子查询使用了Shared access策略,即PolarDB优化器选择提前并行执行该子查询并将执行结果Share给外层所有Worker做共享访问。