本文介绍云数据库 SelectDB 版的Explain功能和使用方式,为查询性能分析提供参考。
查询计划树
SQL是一个描述性语言,您可以通过一个SQL来描述需要获取的数据。一个SQL的具体执行方式依赖于数据库的实现。查询规划器就是用来决定数据库如何具体一个SQL。
例如,您指定了一个Join算子,则查询规划器需要决定使用的具体Join算法,比如Hash Join或Merge Sort Join、Shuffle或Broadcast、Join顺序是否需要调整以避免笛卡尔积、确定最终执行的节点等。
SelectDB的查询规划过程是先将一个SQL语句转换成一个单机执行计划树,如下所示。
┌────┐
│Sort│
└────┘
│
┌───────────┐
│Aggregation│
└───────────┘
│
┌────┐
│Join│
└────┘
┌───┴────┐
┌──────┐ ┌──────┐
│Scan-1│ │Scan-2│
└──────┘ └──────┘
查询规划器会根据具体的算子执行方式、数据的具体分布,将单机查询计划转换为分布式查询计划。分布式查询计划是由多个Fragment组成的,每个Fragment负责查询计划的一部分,各个Fragment之间会通过ExchangeNode算子进行数据的传输。
例如单机计划分成了两个Fragment:F1和F2。两个Fragment之间通过一个ExchangeNode节点传输数据。一个Fragment会进一步的划分为多个执行单元,执行单元是最终具体的执行实例。划分成多个执行单元有助于充分利用机器资源,提升一个Fragment的执行并发度,如下所示。
┌────┐
│Sort│
│F1 │
└────┘
│
┌───────────┐
│Aggregation│
│F1 │
└───────────┘
│
┌────┐
│Join│
│F1 │
└────┘
┌──────┴────┐
┌──────┐ ┌────────────┐
│Scan-1│ │ExchangeNode│
│F1 │ │F1 │
└──────┘ └────────────┘
│
┌──────────────┐
│DataStreamDink│
│F2 │
└──────────────┘
│
┌──────┐
│Scan-2│
│F2 │
└──────┘
查看查询计划
可以通过如下三个命令查看一个SQL的执行计划。
EXPLAIN GRAPH SELECT ...;
或DESC GRAPH SELECT ...;
。EXPLAIN SELECT ...;
。EXPLAIN VERBOSE SELECT ...;
。
EXPLAIN GRAPH SELECT或DESC GRAPH SELECT
命令EXPLAIN GRAPH SELECT ...;
或DESC GRAPH SELECT ...;
以图形化的方式展示一个查询计划,这个命令可以比较直观的展示查询计划的树形结构以及Fragment的划分情况,示例如下。
EXPALIN graph SELECT tbl1.k1, SUM(tbl1.k2) FROM tbl1 JOIN tbl2 ON tbl1.k1 = tbl2.k1 GROUP BY tbl1.k1 ORDER BY tbl1.k1;
+---------------------------------------------------------------------------------------------------------------------------------+
| Explain String |
+---------------------------------------------------------------------------------------------------------------------------------+
| |
| ┌───────────────┐ |
| │[9: ResultSink]│ |
| │[Fragment: 4] │ |
| │RESULT SINK │ |
| └───────────────┘ |
| │ |
| ┌─────────────────────┐ |
| │[9: MERGING-EXCHANGE]│ |
| │[Fragment: 4] │ |
| └─────────────────────┘ |
| │ |
| ┌───────────────────┐ |
| │[9: DataStreamSink]│ |
| │[Fragment: 3] │ |
| │STREAM DATA SINK │ |
| │ EXCHANGE ID: 09 │ |
| │ UNPARTITIONED │ |
| └───────────────────┘ |
| │ |
| ┌─────────────┐ |
| │[4: TOP-N] │ |
| │[Fragment: 3]│ |
| └─────────────┘ |
| │ |
| ┌───────────────────────────────┐ |
| │[8: AGGREGATE (merge finalize)]│ |
| │[Fragment: 3] │ |
| └───────────────────────────────┘ |
| │ |
| ┌─────────────┐ |
| │[7: EXCHANGE]│ |
| │[Fragment: 3]│ |
| └─────────────┘ |
| │ |
| ┌───────────────────┐ |
| │[7: DataStreamSink]│ |
| │[Fragment: 2] │ |
| │STREAM DATA SINK │ |
| │ EXCHANGE ID: 07 │ |
| │ HASH_PARTITIONED │ |
| └───────────────────┘ |
| │ |
| ┌─────────────────────────────────┐ |
| │[3: AGGREGATE (update serialize)]│ |
| │[Fragment: 2] │ |
| │STREAMING │ |
| └─────────────────────────────────┘ |
| │ |
| ┌─────────────────────────────────┐ |
| │[2: HASH JOIN] │ |
| │[Fragment: 2] │ |
| │join op: INNER JOIN (PARTITIONED)│ |
| └─────────────────────────────────┘ |
| ┌──────────┴──────────┐ |
| ┌─────────────┐ ┌─────────────┐ |
| │[5: EXCHANGE]│ │[6: EXCHANGE]│ |
| │[Fragment: 2]│ │[Fragment: 2]│ |
| └─────────────┘ └─────────────┘ |
| │ │ |
| ┌───────────────────┐ ┌───────────────────┐ |
| │[5: DataStreamSink]│ │[6: DataStreamSink]│ |
| │[Fragment: 0] │ │[Fragment: 1] │ |
| │STREAM DATA SINK │ │STREAM DATA SINK │ |
| │ EXCHANGE ID: 05 │ │ EXCHANGE ID: 06 │ |
| │ HASH_PARTITIONED │ │ HASH_PARTITIONED │ |
| └───────────────────┘ └───────────────────┘ |
| │ │ |
| ┌─────────────────┐ ┌─────────────────┐ |
| │[0: OlapScanNode]│ │[1: OlapScanNode]│ |
| │[Fragment: 0] │ │[Fragment: 1] │ |
| │TABLE: tbl1 │ │TABLE: tbl2 │ |
| └─────────────────┘ └─────────────────┘ |
+---------------------------------------------------------------------------------------------------------------------------------+
上述查询结果中查询计划树被分为了5个Fragment:0、1、2、3、4。如OlapScanNode节点上的[Fragment: 0]
表示这个节点属于Fragment 0。每个Fragment之间都通过DataStreamSink和ExchangeNode进行数据传输。
EXPLAIN SELECT
图形命令仅展示简化后的节点信息,如果查看更具体的节点信息,例如下推到节点上的过滤条件等,则需要通过第二个命令EXPLAIN SELECT ...;
查看更详细的文字版信息,示例如下。
EXPALIN SELECT tbl1.k1, sum(tbl1.k2) FROM tbl1 JOIN tbl2 ON tbl1.k1 = tbl2.k1 GROUP BY tbl1.k1 ORDER BY tbl1.k1;
+----------------------------------------------------------------------------------+
| EXPALIN String |
+----------------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:<slot 5> <slot 3> `tbl1`.`k1` | <slot 6> <slot 4> sum(`tbl1`.`k2`) |
| PARTITION: UNPARTITIONED |
| |
| RESULT SINK |
| |
| 9:MERGING-EXCHANGE |
| limit: 65535 |
| |
| PLAN FRAGMENT 1 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: <slot 3> `tbl1`.`k1` |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 09 |
| UNPARTITIONED |
| |
| 4:TOP-N |
| | order by: <slot 5> <slot 3> `tbl1`.`k1` ASC |
| | offset: 0 |
| | limit: 65535 |
| | |
| 8:AGGREGATE (merge finalize) |
| | output: sum(<slot 4> sum(`tbl1`.`k2`)) |
| | group by: <slot 3> `tbl1`.`k1` |
| | cardinality=-1 |
| | |
| 7:EXCHANGE |
| |
| PLAN FRAGMENT 2 |
| OUTPUT EXPRS: |
| PARTITION: HASH_PARTITIONED: `tbl1`.`k1` |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 07 |
| HASH_PARTITIONED: <slot 3> `tbl1`.`k1` |
| |
| 3:AGGREGATE (update serialize) |
| | STREAMING |
| | output: sum(`tbl1`.`k2`) |
| | group by: `tbl1`.`k1` |
| | cardinality=-1 |
| | |
| 2:HASH JOIN |
| | join op: INNER JOIN (PARTITIONED) |
| | runtime filter: false |
| | hash predicates: |
| | colocate: false, reason: table not in the same group |
| | equal join conjunct: `tbl1`.`k1` = `tbl2`.`k1` |
| | cardinality=2 |
| | |
| |----6:EXCHANGE |
| | |
| 5:EXCHANGE |
| |
| PLAN FRAGMENT 3 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 06 |
| HASH_PARTITIONED: `tbl2`.`k1` |
| |
| 1:OlapScanNode |
| TABLE: tbl2 |
| PREAGGREGATION: ON |
| partitions=1/1 |
| rollup: tbl2 |
| tabletRatio=3/3 |
| tabletList=105104776,105104780,105104784 |
| cardinality=1 |
| avgRowSize=4.0 |
| numNodes=6 |
| |
| PLAN FRAGMENT 4 |
| OUTPUT EXPRS: |
| PARTITION: RANDOM |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 05 |
| HASH_PARTITIONED: `tbl1`.`k1` |
| |
| 0:OlapScanNode |
| TABLE: tbl1 |
| PREAGGREGATION: ON |
| partitions=1/1 |
| rollup: tbl1 |
| tabletRatio=3/3 |
| tabletList=105104752,105104763,105104767 |
| cardinality=2 |
| avgRowSize=8.0 |
| numNodes=6 |
+----------------------------------------------------------------------------------+
EXPLAIN VERBOSE SELECT
第三个命令EXPLAIN VERBOSE SELECT ...;
相比第二个命令可以查看更详细的执行计划信息,示例如下。
EXPALIN verbose SELECT tbl1.k1, sum(tbl1.k2) FROM tbl1 JOIN tbl2 ON tbl1.k1 = tbl2.k1 GROUP BY tbl1.k1 ORDER BY tbl1.k1;
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPALIN String |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS:<slot 5> <slot 3> `tbl1`.`k1` | <slot 6> <slot 4> sum(`tbl1`.`k2`) |
| PARTITION: UNPARTITIONED |
| |
| VRESULT SINK |
| |
| 6:VMERGING-EXCHANGE |
| limit: 65535 |
| tuple ids: 3 |
| |
| PLAN FRAGMENT 1 |
| |
| PARTITION: HASH_PARTITIONED: `default_cluster:test`.`tbl1`.`k2` |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 06 |
| UNPARTITIONED |
| |
| 4:VTOP-N |
| | order by: <slot 5> <slot 3> `tbl1`.`k1` ASC |
| | offset: 0 |
| | limit: 65535 |
| | tuple ids: 3 |
| | |
| 3:VAGGREGATE (update finalize) |
| | output: sum(<slot 8>) |
| | group by: <slot 7> |
| | cardinality=-1 |
| | tuple ids: 2 |
| | |
| 2:VHASH JOIN |
| | join op: INNER JOIN(BROADCAST)[Tables are not in the same group] |
| | equal join conjunct: CAST(`tbl1`.`k1` AS DATETIME) = `tbl2`.`k1` |
| | runtime filters: RF000[in_or_bloom] <- `tbl2`.`k1` |
| | cardinality=0 |
| | vec output tuple id: 4 | tuple ids: 0 1 |
| | |
| |----5:VEXCHANGE |
| | tuple ids: 1 |
| | |
| 0:VOlapScanNode |
| TABLE: tbl1(null), PREAGGREGATION: OFF. Reason: the type of agg on StorageEngine's Key column should only be MAX or MIN.agg expr: sum(`tbl1`.`k2`) |
| runtime filters: RF000[in_or_bloom] -> CAST(`tbl1`.`k1` AS DATETIME) |
| partitions=0/1, tablets=0/0, tabletList= |
| cardinality=0, avgRowSize=20.0, numNodes=1 |
| tuple ids: 0 |
| |
| PLAN FRAGMENT 2 |
| |
| PARTITION: HASH_PARTITIONED: `default_cluster:test`.`tbl2`.`k2` |
| |
| STREAM DATA SINK |
| EXCHANGE ID: 05 |
| UNPARTITIONED |
| |
| 1:VOlapScanNode |
| TABLE: tbl2(null), PREAGGREGATION: OFF. Reason: null |
| partitions=0/1, tablets=0/0, tabletList= |
| cardinality=0, avgRowSize=16.0, numNodes=1 |
| tuple ids: 1 |
| |
| Tuples: |
| TupleDescriptor{id=0, tbl=tbl1, byteSize=32, materialized=true} |
| SlotDescriptor{id=0, col=k1, type=DATE} |
| parent=0 |
| materialized=true |
| byteSize=16 |
| byteOffset=16 |
| nullIndicatorByte=0 |
| nullIndicatorBit=-1 |
| slotIdx=1 |
| |
| SlotDescriptor{id=2, col=k2, type=INT} |
| parent=0 |
| materialized=true |
| byteSize=4 |
| byteOffset=0 |
| nullIndicatorByte=0 |
| nullIndicatorBit=-1 |
| slotIdx=0 |
| |
| |
| TupleDescriptor{id=1, tbl=tbl2, byteSize=16, materialized=true} |
| SlotDescriptor{id=1, col=k1, type=DATETIME} |
| parent=1 |
| materialized=true |
| byteSize=16 |
| byteOffset=0 |
| nullIndicatorByte=0 |
| nullIndicatorBit=-1 |
| slotIdx=0 |
| |
| |
| TupleDescriptor{id=2, tbl=null, byteSize=32, materialized=true} |
| SlotDescriptor{id=3, col=null, type=DATE} |
| parent=2 |
| materialized=true |
| byteSize=16 |
| byteOffset=16 |
| nullIndicatorByte=0 |
| nullIndicatorBit=-1 |
| slotIdx=1 |
| |
| SlotDescriptor{id=4, col=null, type=BIGINT} |
| parent=2 |
| materialized=true |
| byteSize=8 |
| byteOffset=0 |
| nullIndicatorByte=0 |
| nullIndicatorBit=-1 |
| slotIdx=0 |
| |
| |
| TupleDescriptor{id=3, tbl=null, byteSize=32, materialized=true} |
| SlotDescriptor{id=5, col=null, type=DATE} |
| parent=3 |
| materialized=true |
| byteSize=16 |
| byteOffset=16 |
| nullIndicatorByte=0 |
| nullIndicatorBit=-1 |
| slotIdx=1 |
| |
| SlotDescriptor{id=6, col=null, type=BIGINT} |
| parent=3 |
| materialized=true |
| byteSize=8 |
| byteOffset=0 |
| nullIndicatorByte=0 |
| nullIndicatorBit=-1 |
| slotIdx=0 |
| |
| |
| TupleDescriptor{id=4, tbl=null, byteSize=48, materialized=true} |
| SlotDescriptor{id=7, col=k1, type=DATE} |
| parent=4 |
| materialized=true |
| byteSize=16 |
| byteOffset=16 |
| nullIndicatorByte=0 |
| nullIndicatorBit=-1 |
| slotIdx=1 |
| |
| SlotDescriptor{id=8, col=k2, type=INT} |
| parent=4 |
| materialized=true |
| byteSize=4 |
| byteOffset=0 |
| nullIndicatorByte=0 |
| nullIndicatorBit=-1 |
| slotIdx=0 |
| |
| SlotDescriptor{id=9, col=k1, type=DATETIME} |
| parent=4 |
| materialized=true |
| byteSize=16 |
| byteOffset=32 |
| nullIndicatorByte=0 |
| nullIndicatorBit=-1 |
| slotIdx=2 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
160 rows in set (0.00 sec)