Use EXPLAIN to read a query plan

更新时间:
复制 MD 格式

The query optimizer uses database statistics to select the query plan with the lowest total cost. The cost of a query is measured in units of disk pages fetched, which is an estimate of the required disk I/O. You can use the EXPLAIN and EXPLAIN ANALYZE statements to view and improve query plans.

The syntax for EXPLAIN is as follows:

EXPLAIN [ANALYZE] [VERBOSE] statement

EXPLAIN shows the query optimizer's estimated cost for the query plan. For example:

EXPLAIN SELECT * FROM names WHERE id=22; 
EXPLAIN ANALYZE not only displays the query plan but also runs the statement. It provides additional information, such as the actual number of rows executed and the time taken. For example:
EXPLAIN ANALYZE SELECT * FROM names WHERE id=22;

Reading the EXPLAIN output

A query plan is a tree of nodes. The plan is read and executed from the bottom up. Each node in the plan represents an operation, such as a table scan, table join, aggregation, or sort. Each node passes its results to the node directly above it. The bottom nodes of a plan are usually table scan operations. These operations include sequential scans, index scans, and bitmap index scans. If the query requires joins, aggregations, sorts, or other operations on the scanned rows, additional nodes are added above the scan nodes to perform these operations. The top-level plan nodes are typically the database's motion nodes: REDISTRIBUTE, BROADCAST, or GATHER. These operations move data between instance nodes during query processing.

The EXPLAIN output shows one line for each node in the query plan. It displays the node type and the following execution cost estimates:

  • cost: Measured in units of disk page fetches. A value of 1.0 equals one sequential disk page read. The first estimate is the startup cost to retrieve the first row. The second estimate is the total cost to retrieve all rows.
  • rows: The total number of rows that this plan node outputs. This number can be less than the number of rows processed or scanned by the node because of conditional filters. The top-level node shows the estimated number of rows to be returned, updated, or deleted.
  • width: The total number of bytes for all rows that this plan node outputs.

Note the following:

  • The cost of a node includes the cost of its child nodes. The top-level plan node shows the estimated total execution cost for the plan. This is the lowest cost estimated by the optimizer.
  • The cost only reflects the execution time within the database. It does not include time spent outside the database, such as the time to transfer result rows to the client.

EXPLAIN example

The following example shows how to read the EXPLAIN query cost for a query:

EXPLAIN SELECT * FROM names WHERE name = 'Joelle';
                     QUERY PLAN
------------------------------------------------------------
Gather Motion 4:1 (slice1) (cost=0.00..20.88 rows=1 width=13)

   -> Seq Scan on 'names' (cost=0.00..20.88 rows=1 width=13)
         Filter: name::text ~~ 'Joelle'::text

The query optimizer sequentially scans the names table. For each row, it checks the filter condition in the WHERE clause and outputs only the rows that meet the condition. The results of the scan operation are passed to a Gather Motion operation. A Gather Motion operation occurs when the Segments send all rows to the master node. In this example, four Segment nodes execute in parallel and send data to the master node. The estimated startup cost for this plan is 0.00, and the total cost is 20.88 disk page fetches. The optimizer estimates that this query will return one row.

EXPLAIN ANALYZE runs the statement in addition to showing the execution plan. The EXPLAIN ANALYZE plan shows the actual execution costs alongside the optimizer's estimates. It also provides the following additional information:

  • The total runtime of the query execution in milliseconds.
  • The memory used by each slice of the query plan, and the memory reserved for the entire query statement.
  • The number of Segment nodes involved in the plan node operation. Only Segments that return rows are counted.
  • The maximum number of rows returned by the Segment node that produced the most rows for an operation. If multiple Segment nodes produce an equal number of rows, EXPLAIN ANALYZE shows the Segment node that had the longest end time.
  • The ID of the Segment node that produced the most rows for an operation.
  • The amount of memory (work_mem) used for the operation. If the allocated `work_mem` is insufficient to execute the operation in memory, the plan shows the amount of data spilled to disk. For example:
    Work_mem used: 64K bytes avg, 64K bytes max (seg0).
    Work_mem wanted: 90K bytes avg, 90K byes max (seg0) to lessen
    workfile I/O affecting 2 workers.
  • The time in milliseconds for the Segment node that produced the most rows to retrieve the first row, and the time it took for that Segment node to retrieve all rows.
The following example uses the same query to show how to read an EXPLAIN ANALYZE query plan. The bold parts of this plan show the actual timing and number of rows returned for each plan node, along with the memory and time statistics for the entire query.
EXPLAIN ANALYZE SELECT * FROM names WHERE name = 'Joelle';
                     QUERY PLAN
------------------------------------------------------------
Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..20.88 rows=1 width=13)
Rows out: 1 rows at destination with 0.305 ms to first row, 0.537 ms to end, start offset by 0.289 ms.
        -> Seq Scan on names (cost=0.00..20.88 rows=1 width=13)
Rows out: Avg 1 rows x 2 workers. Max 1 rows (seg0) with 0.255 ms to first row, 0.486 ms to end, start offset by 0.968 ms.
                 Filter: name = 'Joelle'::text
 Slice statistics:

      (slice0) Executor memory: 135K bytes.

    (slice1) Executor memory: 151K bytes avg x 2 workers, 151K bytes max (seg0).

Statement statistics: Memory used: 128000K bytes Total runtime: 22.548 ms
The total time to run this query was 22.548 ms. For the Sequential Scan operation, only the Segment (seg0) node returned one row. It took 0.255 ms to retrieve the first row and 0.486 ms to scan all rows. The Gather Motion operation, where the Segment sends data to the master node, received one row. The total time for this operation was 0.537 ms.

Common query operators

Table scan operators (SCAN) scan rows in a table to retrieve a set of rows. They include the following types:
  • Seq Scan: Sequentially scans all rows in the table.
  • Append-only Scan: Scans a row-oriented append-optimized table.
  • Append-only Columnar Scan: Scans rows in a column-oriented append-optimized table.
  • Index Scan: Traverses a B-tree index to retrieve rows from the table.
  • Bitmap Append-only Row-oriented Scan: Collects pointers to rows in an append-only table from an index and sorts them by their location on disk.
  • Dynamic Table Scan: Uses a partition selection function to select partitions. The Function Scan node contains the name of the partition selection function, which can be one of the following:
    • gp_partition_expansion: Selects all partitions in the table.
    • gp_partition_selection: Selects a partition based on an equality expression.
    • gp_partition_inversion: Selects partitions based on a range expression.
    The Function Scan node passes the dynamically selected list of partitions to a Result node, which in turn passes them to a Sequence node.

Table join operators (JOIN) include the following types:

  • Hash Join: Builds a hash table from the smaller table. It uses the join column as the hash key, scans the larger table, calculates the hash key for the join column, and probes the hash table for rows with the same hash key. A hash join is usually the fastest join in a database. The Hash Cond in the plan identifies the columns to be joined.
  • Nested Loop Join: Iterates through every row of the outer table and, for each one, scans the inner table for matching rows. A nested loop join typically requires broadcasting one of the tables so that all rows in one table can be joined with all rows in the other. Nested loop joins perform well on smaller tables or on tables constrained by an index. However, using a nested loop join on large tables can impact performance. To make the optimizer prefer a Hash Join over a Nested Loop Join, you can set the `enable_nestloop` configuration parameter to `OFF`.
  • Merge Join: Sorts two tables on the join keys and then merges them. A merge join is fast for pre-sorted data. The optimizer will consider a Merge Join if the `enable_mergejoin` parameter is set to `ON`.
Motion operators (MOTION) move data between Segment nodes. They include the following types:
  • Broadcast motion: Each Segment node sends its rows to all other Segment nodes. This action provides each Segment node with a complete local copy of the table. The optimizer typically chooses a broadcast motion only for small tables. For large tables, a broadcast motion can be slow.
  • Redistribute motion: Each Segment node re-hashes the data and sends the rows to the Segment node that corresponds to the hash key.
  • Gather motion: Result data from all Segments is merged and sent to a node, usually the master node. This is the final operation for most query plans.
Other operators that appear in query plans include the following:
  • Materialize: The optimizer materializes the result of a subquery.
  • InitPlan: A pre-query used in dynamic partition elimination. This pre-query is executed when the optimizer does not know the values required to identify the partitions to scan at execution time.
  • Sort: Sorts all data for another operation, such as an Aggregation or a Merge Join.
  • Group By: Groups rows by one or more columns.
  • Group/Hash Aggregate: Uses hashing to perform an aggregation operation on rows.
  • Append: Concatenates datasets. For example, it is used when integrating rows scanned from each partition of a partitioned table.
  • Filter: Selects rows using conditions from a WHERE clause.
  • Limit: Limits the number of rows returned.

Determining the query optimizer

You can check the EXPLAIN output to determine whether the plan was generated by GPORCA or the legacy query optimizer. This information appears at the end of the EXPLAIN output. The Settings line shows the setting for the `OPTIMIZER` configuration parameter. The `Optimizer status` line shows which optimizer generated the plan.

Example using the GPORCA optimizer:
                       QUERY PLAN
------------------------------------------------------------------------------------
 Aggregate  (cost=0.00..296.14 rows=1 width=8)
   ->  Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..295.10 rows=1 width=8)
         ->  Aggregate  (cost=0.00..294.10 rows=1 width=8)
               ->  Table Scan on part  (cost=0.00..97.69 rows=100040 width=1)
 Settings:  optimizer=on
 Optimizer status: PQO version 1.609
(5 rows)
explain select count(*) from part;
Example using the legacy optimizer:
                       QUERY PLAN
----------------------------------------------------------------------------------------
 Aggregate  (cost=3519.05..3519.06 rows=1 width=8)
   ->  Gather Motion 2:1  (slice1; segments: 2)  (cost=3518.99..3519.03 rows=1 width=8)
         ->  Aggregate  (cost=3518.99..3519.00 rows=1 width=8)
               ->  Seq Scan on part  (cost=0.00..3018.79 rows=100040 width=1)
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
(5 rows)