Execution plans

更新时间:
复制 MD 格式

The query optimizer selects the execution plan with the lowest estimated cost for each query, using statistics maintained by the database. Cost is measured in disk I/O, expressed as the number of disk page fetches. Use EXPLAIN and EXPLAIN ANALYZE to inspect and optimize query plans.

How it works

A query plan is a tree of nodes, where each node represents a single operation — a table scan, a join, an aggregation, or a sort. Read plans from the bottom up: each node feeds its output rows into the node directly above it.

  • Bottom nodes — table scan operations such as sequential scans, index scans, or bitmap index scans.

  • Middle nodes — operations such as joins, aggregations, and sorts, stacked above the scan nodes.

  • Top nodes — motion nodes (redistribute, broadcast, or gather) that move rows between compute nodes.

The cost of any node includes the cumulative cost of all its child nodes. The topmost node carries the estimated total execution cost — the number the optimizer minimizes. Cost does not include the time to transmit result rows to the client.

EXPLAIN output

EXPLAIN prints the query plan without executing the query. Each line in the output corresponds to one plan node and shows three estimates:

FieldDescription
costExpressed as startup_cost..total_cost. Startup cost is the estimated cost to return the first row; total cost covers all rows. The unit is disk page fetches, where 1.0 equals one sequential disk page read.
rowsEstimated number of rows this node produces. For the topmost node, this approximates the rows the query returns, updates, or deletes. Filters in WHERE clauses reduce this below the number of rows scanned.
widthEstimated total size in bytes of all rows produced by this node.

Read an EXPLAIN output

EXPLAIN SELECT * FROM names WHERE name = 'Joelle';
                     QUERY PLAN
------------------------------------------------------------
Gather Motion 4:1 (slice1) (cost=0.00..20.88 rows=1 width=13)

   -> Seq Scan on 'names' (cost=0.00..20.88 rows=1 width=13)
         Filter: name::text ~~ 'Joelle'::text

Reading from the bottom up:

  1. Seq Scan — scans the names table row by row, applying the WHERE name = 'Joelle' filter. Estimated startup cost: 0.00; total cost: 20.88; estimated rows returned: 1.

  2. Gather Motion 4:1 — four compute nodes send their filtered rows to the coordinator node. The 4:1 notation means four senders, one receiver.

EXPLAIN ANALYZE output

EXPLAIN ANALYZE both plans and executes the query, so it shows actual timings alongside the optimizer's estimates. It reports the following additional information:

  • Total runtime — wall-clock time in milliseconds for the full query.

  • Memory per slice — executor memory used by each query slice, plus the total memory reserved for the statement.

  • Compute nodes involved — number of compute nodes that returned rows for each plan node operation.

  • Max rows — the highest row count returned by any single compute node for a given operation. If multiple nodes return the same row count, this reflects the slowest node.

  • Node ID — the compute node that returned the most rows.

  • work_mem — memory used per operation. If a node spills data to disk, the output shows both the memory used and the memory that would have prevented the spill.

  • Timing — time for the busiest compute node to return the first row and all rows.

Read an EXPLAIN ANALYZE output

EXPLAIN ANALYZE SELECT * FROM names WHERE name = 'Joelle';
                     QUERY PLAN
------------------------------------------------------------
Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..20.88 rows=1 width=13)
Rows out: 1 rows at destination with 0.305 ms to first row, 0.537 ms to end, start offset by 0.289 ms.
        -> Seq Scan on names (cost=0.00..20.88 rows=1 width=13)
Rows out: Avg 1 rows x 2 workers. Max 1 rows (seg0) with 0.255 ms to first row, 0.486 ms to end, start offset by 0.968 ms.
                 Filter: name = 'Joelle'::text
 Slice statistics:

      (slice0) Executor memory: 135K bytes.

    (slice1) Executor memory: 151K bytes avg x 2 workers, 151K bytes max (seg0).

Statement statistics: Memory used: 128000K bytes Total runtime: 22.548 ms

Key observations:

  • Seq Scan — only seg0 returned rows (1 row). Time to first row: 0.255 ms; time to scan all rows: 0.486 ms.

  • Gather Motion — received 1 row from the compute nodes. Total operation time: 0.537 ms.

  • Total runtime — 22.548 ms.

  • Memory — slice1 used 151K bytes per worker; the full statement reserved 128,000K bytes.

Query operators

Scan operators

Scan operators retrieve rows from tables.

OperatorDescription
Seq ScanScans all rows in a table sequentially.
Append-only ScanScans append-optimized row-oriented tables.
Append-only Columnar ScanScans append-optimized column-oriented tables.
Index ScanTraverses a B-tree index to fetch rows from a table.
Bitmap Append-only Row-oriented ScanGathers row pointers from an index for an append-optimized table, then sorts them by disk location before fetching.
Dynamic Table ScanSelects which partitions to scan at runtime using one of the following functions, contained in a Function Scan node: gp_partition_expansion (all partitions), gp_partition_selection (equality-based), or gp_partition_inversion (range-based). The Function Scan node passes the selected partition list to the Result node, which passes it to the Sequence node.

Join operators

Join operators combine rows from two tables.

OperatorDescriptionWhen to use
Hash JoinBuilds a hash table from the smaller table using the join column as a hash key, then probes it while scanning the larger table. The Hash Cond field in the plan identifies the joined columns.Default for most joins; typically the fastest option.
Nested Loop JoinIterates through every row in the larger table and scans the smaller table for each iteration. Requires broadcasting one table to all compute nodes.Small tables or queries constrained by an index. Avoid for large tables. Set enable_nestloop=off (the default) to let the optimizer prefer Hash Join instead.
Merge JoinSorts both tables and merges them.Pre-ordered data. Set enable_mergejoin=on to prefer this join type.

Motion operators

Motion operators move rows between compute nodes.

OperatorDescription
Broadcast motionEach compute node sends a full copy of its rows to every other compute node. Used for small tables. Not suitable for large tables.
Redistribute motionEach compute node rehashes its rows and sends them to the appropriate compute node based on the hash key. Used when data is not already distributed on the join key.
Gather motionAll compute nodes send their results to the coordinator node. This is the final operation in most query plans.

Other operators

OperatorDescription
MaterializeMaterializes a subquery result.
InitPlanExecutes a pre-query used in dynamic partition elimination, when the optimizer does not know partition values ahead of time.
SortSorts rows before operations that require ordering, such as Merge Join or aggregation.
Group ByGroups rows by one or more columns.
Group/Hash AggregateAggregates rows using a hash algorithm.
AppendConcatenates row sets scanned from multiple partitions of a partitioned table.
FilterFilters rows using the criterion from a WHERE clause.
LimitCaps the number of rows returned.

Identify the active query optimizer

AnalyticDB for PostgreSQL supports two query optimizers: ORCA (also called GPORCA) and the legacy query optimizer. The end of every EXPLAIN output shows which optimizer generated the plan.

ORCA (optimizer=on):

                       QUERY PLAN
------------------------------------------------------------------------------------
 Aggregate  (cost=0.00..296.14 rows=1 width=8)
   ->  Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..295.10 rows=1 width=8)
         ->  Aggregate  (cost=0.00..294.10 rows=1 width=8)
               ->  Table Scan on part  (cost=0.00..97.69 rows=100040 width=1)
 Settings:  optimizer=on
 Optimizer status: PQO version 1.609
(5 rows)

Legacy query optimizer (optimizer=off):

                       QUERY PLAN
----------------------------------------------------------------------------------------
 Aggregate  (cost=3519.05..3519.06 rows=1 width=8)
   ->  Gather Motion 2:1  (slice1; segments: 2)  (cost=3518.99..3519.03 rows=1 width=8)
         ->  Aggregate  (cost=3518.99..3519.00 rows=1 width=8)
               ->  Seq Scan on part  (cost=0.00..3018.79 rows=100040 width=1)
 Settings:  optimizer=off
 Optimizer status: legacy query optimizer
(5 rows)

The Settings line shows the optimizer parameter value. The Optimizer status line confirms whether ORCA or the legacy optimizer generated the plan. PQO in the status line stands for the Parallel Query Optimizer, the historical name for GPORCA.