Execution plans
The query optimizer selects the execution plan with the lowest estimated cost for each query, using statistics maintained by the database. Cost is measured in disk I/O, expressed as the number of disk page fetches. Use EXPLAIN and EXPLAIN ANALYZE to inspect and optimize query plans.
How it works
A query plan is a tree of nodes, where each node represents a single operation — a table scan, a join, an aggregation, or a sort. Read plans from the bottom up: each node feeds its output rows into the node directly above it.
Bottom nodes — table scan operations such as sequential scans, index scans, or bitmap index scans.
Middle nodes — operations such as joins, aggregations, and sorts, stacked above the scan nodes.
Top nodes — motion nodes (redistribute, broadcast, or gather) that move rows between compute nodes.
The cost of any node includes the cumulative cost of all its child nodes. The topmost node carries the estimated total execution cost — the number the optimizer minimizes. Cost does not include the time to transmit result rows to the client.
EXPLAIN output
EXPLAIN prints the query plan without executing the query. Each line in the output corresponds to one plan node and shows three estimates:
| Field | Description |
|---|---|
cost | Expressed as startup_cost..total_cost. Startup cost is the estimated cost to return the first row; total cost covers all rows. The unit is disk page fetches, where 1.0 equals one sequential disk page read. |
rows | Estimated number of rows this node produces. For the topmost node, this approximates the rows the query returns, updates, or deletes. Filters in WHERE clauses reduce this below the number of rows scanned. |
width | Estimated total size in bytes of all rows produced by this node. |
Read an EXPLAIN output
EXPLAIN SELECT * FROM names WHERE name = 'Joelle'; QUERY PLAN
------------------------------------------------------------
Gather Motion 4:1 (slice1) (cost=0.00..20.88 rows=1 width=13)
-> Seq Scan on 'names' (cost=0.00..20.88 rows=1 width=13)
Filter: name::text ~~ 'Joelle'::textReading from the bottom up:
Seq Scan — scans the
namestable row by row, applying theWHERE name = 'Joelle'filter. Estimated startup cost: 0.00; total cost: 20.88; estimated rows returned: 1.Gather Motion 4:1 — four compute nodes send their filtered rows to the coordinator node. The
4:1notation means four senders, one receiver.
EXPLAIN ANALYZE output
EXPLAIN ANALYZE both plans and executes the query, so it shows actual timings alongside the optimizer's estimates. It reports the following additional information:
Total runtime — wall-clock time in milliseconds for the full query.
Memory per slice — executor memory used by each query slice, plus the total memory reserved for the statement.
Compute nodes involved — number of compute nodes that returned rows for each plan node operation.
Max rows — the highest row count returned by any single compute node for a given operation. If multiple nodes return the same row count, this reflects the slowest node.
Node ID — the compute node that returned the most rows.
work_mem — memory used per operation. If a node spills data to disk, the output shows both the memory used and the memory that would have prevented the spill.
Timing — time for the busiest compute node to return the first row and all rows.
Read an EXPLAIN ANALYZE output
EXPLAIN ANALYZE SELECT * FROM names WHERE name = 'Joelle'; QUERY PLAN
------------------------------------------------------------
Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..20.88 rows=1 width=13)
Rows out: 1 rows at destination with 0.305 ms to first row, 0.537 ms to end, start offset by 0.289 ms.
-> Seq Scan on names (cost=0.00..20.88 rows=1 width=13)
Rows out: Avg 1 rows x 2 workers. Max 1 rows (seg0) with 0.255 ms to first row, 0.486 ms to end, start offset by 0.968 ms.
Filter: name = 'Joelle'::text
Slice statistics:
(slice0) Executor memory: 135K bytes.
(slice1) Executor memory: 151K bytes avg x 2 workers, 151K bytes max (seg0).
Statement statistics: Memory used: 128000K bytes Total runtime: 22.548 msKey observations:
Seq Scan — only seg0 returned rows (1 row). Time to first row: 0.255 ms; time to scan all rows: 0.486 ms.
Gather Motion — received 1 row from the compute nodes. Total operation time: 0.537 ms.
Total runtime — 22.548 ms.
Memory — slice1 used 151K bytes per worker; the full statement reserved 128,000K bytes.
Query operators
Scan operators
Scan operators retrieve rows from tables.
| Operator | Description |
|---|---|
| Seq Scan | Scans all rows in a table sequentially. |
| Append-only Scan | Scans append-optimized row-oriented tables. |
| Append-only Columnar Scan | Scans append-optimized column-oriented tables. |
| Index Scan | Traverses a B-tree index to fetch rows from a table. |
| Bitmap Append-only Row-oriented Scan | Gathers row pointers from an index for an append-optimized table, then sorts them by disk location before fetching. |
| Dynamic Table Scan | Selects which partitions to scan at runtime using one of the following functions, contained in a Function Scan node: gp_partition_expansion (all partitions), gp_partition_selection (equality-based), or gp_partition_inversion (range-based). The Function Scan node passes the selected partition list to the Result node, which passes it to the Sequence node. |
Join operators
Join operators combine rows from two tables.
| Operator | Description | When to use |
|---|---|---|
| Hash Join | Builds a hash table from the smaller table using the join column as a hash key, then probes it while scanning the larger table. The Hash Cond field in the plan identifies the joined columns. | Default for most joins; typically the fastest option. |
| Nested Loop Join | Iterates through every row in the larger table and scans the smaller table for each iteration. Requires broadcasting one table to all compute nodes. | Small tables or queries constrained by an index. Avoid for large tables. Set enable_nestloop=off (the default) to let the optimizer prefer Hash Join instead. |
| Merge Join | Sorts both tables and merges them. | Pre-ordered data. Set enable_mergejoin=on to prefer this join type. |
Motion operators
Motion operators move rows between compute nodes.
| Operator | Description |
|---|---|
| Broadcast motion | Each compute node sends a full copy of its rows to every other compute node. Used for small tables. Not suitable for large tables. |
| Redistribute motion | Each compute node rehashes its rows and sends them to the appropriate compute node based on the hash key. Used when data is not already distributed on the join key. |
| Gather motion | All compute nodes send their results to the coordinator node. This is the final operation in most query plans. |
Other operators
| Operator | Description |
|---|---|
| Materialize | Materializes a subquery result. |
| InitPlan | Executes a pre-query used in dynamic partition elimination, when the optimizer does not know partition values ahead of time. |
| Sort | Sorts rows before operations that require ordering, such as Merge Join or aggregation. |
| Group By | Groups rows by one or more columns. |
| Group/Hash Aggregate | Aggregates rows using a hash algorithm. |
| Append | Concatenates row sets scanned from multiple partitions of a partitioned table. |
| Filter | Filters rows using the criterion from a WHERE clause. |
| Limit | Caps the number of rows returned. |
Identify the active query optimizer
AnalyticDB for PostgreSQL supports two query optimizers: ORCA (also called GPORCA) and the legacy query optimizer. The end of every EXPLAIN output shows which optimizer generated the plan.
ORCA (optimizer=on):
QUERY PLAN
------------------------------------------------------------------------------------
Aggregate (cost=0.00..296.14 rows=1 width=8)
-> Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..295.10 rows=1 width=8)
-> Aggregate (cost=0.00..294.10 rows=1 width=8)
-> Table Scan on part (cost=0.00..97.69 rows=100040 width=1)
Settings: optimizer=on
Optimizer status: PQO version 1.609
(5 rows)Legacy query optimizer (optimizer=off):
QUERY PLAN
----------------------------------------------------------------------------------------
Aggregate (cost=3519.05..3519.06 rows=1 width=8)
-> Gather Motion 2:1 (slice1; segments: 2) (cost=3518.99..3519.03 rows=1 width=8)
-> Aggregate (cost=3518.99..3519.00 rows=1 width=8)
-> Seq Scan on part (cost=0.00..3018.79 rows=100040 width=1)
Settings: optimizer=off
Optimizer status: legacy query optimizer
(5 rows)The Settings line shows the optimizer parameter value. The Optimizer status line confirms whether ORCA or the legacy optimizer generated the plan. PQO in the status line stands for the Parallel Query Optimizer, the historical name for GPORCA.