Query acceleration

更新时间:
复制 MD 格式

This topic describes the single-node query acceleration capabilities, performance benchmarks, and observability methods for PolarDB for MySQL columnar tables. By default, the optimizer determines whether to use a massively parallel processing (MPP) plan. After a query is dispatched to a specific node, the node relies on single-node columnar execution optimizations to reduce latency. This topic focuses on these node-level optimizations.

Background and value

As analytical query data volumes grow, single-node execution often becomes a bottleneck in the following areas:

  • Reading columns that the query does not need

  • Scanning data blocks that are ultimately filtered out

  • Repeatedly reading and decoding the same data pages

  • Insufficient single-node cache coverage for frequently accessed data

  • High CPU and memory bandwidth demands from large-table aggregation, joins, and sorting

The columnar execution acceleration for columnar tables addresses these bottlenecks by reading only the required columns, processing only the necessary data, and reusing decoded results. This reduces query latency and improves throughput.

From an execution pipeline perspective, these optimizations include:

  • Reading only the columns required by the query, eliminating unnecessary column scans.

  • Pushing pushable predicates down to the storage layer for early stripe-level and block-level pruning.

  • Using visibility views to filter files early, preventing irrelevant rows from entering aggregation, sorting, and expression computation stages.

  • Reusing OSS page cache and metadata cache to avoid redundant reads and decodes.

  • Delivering data to the execution layer in batches to leverage vectorized batch processing.

Relationship with multi-node parallel execution

This topic focuses on single-node execution acceleration. For multi-node parallel execution capabilities, use cases, and partition design for columnar tables, see Multi-node analysis.

To determine whether a specific SQL statement uses an MPP plan, check whether the execution plan contains an Exchange operator.

EXPLAIN SELECT /*+ SET_VAR(imci_plan_use_mpp=forced) */ COUNT(*) FROM nation;

If the execution plan contains an Exchange operator, the SQL statement can use multi-node parallel execution. If the actual execution plan includes an Exchange operator, the statement runs as an MPP plan. If no Exchange operator is included, the statement runs as a single-node plan.

Performance benchmarks

The following table shows the TPC-H 100 GB benchmark results for columnar tables in single-node warm cache mode on a 32-core, 256 GB instance:

Note

The TPC-H implementation in this topic is based on TPC-H benchmarking. These test results cannot be compared with published TPC-H benchmark results because the tests do not meet all TPC-H requirements.

Query

Time (s)

Q1

1.175

Q2

0.178

Q3

0.577

Q4

0.433

Q5

0.522

Q6

0.366

Q7

0.633

Q8

0.528

Q9

2.817

Q10

0.935

Q11

0.218

Q12

0.535

Q13

1.255

Q14

0.442

Q15

0.889

Q16

0.553

Q17

0.738

Q18

2.381

Q19

0.759

Q20

0.453

Q21

1.308

Q22

0.299

TOTAL

17.994