Parallel refresh

更新时间:
复制 MD 格式

In PolarDB for MySQL, both a full refresh and the initial run of an incremental refresh for a materialized view support two parallel write modes: ETL and ePQ. The optimal mode for improving refresh efficiency depends on workload characteristics, such as wide and narrow tables, the cluster series (Cluster Edition or Multi-primary Cluster), and whether compute offloading is needed.

Overview

To improve execution efficiency, PolarDB for MySQL provides the following two parallel write modes for a complete refresh or an initial incremental refresh of a materialized view:

Mode

Activation

Architecture

Use cases

ETL Parallel Write (Default)

Default mode (automatically enabled when parallelism > 1).

This mode offloads queries to remote nodes, where multiple workers consume and write the results in parallel.

Wide tables, large data volumes, and scenarios requiring query offloading.

ePQ Parallel Write

Must be enabled manually.

The PQ framework splits the INSERT SELECT statement for parallel execution by multiple workers and supports MPP offloading.

Narrow tables and scenarios requiring end-to-end parallelism.

Both modes require materialized_view_refresh_parallelism to be set to a value greater than 1.

Compatibility

Version

Parallel refresh modes

Notes

PolarDB for MySQL 8.0.1 with minor kernel version 8.0.1.1.54 or later

ETL mode

ETL mode only.

PolarDB for MySQL 8.0.2 with minor kernel versions 8.0.2.2.34 and 8.0.2.2.35

ePQ mode

  • ePQ mode only.

  • Currently incompatible with partition MDL. If implicit partitioning is enabled, you must disable partition_level_mdl.

Refresh process

A full refresh of a materialized view consists of three stages:

  1. Create a container table: A new container table is created from the materialized view definition. If materialized_view_implicit_partition_num > 1, a hash partition is automatically added to prevent lock contention during parallel writes.

    Note

    MySQL 8.0.1 does not support the materialized_view_implicit_partition_num parameter.

  2. Write data in parallel: The system writes data in parallel according to the selected mode.

  3. Build a columnar index (Optional): If materialized_view_add_columnar_index_after_insert = ON, the system first completes the refresh and then builds a columnar index (IMCI) for the container table.

ETL parallel write mode

How it works

  1. PolarDB selects a remote node (a column-store index read-only node or a row-store read-only node) based on the offload policy and establishes a connection.

  2. The primary (RW) node sends the materialized view's definition query to the remote node for execution.

  3. The remote node streams the result set back. Compressed data transmission is supported.

  4. The primary (RW) node creates multiple worker threads. Each worker thread fetches rows in parallel and writes them to different partitions of the container table.

Features

  • Query computation runs on the remote node and does not consume CPU resources on the primary (RW) node.

  • Write operations run locally on the primary (RW) node to ensure transactional consistency.

  • In wide-table scenarios, parallel efficiency is not affected by row length.

  • Supports multi-primary cluster (Limitless).

Usage

The method for modifying PolarDB cluster parameters differs between the console and a database session. The differences are as follows:

  • In the PolarDB console

    • Compatibility: Some cluster parameters in the PolarDB console have the loose_ prefix for compatibility with MySQL configuration files.

    • Procedure: Find and modify the parameters that have the loose_ prefix.

  • In a database session (using the command line or a client)

    • Procedure: When you connect to the database and use the SET command to modify a parameter, remove the loose_ prefix and use the original parameter name.

Parameter

Example value

Modifiable per session

Description

materialized_view_refresh_parallelism

16

No

Sets the parallelism. A value greater than 1 enables parallel refresh.

materialized_view_complete_refresh_offload_policy

FORCE_IMCI

Yes

Sets the offload policy for manual refreshes on a per-session basis.

After configuring the parameters, run the following command to refresh the materialized view:

-- Refresh the materialized view
REFRESH MATERIALIZED VIEW <schema_name>.<mview_name>;

ePQ parallel write mode

How it works (MPP offloading to a remote column-store index read-only node when the offload policy is FORCE_IMCI)

  1. The system schedules PQ Workers to a remote column-store index read-only node to perform a SELECT scan.

  2. The PQ Workers send the scan results back to the primary (RW) node over the internal network, which supports compression.

  3. Multiple PQ Workers scan the source table data in parallel on the primary (RW) node.

  4. The PQ Workers write the scan results in parallel to the partitions of the container table.

  5. Finally, the primary (RW) node completes the data write locally.

Features

  • End-to-end parallelism: A single parallel execution plan handles both scans and writes.

  • Near-linear scaling for narrow tables.

  • Avoids the single-port network transfer limitation of ETL.

  • Currently not supported in Limitless clusters. The operation automatically falls back to serial execution.

Network compression (MPP offload scenario)

Prerequisites

Parameter

Example value

Settable at session level

Description

materialized_view_etl_compression_algorithm

ZSTD

Yes

Specifies the compression algorithm for data transferred from a replica node during a materialized view refresh. Must be set to ZSTD to enable compression.

  • UNCOMPRESSED (default)

  • ZSTD

materialized_view_etl_compression_level

3

Yes

Specifies the compression level. If this parameter is set to 0 (the default) or an invalid value, the system uses level 3 for the ZSTD algorithm.

  • Valid values: 0–22

  • Default value: 0

Impact

  • Reduces the amount of data transferred between nodes by 60%–80%.

  • Compression and decompression add CPU overhead. This trade-off must be considered in CPU-bound scenarios.

Recommendations

Compressed transfer is recommended only for wide tables. For narrow tables, the CPU overhead from compression might offset the network benefits, so using the UNCOMPRESSED setting is recommended. If your materialized view is a wide table and requires execution offloading, consider using ETL mode directly instead of ePQ mode with MPP offload. ETL mode is inherently well-suited for wide tables.

Procedure

Scenario A: MPP offload to remote node

Parameter

Example value

Settable at session level

Description

materialized_view_refresh_parallelism

16

No

Sets the degree of parallelism. A value greater than 1 enables parallel refresh.

materialized_view_implicit_partition_num

16

Yes

Sets the number of implicit partitions.

Note

The materialized_view_implicit_partition_num parameter is not supported in MySQL 8.0.1.

materialized_view_complete_refresh_offload_policy

FORCE_IMCI

Yes

For a manual refresh, you can specify a different offload policy for each session.

pq_mpp_allow_imci_node

ON

No

Allows offload scheduling to a column-store index read-only node.

materialized_view_etl_compression_algorithm

ZSTD

Yes

Enables compressed transfer (recommended for wide tables; can be skipped for narrow tables).

materialized_view_etl_compression_level

3

Yes

Sets the compression level for compressed transfers.

After you configure these parameters, perform the refresh:

-- Perform the refresh
REFRESH MATERIALIZED VIEW <schema_name>.<mview_name>;

Scenario B: ePQ offload to a specific node

If your cluster has multiple column-store index read-only nodes, you can use pq_mpp_allow_dispatch_to_imci_node to control which of these nodes can receive ePQ MPP scheduling requests. This parameter must be set to ON on both the primary (RW) node and the target column-store index read-only node. The system does not target a column-store index read-only node for scheduling if this parameter is not enabled on that node.

-- 1. On the primary (RW) node, disable its ability to act as a PQ Worker resource.
-- In the console, set pq_mpp_allow_imci_node = OFF;

-- 2. On the primary (RW) node, allow scheduling to an IMCI node.
-- In the console, set pq_mpp_allow_dispatch_to_imci_node = ON;

-- 3. Connect to the IMCI node that should handle the offloaded query, and enable the scheduling entry point only on that node.
-- In the console, set pq_mpp_allow_dispatch_to_imci_node = ON;

-- 4. Return to the primary (RW) node and refresh with the IMCI offload policy.
SET SESSION materialized_view_complete_refresh_offload_policy = FORCE_IMCI;
REFRESH MATERIALIZED VIEW <schema_name>.<mview_name>;

After enabling this feature, use information_schema.cluster_group_resource_view to check the status of the target node. The target column-store index read-only node should show allow_to_dispatch, and its EXTRA column should contain Allow to dispatch to IMCI node. In this mode, the column-store index read-only node only receives scheduling requests and does not act as a PQ Worker. To have all column-store index read-only nodes act as ePQ Worker resources, set pq_mpp_allow_imci_node = ON.

Offload target node selection

When the offload policy is not MASTER, the system selects a remote node to execute the SELECT query.

Policy determines node type

Policy

Candidate node type

Description

PREFER_IMCI

IMCI read-only node

Falls back to local execution on the master (RW) node and logs a warning.

FORCE_IMCI

IMCI read-only node

Fails with an error.

PREFER_TPRO

row-store read-only node

Falls back to local execution on the master (RW) node and logs a warning.

FORCE_TPRO

row-store read-only node

Fails with an error.

COST

IMCI read-only node or row-store read-only node (based on cost)

Selects the available node type if only one is available. If neither is available, it falls back to the master (RW) node.

Decision logic for the COST policy

When materialized_view_complete_refresh_offload_policy = COST, the system determines the target node by comparing the query's optimizer cost to a threshold:

  • If the query cost ≥ cost_threshold_for_imci: Offload to an IMCI read-only node. IMCI is suitable for high-cost scan and aggregation queries.

  • If the query cost < cost_threshold_for_imci: Offload to a row-store read-only node. Row-store read-only nodes are sufficient for lightweight queries.

-- Set the cost threshold (default is 0, which means all queries tend to be offloaded to IMCI).
SET SESSION cost_threshold_for_imci = 100000;

-- Use the COST policy.
SET SESSION materialized_view_complete_refresh_offload_policy = COST;

Multi-node load balancing

If multiple nodes of the same type are available, the system uses a Round-Robin approach to select a node:

  • Each refresh task selects the next node based on a global counter.

  • If the connection to the selected node fails, the system automatically tries the next node in the list until all nodes have been attempted.

  • The system distributes refresh tasks for different materialized views evenly across the nodes.

To offload ePQ + MPP tasks to an IMCI node, use one of the following two methods:

  1. Enable pq_mpp_allow_imci_node to support offloading to all IMCI read-only nodes.

  2. First, set pq_mpp_allow_imci_node = OFF, then use pq_mpp_allow_dispatch_to_imci_node to narrow down the candidate IMCI read-only nodes:

    • On the master (RW) node, set pq_mpp_allow_dispatch_to_imci_node = ON to allow the master (RW) node to accept the dispatch entry-point status from remote IMCI read-only nodes.

    • On each IMCI read-only node that you want to handle offloaded queries, set pq_mpp_allow_dispatch_to_imci_node = ON.

    • On each IMCI read-only node that you do not want to handle offloaded queries, keep pq_mpp_allow_dispatch_to_imci_node = OFF.

Recommended policies

In a production environment, use FORCE_IMCI or FORCE_TPRO. Avoid PREFER_* policies.

Reason: If the target node is unavailable, a PREFER_* policy silently falls back to local execution on the master (RW) node. This fallback causes the INSERT SELECT statement to perform a full scan and write on the master (RW) node. This creates a large transaction and can impact the performance of online workloads on the master (RW) node.

Configuration examples

-- Recommended: Force the use of an IMCI node. If unavailable, the operation fails instead of falling back.
SET SESSION materialized_view_complete_refresh_offload_policy = FORCE_IMCI;

-- Use only row-store read-only nodes. If unavailable, the operation fails.
SET SESSION materialized_view_complete_refresh_offload_policy = FORCE_TPRO;

-- Automatically choose between IMCI and TPRO based on query cost.
SET SESSION materialized_view_complete_refresh_offload_policy = COST;
SET SESSION cost_threshold_for_imci = 50000;

Parameter scopes and settings

Basic rules

  • GLOBAL parameters: are set only in the console and apply to all sessions and automatic refresh threads.

  • SESSION parameters: You can set them for the current session by using SET SESSION. You can also modify their default values in the console. Automatic refresh threads use the GLOBAL default values.

Parameter scopes for ETL parallel write mode

Parameter

Scope

Settable per session

Description

materialized_view_refresh_parallelism

GLOBAL

No, must be GLOBAL

All refresh tasks share the same parallelism.

materialized_view_complete_refresh_offload_policy

SESSION

Yes

For manual refreshes, you can specify a different offload policy for each session.

materialized_view_use_atomic_autoinc_for_refresh

SESSION

Yes

Allows you to disable atomic auto-increment optimization.

materialized_view_cache_etl_result

SESSION

Yes

Enables result set caching.

Parameter scopes for ePQ parallel write mode

Parameter

Scope

Settable per session

Description

materialized_view_refresh_parallelism

GLOBAL

No, must be GLOBAL

All refresh tasks share the same parallelism.

pq_mpp_allow_imci_node

GLOBAL

No, must be GLOBAL

Controls whether MPP tasks can be scheduled to an IMCI node.

pq_mpp_allow_dispatch_to_imci_node

GLOBAL

No, must be GLOBAL

Allows the use of the IMCI scheduling entry point. This parameter must be enabled on both the primary (RW) node and the target read-only node that contains the columnar index.

materialized_view_complete_refresh_offload_policy

SESSION

Yes

For manual refreshes, you can specify a different offload policy for each session.

materialized_view_implicit_partition_num

SESSION

Yes

Allows each materialized view to use a different number of partitions.

materialized_view_etl_compression_algorithm

SESSION

Yes

Specifies the transport compression algorithm for MPP offloads. The system automatically propagates this setting to PQ workers.

materialized_view_etl_compression_level

SESSION

Yes

Specifies the compression level. The system automatically propagates this setting to PQ workers.

Notes

  • If you use only manual refreshes, you can adjust SESSION parameters for each session.

  • If you rely on automatic refreshes, you must set appropriate default values for all SESSION parameters in the console.

  • GLOBAL parameters must be set in the console, regardless of the refresh method. Changes take effect immediately for all subsequent tasks.

  • SET SESSION settings do not persist after a cluster restart. In contrast, GLOBAL parameters modified in the console are persistent.

Parameter reference

Core parameters

Parameter

Scope

Value

Default

Description

materialized_view_refresh_parallelism

GLOBAL

1–1024

1

The number of parallel workers. A value greater than 1 enables parallel refresh.

materialized_view_implicit_partition_num

SESSION

0 to unlimited

1

The number of implicit HASH partitions for the container table. A value greater than 1 enables implicit HASH partitioning.

Note

MySQL 8.0.1 does not support the materialized_view_implicit_partition_num parameter.

materialized_view_use_atomic_autoinc_for_refresh

SESSION

ON/OFF

ON

Improves performance by using atomic auto-increment values for the implicit partition column.

Compute offloading parameters

Parameter

Scope

Value

Default

Description

materialized_view_complete_refresh_offload_policy

SESSION

See supported policy values

PREFER_IMCI

Specifies the policy for offloading query computation.

pq_mpp_allow_imci_node

GLOBAL

ON/OFF

OFF

Allows PQ workers to be scheduled on a read-only IMCI node (required for ePQ mode).

pq_mpp_allow_dispatch_to_imci_node

GLOBAL

ON/OFF

OFF

Allows the use of the IMCI dispatch entry point. This parameter must be enabled on both the primary (RW) node and the target read-only IMCI node.

Network compression parameters

Parameter

Scope

Value

Default

Description

materialized_view_etl_compression_algorithm

SESSION

  • UNCOMPRESSED

  • ZSTD

UNCOMPRESSED

Specifies the compression algorithm for the internal PQ channel. This setting applies only in ePQ MPP offload mode.

materialized_view_etl_compression_level

SESSION

0–22

0

Specifies the ZSTD compression level for the PQ channel. If set to 0, the ZSTD default level (3) is used.

materialized_view_cache_etl_result

SESSION

ON/OFF

OFF

Specifies whether to cache the entire result set in memory. This setting is effective only in ETL mode.

How compression works:

Mode

Compression location

Effective condition

ePQ (MPP offload)

Internal PQ channel between the primary (RW) node and remote nodes.

materialized_view_etl_compression_algorithm = ZSTD

ePQ (local only)

No network transfer.

Not applicable.

The materialized_view_etl_compression_level parameter takes effect only when materialized_view_etl_compression_algorithm = ZSTD. If set to 0, the ZSTD default level of 3 is used. If set to a value from 1 to 22, the specified level is used. A higher level provides a better compression ratio but increases CPU overhead.

Offload policy (materialized_view_complete_refresh_offload_policy parameter)

Value

Description

MASTER

The primary (RW) node executes all operations locally.

PREFER_TPRO

Prioritizes offloading to a read-only TPRO node. Falls back to the primary (RW) node if unavailable.

PREFER_IMCI

Prioritizes offloading to a read-only IMCI node. Falls back to the primary (RW) node if unavailable.

FORCE_TPRO

Forces computation offloading to a read-only TPRO node. If no such node is available, the operation fails.

FORCE_IMCI

Forces computation offloading to a read-only IMCI node. If no such node is available, the operation fails.

COST

Uses a cost-based model to automatically select either a read-only IMCI node or a read-only TPRO node for offloading.

Mode selection recommendations

Scenario

Recommended mode

Key configuration

Narrow table with no remote nodes

Local-only ePQ

materialized_view_complete_refresh_offload_policy=MASTER

Narrow table with column-store-indexed read-only nodes or row-store read-only nodes

ETL/ePQ with MPP offload

materialized_view_complete_refresh_offload_policy=FORCE_IMCI

Wide table or table containing large fields

ETL

materialized_view_complete_refresh_offload_policy=FORCE_IMCI

Multi-master cluster (Limitless)

ETL

Use ETL mode. ePQ is not available for multi-master clusters.

Considerations

  • No effect when the degree of parallelism is 1: Both parallel modes degrade to a serial INSERT SELECT operation.

  • Remote node availability: Both ETL mode and ePQ MPP offloading depend on remote nodes. If a remote node is unavailable, FORCE_* policies return an error, while PREFER_* policies fall back to local execution on the primary (RW) node.

  • Multi-primary environment limitations: In a multi-primary cluster (Limitless), ePQ mode is automatically disabled and falls back to serial execution. Use ETL mode for parallel refreshing.

  • IMCI column coverage: When you use ePQ MPP offloading to a read-only node with a columnar index, the columnar index must cover all columns read by the materialized view query. Otherwise, the system returns an error. ETL mode does not have this limitation.

  • Implicit partition count: We recommend setting this value to 0.5 to 2 times the degree of parallelism. Too few partitions can lead to write contention, while too many can increase file management overhead.

  • Write location: In both modes, data writes are always completed locally on the primary (RW) node. Offloading only affects where the SELECT query is executed.

  • Parameter persistence: Settings configured with SET SESSION do not persist across cluster restarts, whereas changes to GLOBAL parameters made in the console persist.

  • Compression vs. CPU trade-off: For ePQ MPP offloading, we recommend setting materialized_view_etl_compression_algorithm=ZSTD. For narrow tables, the CPU overhead from compression may offset the network benefits.