Refresh materialized views
Materialized views are often used to accelerate complex queries or simplify ETL processes. They work by pre-calculating a user-defined query and storing the results. You can define a refresh policy for a materialized view based on the write patterns of its base tables, the computational complexity of its query (query_body), and your data freshness requirements.
Choosing a refresh policy
Materialized views support two refresh policies: complete refresh (COMPLETE) and fast refresh (FAST).
-
A complete refresh runs the original query SQL to scan the data of all target partitions in the base table and completely overwrites the old data with the newly calculated data.
-
Performs a fast refresh. The system rewrites the view's query (
query_body) to scan only the changed data (fromINSERT,DELETE, andUPDATEoperations) in the base table and applies these changes to the materialized view. This avoids scanning the entire base table in each cycle and reduces the computational cost of each refresh.
The following table compares the use cases, advantages, and limitations of the two refresh policies.
|
Refresh policy |
Use cases |
Characteristics |
|
Complete refresh |
Offline scenarios:
|
Advantage: The query_body supports any SQL query. |
|
Limitation: Can only perform full batch updates. |
||
|
Fast refresh |
Real-time scenarios:
|
Advantages:
|
|
Limitations:
|
Choosing a refresh trigger
When you create a materialized view, you define both its refresh policy and its refresh trigger. Materialized views support on-demand refresh (ON DEMAND) and overwrite-triggered refresh (ON OVERWRITE). On-demand refresh can be further divided into scheduled refresh and manual refresh. If you do not specify a refresh trigger, the default is on-demand refresh.
When choosing a refresh trigger, consider your data freshness requirements and cluster load. The characteristics and use cases for each trigger are as follows:
-
Manual refresh: The materialized view does not refresh data automatically. You must run
REFRESH MATERIALIZED VIEWto manually refresh the data. This is suitable for scenarios where data consistency is not a high priority or the data changes infrequently. -
Scheduled refresh: The materialized view refreshes automatically at a specified time. If a refresh is still running when the next one is scheduled to start, the system skips the new refresh and waits for the next interval. This is suitable for scenarios where base table data changes periodically, such as new transaction records generated during fixed daily or weekly periods.
-
Overwrite-triggered refresh: The materialized view refreshes automatically when a base table is overwritten by using
INSERT OVERWRITE. This is suitable for scenarios with high requirements for real-time data and consistency.
Different refresh policies support different refresh triggers, as shown in the following table:
|
Refresh policy |
On-demand refresh (ON DEMAND) |
Overwrite-triggered refresh (ON OVERWRITE) |
|
|
Manual refresh |
Scheduled refresh |
||
|
Complete refresh |
✔️ |
✔️ |
✔️ |
|
Fast refresh |
❌ |
✔️ |
❌ |
Define refresh policies and triggers
The following examples use the customer, sales, and product tables to show how to define the refresh policy and refresh trigger for a new materialized view.
/*+ RC_DDL_ENGINE_REWRITE_XUANWUV2=false */ -- Set the table engine to XUANWU.
CREATE TABLE customer (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(255),
is_vip Boolean
);
/*+ RC_DDL_ENGINE_REWRITE_XUANWUV2=false */ -- Set the table engine to XUANWU.
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
product_id INT,
customer_id INT,
price DECIMAL(10, 2),
quantity INT,
sale_date TIMESTAMP
);/*+ RC_DDL_ENGINE_REWRITE_XUANWUV2=false */ -- Set the table engine to XUANWU.
CREATE TABLE sales (
sale_id INT PRIMARY KEY,
product_id INT,
customer_id INT,
price DECIMAL(10, 2),
quantity INT,
sale_date TIMESTAMP
);/*+ RC_DDL_ENGINE_REWRITE_XUANWUV2=false */ -- Set the table engine to XUANWU.
CREATE TABLE product (
product_id INT PRIMARY KEY,
product_name VARCHAR,
category_id INT,
unit_price DECIMAL(10, 2),
stock_quantity INT
);
Create a complete-refresh materialized view
When you create a materialized view, use the REFRESH COMPLETE keyword to specify a complete refresh policy.
A materialized view that uses complete refresh supports manual refresh, scheduled refresh, and overwrite-triggered refresh.
-
Create a complete-refresh materialized view named
compl_mv1. Because this view does not define a refresh trigger or aNEXTparameter, it defaults to an on-demand refresh that must be triggered manually.CREATE MATERIALIZED VIEW compl_mv1 REFRESH COMPLETE AS SELECT * FROM customer; -
Create a complete-refresh materialized view named
compl_mv2. This view defines an on-demand refresh (ON DEMAND) and specifies the initial (START WITH) and subsequent (NEXT) refresh times. In this example, the view refreshes automatically at 2:00 AM every day.CREATE MATERIALIZED VIEW compl_mv2 REFRESH COMPLETE ON DEMAND START WITH DATE_FORMAT(now() + interval 1 day, '%Y-%m-%d 02:00:00') NEXT DATE_FORMAT(now() + interval 1 day, '%Y-%m-%d 02:00:00') AS SELECT * FROM customer; -
Create a complete-refresh materialized view named
compl_mv3. This view is configured for an overwrite-triggered refresh (ON OVERWRITE). This mode does not require aNEXTclause.CREATE MATERIALIZED VIEW compl_mv3 REFRESH COMPLETE ON OVERWRITE AS SELECT * FROM customer;
Create a fast-refresh materialized view
When you create a materialized view, use the REFRESH FAST keyword to specify a fast refresh policy. A materialized view that uses fast refresh supports only scheduled refresh.
Enable binary logging
Before you create a materialized view that uses fast refresh, you must enable binary logging for the cluster and its base tables.
SET ADB_CONFIG BINLOG_ENABLE=true; -- For clusters with an engine version before 3.2.0.0, run this command to enable binary logging. It is enabled by default in 3.2.0.0 and later.
ALTER TABLE customer binlog=true;
ALTER TABLE sales binlog=true;
ALTER TABLE product binlog=true;
-
INSERT OVERWRITE INTOandTRUNCATEoperations are supported on tables with binary logging enabled only in engine version 3.2.0.0 and later. -
After you create a fast-refresh materialized view, you cannot disable binary logging on the base tables.
-
After you delete a materialized view that uses fast refresh, you can manually disable binary logging for the cluster and base tables by running
SET ADB_CONFIG BINLOG_ENABLE=false;andALTER TABLE <table_name> binlog=false;.
Single-table materialized views
-
Create a single-table materialized view
fast_mv1that uses fast refresh without an aggregate operation and refreshes every 10 seconds.CREATE MATERIALIZED VIEW fast_mv1 REFRESH FAST NEXT now() + INTERVAL 10 second AS SELECT sale_id, sale_date, price FROM sales WHERE price > 10; -
Create a single-table materialized view
fast_mv2that uses fast refresh with a GROUP BY aggregation and refreshes every 5 seconds.CREATE MATERIALIZED VIEW fast_mv2 REFRESH FAST NEXT now() + INTERVAL 5 second AS SELECT customer_id, sale_date, -- The system uses the GROUP BY columns as the primary key of the materialized view. COUNT(sale_id) AS cnt_sale_id, -- Aggregate output column. SUM(price * quantity) AS total_revenue, -- Aggregate output column. customer_id / 100 AS new_customer_id -- Non-aggregate output columns can use any expression. FROM sales WHERE ifnull(price, 1) > 0 -- Conditions can use any expression. GROUP BY customer_id, sale_date; -
Create a single-table materialized view
fast_mv3that uses fast refresh without a GROUP BY aggregation and refreshes every minute.CREATE MATERIALIZED VIEW fast_mv3 REFRESH FAST NEXT now() + INTERVAL 1 minute AS SELECT count(*) AS cnt -- The system generates a constant primary key to ensure that the materialized view contains only one record. FROM sales;
Multi-table materialized views
-
Create a multi-table materialized view
fast_mv4that uses fast refresh without an aggregate operation and refreshes every 5 seconds.CREATE MATERIALIZED VIEW fast_mv4 REFRESH FAST NEXT now() + INTERVAL 5 second AS SELECT c.customer_id, c.customer_name, p.product_id, s.sale_id, (s.price * s.quantity) AS revenue FROM sales s JOIN customer c ON s.customer_id = c.customer_id JOIN product p ON s.product_id = p.product_id; -
Create a multi-table materialized view
fast_mv5that uses fast refresh with a GROUP BY aggregation and refreshes every 10 seconds.CREATE MATERIALIZED VIEW fast_mv5 REFRESH FAST NEXT now() + INTERVAL 10 second AS SELECT s.sale_id, c.customer_name, p.product_name, COUNT(*) AS cnt, SUM(s.price * s.quantity) AS revenue, SUM(p.unit_price) AS sum_p FROM sales s JOIN (SELECT customer_id, customer_name FROM customer) c ON c.customer_id = s.customer_id JOIN (SELECT * FROM product WHERE stock_quantity > 0) p ON p.product_id = s.product_id GROUP BY s.sale_id, c.customer_name, p.product_name;
Limitations
The following limitations apply to materialized views that use fast refresh:
-
For clusters with an engine version earlier than 3.2.3.0, a partitioned table cannot be a base table for a materialized view that uses fast refresh.
-
For clusters with an engine version earlier than 3.2.3.1,
INSERT OVERWRITEandTRUNCATEoperations are not supported on the base tables of a materialized view that uses fast refresh. Running these operations causes an error. -
Fast refresh supports only scheduled refreshes with an interval between 5 seconds (s) and 5 minutes (min).
-
The
query_bodyof a materialized view that uses fast refresh has the following limitations:-
A materialized view must guarantee that its results are identical to a direct query on its base tables and must support all DML changes. If the query_body does not support fast refresh, the
CREATE MATERIALIZED VIEWstatement returns an error. -
Conditions cannot contain non-deterministic expressions, such as
now()orrand(). -
Only the following aggregate functions are supported: COUNT, SUM, MAX, MIN, AVG, APPROX_DISTINCT, and COUNT(DISTINCT).
-
When the query_body uses the MAX, MIN, APPROX_DISTINCT, or COUNT(DISTINCT) aggregate function, only INSERT operations are allowed on the base tables. Data-deleting operations, such as DELETE, UPDATE, REPLACE, and INSERT ON DUPLICATE KEY UPDATE, are prohibited.
-
The DISTINCT keyword is not supported for aggregate functions other than COUNT(DISTINCT).
-
COUNT(DISTINCT) supports only the INTEGER type.
-
AVG does not support the DECIMAL type.
-
The HAVING keyword is not supported for aggregate operations.
-
Window functions are not supported.
-
Sorting operations are not supported.
-
Set operations such as UNION, EXCEPT, and INTERSECT are not supported.
-
-
Multi-table materialized views that use fast refresh have the following additional limitations:
-
Multi-table materialized views currently support only INNER JOIN.
-
By default, a multi-table materialized view can be joined with a maximum of five tables.
-
The join columns in a multi-table materialized view must be original columns from the tables, have the same data type, and each join column must have an index.
-
Manually refreshing materialized views
If a materialized view is created with the ON DEMAND refresh policy and no NEXT clause is defined, it does not refresh automatically. You must manually refresh the materialized view.
REFRESH MATERIALIZED VIEW <mv_name>;
After you submit a refresh request, the system adds the refresh job to a background queue. You can continue to perform other operations without waiting for the refresh to complete.
A return message of Query OK or Success indicates that the refresh job was successfully submitted to the queue.
Querying refresh records
Querying automatic refresh records
Run the following SQL statement to query the automatic refresh records for a specific materialized view, including the start time (start_time), end time (end_time), status (state), and process ID (process_id). For more information about the fields in the returned results, see Manage materialized views.
SELECT * FROM information_schema.mv_auto_refresh_jobs where mv_name = '<mv_name>';
Querying manual refresh records
-
To query manual refresh records from the last 30 days, you can use the SQL audit feature. When querying, enter the keyword
REFRESH MATERIALIZED VIEWmv_name to find information such as the time, duration, IP address, and database account for each manual refresh.The SQL audit feature must be enabled separately. SQL operations that occurred before it was enabled are not recorded in the audit logs.
-
To query manual and automatic refresh records from the last 14 days, you can use the SQL diagnostics and optimization feature. When querying, enter the name of the materialized view, such as
compl_mv1, to find information for all related SQL queries (including creation, manual refresh, automatic refresh, and alteration), such as the start time, database account, duration, and process ID.
Stopping a running refresh job
If a refresh job takes too long, you can stop it manually by using its process ID. If this fails, contact technical support.
Notes
If you stop a refresh job by using KILL PROCESS <process_id>;, note that the next refresh still triggers at the next scheduled time or at the next base table overwrite.
Related documents
-
Create a materialized view: Describes the use cases, key features, and limitations of materialized views.
-
CREATE MATERIALIZED VIEW: Provides the complete syntax.
-
Manage materialized views: Explains how to query materialized view definitions, query refresh records, alter materialized views, and delete materialized views.