MaxCompute supports rewriting original SQL queries to use a materialized view if the queries contain filter conditions or certain operator types.
Usage notes
The core principle of materialized view query rewrite is that the materialized view must contain all data required by the query. This includes output columns and any columns used in filter conditions, aggregate functions, or JOIN conditions. A query cannot be rewritten if it requires columns that are not in the materialized view or if it uses an unsupported aggregate function.
To enable materialized view query rewrite, add the following configuration before your query statement:
SET odps.sql.materialized.view.enable.auto.rewriting=true;Query rewrite is not supported when a materialized view is in an invalid state. In this case, the query runs directly against the source table without acceleration.
Cross-project rewrite
By default, a MaxCompute project can only use its own materialized views for query rewrite. To use materialized views from other projects, you must specify a list of allowed MaxCompute projects by adding the following configuration before your query:
SET odps.sql.materialized.view.source.project.white.list = <project_name1>,<project_name2>,<project_name3>;To enable rewrites that use materialized views defined with
LEFT/RIGHT JOINorUNION ALL, add the following configuration before your query statement:SET odps.sql.materialized.view.enable.substitute.rewriting=true;
Supported operator types
The following table compares the query rewrite operator types supported by MaxCompute with those of other products.
Operator type | Classification | MaxCompute | BigQuery | Amazon Redshift | Hive |
FILTER | Full expression match | Supported | Supported | Supported | Supported |
Partial expression match | Supported | Supported | Supported | Supported | |
AGGREGATE | Single AGGREGATE | Supported | Supported | Supported | Supported |
Multiple AGGREGATEs | Not supported | Not supported | Not supported | Not supported | |
JOIN | JOIN type | INNER JOIN | Not supported | INNER JOIN | INNER JOIN |
Single JOIN | Supported | Not supported | Supported | Supported | |
Multiple JOINs | Supported | Not supported | Supported | Supported | |
AGGREGATE+JOIN | - | Supported | Not supported | Supported | Supported |
Examples
Example 1: Rewrite with filter conditions
Create a materialized view.
CREATE MATERIALIZED VIEW mv AS SELECT a,b,c FROM src WHERE a>5;The following table provides rewrite examples for the materialized view.
Original query
Rewritten query
SELECT a,b FROM src WHERE a>5;SELECT a,b FROM mv;SELECT a, b FROM src WHERE a=10;SELECT a,b FROM mv WHERE a=10;SELECT a, b FROM src WHERE a=10 AND b='3';SELECT a,b FROM mv WHERE a=10 AND b=3;SELECT a, b FROM src WHERE a>3;(SELECT a,b FROM src WHERE a>3 AND a<=5) UNION (SELECT a,b FROM mv);SELECT a, b FROM src WHERE a=10 AND d=4;Rewrite fails because the materialized view does not contain the column
d.SELECT d, e FROM src WHERE a=10;Rewrite fails because the materialized view does not contain the columns
dande.SELECT a, b FROM src WHERE a=1;Rewrite fails because the materialized view does not contain data where
a=1.
Example 2: Rewrite with aggregate functions
All aggregate functions can be rewritten if the materialized view and the query share the same aggregation key. If the aggregation keys differ, only rewrites using SUM, MIN, and MAX are supported.
Create a materialized view.
CREATE MATERIALIZED VIEW mv AS SELECT a, b, sum(c) AS sum, count(d) AS cnt FROM src GROUP BY a, b;The following table shows how queries are rewritten based on the materialized view.
Original query
Rewritten query
SELECT a, sum(c) FROM src GROUP BY a;SELECT a, sum(sum) FROM mv GROUP BY a;SELECT a, count(d) FROM src GROUP BY a, b;SELECT a, cnt FROM mv;SELECT a, count(b) FROM (SELECT a, b FROM src GROUP BY a, b) GROUP BY a;SELECT a,count(b) FROM mv GROUP BY a;SELECT a,count(b) FROM mv GROUP BY a;Rewrite fails because the view has already aggregated columns
aandb, so columnbcannot be aggregated again.SELECT a, count(c) FROM src GROUP BY a;Rewrite fails because re-aggregation of the
COUNTfunction is not supported.
If an aggregate function contains DISTINCT, the query can be rewritten only if the materialized view and the original query have the same aggregation key. Otherwise, the rewrite is not possible.
Create a materialized view.
CREATE MATERIALIZED VIEW mv AS SELECT a, b, sum(DISTINCT c) AS sum, count(DISTINCT d) AS cnt FROM src GROUP BY a, b;The following table shows how queries are rewritten based on the materialized view.
Original query
Rewritten query
SELECT a, count(DISTINCT d) FROM src GROUP BY a, b;SELECT a, cnt FROM mv;SELECT a, count(c) FROM src GROUP BY a, b;Rewrite fails because re-aggregation of the
COUNTfunction is not supported.SELECT a, count(DISTINCT c) FROM src GROUP BY a;Rewrite fails because column
arequires another aggregation.
Example 3: Rewrite with a JOIN clause
Rewrite JOIN inputs
Create materialized views.
CREATE MATERIALIZED VIEW mv1 AS SELECT a, b FROM j1 WHERE b > 10; CREATE MATERIALIZED VIEW mv2 AS SELECT a, b FROM j2 WHERE b > 10;The following table shows how queries are rewritten based on the materialized views.
Original query
Rewritten query
SELECT j1.a,j1.b,j2.a FROM (SELECT a,b FROM j1 WHERE b > 10) j1 JOIN j2 ON j1.a=j2.a;SELECT mv1.a, mv1.b, j2.a FROM mv1 JOIN j2 ON mv1.a=j2.a;SELECT j1.a,j1.b,j2.a FROM (SELECT a,b FROM j1 WHERE b > 10) j1 JOIN (SELECT a,b FROM j2 WHERE b > 10) j2 ON j1.a=j2.a;SELECT mv1.a,mv1.b,mv2.a FROM mv1 JOIN mv2 ON mv1.a=mv2.a;
JOIN with filter conditions
Create materialized views.
--Create a non-partitioned materialized view. CREATE MATERIALIZED VIEW mv1 AS SELECT j1.a, j1.b FROM j1 JOIN j2 ON j1.a=j2.a; CREATE MATERIALIZED VIEW mv2 AS SELECT j1.a, j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j1.a > 10; --Create a partitioned materialized view. CREATE MATERIALIZED VIEW mv LIFECYCLE 7 PARTITIONED BY (ds) AS SELECT t1.id, t1.ds AS ds FROM t1 JOIN t2 ON t1.id = t2.id;The following table shows how queries are rewritten based on the materialized views.
Original query
Rewritten query
SELECT j1.a,j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j1.a=4;SELECT a, b FROM mv1 WHERE a=4;SELECT j1.a,j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j1.a > 20;SELECT a,b FROM mv2 WHERE a>20;SELECT j1.a,j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j1.a > 5;(SELECT j1.a,j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j1.a > 5 AND j1.a <= 10) UNION SELECT * FROM mv2;SELECT key FROM t1 JOIN t2 ON t1.id= t2.id WHERE t1.ds='20210306';SELECT key FROM mv WHERE ds='20210306';SELECT key FROM t1 JOIN t2 ON t1.id= t2.id WHERE t1.ds>='20210306';SELECT key FROM mv WHERE ds>='20210306';SELECT j1.a,j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j2.a=4;Rewrite fails because the materialized view does not contain the column
j2.a.
Extend a JOIN
Create a materialized view.
CREATE MATERIALIZED VIEW mv AS SELECT j1.a, j1.b FROM j1 JOIN j2 ON j1.a=j2.a;The following table shows how queries are rewritten based on the materialized view.
Original query
Rewritten query
SELECT j1.a, j1.b FROM j1 JOIN j2 JOIN j3 ON j1.a=j2.a AND j1.a=j3.a;SELECT mv.a, mv.b FROM mv JOIN j3 ON mv.a=j3.a;SELECT j1.a, j1.b FROM j1 JOIN j2 JOIN j3 ON j1.a=j2.a AND j2.a=j3.a;SELECT mv.a,mv.b FROM mv JOIN j3 ON mv.a=j3.a;
These three JOIN rewrite scenarios can be combined.
Because the goal of materialized view query rewrite is to accelerate queries, MaxCompute prioritizes rewrite rules that offer the best performance. A rule is not applied if it introduces operations that would result in poor acceleration.
Example 4: Rewrite with a LEFT JOIN clause
Create a materialized view.
CREATE MATERIALIZED VIEW mv LIFECYCLE 7( user_id, job, total_amount ) AS SELECT t1.user_id, t1.job, sum(t2.order_amount) AS total_amount FROM user_info AS t1 LEFT JOIN sale_order AS t2 ON t1.user_id=t2.user_id GROUP BY t1.user_id;The following table shows how a query is rewritten based on the materialized view.
Original query
Rewritten query
SELECT t1.user_id, sum(t2.order_amount) AS total_amount FROM user_info AS t1 LEFT JOIN sale_order AS t2 ON t1.user_id=t2.user_id GROUP BY t1.user_id;SELECT user_id, total_amount FROM mv;
Example 5: Rewrite with a UNION ALL clause
Create a materialized view.
CREATE MATERIALIZED VIEW mv LIFECYCLE 7( user_id, tran_amount, tran_date ) AS SELECT user_id, tran_amount, tran_date FROM alipay_tran UNION ALL SELECT user_id, tran_amount, tran_date FROM unionpay_tran;The following table shows how a query is rewritten based on the materialized view.
Original query
Rewritten query
SELECT user_id, tran_amount FROM alipay_tran UNION ALL SELECT user_id, tran_amount FROM unionpay_tran;SELECT user_id, tran_amount FROM mv;
Example 6: Use case
Scenario
Consider a page visit table named
visit_recordsthat logs the page ID, user ID, and visit time for each visit. A frequent analysis task is to count the number of visits for different pages.In this situation, you can create a materialized view on
visit_recordsthat groups by page ID and counts the visits for each page. You can then run subsequent queries against this materialized view.The structure of
visit_recordsis as follows:+------------------------------------------------------------------------------------+ | Field | Type | Label | Comment | +------------------------------------------------------------------------------------+ | page_id | string | | | | user_id | string | | | | visit_time | string | | | +------------------------------------------------------------------------------------+Create a materialized view.
-- Create a materialized view for the visit_records table that groups by page ID and counts the visits for each page. CREATE MATERIALIZED VIEW count_mv AS SELECT page_id, count(*) FROM visit_records GROUP BY page_id;Run the following query:
SET odps.sql.materialized.view.enable.auto.rewriting=true; SELECT page_id, count(*) FROM visit_records GROUP BY page_id;When this query statement is executed, MaxCompute automatically matches the materialized view
count_mvand reads the pre-aggregated data fromcount_mv.To verify that the query was rewritten using the materialized view, run the following
EXPLAINcommand:EXPLAIN SELECT page_id, count(*) FROM visit_records GROUP BY page_id;The following result is returned:
job0 is root job In Job job0: root Tasks: M1 In Task M1: Data source: doc_test_dev.count_mv TS: doc_test_dev.count_mv FS: output: Screen schema: page_id (string) _c1 (bigint) OKThe
Data sourcein the returned result shows that the table read by the query is thedoc_test_devproject'scount_mv. This indicates that the materialized view is effective and the query rewrite was successful.
Related documents
For more information about materialized view operations, see Materialized view operations.
For more information about the scheduled update feature for materialized views, see Scheduled updates for materialized views.