Materialized view query rewrite

更新时间:
复制 MD 格式

MaxCompute supports rewriting original SQL queries to use a materialized view if the queries contain filter conditions or certain operator types.

Usage notes

  • The core principle of materialized view query rewrite is that the materialized view must contain all data required by the query. This includes output columns and any columns used in filter conditions, aggregate functions, or JOIN conditions. A query cannot be rewritten if it requires columns that are not in the materialized view or if it uses an unsupported aggregate function.

  • To enable materialized view query rewrite, add the following configuration before your query statement:

    SET odps.sql.materialized.view.enable.auto.rewriting=true;

    Query rewrite is not supported when a materialized view is in an invalid state. In this case, the query runs directly against the source table without acceleration.

  • Cross-project rewrite

    By default, a MaxCompute project can only use its own materialized views for query rewrite. To use materialized views from other projects, you must specify a list of allowed MaxCompute projects by adding the following configuration before your query:

    SET odps.sql.materialized.view.source.project.white.list = <project_name1>,<project_name2>,<project_name3>;
  • To enable rewrites that use materialized views defined with LEFT/RIGHT JOIN or UNION ALL, add the following configuration before your query statement:

    SET odps.sql.materialized.view.enable.substitute.rewriting=true;

Supported operator types

The following table compares the query rewrite operator types supported by MaxCompute with those of other products.

Operator type

Classification

MaxCompute

BigQuery

Amazon Redshift

Hive

FILTER

Full expression match

Supported

Supported

Supported

Supported

Partial expression match

Supported

Supported

Supported

Supported

AGGREGATE

Single AGGREGATE

Supported

Supported

Supported

Supported

Multiple AGGREGATEs

Not supported

Not supported

Not supported

Not supported

JOIN

JOIN type

INNER JOIN

Not supported

INNER JOIN

INNER JOIN

Single JOIN

Supported

Not supported

Supported

Supported

Multiple JOINs

Supported

Not supported

Supported

Supported

AGGREGATE+JOIN

-

Supported

Not supported

Supported

Supported

Examples

Example 1: Rewrite with filter conditions

  1. Create a materialized view.

    CREATE MATERIALIZED VIEW mv AS SELECT a,b,c FROM src WHERE a>5;
  2. The following table provides rewrite examples for the materialized view.

    Original query

    Rewritten query

    SELECT a,b FROM src WHERE a>5;
    SELECT a,b FROM mv;
    SELECT a, b FROM src WHERE a=10;
    SELECT a,b FROM mv WHERE a=10;
    SELECT a, b FROM src WHERE a=10 AND b='3';
    SELECT a,b FROM mv WHERE a=10 AND b=3;
    SELECT a, b FROM src WHERE a>3;
    (SELECT a,b FROM src WHERE a>3 AND a<=5) UNION (SELECT a,b FROM mv);
    SELECT a, b FROM src WHERE a=10 AND d=4;

    Rewrite fails because the materialized view does not contain the column d.

    SELECT d, e FROM src WHERE a=10;

    Rewrite fails because the materialized view does not contain the columns d and e.

    SELECT a, b FROM src WHERE a=1;

    Rewrite fails because the materialized view does not contain data where a=1.

Example 2: Rewrite with aggregate functions

All aggregate functions can be rewritten if the materialized view and the query share the same aggregation key. If the aggregation keys differ, only rewrites using SUM, MIN, and MAX are supported.

  1. Create a materialized view.

    CREATE MATERIALIZED VIEW mv AS 
    SELECT a, b, sum(c) AS sum, count(d) AS cnt FROM src GROUP BY a, b;
  2. The following table shows how queries are rewritten based on the materialized view.

    Original query

    Rewritten query

    SELECT a, sum(c) FROM src GROUP BY a;
    SELECT a, sum(sum) FROM mv GROUP BY a;
    SELECT a, count(d) FROM src GROUP BY a, b;
    SELECT a, cnt FROM mv;
    SELECT a, count(b) FROM 
    (SELECT a, b FROM src GROUP BY a, b) GROUP BY a;
    SELECT a,count(b) FROM mv GROUP BY a;
    SELECT a,count(b) FROM mv GROUP BY a;

    Rewrite fails because the view has already aggregated columns a and b, so column b cannot be aggregated again.

    SELECT a, count(c) FROM src GROUP BY a;

    Rewrite fails because re-aggregation of the COUNT function is not supported.

If an aggregate function contains DISTINCT, the query can be rewritten only if the materialized view and the original query have the same aggregation key. Otherwise, the rewrite is not possible.

  1. Create a materialized view.

    CREATE MATERIALIZED VIEW mv AS 
    SELECT a, b, sum(DISTINCT c) AS sum, count(DISTINCT d) AS cnt FROM src GROUP BY a, b;
  2. The following table shows how queries are rewritten based on the materialized view.

    Original query

    Rewritten query

    SELECT a, count(DISTINCT d) FROM src GROUP BY a, b;
    SELECT a, cnt FROM mv;
    SELECT a, count(c) FROM src GROUP BY a, b;

    Rewrite fails because re-aggregation of the COUNT function is not supported.

    SELECT a, count(DISTINCT c) FROM src GROUP BY a;

    Rewrite fails because column a requires another aggregation.

Example 3: Rewrite with a JOIN clause

Rewrite JOIN inputs

  1. Create materialized views.

    CREATE MATERIALIZED VIEW mv1 AS SELECT a, b FROM j1 WHERE b > 10;
    CREATE MATERIALIZED VIEW mv2 AS SELECT a, b FROM j2 WHERE b > 10;
  2. The following table shows how queries are rewritten based on the materialized views.

    Original query

    Rewritten query

    SELECT j1.a,j1.b,j2.a FROM (SELECT a,b FROM j1 WHERE b > 10) j1 JOIN j2 ON j1.a=j2.a;
    SELECT mv1.a, mv1.b, j2.a FROM mv1 JOIN j2 ON mv1.a=j2.a;
    SELECT j1.a,j1.b,j2.a FROM 
    (SELECT a,b FROM j1 WHERE b > 10) j1 
    JOIN (SELECT a,b FROM j2 WHERE b > 10) j2 ON j1.a=j2.a;
    SELECT mv1.a,mv1.b,mv2.a FROM mv1 JOIN mv2 ON mv1.a=mv2.a;

JOIN with filter conditions

  1. Create materialized views.

    --Create a non-partitioned materialized view.
    CREATE MATERIALIZED VIEW mv1 AS SELECT j1.a, j1.b FROM j1 JOIN j2 ON j1.a=j2.a;
    CREATE MATERIALIZED VIEW mv2 AS SELECT j1.a, j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j1.a > 10;
    --Create a partitioned materialized view.
    CREATE MATERIALIZED VIEW mv LIFECYCLE 7 PARTITIONED BY (ds) AS SELECT t1.id, t1.ds AS ds FROM t1 JOIN t2 ON t1.id = t2.id;
  2. The following table shows how queries are rewritten based on the materialized views.

    Original query

    Rewritten query

    SELECT j1.a,j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j1.a=4;
    SELECT a, b FROM mv1 WHERE a=4;
    SELECT j1.a,j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j1.a > 20;
    SELECT a,b FROM mv2 WHERE a>20;
    SELECT j1.a,j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j1.a > 5;
    (SELECT j1.a,j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j1.a > 5 AND j1.a <= 10) 
    UNION SELECT * FROM mv2;
    SELECT key FROM t1 JOIN t2 ON t1.id= t2.id WHERE t1.ds='20210306';
    SELECT key FROM mv WHERE ds='20210306';
    SELECT key FROM t1 JOIN t2 ON t1.id= t2.id WHERE t1.ds>='20210306';
    SELECT key FROM mv WHERE ds>='20210306';
    SELECT j1.a,j1.b FROM j1 JOIN j2 ON j1.a=j2.a WHERE j2.a=4;

    Rewrite fails because the materialized view does not contain the column j2.a.

Extend a JOIN

  1. Create a materialized view.

    CREATE MATERIALIZED VIEW mv AS SELECT j1.a, j1.b FROM j1 JOIN j2 ON j1.a=j2.a;
  2. The following table shows how queries are rewritten based on the materialized view.

    Original query

    Rewritten query

    SELECT j1.a, j1.b FROM j1 JOIN j2 JOIN j3 ON j1.a=j2.a AND j1.a=j3.a;
    SELECT mv.a, mv.b FROM mv JOIN j3 ON mv.a=j3.a;
    SELECT j1.a, j1.b FROM j1 JOIN j2 JOIN j3 ON j1.a=j2.a AND j2.a=j3.a;
    SELECT mv.a,mv.b FROM mv JOIN j3 ON mv.a=j3.a;

These three JOIN rewrite scenarios can be combined.

Because the goal of materialized view query rewrite is to accelerate queries, MaxCompute prioritizes rewrite rules that offer the best performance. A rule is not applied if it introduces operations that would result in poor acceleration.

Example 4: Rewrite with a LEFT JOIN clause

  1. Create a materialized view.

    CREATE MATERIALIZED VIEW mv LIFECYCLE 7(
            user_id,
            job,
            total_amount
    ) AS SELECT t1.user_id, t1.job, sum(t2.order_amount) AS total_amount 
          FROM user_info AS t1 LEFT JOIN sale_order AS t2 ON t1.user_id=t2.user_id GROUP BY t1.user_id;
  2. The following table shows how a query is rewritten based on the materialized view.

    Original query

    Rewritten query

    SELECT t1.user_id, sum(t2.order_amount) AS total_amount 
    FROM user_info AS t1 
    LEFT JOIN sale_order AS t2 ON t1.user_id=t2.user_id 
    GROUP BY t1.user_id;
    SELECT user_id, total_amount FROM mv;

Example 5: Rewrite with a UNION ALL clause

  1. Create a materialized view.

    CREATE MATERIALIZED VIEW mv LIFECYCLE 7( 
            user_id, 
            tran_amount, 
            tran_date 
    ) AS SELECT user_id, tran_amount, tran_date FROM alipay_tran UNION ALL 
    SELECT user_id, tran_amount, tran_date FROM unionpay_tran;
  2. The following table shows how a query is rewritten based on the materialized view.

    Original query

    Rewritten query

    SELECT user_id, tran_amount FROM alipay_tran 
    UNION ALL SELECT user_id, tran_amount FROM unionpay_tran;
    SELECT user_id, tran_amount FROM mv;

Example 6: Use case

  1. Scenario

    Consider a page visit table named visit_records that logs the page ID, user ID, and visit time for each visit. A frequent analysis task is to count the number of visits for different pages.

    In this situation, you can create a materialized view on visit_records that groups by page ID and counts the visits for each page. You can then run subsequent queries against this materialized view.

    The structure of visit_records is as follows:

    +------------------------------------------------------------------------------------+
    | Field           | Type       | Label | Comment                                     |
    +------------------------------------------------------------------------------------+
    | page_id         | string     |       |                                             |
    | user_id         | string     |       |                                             |
    | visit_time      | string     |       |                                             |
    +------------------------------------------------------------------------------------+
  2. Create a materialized view.

    -- Create a materialized view for the visit_records table that groups by page ID and counts the visits for each page.
    
    CREATE MATERIALIZED VIEW count_mv AS SELECT page_id, count(*) FROM visit_records GROUP BY page_id;
  3. Run the following query:

    SET odps.sql.materialized.view.enable.auto.rewriting=true; 
    SELECT page_id, count(*) FROM visit_records GROUP BY page_id;

    When this query statement is executed, MaxCompute automatically matches the materialized view count_mv and reads the pre-aggregated data from count_mv.

  4. To verify that the query was rewritten using the materialized view, run the following EXPLAIN command:

    EXPLAIN SELECT page_id, count(*) FROM visit_records GROUP BY page_id;

    The following result is returned:

    job0 is root job
    
    In Job job0:
    root Tasks: M1
    
    In Task M1:
        Data source: doc_test_dev.count_mv
        TS: doc_test_dev.count_mv
            FS: output: Screen
                schema:
                  page_id (string)
                  _c1 (bigint)
    
    
    OK

    The Data source in the returned result shows that the table read by the query is the doc_test_dev project's count_mv. This indicates that the materialized view is effective and the query rewrite was successful.

Related documents