SUBQUERY_MAPJOIN HINT

MaxCompute支持子查询操作,部分子查询在执行过程中会被转换成JOIN进行计算。您可以在子查询SUBQUERY语句中使用SUBQUERY_MAPJOIN HINT,以显式指定使用MAPJOIN算法,从而提升子查询的执行效率。本文为您介绍如何使用SUBQUERY_MAPJOIN HINT。

使用限制

  • 仅适用于SCALAR、IN、NOT IN、EXISTS和NOT EXISTS子查询,不适用于基础子查询。子查询详情请参见子查询(SUBQUERY)

  • 使用SUBQUERY_MAPJOIN HINT时不需要手动指定小表,系统默认以SUBQUERY的计算结果作为MAPJOIN的小表。

    重要

    若SUBQUERY的计算结果数据量过大,可能会导致MAPJOIN运行时出现内存溢出(Out of Memory)的情况,请确认SUBQUERY的计算结果为小表时再使用此HINT。MAPJOIN的使用限制请参见使用限制

  • 部分场景下,子查询不会被转换成JOIN来执行,此时使用SUBQUERY_MAPJOIN HINT时,系统会输出Warning进行提示。具体用法请参见示例5

使用方法

在SUBQUERY语句中,您需要使用HINT提示/*+ subquery_mapjoin */来指定MAPJOIN算法的执行,且HINT必须紧贴于SUBQUERY对应的左括号之后书写。用法如下。

假设t1、t2两张表的定义如下:

CREATE TABLE t1(a BIGINT, b BIGINT);
CREATE TABLE t2(a BIGINT, b BIGINT);
  • SCALAR SUBQUERY

    SELECT a,
           (/*+ subquery_mapjoin */ SELECT b FROM t2 WHERE a = t1.a)
    FROM t1;
  • IN和NOT IN SUBQUERY

    SELECT * FROM t1 WHERE a IN (/*+ subquery_mapjoin */ SELECT a FROM t2 WHERE b = t1.b);
    SELECT * FROM t1 WHERE a NOT IN (/*+ subquery_mapjoin */ SELECT a FROM t2 WHERE b = t1.b);
  • EXISTS和NOT EXISTS SUBQUERY

    SELECT * FROM t1 WHERE EXISTS (/*+ subquery_mapjoin */ SELECT * FROM t2 WHERE b = t1.b);
    SELECT * FROM t1 WHERE NOT EXISTS (/*+ subquery_mapjoin */ SELECT * FROM t2 WHERE b = t1.b);
  • 错误写法

    -- 如下代码为错误格式,SUBQUERY_MAPJOIN HINT没有紧贴于SUBQUERY对应的左括号之后书写
    SELECT * FROM t1 WHERE a IN (SELECT /*+ subquery_mapjoin */ a FROM t2 WHERE b = t1.b);

示例数据

为便于理解,本文为您提供源数据,基于源数据提供相关示例。

  1. 创建表sale_detail和shop_detail,并添加数据,命令示例如下:

    -- 创建一张分区表sale_detail
    CREATE TABLE if NOT EXISTS sale_detail
    (
    shop_name     STRING,
    customer_id   STRING,
    total_price   DOUBLE
    )
    PARTITIONED BY (sale_date STRING, region STRING);
    
    -- 向sale_detail表增加分区
    ALTER TABLE sale_detail ADD PARTITION (sale_date='2013', region='china') PARTITION (sale_date='2014', region='shanghai');
    
    -- 向sale_detail表追加数据
    INSERT INTO sale_detail PARTITION (sale_date='2013', region='china') VALUES ('s1','c1',100.1),('s2','c2',100.2),('s3','c3',100.3);
    INSERT INTO sale_detail PARTITION (sale_date='2014', region='shanghai') VALUES ('null','c5',null),('s6','c6',100.4),('s7','c7',100.5);
    
    -- 创建一张shop_detail表,插入sale_detail表分区为2013下的数据
    SET odps.sql.allow.fullscan=true;
    CREATE TABLE shop_detail AS SELECT shop_name,customer_id,total_price FROM sale_detail WHERE sale_date='2013'AND region='china'; 
  2. 查询分区表sale_detail中的数据,命令示例如下:

    SET odps.sql.allow.fullscan=true;
    SELECT * FROM sale_detail; 

    返回结果如下:

    +------------+-------------+-------------+------------+------------+
    | shop_name  | customer_id | total_price | sale_date  | region     |
    +------------+-------------+-------------+------------+------------+
    | s1         | c1          | 100.1       | 2013       | china      |
    | s2         | c2          | 100.2       | 2013       | china      |
    | s3         | c3          | 100.3       | 2013       | china      |
    | null       | c5          | NULL        | 2014       | shanghai   |
    | s6         | c6          | 100.4       | 2014       | shanghai   |
    | s7         | c7          | 100.5       | 2014       | shanghai   |
    +------------+-------------+-------------+------------+------------+
  3. 查询shop_detail表中的数据,命令示例如下:

    SELECT * FROM shop_detail; 

    返回结果如下:

    +------------+-------------+-------------+
    | shop_name  | customer_id | total_price |
    +------------+-------------+-------------+
    | s1         | c1          | 100.1       |
    | s2         | c2          | 100.2       |
    | s3         | c3          | 100.3       |
    +------------+-------------+-------------+

使用示例

下述示例均基于示例数据为您介绍SUBQUERY_MAPJOIN HINT的使用。

示例1:SCALAR SUBQUERY使用SUBQUERY_MAPJOIN HINT

SET odps.sql.allow.fullscan=true;
SELECT * FROM shop_detail WHERE (/*+ subquery_mapjoin */ SELECT COUNT(*) FROM sale_detail WHERE sale_detail.shop_name = shop_detail.shop_name) >= 1;

返回结果如下:

+------------+-------------+-------------+
| shop_name  | customer_id | total_price |
+------------+-------------+-------------+
| s1         | c1          | 100.1       |
| s2         | c2          | 100.2       |
| s3         | c3          | 100.3       |
+------------+-------------+-------------+

示例2:IN子 SUBQUERY使用SUBQUERY_MAPJOIN HINT

SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail WHERE total_price IN (/*+ subquery_mapjoin */ SELECT total_price FROM shop_detail WHERE customer_id = shop_detail.customer_id);

返回结果如下:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

示例3:EXISTS SUBQUERY使用SUBQUERY_MAPJOIN HINT

SET odps.sql.allow.fullscan=true;
SELECT * FROM sale_detail WHERE EXISTS (/*+ subquery_mapjoin */ SELECT * FROM shop_detail WHERE customer_id = sale_detail.customer_id);

返回结果如下:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

示例4:不支持基础子查询指定使用SUBQUERY_MAPJOIN HINT

如下Query会报错,因为subquery_mapjoin不支持基础子查询。

SET odps.sql.allow.fullscan=true;
SELECT * FROM (/*+ subquery_mapjoin */ SELECT shop_name FROM sale_detail) a;

返回结果如下:

-- 报错信息
FAILED: ODPS-0130161:[1,16] Parse exception - invalid subquery_mapjoin hint, should only be used for scalar/in/exists subquery

示例5:未转成JOIN的子查询,系统会在输出查询结果的同时,进行报错提示(即WARNING信息)

-- 在表shop_detail中增加一列,并添加数据
ALTER TABLE shop_detail ADD columns if not exists(sale_date STRING);
INSERT OVERWRITE TABLE shop_detail VALUES ('s1','c1',100.1,'2013'),('s2','c2',100.2,'2013'),('s3','c3',100.3,'2013');

-- 该flag允许系统输出warning,假如project的默认配置已经满足要求,则不需要设置
SET odps.compiler.warning.disable=false;

/** 以下查询由于涉及到分区表,系统为了支持分区裁剪,没有把subquery转成join,
使用SUBQUERY_MAPJOIN HINT时,系统会输出waring信息 **/
SELECT * FROM sale_detail WHERE sale_date IN (/*+ subquery_mapjoin */ SELECT sale_date FROM shop_detail);

返回结果如下:

+------------+-------------+-------------+------------+------------+
| shop_name  | customer_id | total_price | sale_date  | region     |
+------------+-------------+-------------+------------+------------+
| s1         | c1          | 100.1       | 2013       | china      |
| s2         | c2          | 100.2       | 2013       | china      |
| s3         | c3          | 100.3       | 2013       | china      |
+------------+-------------+-------------+------------+------------+

WARNING信息如下:

WARNING:[1,47]  subquery_mapjoin hint does not work because the subquery is not converted to join