本文为您介绍UNION、UNOIN ALL、UNION DISTINCT并集,INTERSECT、INTERSECT ALL、INTERSECT DISTINCT交集,EXCEPT、EXCEPT ALL、EXCEPT DISTINCT补集等SQL语法。

语法格式如下。
select_statement UNION ALL select_statement;
select_statement UNION [DISTINCT] select_statement;
select_statement INTERSECT ALL select_statement;
select_statement INTERSECT [DISTINCT] select_statement;
select_statement EXCEPT ALL select_statement;
select_statement EXCEPT [DISTINCT] select_statement;
select_statement MINUS ALL select_statement;
select_statement MINUS [DISTINCT] select_statement;
  • UNION: 求两个数据集的并集,即将两个数据集合并成一个数据集。
  • INTERSECT:求两个数据集的交集,即输出两个数据集均包含的记录。
  • EXCEPT: 求第二个数据集在第一个数据集中的补集,即输出第一个数据集包含而第二个数据集不包含的记录。
  • MINUS: 等同于EXCEPT
示例
  • UNION ALL示例
    SELECT * FROM VALUES (1, 2), (1, 2), (3, 4) t(a, b) 
    UNION ALL 
    SELECT * FROM VALUES (1, 2), (1, 4) t(a, b);
    结果:将两个数据集合并。
    +------------+------------+
    | a          | b          |
    +------------+------------+
    | 1          | 2          |
    | 1          | 4          |
    | 1          | 2          |
    | 1          | 2          |
    | 3          | 4          |
    +------------+------------+
  • UNION DISTINCT示例
    SELECT * FROM VALUES (1, 2), (1, 2), (3, 4) t(a, b) 
    UNION 
    SELECT * FROM VALUES (1, 2), (1, 4) t(a, b);
    结果:等同于SELECT DISTINCT * FROM (< UNION ALL的结果 >) t;语句。
    +------------+------------+
    | a          | b          |
    +------------+------------+
    | 1          | 2          |
    | 1          | 4          |
    | 3          | 4          |
    +------------+------------+
  • INTERSECT ALL示例
    SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 6) t(a, b) 
    INTERSECT ALL 
    SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 7) t(a, b);
    结果:INTERSECT ALL不做去重。可以理解为相同的行,每行后面有个隐藏的序号,可用于将各行区分开显示。
    +------------+------------+
    | a          | b          |
    +------------+------------+
    | 1          | 2          |
    | 1          | 2          |
    | 3          | 4          |
    +------------+------------+
  • INTERSECT DISTINCT示例
    SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 6) t(a, b) 
    INTERSECT 
    SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 7) t(a, b);
    结果:等同于SELECT DISTINCT * FROM (< INTERSECT ALL的结果 >) t;语句。
    +------------+------------+
    | a          | b          |
    +------------+------------+
    | 1          | 2          |
    | 3          | 4          |
    +------------+------------+
  • EXCEPT ALL示例
    SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b) 
    EXCEPT ALL 
    SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);
    结果:EXCEPT ALL不做去重。对于相同的行,每行后面有个隐藏的序号,将各行区分显示。
    +------------+------------+
    | a          | b          |
    +------------+------------+
    | 1          | 2          |
    | 1          | 2          |
    | 3          | 4          |
    | 7          | 8          |
    +------------+------------+
  • EXCEPT DISTINCT示例
    SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b) 
    EXCEPT
    SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);
    结果:相当于SELECT DISTINCT * FROM left_branch EXCEPT ALL SELECT DISTINCT * FROM right_branch;
    +------------+------------+
    | a          | b          |
    +------------+------------+
    | 1          | 2          |
    | 7          | 8          |
    +------------+------------+
    说明
    • 集合操作的结果不一定会按序排列。
    • 集合操作左右两个分支要求列个数必须一致。如果类型不一致,可能会进行隐式类型转换。由于兼容性原因,STRING类型和非STRING类型在集合操作中的隐式转换已被禁用。
    • MaxCompute最多允许256个分支的集合操作,超出个数将报错。
    • UNION后如果有CLUSTER BYDISTRIBUTE BYSORT BYORDER BY或者LIMIT子句,当设置set odps.sql.type.system.odps2=false;时,其作用于UNION的最后一个select_statement;当set odps.sql.type.system.odps2=true;时,作用于前面所有UNION的结果。示例如下。
      set odps.sql.type.system.odps2=true;
      SELECT explode(array(3, 1)) AS (a) UNION ALL SELECT explode(array(0, 4, 2)) AS (a) ORDER BY a LIMIT 3;
      返回结果如下。
      +------+
      | a    |
      +------+
      | 0    |
      | 1    |
      | 2    |
      +------+