本文向您介绍UNOIN ALL、UNION DISTINCT并集,INTERSECT ALL、INTERSECT DISTINCT交集,EXCEPT ALL、EXCEPT DISTINCT补集等SQL语法和相关示例说明。

语法格式如下:
select_statement UNION ALL select_statement;
select_statement UNION [DISTINCT] select_statement;
select_statement INTERSECT ALL select_statement;
select_statement INTERSECT [DISTINCT] select_statement;
select_statement EXCEPT ALL select_statement;
select_statement EXCEPT [DISTINCT] select_statement;
select_statement MINUS ALL select_statement;
select_statement MINUS [DISTINCT] select_statement;
  • UNION: 求两个数据集的并集。即将两个数据集合并成一个数据集。
  • INTERSECT:求两个数据集的交集。即输出两个数据集均包含的记录。
  • EXCEPT: 求第二个数据集在第一个数据集中的补集。即输出第一个数据集包含而第二个数据集不包含的记录。
  • MINUS: 等同于EXCEPT。
示例:
  • UNOIN ALL示例:
    SELECT * FROM VALUES (1, 2), (1, 2), (3, 4) t(a, b) 
     UNION ALL 
    SELECT * FROM VALUES (1, 2), (1, 4) t(a, b);
    结果:UNOIN ALL是将两个数据集合并。
    +------------+------------+
      | a          | b          |
      +------------+------------+
      | 1          | 2          |
      | 1          | 4          |
      | 1          | 2          |
      | 1          | 2          |
      | 3          | 4          |
      +------------+------------+
  • UNION DISTINCT示例
    SELECT * FROM VALUES (1, 2), (1, 2), (3, 4) t(a, b) 
    UNION 
    SELECT * FROM VALUES (1, 2), (1, 4) t(a, b);
    结果:相当于 SELECT DISTINCT * FROM (< UNOIN ALL的结果 >) t;
    +------------+------------+
     | a          | b          |
     +------------+------------+
     | 1          | 2          |
     | 1          | 4          |
     | 3          | 4          |
     +------------+------------+
  • INTERSECT ALL示例
    SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 6) t(a, b) 
    INTERSECT ALL 
    SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 7) t(a, b);
    结果:INTERSECT ALL不做去重。可以理解为相同的行每行后面有个隐藏的序号,可以让各行区分开显示。
    +------------+------------+
     | a          | b          |
     +------------+------------+
     | 1          | 2          |
     | 1          | 2          |
     | 3          | 4          |
     +------------+------------+
  • INTERSECT DISTINCT示例
    SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 6) t(a, b) 
    INTERSECT 
    SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (5, 7) t(a, b);
    结果: SELECT DISTINCT * FROM (< INTERSECT ALL的结果 >) t;
    +------------+------------+
     | a          | b          |
     +------------+------------+
     | 1          | 2          |
     | 3          | 4          |
     +------------+------------+
  • EXCEPT ALL示例
    SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b) 
    EXCEPT ALL 
    SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);
    结果:EXCEPT ALL不做去重。可以理解为相同的行每行后面有个隐藏的序号,让各行区分开显示。
    +------------+------------+
      | a          | b          |
      +------------+------------+
      | 1          | 2          |
      | 1          | 2          |
      | 3          | 4          |
      | 7          | 8          |
      +------------+------------+
  • EXCEPT DISTINCT示例
    SELECT * FROM VALUES (1, 2), (1, 2), (3, 4), (3, 4), (5, 6), (7, 8) t(a, b) 
    EXCEPT
    SELECT * FROM VALUES (3, 4), (5, 6), (5, 6), (9, 10) t(a, b);
    结果:相当于 SELECT DISTINCT * FROM left_branch EXCEPT ALL SELECT DISTINCT * FROM right_branch;
    +------------+------------+
    | a          | b          |
    +------------+------------+
    | 1          | 2          |
    | 7          | 8          |
    +------------+------------+
    说明
    • 集合操作的结果不保证排序。
    • 集合操作左右两个分支要求列个数必须一致。如果类型不一致,可能会插入隐式类型转换。由于兼容性原因,我们禁用了STRING类型和非STRING类型在集合操作中的隐式转换。
    • MaxCompute最多允许256个分支的集合操作,超出将报错。
    • UNION后如果有CLUSTER BY, DISTRIBUTE BY,SORT BY,ORDER BY或者LIMIT子句。当设置set odps.sql.type.system.odps2=false;时,其作用于UNION的最后一个select_statement;set odps.sql.type.system.odps2=true;时,作用于前面所有UNION的结果。例如:
      set odps.sql.type.system.odps2=true;
        SELECT explode(array(3, 1)) AS (a) UNION ALL SELECT explode(array(0, 4, 2)) AS (a) ORDER 
        BY a LIMIT 3;
      返回结果如下:
      +------+
       | a    |
       +------+
       | 0    |
       | 1    |
       | 2    |
       +------+