如何使用GROUPINGSETS进行多维聚合_云原生大数据计算服务 MaxCompute(MaxCompute)-阿里云帮助中心

功能介绍

grouping sets是对select语句中group by子句的扩展，允许您采用多种方式对结果分组，而不必使用多个select语句再union all来实现。这样能够使MaxCompute的引擎给出更有效的执行计划，从而提高执行性能。

与grouping sets相关联的语法如下。

类型	说明

类型	说明
`cube`	特殊的`grouping sets`，将指定列的所有可能组合作为`grouping sets`，也可以与`grouping sets`组合使用。 `group by cube (a, b, c) --等效于以下语句。 grouping sets ((a,b,c),(a,b),(a,c),(b,c),(a),(b),(c),()) group by cube ((a,b),(c,d)) --等效于以下语句。 grouping sets ( (a,b,c,d), (a,b), (c,d), () ) group by a, cube (b,c), grouping sets ((d),(e)) --等效于以下语句。 group by grouping sets ( (a, b, c, d), (a, b, c, e), (a, b, d), (a, b, e), (a, c, d), (a, c, e), (a, d), (a, e) )`
`rollup`	特殊的`grouping sets`，以按层级聚合的方式产生`grouping sets`，也可以与`grouping sets`组合使用。 `group by rollup (a,b,c) --等效价于以下语句。 grouping sets ((a,b,c),(a,b),(a),()) group by rollup (a,(b,c),d) --等效于以下语句。 grouping sets ( (a,b,c,d), (a,b,c), (a), () ) group by grouping sets((b),(c),rollup(a,b,c)) --等效于以下语句。 group by grouping sets ( (b),(c),(a,b,c),(a,b),(a),() )`
`grouping`	`grouping`用于区分占位符NULL与数据中的NULL。接受一个列名作为参数，如果结果行使用了参数列做聚合，返回0，否则返回1。
`grouping_id`	接受一个或多个列名作为参数。结果是将参数列的`grouping`结果按照`Bitmap`的方式组成整数。
`grouping__id`	`grouping__id`不带参数，用于兼容Hive查询。等价于`grouping_id（group by参数列表）`，参数与`group by`的顺序一致。说明 MaxCompute和Hive 2.3.0及以上版本兼容该函数，但在Hive 2.3.0以下版本中输出不一致，因此并不推荐使用此函数。

GROUPING SETS使用示例

grouping sets使用示例如下：

准备数据。

create table requests lifecycle 20 as 
select * from values 
    (1, 'windows', 'PC', 'Beijing'),
    (2, 'windows', 'PC', 'Shijiazhuang'),
    (3, 'linux', 'Phone', 'Beijing'),
    (4, 'windows', 'PC', 'Beijing'),
    (5, 'ios', 'Phone', 'Shijiazhuang'),
    (6, 'linux', 'PC', 'Beijing'),
    (7, 'windows', 'Phone', 'Shijiazhuang') 
as t(id, os, device, city);

对数据进行分组。您可以通过如下两种方式进行分组：

使用多个select语句进行分组。

select NULL, NULL, NULL, count(*)
from requests
union all
select os, device, NULL, count(*)
from requests group by os, device
union all
select null, null, city, count(*)
from requests group by city;

使用grouping sets进行分组。

select os,device, city ,count(*)
from requests
group by grouping sets((os, device), (city), ());

返回结果如下：

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| NULL       | NULL       | Beijing    | 4          |
| NULL       | NULL       | Shijiazhuang | 3          |
| ios        | Phone      | NULL       | 1          |
| linux      | PC         | NULL       | 1          |
| linux      | Phone      | NULL       | 1          |
| windows    | PC         | NULL       | 3          |
| windows    | Phone      | NULL       | 1          |
+------------+------------+------------+------------+

说明

分组集中不使用的表达式，会使用NULL充当占位符，使得这些结果集可以做操作。例如结果第4~8行的city列。

CUBE | ROLLUP使用示例

基于grouping sets示例表，cube及rollup使用示例如下：

示例1：通过cube枚举os、device、city的所有可能列为grouping sets。

select os,device, city, count(*)
from requests 
group by cube (os, device, city);
--等效于如下语句。
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os, city),(device,city),(os),(device),(city),());

返回结果如下：

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| NULL       | NULL       | Beijing    | 4          |
| NULL       | NULL       | Shijiazhuang | 3          |
| NULL       | PC         | NULL       | 4          |
| NULL       | PC         | Beijing    | 3          |
| NULL       | PC         | Shijiazhuang | 1          |
| NULL       | Phone      | NULL       | 3          |
| NULL       | Phone      | Beijing    | 1          |
| NULL       | Phone      | Shijiazhuang | 2          |
| ios        | NULL       | NULL       | 1          |
| ios        | NULL       | Shijiazhuang | 1          |
| ios        | Phone      | NULL       | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | NULL       | NULL       | 2          |
| linux      | NULL       | Beijing    | 2          |
| linux      | PC         | NULL       | 1          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | NULL       | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | NULL       | NULL       | 4          |
| windows    | NULL       | Beijing    | 2          |
| windows    | NULL       | Shijiazhuang | 2          |
| windows    | PC         | NULL       | 3          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | NULL       | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

示例2：通过cube枚举（os, device）,（device, city）所有可能列为grouping sets。

select os,device, city, count(*) 
from requests 
group by cube ((os, device), (device, city));
--等效于如下语句。
select os,device, city, count(*) 
from requests 
group by grouping sets ((os, device, city),(os, device),(device,city),());

返回结果如下：

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| NULL       | PC         | Beijing    | 3          |
| NULL       | PC         | Shijiazhuang | 1          |
| NULL       | Phone      | Beijing    | 1          |
| NULL       | Phone      | Shijiazhuang | 2          |
| ios        | Phone      | NULL       | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | PC         | NULL       | 1          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | NULL       | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | PC         | NULL       | 3          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | NULL       | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

示例3：通过rollup对os、device、city以按层级聚合的方式产生grouping sets。

select os,device, city, count(*)
from requests 
group by rollup (os, device, city);
--等效于如下语句。
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os),());

返回结果如下：

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| ios        | NULL       | NULL       | 1          |
| ios        | Phone      | NULL       | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | NULL       | NULL       | 2          |
| linux      | PC         | NULL       | 1          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | NULL       | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | NULL       | NULL       | 4          |
| windows    | PC         | NULL       | 3          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | NULL       | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

示例4：通过rollup对os, (os,device), city以按层级聚合的方式产生grouping sets。

select os,device, city, count(*)
from requests 
group by rollup (os, (os,device), city);
--等效于如下语句。
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os),());

返回结果如下：

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 7          |
| ios        | NULL       | NULL       | 1          |
| ios        | Phone      | NULL       | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | NULL       | NULL       | 2          |
| linux      | PC         | NULL       | 1          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | NULL       | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | NULL       | NULL       | 4          |
| windows    | PC         | NULL       | 3          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | NULL       | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

示例5：通过group by、cube、grouping sets组合产生grouping sets。

select os,device, city, count(*)
from requests 
group by os, cube(os,device), grouping sets(city);
--等效于如下语句。
select os,device, city, count(*)
from requests 
group by grouping sets((os,device,city),(os,city),(os,device,city));

返回结果如下：

+------------+------------+------------+------------+
| os         | device     | city       | _c3        |
+------------+------------+------------+------------+
| ios        | NULL       | Shijiazhuang | 1          |
| ios        | Phone      | Shijiazhuang | 1          |
| linux      | NULL       | Beijing    | 2          |
| linux      | PC         | Beijing    | 1          |
| linux      | Phone      | Beijing    | 1          |
| windows    | NULL       | Beijing    | 2          |
| windows    | NULL       | Shijiazhuang | 2          |
| windows    | PC         | Beijing    | 2          |
| windows    | PC         | Shijiazhuang | 1          |
| windows    | Phone      | Shijiazhuang | 1          |
+------------+------------+------------+------------+

GROUPING | GROUPING_ID使用示例

grouping及grouping_id使用示例如下：

select a,b,c,count(*),
grouping(a) ga, grouping(b) gb, grouping(c) gc, grouping_id(a,b,c) groupingid 
from values (1,2,3) as t(a,b,c)
group by cube(a,b,c);

返回结果如下：

+------------+------------+------------+------------+------------+------------+------------+------------+
| a          | b          | c          | _c3        | ga         | gb         | gc         | groupingid |
+------------+------------+------------+------------+------------+------------+------------+------------+
| NULL       | NULL       | NULL       | 1          | 1          | 1          | 1          | 7          |
| NULL       | NULL       | 3          | 1          | 1          | 1          | 0          | 6          |
| NULL       | 2          | NULL       | 1          | 1          | 0          | 1          | 5          |
| NULL       | 2          | 3          | 1          | 1          | 0          | 0          | 4          |
| 1          | NULL       | NULL       | 1          | 0          | 1          | 1          | 3          |
| 1          | NULL       | 3          | 1          | 0          | 1          | 0          | 2          |
| 1          | 2          | NULL       | 1          | 0          | 0          | 1          | 1          |
| 1          | 2          | 3          | 1          | 0          | 0          | 0          | 0          |
+------------+------------+------------+------------+------------+------------+------------+------------+

默认情况，group by列表中未使用的列会被填充为NULL。您可以通过grouping输出更有实际意义的值。基于grouping sets示例表，命令示例如下：

select
  if(grouping(os) == 0, os, 'ALL') as os,
  if(grouping(device) == 0, device, 'ALL') as device,
  if(grouping(city) == 0, city, 'ALL') as city, 
  count(*) as count 
from requests 
group by os, device, city grouping sets((os, device), (city), ());

返回结果如下：

+------------+------------+------------+------------+
| os         | device     | city       | count      |
+------------+------------+------------+------------+
| ALL        | ALL        | ALL        | 7          |
| ALL        | ALL        | Beijing    | 4          |
| ALL        | ALL        | Shijiazhuang | 3          |
| ios        | Phone      | ALL        | 1          |
| linux      | PC         | ALL        | 1          |
| linux      | Phone      | ALL        | 1          |
| windows    | PC         | ALL        | 3          |
| windows    | Phone      | ALL        | 1          |
+------------+------------+------------+------------+

GROUPING__ID使用示例

grouping__id不带参数的命令示例如下：

set odps.sql.hive.compatible=true;
select      
a, b, c, count(*), grouping__id 
from values (1,2,3) as t(a,b,c) 
group by a, b, c grouping sets ((a,b,c), (a));
--等效于如下语句。
select      
a, b, c, count(*), grouping_id(a,b,c)  
from values (1,2,3) as t(a,b,c) 
group by a, b, c grouping sets ((a,b,c), (a));

返回结果如下：

+------------+------------+------------+------------+------------+
| a          | b          | c          | _c3        | _c4        |
+------------+------------+------------+------------+------------+
| 1          | NULL       | NULL       | 1          | 3          |
| 1          | 2          | 3          | 1          | 0          |
+------------+------------+------------+------------+------------+