Grouping Sets

更新时间:
复制 MD 格式

GROUPING SETS is an extension of the GROUP BY clause that lets you aggregate query results across multiple grouping dimensions in a single query. Without GROUPING SETS, you would need to execute multiple SELECT statements to achieve the same result.

OpenSearch High-performance Search Edition generates a simpler execution plan for GROUPING SETS queries, which improves query execution performance.

Syntax

SELECT column1, column2, aggregate_function(column3)
FROM table_name
GROUP BY GROUPING SETS (
    (grouping_set_1),
    (grouping_set_2),
    ...
)

Parameters

ParameterDescription
grouping_setA comma-separated list of columns enclosed in parentheses. An empty set () computes the grand total across all rows. When a grouping set has only one column, the parentheses are optional — GROUPING SETS ((brand)) is equivalent to GROUPING SETS (brand).

A GROUPING SETS query with the sets (a), (b), and () is semantically equivalent to:

SELECT a, NULL AS b, SUM(c) FROM t GROUP BY a
UNION ALL
SELECT NULL AS a, b, SUM(c) FROM t GROUP BY b
UNION ALL
SELECT NULL AS a, NULL AS b, SUM(c) FROM t

Example

Basic aggregation

The following query calculates phone prices aggregated by brand, by size, and as a grand total — all in one statement.

SELECT brand, size, SUM(price) AS sp
FROM phone
GROUP BY GROUPING SETS ((brand), (size), ())

The query uses three grouping sets:

  • (brand) — aggregates prices by brand; the size column is set to the default value

  • (size) — aggregates prices by size; the brand column is set to the default value

  • () — computes the grand total; both brand and size are set to the default value

Result:

BRAND     SIZE  sp
""        0.0   32936   <- grand total
""        1.4   169     <- aggregated by size
""        4.7   7897
""        5.0   899
""        5.5   14684
""        5.6   5688
""        5.9   3599
"Huawei"  0     7987    <- aggregated by brand
"Meizu"   0     1299
"Nokia"   0     169
"Xiaomi"  0     899
"OPPO"    0     2999
"Samsung" 0     5688
"Apple"   0     13895

Columns not part of a given grouping set are replaced with the default value.

Distinguish aggregation-generated default values with GROUPING()

A problem arises when the original data contains empty strings or null values. For example, if a phone's brand is stored as an empty string "" in the source data, the result above cannot distinguish that legitimate empty string from the default values the engine generates for the brand column when aggregating by size.

Use the GROUPING function to tell them apart:

SELECT brand, size, SUM(price) AS sp, GROUPING(brand, size), GROUPING(brand) AS g1
FROM phone
GROUP BY GROUPING SETS ((brand), (size), ()) LIMIT 20

Result:

brand     size  sp     GROUPING(brand,size)  g1
""        0.0   32936  3                     1
""        1.4   169    2                     1
""        4.7   7897   2                     1
""        5.0   899    2                     1
""        5.5   14684  2                     1
""        5.6   5688   2                     1
""        5.9   3599   2                     1
"Huawei"  0     7987   1                     0
"Meizu"   0     1299   1                     0
"Nokia"   0     169    1                     0
"Xiaomi"  0     899    1                     0
"OPPO"    0     2999   1                     0
"Samsung" 0     5688   1                     0
"Apple"   0     13895  1                     0

The GROUPING function

GROUPING(col1, ..., colN) returns an integer. Each bit in the binary representation corresponds to one input column, with the rightmost argument mapped to the least significant bit.

Bit valueMeaning
0The column is part of the current grouping set (value comes from the data)
1The column is NOT part of the current grouping set (value is aggregation-generated default value)

The integer result is the decimal equivalent of the binary bit string.

Reading the example results

GROUPING(brand, size) encodes two columns:

  • Grand total row (row 1, size = 0.0): both columns are excluded → binary 11 = decimal 3, so GROUPING(brand, size) = 3

  • Rows aggregated by size (rows 2–7, size = 1.4–5.9):

    • brand is excluded from the grouping set → bit = 1 (high bit)

    • size is included in the grouping set → bit = 0 (low bit)

    • Binary 10 = decimal 2, so GROUPING(brand, size) = 2

  • Rows aggregated by brand (rows 8–14):

    • brand is included → high bit = 0

    • size is excluded → low bit = 1

    • Binary 01 = decimal 1, so GROUPING(brand, size) = 1

GROUPING(brand) encodes a single column:

  • Rows 1–7 (grand total and size aggregations): brand is excluded → GROUPING(brand) = 1

  • Rows 8–14 (aggregated by brand): brand is included → GROUPING(brand) = 0

Usage notes

  • The GROUPING function accepts up to 32 input parameters.

  • A GROUPING result of 0 for a column means the value in that column comes from the actual data — it may still be a null value or an empty string if the source data contains those values.

  • A GROUPING result of 1 (or any non-zero value for multi-column calls) confirms the default value was generated by the aggregation engine, not present in the source data.