全部产品

窗口函数

更新时间:2019-08-06 13:44:36

  • CUME_DIST:返回一组数值中每个值的累计分布。
  • RANK:返回数据集中每个值的排名。
  • DENSE_RANK:返回一组数值中每个数值的排名
  • NTILE:将每个窗口分区的数据分散到桶号从1到n的n个桶中。
  • ROW_NUMBER:根据行在窗口分区内的顺序,为每行数据返回一个唯一的有序行号,行号从1开始。
  • PERCENT_RANK:返回数据集中每个数据的排名百分比,其结果由(r - 1) / (n - 1)决定。其中,r为RANK()计算的当前行排名, n为当前窗口分区内总的行数。
  • FIRST_VALUE:返回窗口分区第一行的值。
  • LAST_VALUE:返回窗口分区最后一行的值。
  • LAG:返回窗口内距离当前行之前偏移offset后的值。
  • LEAD:返回窗口内距离当前行偏移offset后的值。
  • NTH_VALUE:返回窗口内偏移指定offset后的值,偏移量从1开始。
    • 概述

      窗口函数基于查询结果的行数据进行计算,窗口函数运行在HAVING子句之后、 ORDER BY子句之前。窗口函数需要特殊的关键字OVER子句来指定窗口即触发一个窗口函数。

      分析型数据库MySQL版支持三种类型的窗口函数:聚合函数、排序函数和值函数。

      语法

      1. function over (partition by a order by b RANGE|ROWS BETWEEN start AND end)

      窗口函数包含以下三个部分。

      • 分区规范:用于将输入行分散到不同的分区中,过程和GROUP BY子句的分散过程相似。

      • 排序规范:决定输入数据行在窗口函数中执行的顺序。

      • 窗口区间:指定计算数据的窗口边界。

        窗口区间支持RANGEROWS两种模式:

        • RANGE按照计算列值的范围进行定义。

        • ROWS按照计算列的行数进行范围定义。

        • RANGEROWS中可以使用BETWEEN start AND end指定边界可取值。BETWEEN start AND end取值为:

          • CURRENT ROW:当前行。
          • N PRECEDING:前n行。
          • UNBOUNDED PRECEDING:直到第1行。
          • N FOLLOWING:后n行。
          • UNBOUNDED FOLLOWING:直到最后1行。

      例如,以下查询根据当前窗口的每行数据计算profit的部分总和。

      1. select year,country,profit,sum(profit) over (partition by country order by year ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW) as slidewindow from testwindow;
      2. +------+---------+--------+-------------+
      3. | year | country | profit | slidewindow |
      4. +------+---------+--------+-------------+
      5. | 2001 | USA | 50 | 50 |
      6. | 2001 | USA | 1500 | 1550 |
      7. | 2000 | India | 75 | 75 |
      8. | 2000 | India | 75 | 150 |
      9. | 2001 | India | 79 | 229 |
      10. | 2000 | Finland | 1500 | 1500 |
      11. | 2001 | Finland | 10 | 1510 |

      而以下查询只能计算出profit的总和。

      1. select country,sum(profit) over (partition by country) from testwindow;
      2. +---------+-----------------------------------------+
      3. | country | sum(profit) OVER (PARTITION BY country) |
      4. +---------+-----------------------------------------+
      5. | India | 229 |
      6. | India | 229 |
      7. | India | 229 |
      8. | USA | 1550 |
      9. | USA | 1550 |
      10. | Finland | 1510 |
      11. | Finland | 1510 |

      注意事项

      边界值的取值有如下要求需要您注意。

      • start不能为UNBOUNDED FOLLOWING,否则提示Window frame start cannot be UNBOUNDED FOLLOWING错误。

      • end不能为UNBOUNDED PRECEDING,否则提示Window frame end cannot be UNBOUNDED PRECEDING错误。

      • startCURRENT ROW并且endN PRECEDING时,将提示Window frame starting from CURRENT ROW cannot end with PRECEDING错误。

      • startN FOLLOWING并且endN PRECEDING时,将提示Window frame starting from FOLLOWING cannot end with PRECEDING错误。

      • startN FOLLOWING并且endCURRENT ROW,将提示Window frame starting from FOLLOWING cannot end with CURRENT ROW错误。

      当模式为RANGE时:

      • start或者endN PRECEDING时,将提示Window frame RANGE PRECEDING is only supported with UNBOUNDED错误。

      • start或者endN FOLLOWING时,将提示Window frame RANGE FOLLOWING is only supported with UNBOUNDED错误。

      准备工作

      本文中的窗口函数均以testwindow表为测试数据。

      1. create table testwindow(year int, country varchar(20), product varchar(20), profit int) distributed by hash(year);
      1. insert into testwindow values (2000,'Finland','Computer',1500);
      2. insert into testwindow values (2001,'Finland','Phone',10);
      3. insert into testwindow values (2000,'India','Calculator',75);
      4. insert into testwindow values (2000,'India','Calculator',75);
      5. insert into testwindow values (2001,'India','Calculator',79);
      6. insert into testwindow values (2001,'USA','Calculator',50);
      7. insert into testwindow values (2001,'USA','Computer',1500);
      1. SELECT * FROM testwindow;
      2. +------+---------+------------+--------+
      3. | year | country | product | profit |
      4. +------+---------+------------+--------+
      5. | 2000 | Finland | Computer | 1500 |
      6. | 2001 | Finland | Phone | 10 |
      7. | 2000 | India | Calculator | 75 |
      8. | 2000 | India | Calculator | 75 |
      9. | 2001 | India | Calculator | 79 |
      10. | 2001 | USA | Calculator | 50 |
      11. | 2001 | USA | Computer | 1500 |

      聚合函数

      所有聚合函数都可以通过添加OVER子句来作为窗口函数使用,聚合函数将基于当前滑动窗口内的数据行计算每一行数据。

      例如,通过以下查询循环显示每个店员每天的订单额总和。

      1. SELECT clerk, orderdate, orderkey, totalprice,sum(totalprice) OVER (PARTITION BY clerk ORDER BY orderdate) AS rolling_sum FROM orders ORDER BY clerk, orderdate, orderkey

      排序函数

      CUME_DIST

      1. CUME_DIST()
      • 命令说明:返回一组数值中每个值的累计分布。

        返回结果:在窗口分区中对窗口进行排序后的数据集,包括当前行和当前行之前的数据行数。排序中任何关联值均会计算成相同的分布值。

      • 返回值类型:DOUBLE。

      • 示例:

        1. select year,country,product,profit,cume_dist() over (partition by country order by profit) as cume_dist from testwindow;
        2. +------+---------+------------+--------+--------------------+
        3. | year | country | product | profit | cume_dist |
        4. +------+---------+------------+--------+--------------------+
        5. | 2001 | USA | Calculator | 50 | 0.5 |
        6. | 2001 | USA | Computer | 1500 | 1.0 |
        7. | 2001 | Finland | Phone | 10 | 0.5 |
        8. | 2000 | Finland | Computer | 1500 | 1.0 |
        9. | 2000 | India | Calculator | 75 | 0.6666666666666666 |
        10. | 2000 | India | Calculator | 75 | 0.6666666666666666 |
        11. | 2001 | India | Calculator | 79 | 1.0 |

      RANK

      1. RANK()
      • 命令说明:返回数据集中每个值的排名。

        排名值是将当前行之前的行数加1,不包含当前行。因此,排序的关联值可能产生顺序上的空隙,而且这个排名会对每个窗口分区进行计算。

      • 返回值类型:BIGINT。

      • 示例:

        1. select year,country,product,profit,rank() over (partition by country order by profit) as rank from testwindow;
        2. +------+---------+------------+--------+------+
        3. | year | country | product | profit | rank |
        4. +------+---------+------------+--------+------+
        5. | 2001 | Finland | Phone | 10 | 1 |
        6. | 2000 | Finland | Computer | 1500 | 2 |
        7. | 2001 | USA | Calculator | 50 | 1 |
        8. | 2001 | USA | Computer | 1500 | 2 |
        9. | 2000 | India | Calculator | 75 | 1 |
        10. | 2000 | India | Calculator | 75 | 1 |
        11. | 2001 | India | Calculator | 79 | 3 |

      DENSE_RANK

      1. DENSE_RANK()
      • 命令说明:返回一组数值中每个数值的排名。

        DENSE_RANK()RANK()功能相似,但是DENSE_RANK()关联值不会产生顺序上的空隙。

      • 返回值类型:BIGINT。

      • 示例:

        1. select year,country,product,profit,dense_rank() over (partition by country order by profit) as dense_rank from testwindow;
        2. +------+---------+------------+--------+------------+
        3. | year | country | product | profit | dense_rank |
        4. +------+---------+------------+--------+------------+
        5. | 2001 | Finland | Phone | 10 | 1 |
        6. | 2000 | Finland | Computer | 1500 | 2 |
        7. | 2001 | USA | Calculator | 50 | 1 |
        8. | 2001 | USA | Computer | 1500 | 2 |
        9. | 2000 | India | Calculator | 75 | 1 |
        10. | 2000 | India | Calculator | 75 | 1 |
        11. | 2001 | India | Calculator | 79 | 2 |

      NTILE

      1. NTILE(n)
      • 命令说明:将每个窗口分区的数据分散到桶号从1nn个桶中。

        桶号值最多间隔1,如果窗口分区中的数据行数不能均匀地分散到每一个桶中,则剩余值将从第1个桶开始,每1个桶分1行数据。例如,有6行数据和4个桶, 最终桶号值为1 1 2 2 3 4

      • 返回值类型:BIGINT。

      • 示例:

        1. select year,country,product,profit,ntile(2) over (partition by country order by profit) as ntile2 from testwindow;
        2. +------+---------+------------+--------+--------+
        3. | year | country | product | profit | ntile2 |
        4. +------+---------+------------+--------+--------+
        5. | 2001 | USA | Calculator | 50 | 1 |
        6. | 2001 | USA | Computer | 1500 | 2 |
        7. | 2001 | Finland | Phone | 10 | 1 |
        8. | 2000 | Finland | Computer | 1500 | 2 |
        9. | 2000 | India | Calculator | 75 | 1 |
        10. | 2000 | India | Calculator | 75 | 1 |
        11. | 2001 | India | Calculator | 79 | 2 |

      ROW_NUMBER

      1. ROW_NUMBER()
      • 命令说明:根据行在窗口分区内的顺序,为每行数据返回一个唯一的有序行号,行号从1开始。

      • 返回值类型:BIGINT。

      • 示例:

        1. SELECT year, country, product, profit, ROW_NUMBER() OVER(PARTITION BY country) AS row_num1 FROM testwindow;
        2. +------+---------+------------+--------+----------+
        3. | year | country | product | profit | row_num1 |
        4. +------+---------+------------+--------+----------+
        5. | 2001 | USA | Calculator | 50 | 1 |
        6. | 2001 | USA | Computer | 1500 | 2 |
        7. | 2000 | India | Calculator | 75 | 1 |
        8. | 2000 | India | Calculator | 75 | 2 |
        9. | 2001 | India | Calculator | 79 | 3 |
        10. | 2000 | Finland | Computer | 1500 | 1 |
        11. | 2001 | Finland | Phone | 10 | 2 |

      PERCENT_RANK

      1. PERCENT_RANK()
      • 命令说明:返回数据集中每个数据的排名百分比,其结果由(r - 1) / (n - 1)决定。其中,rRANK()计算的当前行排名, n为当前窗口分区内总的行数。

      • 返回值类型:DOUBLE。

      • 示例:

        1. select year,country,product,profit,PERCENT_RANK() over (partition by country order by profit) as ntile3 from testwindow;
        2. +------+---------+------------+--------+--------+
        3. | year | country | product | profit | ntile3 |
        4. +------+---------+------------+--------+--------+
        5. | 2001 | Finland | Phone | 10 | 0.0 |
        6. | 2000 | Finland | Computer | 1500 | 1.0 |
        7. | 2001 | USA | Calculator | 50 | 0.0 |
        8. | 2001 | USA | Computer | 1500 | 1.0 |
        9. | 2000 | India | Calculator | 75 | 0.0 |
        10. | 2000 | India | Calculator | 75 | 0.0 |
        11. | 2001 | India | Calculator | 79 | 1.0 |

      值函数

      FIRST_VALUE

      1. FIRST_VALUE(x)
      • 命令说明:返回窗口分区第一行的值。

      • 返回值类型:与输入参数类型相同。

      • 示例:

        1. select year,country,product,profit,first_value(profit) over (partition by country order by profit) as firstValue from testwindow;
        2. +------+---------+------------+--------+------------+
        3. | year | country | product | profit | firstValue |
        4. +------+---------+------------+--------+------------+
        5. | 2000 | India | Calculator | 75 | 75 |
        6. | 2000 | India | Calculator | 75 | 75 |
        7. | 2001 | India | Calculator | 79 | 75 |
        8. | 2001 | USA | Calculator | 50 | 50 |
        9. | 2001 | USA | Computer | 1500 | 50 |
        10. | 2001 | Finland | Phone | 10 | 10 |
        11. | 2000 | Finland | Computer | 1500 | 10 |

      LAST_VALUE

      1. LAST_VALUE(x)
      • 命令说明:返回窗口分区最后一行的值。

      • 返回值类型:与输入参数类型相同。

      • 示例:

        1. select year,country,product,profit,last_value(profit) over (partition by country order by profit) as firstValue from testwindow;
        2. +------+---------+------------+--------+------------+
        3. | year | country | product | profit | firstValue |
        4. +------+---------+------------+--------+------------+
        5. | 2001 | Finland | Phone | 10 | 10 |
        6. | 2000 | Finland | Computer | 1500 | 1500 |
        7. | 2001 | USA | Calculator | 50 | 50 |
        8. | 2001 | USA | Computer | 1500 | 1500 |
        9. | 2000 | India | Calculator | 75 | 75 |
        10. | 2000 | India | Calculator | 75 | 75 |
        11. | 2001 | India | Calculator | 79 | 79 |

      LAG

      1. LAG(x[, offset[, default_value]])
      • 命令说明:返回窗口内距离当前行之前偏移offset后的值。

        偏移量起始值是0,也就是当前数据行。偏移量可以是标量表达式,默认offset1

        如果偏移量的值是null或者大于窗口长度,则返回default_value;如果没有指定default_value,则返回null

      • 返回值类型:与输入参数类型相同。

      • 示例:

        1. select year,country,product,profit,lag(profit) over (partition by country order by profit) as lag from testwindow;
        2. +------+---------+------------+--------+------+
        3. | year | country | product | profit | lag |
        4. +------+---------+------------+--------+------+
        5. | 2001 | USA | Calculator | 50 | NULL |
        6. | 2001 | USA | Computer | 1500 | 50 |
        7. | 2000 | India | Calculator | 75 | NULL |
        8. | 2000 | India | Calculator | 75 | 75 |
        9. | 2001 | India | Calculator | 79 | 75 |
        10. | 2001 | Finland | Phone | 10 | NULL |
        11. | 2000 | Finland | Computer | 1500 | 10 |

      LEAD

      1. LEAD(x[,offset[, default_value]])
      • 命令说明:返回窗口内距离当前行偏移offset后的值。

        偏移量offset起始值是0,也就是当前数据行。偏移量可以是标量表达式,默认offset1

        如果偏移量的值是null或者大于窗口长度,则返回default_value;如果没有指定default_value,则返回null

      • 返回值类型:与输入参数类型相同。

      • 示例:

        1. select year,country,product,profit,lead(profit) over (partition by country order by profit) as lead from testwindow;
        2. +------+---------+------------+--------+------+
        3. | year | country | product | profit | lead |
        4. +------+---------+------------+--------+------+
        5. | 2000 | India | Calculator | 75 | 75 |
        6. | 2000 | India | Calculator | 75 | 79 |
        7. | 2001 | India | Calculator | 79 | NULL |
        8. | 2001 | Finland | Phone | 10 | 1500 |
        9. | 2000 | Finland | Computer | 1500 | NULL |
        10. | 2001 | USA | Calculator | 50 | 1500 |
        11. | 2001 | USA | Computer | 1500 | NULL |

      NTH_VALUE

      1. NTH_VALUE(x, offset)
      • 命令说明:返回窗口内偏移指定offset后的值,偏移量从1开始。

        如果偏移量offsetnull或者大于窗口内值的个数,则返回null;如果偏移量offset0或者负数,则系统提示报错。

      • 返回值类型:与输入参数类型相同。

      • 示例:

        1. select year,country,product,profit,nth_value(profit,1) over (partition by country order by profit) as nth_value from testwindow;
        2. +------+---------+------------+--------+-----------+
        3. | year | country | product | profit | nth_value |
        4. +------+---------+------------+--------+-----------+
        5. | 2001 | Finland | Phone | 10 | 10 |
        6. | 2000 | Finland | Computer | 1500 | 10 |
        7. | 2001 | USA | Calculator | 50 | 50 |
        8. | 2001 | USA | Computer | 1500 | 50 |
        9. | 2000 | India | Calculator | 75 | 75 |
        10. | 2000 | India | Calculator | 75 | 75 |
        11. | 2001 | India | Calculator | 79 | 75 |