全部产品
云市场

分析函数

更新时间:2019-09-18 13:13:47

简介

分析函数(某些数据库下也叫做窗口函数)与聚集函数类似,计算总是基于一组行的集合,不同的是,聚集函数一组只能返回一行,而窗口函数每组可以返回多行,组内每一行都是基于窗口的逻辑计算的结果。分析函数可以显著优化需要self-join的查询。

功能支持

目前OceanBase支持以下分析函数:

  • SUM

  • MIN

  • MAX

  • COUNT

  • AVG

  • GROUP_CONCAT

  • ROW_NUMBER

  • RANK

  • DENSE_RANK

  • PERCENT_RANK

  • CUME_DIST

  • FIRST_VALUE

  • LAST_VALUE

  • NTH_VALUE

  • NTILE

  • LEAD

  • LAG

“窗口”也称为FRAME,OceanBase同时支持ROWS与RANGE两种FRAME语义,前者是基于物理行偏移的窗口,后者则是基于逻辑值偏移的窗口。

执行计划

OceanBase分析函数的相应算子是: WINDOW FUNCTION,该算子依赖下层算子的输出按分区 + 排序键有序,所以在必要时会分配排序算子
注:

  1. mysql> explain select row_number() over (partition by table_id order by partition_id) from __all_root_table;
  2. +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  3. | Query Plan |
  4. +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  5. | ====================================================
  6. |ID|OPERATOR |NAME |EST. ROWS|COST|
  7. ----------------------------------------------------
  8. |0 |WINDOW FUNCTION| |1000 |2240|
  9. |1 | SORT | |1000 |2049|
  10. |2 | TABLE SCAN |__all_root_table|1000 |499 |
  11. ====================================================
  12. Outputs & filters:
  13. -------------------------------------
  14. 0 - output([T_WIN_FUN_ROW_NUMBER()]), filter(nil),
  15. win_expr(T_WIN_FUN_ROW_NUMBER()), partition_by([__all_root_table.table_id]), order_by([__all_root_table.partition_id, ASC]), window_type(RANGE), upper(UNBOUNDED PRECEDING), lower(UNBOUNDED FOLLOWING)
  16. 1 - output([__all_root_table.table_id], [__all_root_table.partition_id]), filter(nil), sort_keys([__all_root_table.table_id, ASC], [__all_root_table.partition_id, ASC])
  17. 2 - output([__all_root_table.table_id], [__all_root_table.partition_id]), filter(nil),
  18. access([__all_root_table.table_id], [__all_root_table.partition_id]), partitions(p0)

排序消除

如果优化阶段已经知道下层输出是有序的,则不会分配排序算子((tenant_id,table_id)为__all_root_table主键前缀)

  1. explain select row_number() over (partition by tenant_id, table_id order by partition_id) from __all_root_table;
  2. +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  3. | Query Plan |
  4. +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  5. | ====================================================
  6. |ID|OPERATOR |NAME |EST. ROWS|COST|
  7. ----------------------------------------------------
  8. |0 |WINDOW FUNCTION| |1000 |690 |
  9. |1 | TABLE SCAN |__all_root_table|1000 |499 |
  10. ====================================================
  11. Outputs & filters:
  12. -------------------------------------
  13. 0 - output([T_WIN_FUN_ROW_NUMBER()]), filter(nil),
  14. win_expr(T_WIN_FUN_ROW_NUMBER()), partition_by([__all_root_table.tenant_id], [__all_root_table.table_id]), order_by([__all_root_table.partition_id, ASC]), window_type(RANGE), upper(UNBOUNDED PRECEDING), lower(UNBOUNDED FOLLOWING)
  15. 1 - output([__all_root_table.tenant_id], [__all_root_table.table_id], [__all_root_table.partition_id]), filter(nil),
  16. access([__all_root_table.tenant_id], [__all_root_table.table_id], [__all_root_table.partition_id]), partitions(p0)
  17. |

计算合并

当同时存在多个分析函数时,如果存在可以共享一个序的子集,子集内所有函数会在同一个算子里计算而不是拆到多个算子,这样可以有效减少内存开销和边界判断。

如下面的例子所示,row_number()与rank()两个分析函数的计算可以共用同一个排序的输出,因此,只会分配一个sort节点:

  1. mysql> explain select row_number() over (partition by table_id order by partition_id), rank() over (partition by table_id, partition_id order by svr_ip) from __all_root_table;
  2. | ====================================================
  3. |ID|OPERATOR |NAME |EST. ROWS|COST|
  4. ----------------------------------------------------
  5. |0 |WINDOW FUNCTION| |1000 |3017|
  6. |1 | SORT | |1000 |2826|
  7. |2 | TABLE SCAN |__all_root_table|1000 |499 |
  8. ====================================================
  9. Outputs & filters:
  10. -------------------------------------
  11. 0 - output([T_WIN_FUN_ROW_NUMBER()], [T_WIN_FUN_RANK()]), filter(nil),
  12. win_expr(T_WIN_FUN_RANK()), partition_by([__all_root_table.table_id], [__all_root_table.partition_id]), order_by([__all_root_table.svr_ip, ASC]), window_type(RANGE), uppe r(UNBOUNDED PRECEDING), lower(UNBOUNDED FOLLOWING)
  13. win_expr(T_WIN_FUN_ROW_NUMBER()), partition_by([__all_root_table.table_id]), order_by([__all_root_table.partition_id, ASC]), window_type(RANGE), upper(UNBOUNDED PRECEDING ), lower(UNBOUNDED FOLLOWING)
  14. 1 - output([__all_root_table.table_id], [__all_root_table.partition_id], [__all_root_table.svr_ip]), filter(nil), sort_keys([__all_root_table.table_id, ASC], [__all_root_tabl e.partition_id, ASC], [__all_root_table.svr_ip, ASC])
  15. 2 - output([__all_root_table.table_id], [__all_root_table.partition_id], [__all_root_table.svr_ip]), filter(nil),
  16. access([__all_root_table.table_id], [__all_root_table.partition_id], [__all_root_table.svr_ip]), partitions(p0)

落盘

分析函数的计算也支持执行期的落盘操作,其逻辑与排序操作符的落盘逻辑一致。