聚合函数复用功能是指如果使用可复用聚合函数对相同列进行计算,AnalyticDB PostgreSQL版会自动复用第一次的聚合计算过程,并使用各列对应的输出函数输出结果。通过该功能可以大幅减少可复用聚合函数的计算耗时。

说明 聚合函数必须是计算相同列的场景下才能复用。

如何判断聚合函数是否可以复用

聚合函数分为三个步骤,分别为初始化值、聚合计算和结果计算,只要初始化值和聚合计算一致的聚合函数就可以进行复用。您可以使用以下语句查询可复用的聚合函数。

SELECT rank() over (ORDER BY aggtransfn,agginitval), aggfnoid,aggtransfn,agginitval FROM pg_aggregate;

返回结果,请参见聚合函数复用列表

示例

准备测试表以及测试数据。

CREATE TABLE agg_test(x float8, y float8);
INSERT INTO agg_test SELECT 10000*random(), 10000*random() FROM generate_series(1,50000000);

以下内容将分别展示7.0版本和6.0版本使用聚合函数复用的效果:

  • 7.0版本
    • 单个聚合函数
      EXPLAIN (analyze,verbose,timing,costs,buffers) SELECT corr(y,x) FROM agg_test;

      执行计划返回信息如下,查看单个聚合函数的执行计划,用于和11个函数的执行计划进行对比。

                                                                        QUERY PLAN
      -----------------------------------------------------------------------------------------------------------------------------------------------
       Finalize Aggregate  (cost=0.00..1408.39 rows=1 width=8) (actual time=3972.170..3972.170 rows=1 loops=1)
         Output: corr(y, x)
         ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..1408.39 rows=1 width=8) (actual time=3748.570..3972.148 rows=3 loops=1)
               Output: (PARTIAL corr(y, x))
               ->  Partial Aggregate  (cost=0.00..1408.39 rows=1 width=8) (actual time=3748.602..3748.602 rows=1 loops=1)
                     Output: PARTIAL corr(y, x)
                     ->  Seq Scan on public.agg_test  (cost=0.00..852.66 rows=16666510 width=16) (actual time=0.249..1982.787 rows=16668088 loops=1)
                           Output: x, y
       Optimizer: Pivotal Optimizer (GPORCA)
       Planning Time: 8.248 ms
         (slice0)    Executor memory: 39K bytes.
         (slice1)    Executor memory: 37K bytes avg x 3 workers, 37K bytes max (seg0).
       Memory used:  128000kB
       Execution Time: 3973.034 ms
      (14 rows)
    • 11个可复用聚合函数
      EXPLAIN (analyze,verbose,timing,costs,buffers) SELECT corr(y,x), covar_pop(y,x), covar_samp(y,x), regr_avgx(y,x), regr_avgy(y,x), regr_intercept(y,x), regr_r2(y,x), regr_slope(y,x), regr_sxx(y,x), regr_sxy(y,x), regr_syy(y,x) FROM agg_test;

      执行计划返回信息如下,可以看出执行过程与单个聚合函数的执行过程基本没有变化,且执行耗时也没有明显增长。

                                                                                                                                                            QUERY PLAN
      ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       Finalize Aggregate  (cost=0.00..2005.72 rows=1 width=88) (actual time=4060.219..4060.219 rows=1 loops=1)
         Output: corr(y, x), covar_pop(y, x), covar_samp(y, x), regr_avgx(y, x), regr_avgy(y, x), regr_intercept(y, x), regr_r2(y, x), regr_slope(y, x), regr_sxx(y, x), regr_sxy(y, x), regr_syy(y, x)
         ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..2005.72 rows=1 width=88) (actual time=3682.296..4060.182 rows=3 loops=1)
               Output: (PARTIAL corr(y, x)), (PARTIAL covar_pop(y, x)), (PARTIAL covar_samp(y, x)), (PARTIAL regr_avgx(y, x)), (PARTIAL regr_avgy(y, x)), (PARTIAL regr_intercept(y, x)), (PARTIAL regr_r2(y, x)), (PARTIAL regr_slope(y, x)), (PARTIAL regr_sxx(y, x)), (PARTIAL regr_sxy(y, x)), (PARTIAL regr_syy(y, x))
               ->  Partial Aggregate  (cost=0.00..2005.72 rows=1 width=88) (actual time=3681.908..3681.909 rows=1 loops=1)
                     Output: PARTIAL corr(y, x), PARTIAL covar_pop(y, x), PARTIAL covar_samp(y, x), PARTIAL regr_avgx(y, x), PARTIAL regr_avgy(y, x), PARTIAL regr_intercept(y, x), PARTIAL regr_r2(y, x), PARTIAL regr_slope(y, x), PARTIAL regr_sxx(y, x), PARTIAL regr_sxy(y, x), PARTIAL regr_syy(y, x)
                     ->  Seq Scan on public.agg_test  (cost=0.00..852.66 rows=16666510 width=16) (actual time=0.075..1909.142 rows=16668088 loops=1)
                           Output: x, y
       Optimizer: Pivotal Optimizer (GPORCA)
       Planning Time: 14.660 ms
         (slice0)    Executor memory: 84K bytes.
         (slice1)    Executor memory: 37K bytes avg x 3 workers, 37K bytes max (seg0).
       Memory used:  128000kB
       Execution Time: 4061.236 ms
      (14 rows)
  • 6.0版本
    • 单个聚合函数
      EXPLAIN (analyze,verbose,timing,costs,buffers) SELECT corr(y,x) FROM agg_test;

      执行计划返回信息如下,查看单个聚合函数的执行计划,用于和11个函数的执行计划进行对比。

                                                                          QUERY PLAN
      --------------------------------------------------------------------------------------------------------------------------------------------------
       Aggregate  (cost=692190.42..692190.43 rows=1 width=8) (actual time=10206.860..10206.861 rows=1 loops=1)
         Output: pg_catalog.corr((corr(agg_test.y, agg_test.x)))
         Executor Memory: 8kB  Segments: 1  Max: 8kB (segment -1)
         ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=692190.35..692190.40 rows=1 width=32) (actual time=10206.810..10206.812 rows=3 loops=1)
               Output: (corr(agg_test.y, agg_test.x))
               ->  Aggregate  (cost=692190.35..692190.36 rows=1 width=32) (actual time=10143.501..10143.502 rows=1 loops=1)
                     Output: corr(agg_test.y, agg_test.x)
                     Executor Memory: 24kB  Segments: 3  Max: 8kB (segment 0)
                     ->  Seq Scan on public.agg_test  (cost=0.00..567193.48 rows=16666250 width=16) (actual time=0.103..5710.533 rows=16667260 loops=1)
                           Output: agg_test.y, agg_test.x
       Planning time: 1.394 ms
         (slice0)    Executor memory: 372K bytes.
         (slice1)    Executor memory: 73K bytes avg x 3 workers, 73K bytes max (seg0).
       Memory used:  128000kB
       Optimizer: Postgres query optimizer
       Execution time: 10222.988 ms
      (16 rows)
    • 11个可复用聚合函数
      EXPLAIN (analyze,verbose,timing,costs,buffers) SELECT corr(y,x), covar_pop(y,x), covar_samp(y,x), regr_avgx(y,x), regr_avgy(y,x), regr_intercept(y,x), regr_r2(y,x), regr_slope(y,x), regr_sxx(y,x), regr_sxy(y,x), regr_syy(y,x) FROM agg_test;

      执行计划返回信息如下,可以看出执行过程与单个函数的执行过程有明显差异,且执行耗时明显增加。

      
                                                            QUERY PLAN
      
      ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      -----------------------------------------------------------------------------------------------------------------------
       Aggregate  (cost=1942159.19..1942159.20 rows=1 width=88) (actual time=39555.642..39555.643 rows=1 loops=1)
         Output: pg_catalog.corr((corr(agg_test.y, agg_test.x))), pg_catalog.covar_pop((covar_pop(agg_test.y, agg_test.x))), pg_catalog.covar_samp((covar_samp(agg_test.y, agg_test.x))), pg_catalog.regr_avgx((regr_avgx(agg_test.y, agg_test.x))), pg_catalog.regr_avgy((regr_a
      vgy(agg_test.y, agg_test.x))), pg_catalog.regr_intercept((regr_intercept(agg_test.y, agg_test.x))), pg_catalog.regr_r2((regr_r2(agg_test.y, agg_test.x))), pg_catalog.regr_slope((regr_slope(agg_test.y, agg_test.x))), pg_catalog.regr_sxx((regr_sxx(agg_test.y, agg_test.
      x))), pg_catalog.regr_sxy((regr_sxy(agg_test.y, agg_test.x))), pg_catalog.regr_syy((regr_syy(agg_test.y, agg_test.x)))
         Executor Memory: 8kB  Segments: 1  Max: 8kB (segment -1)
         ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=1942159.08..1942159.13 rows=1 width=352) (actual time=39555.594..39555.596 rows=3 loops=1)
               Output: (corr(agg_test.y, agg_test.x)), (covar_pop(agg_test.y, agg_test.x)), (covar_samp(agg_test.y, agg_test.x)), (regr_avgx(agg_test.y, agg_test.x)), (regr_avgy(agg_test.y, agg_test.x)), (regr_intercept(agg_test.y, agg_test.x)), (regr_r2(agg_test.y, agg_te
      st.x)), (regr_slope(agg_test.y, agg_test.x)), (regr_sxx(agg_test.y, agg_test.x)), (regr_sxy(agg_test.y, agg_test.x)), (regr_syy(agg_test.y, agg_test.x))
               ->  Aggregate  (cost=1942159.08..1942159.09 rows=1 width=352) (actual time=39253.556..39253.556 rows=1 loops=1)
                     Output: corr(agg_test.y, agg_test.x), covar_pop(agg_test.y, agg_test.x), covar_samp(agg_test.y, agg_test.x), regr_avgx(agg_test.y, agg_test.x), regr_avgy(agg_test.y, agg_test.x), regr_intercept(agg_test.y, agg_test.x), regr_r2(agg_test.y, agg_test.x),
      regr_slope(agg_test.y, agg_test.x), regr_sxx(agg_test.y, agg_test.x), regr_sxy(agg_test.y, agg_test.x), regr_syy(agg_test.y, agg_test.x)
                     Executor Memory: 24kB  Segments: 3  Max: 8kB (segment 0)
                     ->  Seq Scan on public.agg_test  (cost=0.00..567193.48 rows=16666250 width=16) (actual time=0.109..6214.809 rows=16667260 loops=1)
                           Output: agg_test.y, agg_test.x
       Planning time: 1.427 ms
         (slice0)    Executor memory: 781K bytes.
         (slice1)    Executor memory: 156K bytes avg x 3 workers, 156K bytes max (seg0).
       Memory used:  128000kB
       Optimizer: Postgres query optimizer
       Execution time: 39557.902 ms
      (16 rows)

通过以上测试结果可以看出,在7.0版本,单个聚合函数耗时与11个聚合函数耗时基本一致,聚合函数复用会明显缩短计算时间。在6.0版本,单个聚合函数耗时与11个聚合函数耗时会有3~4倍的差距。

聚合函数复用列表

rank列一致的聚合函数可以进行复用。

 rank |           aggfnoid            |                aggtransfn                 |     agginitval
------+-------------------------------+-------------------------------------------+---------------------
    1 | pg_catalog.sum                | float4pl                                  |
    2 | pg_catalog.var_samp           | float4_accum                              | {0,0,0}
    2 | pg_catalog.stddev_pop         | float4_accum                              | {0,0,0}
    2 | pg_catalog.avg                | float4_accum                              | {0,0,0}
    2 | pg_catalog.var_pop            | float4_accum                              | {0,0,0}
    2 | pg_catalog.stddev             | float4_accum                              | {0,0,0}
    2 | pg_catalog.variance           | float4_accum                              | {0,0,0}
    2 | pg_catalog.stddev_samp        | float4_accum                              | {0,0,0}
    9 | pg_catalog.max                | float4larger                              |
   10 | pg_catalog.min                | float4smaller                             |
   11 | pg_catalog.sum                | float8pl                                  |
   12 | pg_catalog.stddev_pop         | float8_accum                              | {0,0,0}
   12 | pg_catalog.stddev_samp        | float8_accum                              | {0,0,0}
   12 | pg_catalog.avg                | float8_accum                              | {0,0,0}
   12 | pg_catalog.stddev             | float8_accum                              | {0,0,0}
   12 | pg_catalog.var_pop            | float8_accum                              | {0,0,0}
   12 | pg_catalog.var_samp           | float8_accum                              | {0,0,0}
   12 | pg_catalog.variance           | float8_accum                              | {0,0,0}
   19 | pg_catalog.max                | float8larger                              |
   20 | pg_catalog.min                | float8smaller                             |
   21 | pg_catalog.max                | text_larger                               |
   22 | pg_catalog.min                | text_smaller                              |
   23 | pg_catalog.max                | array_larger                              |
   24 | pg_catalog.min                | array_smaller                             |
   25 | pg_catalog.max                | int4larger                                |
   26 | pg_catalog.min                | int4smaller                               |
   27 | pg_catalog.max                | int2larger                                |
   28 | pg_catalog.min                | int2smaller                               |
   29 | pg_catalog.sum                | cash_pl                                   |
   30 | pg_catalog.max                | cashlarger                                |
   31 | pg_catalog.min                | cashsmaller                               |
   32 | pg_catalog.max                | bpchar_larger                             |
   33 | pg_catalog.min                | bpchar_smaller                            |
   34 | pg_catalog.max                | date_larger                               |
   35 | pg_catalog.min                | date_smaller                              |
   36 | pg_catalog.sum                | interval_pl                               |
   37 | pg_catalog.min                | timestamptz_smaller                       |
   38 | pg_catalog.max                | timestamptz_larger                        |
   39 | pg_catalog.min                | interval_smaller                          |
   40 | pg_catalog.max                | interval_larger                           |
   41 | pg_catalog.count              | int8inc                                   | 0
   42 | pg_catalog.max                | int8larger                                |
   43 | pg_catalog.min                | int8smaller                               |
   44 | pg_catalog.max                | time_larger                               |
   45 | pg_catalog.min                | time_smaller                              |
   46 | pg_catalog.max                | timetz_larger                             |
   47 | pg_catalog.min                | timetz_smaller                            |
   48 | pg_catalog.bit_and            | bitand                                    |
   49 | pg_catalog.bit_or             | bitor                                     |
   50 | pg_catalog.min                | numeric_smaller                           |
   51 | pg_catalog.max                | numeric_larger                            |
   52 | pg_catalog.stddev_pop         | numeric_accum                             |
   52 | pg_catalog.var_pop            | numeric_accum                             |
   52 | pg_catalog.var_samp           | numeric_accum                             |
   52 | pg_catalog.variance           | numeric_accum                             |
   52 | pg_catalog.stddev_samp        | numeric_accum                             |
   52 | pg_catalog.stddev             | numeric_accum                             |
   58 | pg_catalog.stddev             | int2_accum                                |
   58 | pg_catalog.stddev_pop         | int2_accum                                |
   58 | pg_catalog.stddev_samp        | int2_accum                                |
   58 | pg_catalog.variance           | int2_accum                                |
   58 | pg_catalog.var_pop            | int2_accum                                |
   58 | pg_catalog.var_samp           | int2_accum                                |
   64 | pg_catalog.stddev_pop         | int4_accum                                |
   64 | pg_catalog.var_pop            | int4_accum                                |
   64 | pg_catalog.stddev             | int4_accum                                |
   64 | pg_catalog.var_samp           | int4_accum                                |
   64 | pg_catalog.stddev_samp        | int4_accum                                |
   64 | pg_catalog.variance           | int4_accum                                |
   70 | pg_catalog.stddev_samp        | int8_accum                                |
   70 | pg_catalog.stddev_pop         | int8_accum                                |
   70 | pg_catalog.var_pop            | int8_accum                                |
   70 | pg_catalog.variance           | int8_accum                                |
   70 | pg_catalog.var_samp           | int8_accum                                |
   70 | pg_catalog.stddev             | int8_accum                                |
   76 | pg_catalog.sum                | int2_sum                                  |
   77 | pg_catalog.sum                | int4_sum                                  |
   78 | pg_catalog.avg                | interval_accum                            | {0 second,0 second}
   79 | pg_catalog.bit_and            | int2and                                   |
   80 | pg_catalog.bit_or             | int2or                                    |
   81 | pg_catalog.bit_and            | int4and                                   |
   82 | pg_catalog.bit_or             | int4or                                    |
   83 | pg_catalog.bit_and            | int8and                                   |
   84 | pg_catalog.bit_or             | int8or                                    |
   85 | pg_catalog.avg                | int2_avg_accum                            | {0,0}
   86 | pg_catalog.avg                | int4_avg_accum                            | {0,0}
   87 | pg_catalog.max                | oidlarger                                 |
   88 | pg_catalog.min                | oidsmaller                                |
   89 | pg_catalog.min                | timestamp_smaller                         |
   90 | pg_catalog.max                | timestamp_larger                          |
   91 | pg_catalog.array_agg          | array_agg_transfn                         |
   91 | pg_catalog.gp_array_agg       | array_agg_transfn                         |
   93 | every                         | booland_statefunc                         |
   93 | bool_and                      | booland_statefunc                         |
   95 | bool_or                       | boolor_statefunc                          |
   96 | pg_catalog.sum                | int8_avg_accum                            |
   96 | pg_catalog.avg                | int8_avg_accum                            |
   98 | pg_catalog.max                | tidlarger                                 |
   99 | pg_catalog.min                | tidsmaller                                |
  100 | pg_catalog.count              | int8inc_any                               | 0
  101 | regr_count                    | int8inc_float8_float8                     | 0
  102 | regr_sxy                      | float8_regr_accum                         | {0,0,0,0,0,0}
  102 | regr_sxx                      | float8_regr_accum                         | {0,0,0,0,0,0}
  102 | regr_syy                      | float8_regr_accum                         | {0,0,0,0,0,0}
  102 | regr_avgx                     | float8_regr_accum                         | {0,0,0,0,0,0}
  102 | regr_avgy                     | float8_regr_accum                         | {0,0,0,0,0,0}
  102 | regr_r2                       | float8_regr_accum                         | {0,0,0,0,0,0}
  102 | regr_slope                    | float8_regr_accum                         | {0,0,0,0,0,0}
  102 | regr_intercept                | float8_regr_accum                         | {0,0,0,0,0,0}
  102 | covar_pop                     | float8_regr_accum                         | {0,0,0,0,0,0}
  102 | covar_samp                    | float8_regr_accum                         | {0,0,0,0,0,0}
  102 | corr                          | float8_regr_accum                         | {0,0,0,0,0,0}
  113 | pg_catalog.sum                | numeric_avg_accum                         |
  113 | pg_catalog.avg                | numeric_avg_accum                         |
  115 | xmlagg                        | xmlconcat2                                |
  116 | json_agg                      | json_agg_transfn                          |
  117 | json_object_agg               | json_object_agg_transfn                   |
  118 | jsonb_agg                     | jsonb_agg_transfn                         |
  119 | jsonb_object_agg              | jsonb_object_agg_transfn                  |
  120 | pg_catalog.min                | enum_smaller                              |
  121 | pg_catalog.max                | enum_larger                               |
  122 | pg_catalog.string_agg         | string_agg_transfn                        |
  123 | pg_catalog.string_agg         | bytea_string_agg_transfn                  |
  124 | pg_catalog.max                | network_larger                            |
  125 | pg_catalog.min                | network_smaller                           |
  126 | pg_catalog."median"           | ordered_set_transition                    |
  126 | pg_catalog.percentile_cont    | ordered_set_transition                    |
  126 | pg_catalog."median"           | ordered_set_transition                    |
  126 | pg_catalog.percentile_cont    | ordered_set_transition                    |
  126 | pg_catalog.percentile_cont    | ordered_set_transition                    |
  126 | pg_catalog.percentile_cont    | ordered_set_transition                    |
  126 | pg_catalog."median"           | ordered_set_transition                    |
  126 | pg_catalog.percentile_disc    | ordered_set_transition                    |
  126 | pg_catalog.percentile_cont    | ordered_set_transition                    |
  126 | pg_catalog.percentile_cont    | ordered_set_transition                    |
  126 | pg_catalog.percentile_disc    | ordered_set_transition                    |
  126 | pg_catalog.percentile_cont    | ordered_set_transition                    |
  126 | pg_catalog.percentile_cont    | ordered_set_transition                    |
  126 | mode                          | ordered_set_transition                    |
  126 | pg_catalog."median"           | ordered_set_transition                    |
  141 | pg_catalog.cume_dist          | ordered_set_transition_multi              |
  141 | pg_catalog.rank               | ordered_set_transition_multi              |
  141 | pg_catalog.percent_rank       | ordered_set_transition_multi              |
  141 | pg_catalog.dense_rank         | ordered_set_transition_multi              |
  145 | pg_catalog.array_agg          | array_agg_array_transfn                   |
  145 | pg_catalog.gp_array_agg       | array_agg_array_transfn                   |
  147 | array_sum                     | array_add                                 | {}
  148 | pg_catalog.sum                | int2_matrix_accum                         |
  149 | pg_catalog.sum                | int4_matrix_accum                         |
  150 | pg_catalog.sum                | int8_matrix_accum                         |
  151 | pg_catalog.sum                | float8_matrix_accum                       |
  152 | pg_catalog.pivot_sum          | int4_pivot_accum                          |
  153 | pg_catalog.pivot_sum          | int8_pivot_accum                          |
  154 | pg_catalog.pivot_sum          | float8_pivot_accum                        |
  155 | gp_hyperloglog_accum          | gp_hyperloglog_add_item_agg_default       |
  156 | pg_catalog.gp_percentile_cont | gp_percentile_cont_float8_transition      |
  157 | pg_catalog.gp_percentile_cont | gp_percentile_cont_interval_transition    |
  158 | pg_catalog.gp_percentile_cont | gp_percentile_cont_timestamp_transition   |
  159 | pg_catalog.gp_percentile_cont | gp_percentile_cont_timestamptz_transition |
  160 | gp_percentile_disc            | gp_percentile_disc_transition             |