聚合函数复用功能是指如果使用可复用聚合函数对相同列进行计算,AnalyticDB PostgreSQL版会自动复用第一次的聚合计算过程,并使用各列对应的输出函数输出结果。通过该功能可以大幅减少可复用聚合函数的计算耗时。
说明 聚合函数必须是计算相同列的场景下才能复用。
如何判断聚合函数是否可以复用
聚合函数分为三个步骤,分别为初始化值、聚合计算和结果计算,只要初始化值和聚合计算一致的聚合函数就可以进行复用。您可以使用以下语句查询可复用的聚合函数。
SELECT rank() over (ORDER BY aggtransfn,agginitval), aggfnoid,aggtransfn,agginitval FROM pg_aggregate;
返回结果,请参见聚合函数复用列表。
示例
准备测试表以及测试数据。
CREATE TABLE agg_test(x float8, y float8);
INSERT INTO agg_test SELECT 10000*random(), 10000*random() FROM generate_series(1,50000000);
以下内容将分别展示7.0版本和6.0版本使用聚合函数复用的效果:
- 7.0版本
- 单个聚合函数
EXPLAIN (analyze,verbose,timing,costs,buffers) SELECT corr(y,x) FROM agg_test;
执行计划返回信息如下,查看单个聚合函数的执行计划,用于和11个函数的执行计划进行对比。
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=0.00..1408.39 rows=1 width=8) (actual time=3972.170..3972.170 rows=1 loops=1) Output: corr(y, x) -> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..1408.39 rows=1 width=8) (actual time=3748.570..3972.148 rows=3 loops=1) Output: (PARTIAL corr(y, x)) -> Partial Aggregate (cost=0.00..1408.39 rows=1 width=8) (actual time=3748.602..3748.602 rows=1 loops=1) Output: PARTIAL corr(y, x) -> Seq Scan on public.agg_test (cost=0.00..852.66 rows=16666510 width=16) (actual time=0.249..1982.787 rows=16668088 loops=1) Output: x, y Optimizer: Pivotal Optimizer (GPORCA) Planning Time: 8.248 ms (slice0) Executor memory: 39K bytes. (slice1) Executor memory: 37K bytes avg x 3 workers, 37K bytes max (seg0). Memory used: 128000kB Execution Time: 3973.034 ms (14 rows)
- 11个可复用聚合函数
EXPLAIN (analyze,verbose,timing,costs,buffers) SELECT corr(y,x), covar_pop(y,x), covar_samp(y,x), regr_avgx(y,x), regr_avgy(y,x), regr_intercept(y,x), regr_r2(y,x), regr_slope(y,x), regr_sxx(y,x), regr_sxy(y,x), regr_syy(y,x) FROM agg_test;
执行计划返回信息如下,可以看出执行过程与单个聚合函数的执行过程基本没有变化,且执行耗时也没有明显增长。
QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=0.00..2005.72 rows=1 width=88) (actual time=4060.219..4060.219 rows=1 loops=1) Output: corr(y, x), covar_pop(y, x), covar_samp(y, x), regr_avgx(y, x), regr_avgy(y, x), regr_intercept(y, x), regr_r2(y, x), regr_slope(y, x), regr_sxx(y, x), regr_sxy(y, x), regr_syy(y, x) -> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..2005.72 rows=1 width=88) (actual time=3682.296..4060.182 rows=3 loops=1) Output: (PARTIAL corr(y, x)), (PARTIAL covar_pop(y, x)), (PARTIAL covar_samp(y, x)), (PARTIAL regr_avgx(y, x)), (PARTIAL regr_avgy(y, x)), (PARTIAL regr_intercept(y, x)), (PARTIAL regr_r2(y, x)), (PARTIAL regr_slope(y, x)), (PARTIAL regr_sxx(y, x)), (PARTIAL regr_sxy(y, x)), (PARTIAL regr_syy(y, x)) -> Partial Aggregate (cost=0.00..2005.72 rows=1 width=88) (actual time=3681.908..3681.909 rows=1 loops=1) Output: PARTIAL corr(y, x), PARTIAL covar_pop(y, x), PARTIAL covar_samp(y, x), PARTIAL regr_avgx(y, x), PARTIAL regr_avgy(y, x), PARTIAL regr_intercept(y, x), PARTIAL regr_r2(y, x), PARTIAL regr_slope(y, x), PARTIAL regr_sxx(y, x), PARTIAL regr_sxy(y, x), PARTIAL regr_syy(y, x) -> Seq Scan on public.agg_test (cost=0.00..852.66 rows=16666510 width=16) (actual time=0.075..1909.142 rows=16668088 loops=1) Output: x, y Optimizer: Pivotal Optimizer (GPORCA) Planning Time: 14.660 ms (slice0) Executor memory: 84K bytes. (slice1) Executor memory: 37K bytes avg x 3 workers, 37K bytes max (seg0). Memory used: 128000kB Execution Time: 4061.236 ms (14 rows)
- 单个聚合函数
- 6.0版本
- 单个聚合函数
EXPLAIN (analyze,verbose,timing,costs,buffers) SELECT corr(y,x) FROM agg_test;
执行计划返回信息如下,查看单个聚合函数的执行计划,用于和11个函数的执行计划进行对比。
QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=692190.42..692190.43 rows=1 width=8) (actual time=10206.860..10206.861 rows=1 loops=1) Output: pg_catalog.corr((corr(agg_test.y, agg_test.x))) Executor Memory: 8kB Segments: 1 Max: 8kB (segment -1) -> Gather Motion 3:1 (slice1; segments: 3) (cost=692190.35..692190.40 rows=1 width=32) (actual time=10206.810..10206.812 rows=3 loops=1) Output: (corr(agg_test.y, agg_test.x)) -> Aggregate (cost=692190.35..692190.36 rows=1 width=32) (actual time=10143.501..10143.502 rows=1 loops=1) Output: corr(agg_test.y, agg_test.x) Executor Memory: 24kB Segments: 3 Max: 8kB (segment 0) -> Seq Scan on public.agg_test (cost=0.00..567193.48 rows=16666250 width=16) (actual time=0.103..5710.533 rows=16667260 loops=1) Output: agg_test.y, agg_test.x Planning time: 1.394 ms (slice0) Executor memory: 372K bytes. (slice1) Executor memory: 73K bytes avg x 3 workers, 73K bytes max (seg0). Memory used: 128000kB Optimizer: Postgres query optimizer Execution time: 10222.988 ms (16 rows)
- 11个可复用聚合函数
EXPLAIN (analyze,verbose,timing,costs,buffers) SELECT corr(y,x), covar_pop(y,x), covar_samp(y,x), regr_avgx(y,x), regr_avgy(y,x), regr_intercept(y,x), regr_r2(y,x), regr_slope(y,x), regr_sxx(y,x), regr_sxy(y,x), regr_syy(y,x) FROM agg_test;
执行计划返回信息如下,可以看出执行过程与单个函数的执行过程有明显差异,且执行耗时明显增加。
QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------- Aggregate (cost=1942159.19..1942159.20 rows=1 width=88) (actual time=39555.642..39555.643 rows=1 loops=1) Output: pg_catalog.corr((corr(agg_test.y, agg_test.x))), pg_catalog.covar_pop((covar_pop(agg_test.y, agg_test.x))), pg_catalog.covar_samp((covar_samp(agg_test.y, agg_test.x))), pg_catalog.regr_avgx((regr_avgx(agg_test.y, agg_test.x))), pg_catalog.regr_avgy((regr_a vgy(agg_test.y, agg_test.x))), pg_catalog.regr_intercept((regr_intercept(agg_test.y, agg_test.x))), pg_catalog.regr_r2((regr_r2(agg_test.y, agg_test.x))), pg_catalog.regr_slope((regr_slope(agg_test.y, agg_test.x))), pg_catalog.regr_sxx((regr_sxx(agg_test.y, agg_test. x))), pg_catalog.regr_sxy((regr_sxy(agg_test.y, agg_test.x))), pg_catalog.regr_syy((regr_syy(agg_test.y, agg_test.x))) Executor Memory: 8kB Segments: 1 Max: 8kB (segment -1) -> Gather Motion 3:1 (slice1; segments: 3) (cost=1942159.08..1942159.13 rows=1 width=352) (actual time=39555.594..39555.596 rows=3 loops=1) Output: (corr(agg_test.y, agg_test.x)), (covar_pop(agg_test.y, agg_test.x)), (covar_samp(agg_test.y, agg_test.x)), (regr_avgx(agg_test.y, agg_test.x)), (regr_avgy(agg_test.y, agg_test.x)), (regr_intercept(agg_test.y, agg_test.x)), (regr_r2(agg_test.y, agg_te st.x)), (regr_slope(agg_test.y, agg_test.x)), (regr_sxx(agg_test.y, agg_test.x)), (regr_sxy(agg_test.y, agg_test.x)), (regr_syy(agg_test.y, agg_test.x)) -> Aggregate (cost=1942159.08..1942159.09 rows=1 width=352) (actual time=39253.556..39253.556 rows=1 loops=1) Output: corr(agg_test.y, agg_test.x), covar_pop(agg_test.y, agg_test.x), covar_samp(agg_test.y, agg_test.x), regr_avgx(agg_test.y, agg_test.x), regr_avgy(agg_test.y, agg_test.x), regr_intercept(agg_test.y, agg_test.x), regr_r2(agg_test.y, agg_test.x), regr_slope(agg_test.y, agg_test.x), regr_sxx(agg_test.y, agg_test.x), regr_sxy(agg_test.y, agg_test.x), regr_syy(agg_test.y, agg_test.x) Executor Memory: 24kB Segments: 3 Max: 8kB (segment 0) -> Seq Scan on public.agg_test (cost=0.00..567193.48 rows=16666250 width=16) (actual time=0.109..6214.809 rows=16667260 loops=1) Output: agg_test.y, agg_test.x Planning time: 1.427 ms (slice0) Executor memory: 781K bytes. (slice1) Executor memory: 156K bytes avg x 3 workers, 156K bytes max (seg0). Memory used: 128000kB Optimizer: Postgres query optimizer Execution time: 39557.902 ms (16 rows)
- 单个聚合函数
通过以上测试结果可以看出,在7.0版本,单个聚合函数耗时与11个聚合函数耗时基本一致,聚合函数复用会明显缩短计算时间。在6.0版本,单个聚合函数耗时与11个聚合函数耗时会有3~4倍的差距。
聚合函数复用列表
rank列一致的聚合函数可以进行复用。
rank | aggfnoid | aggtransfn | agginitval
------+-------------------------------+-------------------------------------------+---------------------
1 | pg_catalog.sum | float4pl |
2 | pg_catalog.var_samp | float4_accum | {0,0,0}
2 | pg_catalog.stddev_pop | float4_accum | {0,0,0}
2 | pg_catalog.avg | float4_accum | {0,0,0}
2 | pg_catalog.var_pop | float4_accum | {0,0,0}
2 | pg_catalog.stddev | float4_accum | {0,0,0}
2 | pg_catalog.variance | float4_accum | {0,0,0}
2 | pg_catalog.stddev_samp | float4_accum | {0,0,0}
9 | pg_catalog.max | float4larger |
10 | pg_catalog.min | float4smaller |
11 | pg_catalog.sum | float8pl |
12 | pg_catalog.stddev_pop | float8_accum | {0,0,0}
12 | pg_catalog.stddev_samp | float8_accum | {0,0,0}
12 | pg_catalog.avg | float8_accum | {0,0,0}
12 | pg_catalog.stddev | float8_accum | {0,0,0}
12 | pg_catalog.var_pop | float8_accum | {0,0,0}
12 | pg_catalog.var_samp | float8_accum | {0,0,0}
12 | pg_catalog.variance | float8_accum | {0,0,0}
19 | pg_catalog.max | float8larger |
20 | pg_catalog.min | float8smaller |
21 | pg_catalog.max | text_larger |
22 | pg_catalog.min | text_smaller |
23 | pg_catalog.max | array_larger |
24 | pg_catalog.min | array_smaller |
25 | pg_catalog.max | int4larger |
26 | pg_catalog.min | int4smaller |
27 | pg_catalog.max | int2larger |
28 | pg_catalog.min | int2smaller |
29 | pg_catalog.sum | cash_pl |
30 | pg_catalog.max | cashlarger |
31 | pg_catalog.min | cashsmaller |
32 | pg_catalog.max | bpchar_larger |
33 | pg_catalog.min | bpchar_smaller |
34 | pg_catalog.max | date_larger |
35 | pg_catalog.min | date_smaller |
36 | pg_catalog.sum | interval_pl |
37 | pg_catalog.min | timestamptz_smaller |
38 | pg_catalog.max | timestamptz_larger |
39 | pg_catalog.min | interval_smaller |
40 | pg_catalog.max | interval_larger |
41 | pg_catalog.count | int8inc | 0
42 | pg_catalog.max | int8larger |
43 | pg_catalog.min | int8smaller |
44 | pg_catalog.max | time_larger |
45 | pg_catalog.min | time_smaller |
46 | pg_catalog.max | timetz_larger |
47 | pg_catalog.min | timetz_smaller |
48 | pg_catalog.bit_and | bitand |
49 | pg_catalog.bit_or | bitor |
50 | pg_catalog.min | numeric_smaller |
51 | pg_catalog.max | numeric_larger |
52 | pg_catalog.stddev_pop | numeric_accum |
52 | pg_catalog.var_pop | numeric_accum |
52 | pg_catalog.var_samp | numeric_accum |
52 | pg_catalog.variance | numeric_accum |
52 | pg_catalog.stddev_samp | numeric_accum |
52 | pg_catalog.stddev | numeric_accum |
58 | pg_catalog.stddev | int2_accum |
58 | pg_catalog.stddev_pop | int2_accum |
58 | pg_catalog.stddev_samp | int2_accum |
58 | pg_catalog.variance | int2_accum |
58 | pg_catalog.var_pop | int2_accum |
58 | pg_catalog.var_samp | int2_accum |
64 | pg_catalog.stddev_pop | int4_accum |
64 | pg_catalog.var_pop | int4_accum |
64 | pg_catalog.stddev | int4_accum |
64 | pg_catalog.var_samp | int4_accum |
64 | pg_catalog.stddev_samp | int4_accum |
64 | pg_catalog.variance | int4_accum |
70 | pg_catalog.stddev_samp | int8_accum |
70 | pg_catalog.stddev_pop | int8_accum |
70 | pg_catalog.var_pop | int8_accum |
70 | pg_catalog.variance | int8_accum |
70 | pg_catalog.var_samp | int8_accum |
70 | pg_catalog.stddev | int8_accum |
76 | pg_catalog.sum | int2_sum |
77 | pg_catalog.sum | int4_sum |
78 | pg_catalog.avg | interval_accum | {0 second,0 second}
79 | pg_catalog.bit_and | int2and |
80 | pg_catalog.bit_or | int2or |
81 | pg_catalog.bit_and | int4and |
82 | pg_catalog.bit_or | int4or |
83 | pg_catalog.bit_and | int8and |
84 | pg_catalog.bit_or | int8or |
85 | pg_catalog.avg | int2_avg_accum | {0,0}
86 | pg_catalog.avg | int4_avg_accum | {0,0}
87 | pg_catalog.max | oidlarger |
88 | pg_catalog.min | oidsmaller |
89 | pg_catalog.min | timestamp_smaller |
90 | pg_catalog.max | timestamp_larger |
91 | pg_catalog.array_agg | array_agg_transfn |
91 | pg_catalog.gp_array_agg | array_agg_transfn |
93 | every | booland_statefunc |
93 | bool_and | booland_statefunc |
95 | bool_or | boolor_statefunc |
96 | pg_catalog.sum | int8_avg_accum |
96 | pg_catalog.avg | int8_avg_accum |
98 | pg_catalog.max | tidlarger |
99 | pg_catalog.min | tidsmaller |
100 | pg_catalog.count | int8inc_any | 0
101 | regr_count | int8inc_float8_float8 | 0
102 | regr_sxy | float8_regr_accum | {0,0,0,0,0,0}
102 | regr_sxx | float8_regr_accum | {0,0,0,0,0,0}
102 | regr_syy | float8_regr_accum | {0,0,0,0,0,0}
102 | regr_avgx | float8_regr_accum | {0,0,0,0,0,0}
102 | regr_avgy | float8_regr_accum | {0,0,0,0,0,0}
102 | regr_r2 | float8_regr_accum | {0,0,0,0,0,0}
102 | regr_slope | float8_regr_accum | {0,0,0,0,0,0}
102 | regr_intercept | float8_regr_accum | {0,0,0,0,0,0}
102 | covar_pop | float8_regr_accum | {0,0,0,0,0,0}
102 | covar_samp | float8_regr_accum | {0,0,0,0,0,0}
102 | corr | float8_regr_accum | {0,0,0,0,0,0}
113 | pg_catalog.sum | numeric_avg_accum |
113 | pg_catalog.avg | numeric_avg_accum |
115 | xmlagg | xmlconcat2 |
116 | json_agg | json_agg_transfn |
117 | json_object_agg | json_object_agg_transfn |
118 | jsonb_agg | jsonb_agg_transfn |
119 | jsonb_object_agg | jsonb_object_agg_transfn |
120 | pg_catalog.min | enum_smaller |
121 | pg_catalog.max | enum_larger |
122 | pg_catalog.string_agg | string_agg_transfn |
123 | pg_catalog.string_agg | bytea_string_agg_transfn |
124 | pg_catalog.max | network_larger |
125 | pg_catalog.min | network_smaller |
126 | pg_catalog."median" | ordered_set_transition |
126 | pg_catalog.percentile_cont | ordered_set_transition |
126 | pg_catalog."median" | ordered_set_transition |
126 | pg_catalog.percentile_cont | ordered_set_transition |
126 | pg_catalog.percentile_cont | ordered_set_transition |
126 | pg_catalog.percentile_cont | ordered_set_transition |
126 | pg_catalog."median" | ordered_set_transition |
126 | pg_catalog.percentile_disc | ordered_set_transition |
126 | pg_catalog.percentile_cont | ordered_set_transition |
126 | pg_catalog.percentile_cont | ordered_set_transition |
126 | pg_catalog.percentile_disc | ordered_set_transition |
126 | pg_catalog.percentile_cont | ordered_set_transition |
126 | pg_catalog.percentile_cont | ordered_set_transition |
126 | mode | ordered_set_transition |
126 | pg_catalog."median" | ordered_set_transition |
141 | pg_catalog.cume_dist | ordered_set_transition_multi |
141 | pg_catalog.rank | ordered_set_transition_multi |
141 | pg_catalog.percent_rank | ordered_set_transition_multi |
141 | pg_catalog.dense_rank | ordered_set_transition_multi |
145 | pg_catalog.array_agg | array_agg_array_transfn |
145 | pg_catalog.gp_array_agg | array_agg_array_transfn |
147 | array_sum | array_add | {}
148 | pg_catalog.sum | int2_matrix_accum |
149 | pg_catalog.sum | int4_matrix_accum |
150 | pg_catalog.sum | int8_matrix_accum |
151 | pg_catalog.sum | float8_matrix_accum |
152 | pg_catalog.pivot_sum | int4_pivot_accum |
153 | pg_catalog.pivot_sum | int8_pivot_accum |
154 | pg_catalog.pivot_sum | float8_pivot_accum |
155 | gp_hyperloglog_accum | gp_hyperloglog_add_item_agg_default |
156 | pg_catalog.gp_percentile_cont | gp_percentile_cont_float8_transition |
157 | pg_catalog.gp_percentile_cont | gp_percentile_cont_interval_transition |
158 | pg_catalog.gp_percentile_cont | gp_percentile_cont_timestamp_transition |
159 | pg_catalog.gp_percentile_cont | gp_percentile_cont_timestamptz_transition |
160 | gp_percentile_disc | gp_percentile_disc_transition |