在平时的进行sql执行时,如果遇到group by聚集时间很长的情况,可以尝试调整下面的参数:
_gbase_parallel_aggr_mode
(1)参数值范围:0,1,2,3
(2)值的含义:0为自动评估(默认值);
1 不同值很多,group by buffer足够时,采用原有hash切分方式;
2 不同值很少,数据重复率高,rr划分后聚集;
3 不同值很多,group by buffer不够,采用one-pass hash切分
(3)举例说明
自动评估为1:
OP buffer big enough, do not use one-pass algorithm
begin distinct ratio sampling...
finish distinct ratio sampling: total 8123631 rows; sampled 198884 rows, 100 DCs. NDV: 198884, 100%.
Begin parallel splitting for aggregation (split_type = 1)
( 0)split by hash already(131072 rows).
...
divide to 72 blocks(using hash[parallel]): 113858, 112557, 113230, 112865..., 112607.
End parallel splitting for aggregation (split_type = 1)(time used: 0.307s)
自动评估为2:
finish distinct ratio sampling: total 8123631 rows; sampled 198884 rows, 100 DCs. NDV: 5001, 2%. (time used: 0.207s)
Begin parallel splitting for aggregation (split_type = 2)
divide to 72 blocks(using round-robin): 131072, 131072, 131072, 65536, 131072, 131072, 131072, 65536, 131072,
131072, 65536, 131072, 131072, 131072, 65536, 131072, 131072, 65536, 131072, 131072, 131072, 65536,
131072, 131072, 131072, 65536, 131072, 131072, 65536, 131072, 131072, 131072, 65536, 131072, 131072,
65536, 131072, 131072, 131072, 65536, 131072, 131072, 131072, 65536, 131072, 131072, 65536, 131072,
131072, 131072, 65536, 131072, 131072, 65536, 131072, 131072, 131072, 65536, 131072, 131072, 131072,
65536, 131072, 131072, 65536, 131072, 131072, 131072, 65536, 131072, 131072, 62703.
End parallel splitting for aggregation (split_type = 2)(time used: 0.001s)
( 1)BEGIN Aggregation(131072 rows)
自动评估为3:
BEGIN Parallel Aggregation(8123631 rows)
op buffer size: 21474836, tuple width: 28. op buffer can hold 766958 rows-----buffer放不下
total tuples(per thread): 112828, tuples in op buffer(per thread): 10652
op buffer size: 21474836, tuple width: 28. op buffer can hold 766958 rows
evaluated mat_buf_size = 96, max mat_buf_size = 96
begin distinct ratio sampling...
finish distinct ratio sampling: total 8123631 rows; sampled 198884 rows, 100 DCs. NDV: 198884, 100%.
Begin parallel splitting for aggregation (split_type = 3)




