在多列排序时,排序列的顺序对于排序操作的性能影响很大。通常,将NDV值大的放在前面性能更佳。以下通过例子展示。
一、构造测试数据
create table t1(id1 text, id2 text);
insert into t1 select mod(generate_series(10000000,15000000),10000)||'0000',generate_series(10000000,15000000);
insert into t1 select (mod(generate_series(10000000,15000000),10000)+1)||'0000',generate_series(10000000,15000000);
test=# select count(distinct id1),count(distinct id2),count(*) from t1;
count | count | count
-------+---------+----------
10001 | 5000001 | 10000002二、distinct 操作
1、当内存足够大时
可以看到,当内存足够大时,优化器采用hashaggr 算法,而非排序,所以性能上没有差异。
test=# explain (analyse ,buffers ) select /*+ set(work_mem 1024MB) */ distinct id1 ,id2 from t1 ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=214003.11..266948.84 rows=5294573 width=17) (actual time=6679.322..10131.612 rows=10000002 loops=1)
Group Key: id1, id2
Buffers: shared hit=64 read=63937
-> Seq Scan on t1 (cost=0.00..164002.41 rows=10000141 width=17) (actual time=0.092..1071.132 rows=10000002 loops=1)
Buffers: shared hit=64 read=63937
Planning Time: 0.074 ms
Execution Time: 10530.949 ms
(7 行记录)
test=# explain (analyse ,buffers ) select /*+ set(work_mem 1024MB) */ distinct id2,id1 from t1 ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=214003.11..266948.84 rows=5294573 width=17) (actual time=6638.296..9819.373 rows=10000002 loops=1)
Group Key: id2, id1
Buffers: shared hit=96 read=63905
-> Seq Scan on t1 (cost=0.00..164002.41 rows=10000141 width=17) (actual time=0.088..1068.998 rows=10000002 loops=1)
Buffers: shared hit=96 read=63905
Planning Time: 0.082 ms
Execution Time: 10199.573 ms
(7 行记录)2、当内存较小时
可以看到id2 (distinct 值更多)在前,性能更佳
test=# explain (analyse ,buffers,verbose ) select /*+ set(work_mem 100MB) */ distinct id2 ,id1 from t1 ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=1531777.15..1606778.21 rows=5294573 width=17) (actual time=8393.168..14878.930 rows=10000002 loops=1)
Output: id2, id1
Buffers: shared hit=128 read=63873, temp read=102913 written=103170
-> Sort (cost=1531777.15..1556777.51 rows=10000141 width=17) (actual time=8393.166..12881.589 rows=10000002 loops=1)
Output: id2, id1
Sort Key: t1.id2, t1.id1
Sort Method: external merge Disk: 272936kB
Buffers: shared hit=128 read=63873, temp read=102913 written=103170
-> Seq Scan on public.t1 (cost=0.00..164002.41 rows=10000141 width=17) (actual time=0.047..1154.290 rows=10000002 loops=1)
Output: id2, id1
Buffers: shared hit=128 read=63873
Planning Time: 0.111 ms
Execution Time: 15264.548 ms
(13 行记录)
test=# explain (analyse ,buffers,verbose ) select /*+ set(work_mem 100MB) */ distinct id1,id2 from t1 ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=1531777.15..1606778.21 rows=5294573 width=17) (actual time=23940.932..31830.909 rows=10000002 loops=1)
Output: id1, id2
Buffers: shared hit=160 read=63841, temp read=102913 written=103170
-> Sort (cost=1531777.15..1556777.51 rows=10000141 width=17) (actual time=23940.931..30050.674 rows=10000002 loops=1)
Output: id1, id2
Sort Key: t1.id1, t1.id2
Sort Method: external merge Disk: 272936kB
Buffers: shared hit=160 read=63841, temp read=102913 written=103170
-> Seq Scan on public.t1 (cost=0.00..164002.41 rows=10000141 width=17) (actual time=0.027..791.278 rows=10000002 loops=1)
Output: id1, id2
Buffers: shared hit=160 read=63841
Planning Time: 0.063 ms
Execution Time: 32206.516 ms
(13 行记录)三、group by 操作
1、当内存足够大时
同样,因为内存较大,group by 操作通过hashaggr,并未排序,所以性能无差异
test=# explain (analyse ,buffers ) select /*+ set(work_mem 1024MB) */ id1 ,id2 from t1 group by id1 ,id2 ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=214003.11..266948.84 rows=5294573 width=17) (actual time=6072.929..9338.734 rows=10000002 loops=1)
Group Key: id1, id2
Buffers: shared hit=192 read=63809
-> Seq Scan on t1 (cost=0.00..164002.41 rows=10000141 width=17) (actual time=0.076..963.138 rows=10000002 loops=1)
Buffers: shared hit=192 read=63809
Planning Time: 0.137 ms
Execution Time: 9706.713 ms
(7 行记录)
test=# explain (analyse ,buffers ) select /*+ set(work_mem 1024MB) */ id1 ,id2 from t1 group by id2,id1 ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=214003.11..266948.84 rows=5294573 width=17) (actual time=5839.798..8975.304 rows=10000002 loops=1)
Group Key: id2, id1
Buffers: shared hit=224 read=63777
-> Seq Scan on t1 (cost=0.00..164002.41 rows=10000141 width=17) (actual time=0.067..913.143 rows=10000002 loops=1)
Buffers: shared hit=224 read=63777
Planning Time: 0.081 ms
Execution Time: 9339.602 ms
(7 行记录)2、当内存较小时
同样,可以看到 id2 (distinct 值更多)在前,性能更佳
test=# explain (analyse ,buffers ) select /*+ set(work_mem 100MB) */ id1 ,id2 from t1 group by id1 ,id2 ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Group (cost=1531777.15..1606778.21 rows=5294573 width=17) (actual time=23888.203..32225.259 rows=10000002 loops=1)
Group Key: id1, id2
Buffers: shared hit=256 read=63745, temp read=102913 written=103170
-> Sort (cost=1531777.15..1556777.51 rows=10000141 width=17) (actual time=23888.197..30219.007 rows=10000002 loops=1)
Sort Key: id1, id2
Sort Method: external merge Disk: 272936kB
Buffers: shared hit=256 read=63745, temp read=102913 written=103170
-> Seq Scan on t1 (cost=0.00..164002.41 rows=10000141 width=17) (actual time=0.064..841.771 rows=10000002 loops=1)
Buffers: shared hit=256 read=63745
Planning Time: 0.114 ms
Execution Time: 32600.554 ms
(11 行记录)
test=# explain (analyse ,buffers ) select /*+ set(work_mem 100MB) */ id1 ,id2 from t1 group by id2,id1 ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Group (cost=1531777.15..1606778.21 rows=5294573 width=17) (actual time=7911.096..14479.694 rows=10000002 loops=1)
Group Key: id2, id1
Buffers: shared hit=288 read=63713, temp read=102913 written=103170
-> Sort (cost=1531777.15..1556777.51 rows=10000141 width=17) (actual time=7911.092..12323.432 rows=10000002 loops=1)
Sort Key: id2, id1
Sort Method: external merge Disk: 272936kB
Buffers: shared hit=288 read=63713, temp read=102913 written=103170
-> Seq Scan on t1 (cost=0.00..164002.41 rows=10000141 width=17) (actual time=0.039..803.117 rows=10000002 loops=1)
Buffers: shared hit=288 read=63713
Planning Time: 0.081 ms
Execution Time: 14851.643 ms
(11 行记录)四、order by 操作
由于order by 操作不能用hashaggr ,因此,内存大小只会影响排序的性能。可以看到,在同样内存的情况下,id2 在前性能更佳。
test=# explain (analyse ,buffers ) select /*+ set(work_mem 1024MB) */ id1 ,id2 from t1 order by id2,id1 ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Sort (cost=1326694.65..1351695.01 rows=10000141 width=17) (actual time=7860.700..11978.304 rows=10000002 loops=1)
Sort Key: id2, id1
Sort Method: external merge Disk: 272936kB
Buffers: shared hit=320 read=63681, temp read=102913 written=103170
-> Seq Scan on t1 (cost=0.00..164002.41 rows=10000141 width=17) (actual time=0.057..776.822 rows=10000002 loops=1)
Buffers: shared hit=320 read=63681
Planning Time: 0.058 ms
Execution Time: 12341.831 ms
(8 行记录)
test=# explain (analyse ,buffers ) select /*+ set(work_mem 1024MB) */ id1 ,id2 from t1 order by id1,id2 ;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Sort (cost=1326694.65..1351695.01 rows=10000141 width=17) (actual time=23292.115..29392.412 rows=10000002 loops=1)
Sort Key: id1, id2
Sort Method: external merge Disk: 272936kB
Buffers: shared hit=352 read=63649, temp read=102913 written=103170
-> Seq Scan on t1 (cost=0.00..164002.41 rows=10000141 width=17) (actual time=0.071..829.915 rows=10000002 loops=1)
Buffers: shared hit=352 read=63649
Planning Time: 0.103 ms
Execution Time: 29774.684 ms
(8 行记录)最后修改时间:2024-08-06 14:01:57
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




