暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 1000亿数据量 正则匹配 速度与激情

digoal 2016-03-07
405

作者

digoal

日期

2016-03-07

标签

PostgreSQL , pg_trgm , 倒排索引 , reverse , like , 正则匹配 , 模糊查询 , gin索引


背景

本文主要讲解并验证一下PostgreSQL 1000亿数据量级的模糊查询、正则查询,使用gin索引的效率。

承接上一篇

https://yq.aliyun.com/articles/7444

测试环境

测试环境为 8台主机(16c/host)的 PostgreSQL集群,一共240个数据节点,测试数据量1008亿。

性能图表 :

_

如果要获得更快的响应速度,可以通过增加主机和节点数(或者通过增加CPU和节点数),缩短recheck的处理时间。

数据生成方法

```

!/bin/bash

截取通过random()计算得到的MD5 128bit hex的前48bit, 转成字符串,得到[0-9]和[a-f]组成的12个随机字符串。

psql digoal digoal -c "create table t_regexp_100billion distributed randomly"

for ((i=1;i<=1008;i++))
do
psql digoal digoal -c "copy (select substring(md5(random()::text),1,12) from generate_series(1,100000000)) to stdout" | psql digoal digoal -c "copy t_regexp_100billion from stdin"
done

psql digoal digoal -c "set maintenance_work_mem='4GB'; create index idx_t_regexp_100billion_1 on t_regexp_100billion(info)"
psql digoal digoal -c "set maintenance_work_mem='4GB'; create index idx_t_regexp_100billion_2 on t_regexp_100billion(reverse(info))"
psql digoal digoal -c "set maintenance_work_mem='4GB'; create index idx_t_regexp_100billion_gin on t_regexp_100billion using gin (info gin_trgm_ops)"
```

数据概貌

``` digoal=> select count(*) from t_regexp_100billion ;
count


100800000000
(1 row)
Time: 228721.386 ms
```

表大小, 4.1 TB

digoal=> \dt+ t_regexp_100billion List of relations Schema | Name | Type | Owner | Size | Description --------+---------------------+-------+--------+---------+------------- public | t_regexp_100billion | table | digoal | 4158 GB | (1 row)

索引大小

idx_t_regexp_100billion_1 2961 GB idx_t_regexp_100billion_1 2961 GB idx_t_regexp_100billion_gin 2300 GB

测试数据展示:

``` digoal=> select * from t_regexp_100billion offset 1000000 limit 10;
info


bca0fb45367e
3051ca8a9a38
fadc91a3a4de
710b9c60417e
279dd9832cc3
f4743fe2e83b
9ce9e42d4039
65e64742fd3f
db3d0e0edc52
7cfb00bb38ec
(10 rows)
```

重复度取样, 计算random() md5得到的字符串,可以确保非常低的重复度:

``` digoal=> select count(distinct info) from (select * from t_regexp_100billion offset 1299422811 limit 1000000) t;
count


999750
(1 row)
```

统计信息展示:

``` digoal=> alter table t_regexp_100billion alter column info set statistics 10000;
ALTER TABLE
digoal=> analyze t_regexp_100billion ;
ANALYZE

schemaname | public
tablename | t_regexp_100billion
attname | info
inherited | f
null_frac | 0
avg_width | 13
n_distinct | -0.836834 # 采样统计信息,约83.6834%的唯一值
most_common_vals | (pg_catalog.text){7f68d12d2205,00083380706d,00154b6d79e8,...
most_common_freqs | {1e-06,6.66667e-07,6.66667e-07,6.66667e-07,..... 单个最高频值的占比为1e-06, 也就是说1000亿记录中出现10万次。
histogram_bounds | (pg_catalog.text){0000008123b7,00066c71c9bb,000d672de234,...
correlation | 0.000237291
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |
```

7f68d12d2205 实际的出现次数,可能是采样时7f68d12d2205被采样到的块较多,所以数据库认为它的占比较多:

``` digoal=> select count(*) from t_regexp_100billion where info='7f68d12d2205';
-[ RECORD 1 ]
count | 54

digoal=> select ctid from t_regexp_100billion where info='7f68d12d2205' order by 1;
ctid


(15343,114)
(62134,39)
(96808,112)
(116492,176)
(194615,143)
(328074,116)
(364037,115)
(375240,158)
(376187,152)
(602144,81)
(664026,6)
(689501,136)
(695345,130)
(697374,126)
(714719,148)
(743169,20)
(802326,139)
(833830,41)
(839417,185)
(892417,78)
(892493,149)
(907979,52)
(967078,163)
(990313,159)
(1007998,27)
(1106961,57)
(1142731,165)
(1148427,67)
(1156654,156)
(1205854,137)
(1243429,68)
(1277287,165)
(1328836,98)
(1331727,150)
(1337534,3)
(1360947,104)
(1438970,97)
(1476941,22)
(1482022,82)
(1486307,69)
(1548445,155)
(1557209,82)
(1564980,158)
(1646685,76)
(1663018,99)
(1678604,77)
(1755845,177)
(1981937,153)
(1984723,98)
(2071955,59)
(2093147,149)
(2199794,102)
(2204957,44)
(2234820,142)
(54 rows)
```

性能测试

前缀匹配查询速度:

``` digoal=> select ctid,tableoid,info from t_regexp_100billion where info ~ '^80ebcdd47';
ctid | tableoid | info
---------------+----------+--------------
(124741,60) | 16677 | 80ebcdd47006
(896121,64) | 16659 | 80ebcdd47006
(1124495,97) | 16659 | 80ebcdd47006
(1126474,141) | 16659 | 80ebcdd47006
(1059471,62) | 16659 | 80ebcdd47006
(1296562,115) | 16659 | 80ebcdd47006
(1190941,122) | 16659 | 80ebcdd47006
(680853,129) | 16659 | 80ebcdd47006
(1010667,15) | 16659 | 80ebcdd47006
(1386348,25) | 16659 | 80ebcdd47006
(1522827,90) | 16659 | 80ebcdd47006
(2204071,129) | 16659 | 80ebcdd47006
(1570431,114) | 16659 | 80ebcdd47006
(888185,38) | 16659 | 80ebcdd47006
(605886,160) | 16659 | 80ebcdd47006
(1306061,123) | 16659 | 80ebcdd47006
(757157,47) | 16659 | 80ebcdd47006
(1166290,83) | 16659 | 80ebcdd47006
(419730,1) | 16659 | 80ebcdd47006
(1833853,131) | 16659 | 80ebcdd47006
(964866,120) | 16659 | 80ebcdd47006
(904961,175) | 16659 | 80ebcdd47006
(984373,32) | 16659 | 80ebcdd47006
(891018,145) | 16659 | 80ebcdd47006
(1520483,121) | 16659 | 80ebcdd47006
(571001,124) | 16659 | 80ebcdd47006
(802093,55) | 16659 | 80ebcdd47006
(6831,172) | 16659 | 80ebcdd47006
(1169137,84) | 16659 | 80ebcdd47006
(77398,164) | 16659 | 80ebcdd47006
(24132,98) | 16659 | 80ebcdd47006
(564322,152) | 16659 | 80ebcdd47006
(357087,172) | 16659 | 80ebcdd47006
(1823628,60) | 16659 | 80ebcdd47006
(2153609,52) | 16659 | 80ebcdd47006
(816401,140) | 16659 | 80ebcdd47006
(542383,53) | 16662 | 80ebcdd47006
(1340971,64) | 16662 | 80ebcdd47006
(1239166,108) | 16662 | 80ebcdd47006
(2033648,39) | 16662 | 80ebcdd47006
(1890808,93) | 16662 | 80ebcdd47006
(1213124,4) | 16662 | 80ebcdd47006
(1025184,106) | 16662 | 80ebcdd47006
(620238,131) | 16662 | 80ebcdd47006
(583064,74) | 16662 | 80ebcdd47006
(1454680,42) | 16671 | 80ebcdd47006
(417385,74) | 16671 | 80ebcdd47006
(323669,61) | 16671 | 80ebcdd47006
(1759181,138) | 16671 | 80ebcdd47006
(2112157,146) | 16671 | 80ebcdd47006
(431326,92) | 16671 | 80ebcdd47006
(2097356,110) | 16671 | 80ebcdd47006
(52 rows)
Time: 3226.393 ms

digoal=> explain (analyze,verbose,buffers,costs,timing) select ctid,tableoid,info from t_regexp_100billion where info ~ '^80ebcdd47';
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0) (actual time=3085.502..3112.273 rows=52 loops=1)
Output: t_regexp_100billion.ctid, t_regexp_100billion.tableoid, t_regexp_100billion.info
Node/s: h1_data1, h1_data10, h1_data11, h1_data12, h1_data13, h1_data14, h1_data15, h1_data16, h1_data17, h1_data18, h1_data19, h1_data2, h1_data20, h1_data21, h1_data22, h1_data23, h1_data24, h1_data25, h1_data26, h1_data27, h1_data2
8, h1_data29, h1_data3, h1_data30, h1_data4, h1_data5, h1_data6, h1_data7, h1_data8, h1_data9, h2_data1, h2_data10, h2_data11, h2_data12, h2_data13, h2_data14, h2_data15, h2_data16, h2_data17, h2_data18, h2_data19, h2_data2, h2_data20, h
2_data21, h2_data22, h2_data23, h2_data24, h2_data25, h2_data26, h2_data27, h2_data28, h2_data29, h2_data3, h2_data30, h2_data4, h2_data5, h2_data6, h2_data7, h2_data8, h2_data9, h3_data1, h3_data10, h3_data11, h3_data12, h3_data13, h3_d
ata14, h3_data15, h3_data16, h3_data17, h3_data18, h3_data19, h3_data2, h3_data20, h3_data21, h3_data22, h3_data23, h3_data24, h3_data25, h3_data26, h3_data27, h3_data28, h3_data29, h3_data3, h3_data30, h3_data4, h3_data5, h3_data6, h3_d
ata7, h3_data8, h3_data9, h4_data1, h4_data10, h4_data11, h4_data12, h4_data13, h4_data14, h4_data15, h4_data16, h4_data17, h4_data18, h4_data19, h4_data2, h4_data20, h4_data21, h4_data22, h4_data23, h4_data24, h4_data25, h4_data26, h4_d
ata27, h4_data28, h4_data29, h4_data3, h4_data30, h4_data4, h4_data5, h4_data6, h4_data7, h4_data8, h4_data9, h5_data1, h5_data10, h5_data11, h5_data12, h5_data13, h5_data14, h5_data15, h5_data16, h5_data17, h5_data18, h5_data19, h5_data
2, h5_data20, h5_data21, h5_data22, h5_data23, h5_data24, h5_data25, h5_data26, h5_data27, h5_data28, h5_data29, h5_data3, h5_data30, h5_data4, h5_data5, h5_data6, h5_data7, h5_data8, h5_data9, h6_data1, h6_data10, h6_data11, h6_data12,
h6_data13, h6_data14, h6_data15, h6_data16, h6_data17, h6_data18, h6_data19, h6_data2, h6_data20, h6_data21, h6_data22, h6_data23, h6_data24, h6_data25, h6_data26, h6_data27, h6_data28, h6_data29, h6_data3, h6_data30, h6_data4, h6_data5,
h6_data6, h6_data7, h6_data8, h6_data9, h7_data1, h7_data10, h7_data11, h7_data12, h7_data13, h7_data14, h7_data15, h7_data16, h7_data17, h7_data18, h7_data19, h7_data2, h7_data20, h7_data21, h7_data22, h7_data23, h7_data24, h7_data25,
h7_data26, h7_data27, h7_data28, h7_data29, h7_data3, h7_data30, h7_data4, h7_data5, h7_data6, h7_data7, h7_data8, h7_data9, h8_data1, h8_data10, h8_data11, h8_data12, h8_data13, h8_data14, h8_data15, h8_data16, h8_data17, h8_data18, h8_
data19, h8_data2, h8_data20, h8_data21, h8_data22, h8_data23, h8_data24, h8_data25, h8_data26, h8_data27, h8_data28, h8_data29, h8_data3, h8_data30, h8_data4, h8_data5, h8_data6, h8_data7, h8_data8, h8_data9
Remote query: SELECT ctid, tableoid, info FROM t_regexp_100billion WHERE (info ~ '^80ebcdd47'::text)
Planning time: 0.061 ms
Execution time: 3112.296 ms
(6 rows)
Time: 3139.928 ms
```

后缀匹配查询速度

``` digoal=> select ctid,tableoid,info from t_regexp_100billion where reverse(info) ~ '^f42d12089b';
ctid | tableoid | info
---------------+----------+--------------
(124741,26) | 16677 | f3b98021d24f
(1696888,151) | 16659 | f3b98021d24f
(1278911,101) | 16659 | f3b98021d24f
(1427480,157) | 16659 | f3b98021d24f
(449192,30) | 16659 | f3b98021d24f
(1833887,81) | 16659 | f3b98021d24f
(229525,72) | 16659 | f3b98021d24f
(1353789,17) | 16659 | f3b98021d24f
(1875911,148) | 16659 | f3b98021d24f
(1847078,35) | 16659 | f3b98021d24f
(316780,156) | 16659 | f3b98021d24f
(1265453,120) | 16659 | f3b98021d24f
(100075,60) | 16659 | f3b98021d24f
(1924176,2) | 16659 | f3b98021d24f
(279583,2) | 16659 | f3b98021d24f
(1631226,23) | 16659 | f3b98021d24f
(1906666,50) | 16659 | f3b98021d24f
(1640803,116) | 16659 | f3b98021d24f
(629651,46) | 16659 | f3b98021d24f
(134982,13) | 16659 | f3b98021d24f
(380660,123) | 16659 | f3b98021d24f
(2158193,31) | 16659 | f3b98021d24f
(324901,64) | 16659 | f3b98021d24f
(1243973,160) | 16659 | f3b98021d24f
(540958,139) | 16659 | f3b98021d24f
(441475,99) | 16659 | f3b98021d24f
(1207114,121) | 16659 | f3b98021d24f
(574598,21) | 16659 | f3b98021d24f
(1253283,185) | 16659 | f3b98021d24f
(1396717,142) | 16659 | f3b98021d24f
(149738,9) | 16659 | f3b98021d24f
(764749,26) | 16659 | f3b98021d24f
(1211899,5) | 16659 | f3b98021d24f
(1626746,65) | 16659 | f3b98021d24f
(1342895,124) | 16659 | f3b98021d24f
(733794,136) | 16659 | f3b98021d24f
(417796,2) | 16659 | f3b98021d24f
(555520,163) | 16659 | f3b98021d24f
(232038,105) | 16659 | f3b98021d24f
(355107,127) | 16659 | f3b98021d24f
(352143,175) | 16662 | f3b98021d24f
(1856293,69) | 16662 | f3b98021d24f
(1405106,105) | 16662 | f3b98021d24f
(47689,79) | 16662 | f3b98021d24f
(679310,7) | 16671 | f3b98021d24f
(1076234,164) | 16671 | f3b98021d24f
(46 rows)
Time: 3140.835 ms

digoal=> explain (verbose,costs,timing,buffers,analyze) select ctid,tableoid,info from t_regexp_100billion where reverse(info) ~ '^f42d12089b';
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0) (actual time=3085.738..3112.216 rows=46 loops=1)
Output: t_regexp_100billion.ctid, t_regexp_100billion.tableoid, t_regexp_100billion.info
Node/s: h1_data1, h1_data10, h1_data11, h1_data12, h1_data13, h1_data14, h1_data15, h1_data16, h1_data17, h1_data18, h1_data19, h1_data2, h1_data20, h1_data21, h1_data22, h1_data23, h1_data24, h1_data25, h1_data26, h1_data27, h1_data2
8, h1_data29, h1_data3, h1_data30, h1_data4, h1_data5, h1_data6, h1_data7, h1_data8, h1_data9, h2_data1, h2_data10, h2_data11, h2_data12, h2_data13, h2_data14, h2_data15, h2_data16, h2_data17, h2_data18, h2_data19, h2_data2, h2_data20, h
2_data21, h2_data22, h2_data23, h2_data24, h2_data25, h2_data26, h2_data27, h2_data28, h2_data29, h2_data3, h2_data30, h2_data4, h2_data5, h2_data6, h2_data7, h2_data8, h2_data9, h3_data1, h3_data10, h3_data11, h3_data12, h3_data13, h3_d
ata14, h3_data15, h3_data16, h3_data17, h3_data18, h3_data19, h3_data2, h3_data20, h3_data21, h3_data22, h3_data23, h3_data24, h3_data25, h3_data26, h3_data27, h3_data28, h3_data29, h3_data3, h3_data30, h3_data4, h3_data5, h3_data6, h3_d
ata7, h3_data8, h3_data9, h4_data1, h4_data10, h4_data11, h4_data12, h4_data13, h4_data14, h4_data15, h4_data16, h4_data17, h4_data18, h4_data19, h4_data2, h4_data20, h4_data21, h4_data22, h4_data23, h4_data24, h4_data25, h4_data26, h4_d
ata27, h4_data28, h4_data29, h4_data3, h4_data30, h4_data4, h4_data5, h4_data6, h4_data7, h4_data8, h4_data9, h5_data1, h5_data10, h5_data11, h5_data12, h5_data13, h5_data14, h5_data15, h5_data16, h5_data17, h5_data18, h5_data19, h5_data
2, h5_data20, h5_data21, h5_data22, h5_data23, h5_data24, h5_data25, h5_data26, h5_data27, h5_data28, h5_data29, h5_data3, h5_data30, h5_data4, h5_data5, h5_data6, h5_data7, h5_data8, h5_data9, h6_data1, h6_data10, h6_data11, h6_data12,
h6_data13, h6_data14, h6_data15, h6_data16, h6_data17, h6_data18, h6_data19, h6_data2, h6_data20, h6_data21, h6_data22, h6_data23, h6_data24, h6_data25, h6_data26, h6_data27, h6_data28, h6_data29, h6_data3, h6_data30, h6_data4, h6_data5,
h6_data6, h6_data7, h6_data8, h6_data9, h7_data1, h7_data10, h7_data11, h7_data12, h7_data13, h7_data14, h7_data15, h7_data16, h7_data17, h7_data18, h7_data19, h7_data2, h7_data20, h7_data21, h7_data22, h7_data23, h7_data24, h7_data25,
h7_data26, h7_data27, h7_data28, h7_data29, h7_data3, h7_data30, h7_data4, h7_data5, h7_data6, h7_data7, h7_data8, h7_data9, h8_data1, h8_data10, h8_data11, h8_data12, h8_data13, h8_data14, h8_data15, h8_data16, h8_data17, h8_data18, h8_
data19, h8_data2, h8_data20, h8_data21, h8_data22, h8_data23, h8_data24, h8_data25, h8_data26, h8_data27, h8_data28, h8_data29, h8_data3, h8_data30, h8_data4, h8_data5, h8_data6, h8_data7, h8_data8, h8_data9
Remote query: SELECT ctid, tableoid, info FROM t_regexp_100billion WHERE (reverse(info) ~ '^f42d12089b'::text)
Planning time: 0.063 ms
Execution time: 3112.236 ms
(6 rows)

Time: 3139.890 ms
```

前后模糊查询速度:

``` digoal=> select ctid,tableoid,info from t_regexp_100billion where info ~ 'e7add04871';
ctid | tableoid | info
---------------+----------+--------------
(124741,45) | 16677 | be7add048713
(49315,69) | 16659 | be7add048713
(1770876,21) | 16659 | be7add048713
(199079,143) | 16659 | be7add048713
(151110,141) | 16659 | be7add048713
(1597384,137) | 16659 | be7add048713
(1693453,25) | 16659 | be7add048713
(101576,132) | 16659 | be7add048713
(1110249,50) | 16659 | be7add048713
(792326,68) | 16659 | be7add048713
(1676705,68) | 16659 | be7add048713
(1269148,101) | 16659 | be7add048713
(1027442,113) | 16659 | be7add048713
(1078144,100) | 16659 | be7add048713
(584038,141) | 16659 | be7add048713
(1245454,80) | 16659 | be7add048713
(1551184,102) | 16659 | be7add048713
(1326266,17) | 16659 | be7add048713
(432025,101) | 16659 | be7add048713
(300650,152) | 16659 | be7add048713
(1322140,15) | 16662 | be7add048713
(1424768,25) | 16662 | be7add048713
(391150,31) | 16662 | be7add048713
(254014,170) | 16662 | be7add048713
(1758616,85) | 16662 | be7add048713
(1720990,105) | 16662 | be7add048713
(345908,68) | 16662 | be7add048713
(1592333,102) | 16662 | be7add048713
(1843902,130) | 16671 | be7add048713
(898136,121) | 16671 | be7add048713
(1469985,138) | 16671 | be7add048713
(1287666,51) | 16671 | be7add048713
(32 rows)

Time: 4970.662 ms

digoal=> explain (analyze,verbose,timing,costs,buffers) select ctid,tableoid,info from t_regexp_100billion where info ~ 'e7add04871';
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0) (actual time=4712.916..4897.512 rows=32 loops=1)
Output: t_regexp_100billion.ctid, t_regexp_100billion.tableoid, t_regexp_100billion.info
Node/s: h1_data1, h1_data10, h1_data11, h1_data12, h1_data13, h1_data14, h1_data15, h1_data16, h1_data17, h1_data18, h1_data19, h1_data2, h1_data20, h1_data21, h1_data22, h1_data23, h1_data24, h1_data25, h1_data26, h1_data27, h1_data2
8, h1_data29, h1_data3, h1_data30, h1_data4, h1_data5, h1_data6, h1_data7, h1_data8, h1_data9, h2_data1, h2_data10, h2_data11, h2_data12, h2_data13, h2_data14, h2_data15, h2_data16, h2_data17, h2_data18, h2_data19, h2_data2, h2_data20, h
2_data21, h2_data22, h2_data23, h2_data24, h2_data25, h2_data26, h2_data27, h2_data28, h2_data29, h2_data3, h2_data30, h2_data4, h2_data5, h2_data6, h2_data7, h2_data8, h2_data9, h3_data1, h3_data10, h3_data11, h3_data12, h3_data13, h3_d
ata14, h3_data15, h3_data16, h3_data17, h3_data18, h3_data19, h3_data2, h3_data20, h3_data21, h3_data22, h3_data23, h3_data24, h3_data25, h3_data26, h3_data27, h3_data28, h3_data29, h3_data3, h3_data30, h3_data4, h3_data5, h3_data6, h3_d
ata7, h3_data8, h3_data9, h4_data1, h4_data10, h4_data11, h4_data12, h4_data13, h4_data14, h4_data15, h4_data16, h4_data17, h4_data18, h4_data19, h4_data2, h4_data20, h4_data21, h4_data22, h4_data23, h4_data24, h4_data25, h4_data26, h4_d
ata27, h4_data28, h4_data29, h4_data3, h4_data30, h4_data4, h4_data5, h4_data6, h4_data7, h4_data8, h4_data9, h5_data1, h5_data10, h5_data11, h5_data12, h5_data13, h5_data14, h5_data15, h5_data16, h5_data17, h5_data18, h5_data19, h5_data
2, h5_data20, h5_data21, h5_data22, h5_data23, h5_data24, h5_data25, h5_data26, h5_data27, h5_data28, h5_data29, h5_data3, h5_data30, h5_data4, h5_data5, h5_data6, h5_data7, h5_data8, h5_data9, h6_data1, h6_data10, h6_data11, h6_data12,
h6_data13, h6_data14, h6_data15, h6_data16, h6_data17, h6_data18, h6_data19, h6_data2, h6_data20, h6_data21, h6_data22, h6_data23, h6_data24, h6_data25, h6_data26, h6_data27, h6_data28, h6_data29, h6_data3, h6_data30, h6_data4, h6_data5,
h6_data6, h6_data7, h6_data8, h6_data9, h7_data1, h7_data10, h7_data11, h7_data12, h7_data13, h7_data14, h7_data15, h7_data16, h7_data17, h7_data18, h7_data19, h7_data2, h7_data20, h7_data21, h7_data22, h7_data23, h7_data24, h7_data25,
h7_data26, h7_data27, h7_data28, h7_data29, h7_data3, h7_data30, h7_data4, h7_data5, h7_data6, h7_data7, h7_data8, h7_data9, h8_data1, h8_data10, h8_data11, h8_data12, h8_data13, h8_data14, h8_data15, h8_data16, h8_data17, h8_data18, h8_
data19, h8_data2, h8_data20, h8_data21, h8_data22, h8_data23, h8_data24, h8_data25, h8_data26, h8_data27, h8_data28, h8_data29, h8_data3, h8_data30, h8_data4, h8_data5, h8_data6, h8_data7, h8_data8, h8_data9
Remote query: SELECT ctid, tableoid, info FROM t_regexp_100billion WHERE (info ~ 'e7add04871'::text)
Planning time: 0.063 ms
Execution time: 4897.532 ms
(6 rows)

Time: 4925.741 ms
```

正则匹配查询速度

``` digoal=> select ctid,tableoid,info from t_regexp_100billion where info ~ '.3918.209f';
ctid | tableoid | info
---------------+----------+--------------
(124741,29) | 16677 | 0b39188209f2
(1443707,79) | 16659 | 0b39188209f2
(596962,50) | 16659 | 0b39188209f2
(1763787,145) | 16659 | 0b39188209f2
(2192691,24) | 16659 | 0b39188209f2
(425121,26) | 16659 | 0b39188209f2
(2157735,117) | 16659 | 0b39188209f2
(826685,32) | 16659 | 0b39188209f2
(507417,51) | 16659 | 0b39188209f2
(1168854,22) | 16659 | 0b39188209f2
(178112,96) | 16659 | 0b39188209f2
(1609343,84) | 16659 | 0b39188209f2
(1883190,161) | 16659 | 0b39188209f2
(1879921,82) | 16659 | 0b39188209f2
(187722,148) | 16659 | 0b39188209f2
(411680,31) | 16659 | 0b39188209f2
(1103474,74) | 16659 | 0b39188209f2
(1756318,139) | 16659 | 0b39188209f2
(760475,112) | 16659 | 0b39188209f2
(656098,47) | 16659 | 0b39188209f2
(2015224,31) | 16659 | 0b39188209f2
(401158,64) | 16659 | 0b39188209f2
(1001315,155) | 16659 | 0b39188209f2
(527643,24) | 16659 | 0b39188209f2
(51198,95) | 16659 | 0b39188209f2
(1709591,26) | 16659 | 0b39188209f2
(1235618,22) | 16659 | 0b39188209f2
(542813,107) | 16659 | 0b39188209f2
(331468,156) | 16659 | 0b39188209f2
(940954,68) | 16662 | 0b39188209f2
(1295686,169) | 16662 | 0b39188209f2
(825955,109) | 16668 | 0b39188209f2
(2025210,165) | 16671 | 0b39188209f2
(1639115,139) | 16671 | 0b39188209f2
(422678,79) | 16671 | 0b39188209f2
(379949,175) | 16671 | 0b39188209f2
(455206,96) | 16671 | 0b39188209f2
(1745081,184) | 16671 | 0b39188209f2
(38 rows)
Time: 3580.536 ms

digoal=> explain (verbose,analyze,timing,costs,buffers) select ctid,tableoid,info from t_regexp_100billion where info ~ '.3918.209f';
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0) (actual time=3407.156..3621.601 rows=38 loops=1)
Output: t_regexp_100billion.ctid, t_regexp_100billion.tableoid, t_regexp_100billion.info
Node/s: h1_data1, h1_data10, h1_data11, h1_data12, h1_data13, h1_data14, h1_data15, h1_data16, h1_data17, h1_data18, h1_data19, h1_data2, h1_data20, h1_data21, h1_data22, h1_data23, h1_data24, h1_data25, h1_data26, h1_data27, h1_data2
8, h1_data29, h1_data3, h1_data30, h1_data4, h1_data5, h1_data6, h1_data7, h1_data8, h1_data9, h2_data1, h2_data10, h2_data11, h2_data12, h2_data13, h2_data14, h2_data15, h2_data16, h2_data17, h2_data18, h2_data19, h2_data2, h2_data20, h
2_data21, h2_data22, h2_data23, h2_data24, h2_data25, h2_data26, h2_data27, h2_data28, h2_data29, h2_data3, h2_data30, h2_data4, h2_data5, h2_data6, h2_data7, h2_data8, h2_data9, h3_data1, h3_data10, h3_data11, h3_data12, h3_data13, h3_d
ata14, h3_data15, h3_data16, h3_data17, h3_data18, h3_data19, h3_data2, h3_data20, h3_data21, h3_data22, h3_data23, h3_data24, h3_data25, h3_data26, h3_data27, h3_data28, h3_data29, h3_data3, h3_data30, h3_data4, h3_data5, h3_data6, h3_d
ata7, h3_data8, h3_data9, h4_data1, h4_data10, h4_data11, h4_data12, h4_data13, h4_data14, h4_data15, h4_data16, h4_data17, h4_data18, h4_data19, h4_data2, h4_data20, h4_data21, h4_data22, h4_data23, h4_data24, h4_data25, h4_data26, h4_d
ata27, h4_data28, h4_data29, h4_data3, h4_data30, h4_data4, h4_data5, h4_data6, h4_data7, h4_data8, h4_data9, h5_data1, h5_data10, h5_data11, h5_data12, h5_data13, h5_data14, h5_data15, h5_data16, h5_data17, h5_data18, h5_data19, h5_data
2, h5_data20, h5_data21, h5_data22, h5_data23, h5_data24, h5_data25, h5_data26, h5_data27, h5_data28, h5_data29, h5_data3, h5_data30, h5_data4, h5_data5, h5_data6, h5_data7, h5_data8, h5_data9, h6_data1, h6_data10, h6_data11, h6_data12,
h6_data13, h6_data14, h6_data15, h6_data16, h6_data17, h6_data18, h6_data19, h6_data2, h6_data20, h6_data21, h6_data22, h6_data23, h6_data24, h6_data25, h6_data26, h6_data27, h6_data28, h6_data29, h6_data3, h6_data30, h6_data4, h6_data5,
h6_data6, h6_data7, h6_data8, h6_data9, h7_data1, h7_data10, h7_data11, h7_data12, h7_data13, h7_data14, h7_data15, h7_data16, h7_data17, h7_data18, h7_data19, h7_data2, h7_data20, h7_data21, h7_data22, h7_data23, h7_data24, h7_data25,
h7_data26, h7_data27, h7_data28, h7_data29, h7_data3, h7_data30, h7_data4, h7_data5, h7_data6, h7_data7, h7_data8, h7_data9, h8_data1, h8_data10, h8_data11, h8_data12, h8_data13, h8_data14, h8_data15, h8_data16, h8_data17, h8_data18, h8_
data19, h8_data2, h8_data20, h8_data21, h8_data22, h8_data23, h8_data24, h8_data25, h8_data26, h8_data27, h8_data28, h8_data29, h8_data3, h8_data30, h8_data4, h8_data5, h8_data6, h8_data7, h8_data8, h8_data9
Remote query: SELECT ctid, tableoid, info FROM t_regexp_100billion WHERE (info ~ '.3918.209f'::text)
Planning time: 0.072 ms
Execution time: 3621.626 ms
(6 rows)
Time: 3650.045 ms

digoal=> select ctid,tableoid,info from t_regexp_100billion where info ~ 'ab2..d[1|f]3c8';
ctid | tableoid | info
---------------+----------+--------------
(899065,160) | 16659 | 4eab207df3c8
(2100060,157) | 16659 | a2ab2fbdf3c8
(162213,12) | 16659 | a2ab2fbdf3c8
(637030,50) | 16659 | 4eab207df3c8
(1325830,35) | 16659 | a2ab2fbdf3c8
(197454,129) | 16659 | 4eab207df3c8
(2000258,158) | 16659 | a2ab2fbdf3c8
(765698,19) | 16659 | a2ab2fbdf3c8
(935743,59) | 16659 | 4eab207df3c8
(2203339,96) | 16659 | a2ab2fbdf3c8
(701234,118) | 16659 | a2ab2fbdf3c8
(971717,57) | 16659 | a2ab2fbdf3c8
(1164498,54) | 16659 | 4eab207df3c8
(393227,147) | 16659 | 4eab207df3c8
(1439445,94) | 16659 | a2ab2fbdf3c8
(1549135,146) | 16659 | 4eab207df3c8
(1551991,36) | 16659 | 4eab207df3c8
(2206488,3) | 16659 | a2ab2fbdf3c8
(481614,118) | 16659 | 4eab207df3c8
(1809085,7) | 16659 | a2ab2fbdf3c8
(173214,139) | 16659 | 4eab207df3c8
(1021816,28) | 16659 | 4eab207df3c8
(829846,43) | 16659 | a2ab2fbdf3c8
(1899020,79) | 16659 | 4eab207df3c8
(6241,163) | 16659 | 4eab207df3c8
(1205920,5) | 16659 | a2ab2fbdf3c8
(412014,52) | 16659 | 4eab207df3c8
(1122051,14) | 16659 | 4eab207df3c8
(284493,87) | 16659 | 4eab207df3c8
(374322,83) | 16659 | a2ab2fbdf3c8
(189124,19) | 16659 | 4eab207df3c8
(747428,175) | 16659 | a2ab2fbdf3c8
(795035,152) | 16659 | a2ab2fbdf3c8
(1949396,25) | 16659 | a2ab2fbdf3c8
(154445,167) | 16659 | a2ab2fbdf3c8
(859513,82) | 16659 | a2ab2fbdf3c8
(31337,41) | 16659 | a2ab2fbdf3c8
(1393343,136) | 16659 | 4eab207df3c8
(63555,82) | 16659 | a2ab2fbdf3c8
(608980,177) | 16659 | 4eab207df3c8
(250484,31) | 16659 | a2ab2fbdf3c8
(1696502,87) | 16659 | 4eab207df3c8
(2021326,68) | 16659 | a2ab2fbdf3c8
(397967,70) | 16659 | a2ab2fbdf3c8
(2083071,101) | 16659 | a2ab2fbdf3c8
(98554,23) | 16659 | a2ab2fbdf3c8
(1247891,182) | 16659 | 4eab207df3c8
(1533143,51) | 16659 | a2ab2fbdf3c8
(1280652,28) | 16659 | a2ab2fbdf3c8
(1337921,119) | 16659 | a2ab2fbdf3c8
(446914,180) | 16659 | a2ab2fbdf3c8
(1810263,161) | 16659 | a2ab2fbdf3c8
(350272,51) | 16659 | 4eab207df3c8
(909148,37) | 16659 | 4eab207df3c8
(197153,108) | 16659 | a2ab2fbdf3c8
(207423,96) | 16659 | 4eab207df3c8
(1097934,22) | 16659 | a2ab2fbdf3c8
(12605,49) | 16659 | 4eab207df3c8
(65244,28) | 16659 | 4eab207df3c8
(10274,30) | 16659 | a2ab2fbdf3c8
(1547771,91) | 16659 | 4eab207df3c8
(55044,64) | 16659 | 4eab207df3c8
(1286116,136) | 16659 | 4eab207df3c8
(797831,10) | 16659 | a2ab2fbdf3c8
(450949,98) | 16659 | a2ab2fbdf3c8
(563308,46) | 16659 | 4eab207df3c8
(1815443,179) | 16659 | a2ab2fbdf3c8
(279403,105) | 16659 | 4eab207df3c8
(1953284,11) | 16659 | 4eab207df3c8
(2068896,15) | 16659 | 4eab207df3c8
(1230212,18) | 16659 | 4eab207df3c8
(1513277,18) | 16659 | 4eab207df3c8
(1675223,30) | 16659 | a2ab2fbdf3c8
(966609,80) | 16662 | a2ab2fbdf3c8
(118085,180) | 16662 | a2ab2fbdf3c8
(1557051,116) | 16662 | a2ab2fbdf3c8
(1848877,62) | 16662 | 4eab207df3c8
(2224775,3) | 16662 | 4eab207df3c8
(1196571,72) | 16662 | 4eab207df3c8
(1799448,154) | 16662 | 4eab207df3c8
(2246230,68) | 16662 | a2ab2fbdf3c8
(984529,120) | 16662 | a2ab2fbdf3c8
(1361482,97) | 16662 | 4eab207df3c8
(1935512,51) | 16662 | a2ab2fbdf3c8
(816119,95) | 16662 | a2ab2fbdf3c8
(770381,45) | 16662 | 4eab207df3c8
(1943960,146) | 16662 | a2ab2fbdf3c8
(346006,160) | 16671 | a2ab2fbdf3c8
(1873262,96) | 16671 | 4eab207df3c8
(1219041,118) | 16671 | a2ab2fbdf3c8
(418076,24) | 16671 | a2ab2fbdf3c8
(724463,28) | 16671 | a2ab2fbdf3c8
(1471492,164) | 16671 | a2ab2fbdf3c8
(975490,122) | 16671 | a2ab2fbdf3c8
(1885629,34) | 16671 | 4eab207df3c8
(95 rows)
Time: 4718.459 ms

digoal=> explain (verbose,timing,costs,buffers,analyze) select ctid,tableoid,info from t_regexp_100billion where info ~ 'ab2..d[1|f]3c8';
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0) (actual time=4386.010..4648.614 rows=95 loops=1)
Output: t_regexp_100billion.ctid, t_regexp_100billion.tableoid, t_regexp_100billion.info
Node/s: h1_data1, h1_data10, h1_data11, h1_data12, h1_data13, h1_data14, h1_data15, h1_data16, h1_data17, h1_data18, h1_data19, h1_data2, h1_data20, h1_data21, h1_data22, h1_data23, h1_data24, h1_data25, h1_data26, h1_data27, h1_data2
8, h1_data29, h1_data3, h1_data30, h1_data4, h1_data5, h1_data6, h1_data7, h1_data8, h1_data9, h2_data1, h2_data10, h2_data11, h2_data12, h2_data13, h2_data14, h2_data15, h2_data16, h2_data17, h2_data18, h2_data19, h2_data2, h2_data20, h
2_data21, h2_data22, h2_data23, h2_data24, h2_data25, h2_data26, h2_data27, h2_data28, h2_data29, h2_data3, h2_data30, h2_data4, h2_data5, h2_data6, h2_data7, h2_data8, h2_data9, h3_data1, h3_data10, h3_data11, h3_data12, h3_data13, h3_d
ata14, h3_data15, h3_data16, h3_data17, h3_data18, h3_data19, h3_data2, h3_data20, h3_data21, h3_data22, h3_data23, h3_data24, h3_data25, h3_data26, h3_data27, h3_data28, h3_data29, h3_data3, h3_data30, h3_data4, h3_data5, h3_data6, h3_d
ata7, h3_data8, h3_data9, h4_data1, h4_data10, h4_data11, h4_data12, h4_data13, h4_data14, h4_data15, h4_data16, h4_data17, h4_data18, h4_data19, h4_data2, h4_data20, h4_data21, h4_data22, h4_data23, h4_data24, h4_data25, h4_data26, h4_d
ata27, h4_data28, h4_data29, h4_data3, h4_data30, h4_data4, h4_data5, h4_data6, h4_data7, h4_data8, h4_data9, h5_data1, h5_data10, h5_data11, h5_data12, h5_data13, h5_data14, h5_data15, h5_data16, h5_data17, h5_data18, h5_data19, h5_data
2, h5_data20, h5_data21, h5_data22, h5_data23, h5_data24, h5_data25, h5_data26, h5_data27, h5_data28, h5_data29, h5_data3, h5_data30, h5_data4, h5_data5, h5_data6, h5_data7, h5_data8, h5_data9, h6_data1, h6_data10, h6_data11, h6_data12,
h6_data13, h6_data14, h6_data15, h6_data16, h6_data17, h6_data18, h6_data19, h6_data2, h6_data20, h6_data21, h6_data22, h6_data23, h6_data24, h6_data25, h6_data26, h6_data27, h6_data28, h6_data29, h6_data3, h6_data30, h6_data4, h6_data5,
h6_data6, h6_data7, h6_data8, h6_data9, h7_data1, h7_data10, h7_data11, h7_data12, h7_data13, h7_data14, h7_data15, h7_data16, h7_data17, h7_data18, h7_data19, h7_data2, h7_data20, h7_data21, h7_data22, h7_data23, h7_data24, h7_data25,
h7_data26, h7_data27, h7_data28, h7_data29, h7_data3, h7_data30, h7_data4, h7_data5, h7_data6, h7_data7, h7_data8, h7_data9, h8_data1, h8_data10, h8_data11, h8_data12, h8_data13, h8_data14, h8_data15, h8_data16, h8_data17, h8_data18, h8_
data19, h8_data2, h8_data20, h8_data21, h8_data22, h8_data23, h8_data24, h8_data25, h8_data26, h8_data27, h8_data28, h8_data29, h8_data3, h8_data30, h8_data4, h8_data5, h8_data6, h8_data7, h8_data8, h8_data9
Remote query: SELECT ctid, tableoid, info FROM t_regexp_100billion WHERE (info ~ 'ab2..d[1|f]3c8'::text)
Planning time: 0.058 ms
Execution time: 4648.638 ms
(6 rows)
Time: 4676.919 ms
```

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论