使用pg_bulkload比copy方式性能提升30%以上,数据量越大提升越明显!而且pg_bulkload命令不需要交互式了,方便写脚本。比较可惜的是不支持分区表的导入。
系统环境:
[postgres@lnpg o2p]$ lscpu|grep CPU
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 8
On-line CPU(s) list: 0-7
CPU 系列: 6
型号名称: Intel(R) Xeon(R) Gold 6258R CPU @ 2.70GHz
CPU MHz: 2693.671
NUMA 节点0 CPU: 0-7
[postgres@lnpg o2p]$ free -g
total used free shared buff/cache available
Mem: 15 0 0 8 14 6
Swap: 7 0 7
[postgres@lnpg o2p]$ cat /etc/system-release
CentOS Linux release 7.9.2009 (Core)
[postgres@lnpg o2p]$ psql -U gistar -d resdb -p 5432 --host 192.168.207.143
psql (15.1)
Type "help" for help.
resdb=# select version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 15.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
(1 row)
resdb=# SHOW server_version_num;
server_version_num
--------------------
150001
(1 row)
resdb=# SELECT current_setting('server_version_num');
current_setting
-----------------
150001
(1 row)
resdb=# show server_version;
server_version
----------------
15.1
(1 row)
安装pg_bulkload
下载地址:https://pgxn.org/dist/pg_bulkload/
root安装编译:
unzip pg_bulkload-3.1.20.zip
cd pg_bulkload-3.1.20
make
make install
进入pg创建扩展插件:
[postgres@lnpg o2p]$ psql -U gistar -d resdb -p 5432 --host 192.168.207.143
psql (15.1)
Type "help" for help.
resdb=# create extension pg_bulkload;
CREATE EXTENSION
resdb=# \dx
List of installed extensions
Name | Version | Schema | Description
----------------+---------+------------+-----------------------------------------------------------------
oracle_fdw | 1.2 | public | foreign data wrapper for Oracle access
pg_bulkload | 1.0 | public | pg_bulkload is a high speed data loading utility for PostgreSQL
pgrouting | 2.6.3 | public | pgRouting Extension
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
postgis | 3.3.1 | public | PostGIS geometry and geography spatial types and functions
postgis_raster | 3.3.1 | public | PostGIS raster types and functions
postgis_sfcgal | 3.3.1 | public | PostGIS SFCGAL functions
(7 rows)
测试copy和pg_bulkload结果:
测试记录3687361条,
copy使用10.32秒

pg_bulkload使用6.59秒


使用pg_bulkload比copy方式性能提升30%以上,数据量越大提升越明显!而且pg_bulkload命令不需要交互式了,方便写脚本。
测试记录如下:
[postgres@lnpg o2p]$
[postgres@lnpg o2p]$ psql -U gistar -d resdb -p 5432 --host 192.168.207.143
psql (15.1)
Type "help" for help.
resdb=# truncate table jyc;
TRUNCATE TABLE
resdb=# \timing
Timing is on.
resdb=# select count(*) from jyc;
count
-------
0
(1 row)
Time: 0.991 ms
resdb=# copy jyc from '/db/o2p/9.csv' with (FORMAT csv,quote '"', DELIMITER ',',HEADER false,encoding 'UTF8');
COPY 3687361
Time: 10321.795 ms (00:10.322)
resdb=# select count(*) from jyc;
count
---------
3687361
(1 row)
Time: 199.546 ms
resdb=# truncate table jyc;
TRUNCATE TABLE
Time: 156.815 ms
resdb=# select count(*) from jyc;
count
-------
0
(1 row)
Time: 0.368 ms
resdb=# \q
[postgres@lnpg o2p]$ pg_bulkload -h 192.168.207.143 -p 5432 -P ./error.log -l ./input.log -i /db/o2p/9.csv -O jyc -d resdb -U gistar
NOTICE: BULK LOAD START
NOTICE: BULK LOAD END
0 Rows skipped.
3687361 Rows successfully loaded.
0 Rows not loaded due to parse errors.
0 Rows not loaded due to duplicate errors.
0 Rows replaced with new rows.
[postgres@lnpg o2p]$ more input.log
pg_bulkload 3.1.20 on 2023-02-12 13:12:57.084554+08
INPUT = /db/o2p/9.csv
PARSE_BADFILE = /db/o2p/error.log
LOGFILE = /db/o2p/input.log
LIMIT = INFINITE
PARSE_ERRORS = 0
CHECK_CONSTRAINTS = NO
TYPE = CSV
SKIP = 0
DELIMITER = ,
QUOTE = "\""
ESCAPE = "\""
NULL =
OUTPUT = public.jyc
MULTI_PROCESS = NO
VERBOSE = NO
WRITER = DIRECT
DUPLICATE_BADFILE = /db/postgresql/data/pg_bulkload/20230212131257_resdb_public_jyc.dup.csv
DUPLICATE_ERRORS = 0
ON_DUPLICATE_KEEP = NEW
TRUNCATE = NO
0 Rows skipped.
3687361 Rows successfully loaded.
0 Rows not loaded due to parse errors.
0 Rows not loaded due to duplicate errors.
0 Rows replaced with new rows.
Run began on 2023-02-12 13:12:57.084554+08
Run ended on 2023-02-12 13:13:04.64806+08
CPU 0.94s/5.86u sec elapsed 7.56 sec
pg_bulkload 3.1.20 on 2023-02-12 13:17:54.893514+08
INPUT = /db/o2p/9.csv
PARSE_BADFILE = /db/o2p/error.log
LOGFILE = /db/o2p/input.log
LIMIT = INFINITE
PARSE_ERRORS = 0
CHECK_CONSTRAINTS = NO
TYPE = CSV
SKIP = 0
DELIMITER = ,
QUOTE = "\""
ESCAPE = "\""
NULL =
OUTPUT = public.jyc
MULTI_PROCESS = NO
VERBOSE = NO
WRITER = DIRECT
DUPLICATE_BADFILE = /db/postgresql/data/pg_bulkload/20230212131754_resdb_public_jyc.dup.csv
DUPLICATE_ERRORS = 0
ON_DUPLICATE_KEEP = NEW
TRUNCATE = NO
0 Rows skipped.
3687361 Rows successfully loaded.
0 Rows not loaded due to parse errors.
0 Rows not loaded due to duplicate errors.
0 Rows replaced with new rows.
Run began on 2023-02-12 13:17:54.893514+08
Run ended on 2023-02-12 13:18:02.73289+08
CPU 0.68s/5.92u sec elapsed 7.84 sec
pg_bulkload 3.1.20 on 2023-02-12 13:20:28.948894+08
INPUT = /db/o2p/9.csv
PARSE_BADFILE = /db/o2p/error.log
LOGFILE = /db/o2p/input.log
LIMIT = INFINITE
PARSE_ERRORS = 0
CHECK_CONSTRAINTS = NO
TYPE = CSV
SKIP = 0
DELIMITER = ,
QUOTE = "\""
ESCAPE = "\""
NULL =
OUTPUT = public.jyc
MULTI_PROCESS = NO
VERBOSE = NO
WRITER = DIRECT
DUPLICATE_BADFILE = /db/postgresql/data/pg_bulkload/20230212132028_resdb_public_jyc.dup.csv
DUPLICATE_ERRORS = 0
ON_DUPLICATE_KEEP = NEW
TRUNCATE = NO
0 Rows skipped.
3687361 Rows successfully loaded.
0 Rows not loaded due to parse errors.
0 Rows not loaded due to duplicate errors.
0 Rows replaced with new rows.
Run began on 2023-02-12 13:20:28.948894+08
Run ended on 2023-02-12 13:20:35.540349+08
CPU 0.58s/5.87u sec elapsed 6.59 sec
[postgres@lnpg o2p]$ psql -U gistar -d resdb -p 5432 --host 192.168.207.143
psql (15.1)
Type "help" for help.
resdb=# select count(*) from jyc;
count
---------
3687361
(1 row)
resdb=#
不支持分区表:
https://github.com/ossc-db/pg_bulkload/issues/58

最后修改时间:2023-02-14 10:36:03
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




