\copy不支持\parallel on 并行导入,通过多个gsql同时后台调用起多个脚本进行并发调起,如:
nohup gsql $dblink -f sql1.sql > ./sql1.log 2>&1 &
l 如果并行任务之间的导入数据存在交集(必须不存在唯一键或主键约束),开4并发、每个文件导入100万数据直接导入最长耗时7.39秒;开到16并发数最长耗时11.8秒。但实际使用中应结合数据库总体负载合理控制并发数。
4并发执行结果:
exetime | readrows | loadrows | errorrows
-----------------+----------+----------+-----------
00:00:07.276479 | 1000000 | 1000000 | 0
00:00:07.394007 | 1000000 | 1000000 | 0
00:00:06.793076 | 1000000 | 1000000 | 0
00:00:04.966482 | 1000000 | 1000000 | 0
l 如果并行任务之间的导入数据不存在交集,开4并发、每个文件导入100万数据直接导入最长耗时6.9秒。与上例对比说明并行导入速度与数据相关性有关联。
exetime | readrows | loadrows | errorrows
-----------------+----------+----------+-----------
00:00:05.236935 | 1000000 | 1000000 | 0
00:00:06.949151 | 1000000 | 1000000 | 0
00:00:06.940827 | 1000000 | 1000000 | 0
00:00:06.849905 | 1000000 | 1000000 | 0
l 创建分区表,开4并发、按分区并行导入,每个分区100万数据,仅一个全局索引最慢进程耗时9.1秒,仅一个本地索引最慢进程耗时6.1秒。
create table copy_test_part (id int, var varchar2(100), d date, t timestamp(6)) PARTITION BY RANGE(id) (PARTITION P1 VALUES LESS THAN(1000000),
PARTITION P2 VALUES LESS THAN(2000000),
PARTITION P3 VALUES LESS THAN(3000000),
PARTITION P4 VALUES LESS THAN(MAXVALUE));
create index i_copy_test_part on copy_test_part (id,t) ;
全局索引并行导入结果:
exetime | readrows | loadrows | errorrows
-----------------+----------+----------+-----------
00:00:07.032591 | 1000000 | 1000000 | 0
00:00:08.634547 | 1000000 | 1000000 | 0
00:00:08.520107 | 1000000 | 1000000 | 0
00:00:09.118802 | 1000000 | 1000000 | 0
drop index i_copy_test_part;
create index i_copy_test_part on copy_test_part (id,t) local ;
本地索引并行导入结果:
exetime | readrows | loadrows | errorrows
-----------------+----------+----------+-----------
00:00:06.166574 | 1000000 | 1000000 | 0
00:00:05.820705 | 1000000 | 1000000 | 0
00:00:05.936633 | 1000000 | 1000000 | 0
00:00:06.133898 | 1000000 | 1000000 | 0




