PostgreSQL中数据从一个schema快速导入另一个schema

数据库杂记 2023-07-05

前言

嗯，这是一个比较经典和实用的问题。在实际的开发或生产环境中，会经常遇到。

用户可能就有这么个需求，也有同学问起，如何快速将表数据从一个schema下边导到另一个schema下边。

分析与实作

1、单表操作

即schema abc下有一张表t，我想快速把它导到abc2下边的表t。

1postgres=# create schema abc;
2CREATE SCHEMA
3postgres=# create table abc.t(id int, col2 varchar(32));
4CREATE TABLE
5postgres=# insert into abc.t select n, 'test' || n from generate_series(1, 200000) as n;
6INSERT 0 200000

建另一个schema abc2，并创建对应的表abc2.t, 然后一次性插入数据。

1postgres=# create table abc2.t (like abc.t including all);
2CREATE TABLE
3postgres=# insert into abc2.t select * from abc.t;
4INSERT 0 200000

这里使用create table * (like )语句，including all选项，则会创建一模一样的表结构。非常方便。具体使用方法可以参照：https://www.postgresql.org/docs/15/sql-createtable.html (LIKE *
source_table* [ *
like_option* ... ]
)

在表数量不多的情况下，我们可以使用这种方式来将数据从一个schema弄到另一个schema下边。

2、多表操作

如果schema abc下边的表数量比较多，并且表的数据也很多。

我们通常使用pg_dump -Fp导出为纯文本，再用psql导入。这可以对应于同一个schema的情况。它的优点是对人可读，但是不能使用并行(-j参数指定并发)

如：

1[01:05:56-postgres@sean-rh1:/var/lib/pgsql]$ pg_dump -Fp -d postgres -n abc -j 4> /iihero/tmp/a.sql
2pg_dump: option requires an argument -- 'j'
3Try "pg_dump --help" for more information.
4
5[01:07:34-postgres@sean-rh1:/var/lib/pgsql]$ pg_dump -Fc -d postgres -n abc -j 4 > /iihero/tmp/a.dmp
6pg_dump: error: parallel backup only supported by the directory format

要使用并行，可以指定目录格式：-Fd, 下边就是4个并发下的数据导出。

1[01:09:36-postgres@sean-rh1:/var/lib/pgsql]$ pg_dump -Fd -d postgres -n abc -j 4 -f /iihero/tmp/dmpabc
2[01:09:57-postgres@sean-rh1:/var/lib/pgsql]$
3[01:09:57-postgres@sean-rh1:/var/lib/pgsql]$ tree /iihero/tmp/dmpabc
4/iihero/tmp/dmpabc
5├── 4043.dat.gz
6└── toc.dat

2.1、简单多次替换的笨方法

使用pg_dump -Fp -n 指定schema导出，再使用psql导入时，schema还是用的原来的schema。我们可以把导出的文本文件中的schema那一块儿替换一下即可。

 1[01:10:20-postgres@sean-rh1:/var/lib/pgsql]$ pg_dump -Fp -d postgres -n abc -f /iihero/tmp/a.sql
 2
 3[01:20:51-postgres@sean-rh1:/var/lib/pgsql]$ head -n 38 /iihero/tmp/a.sql
 4--
 5-- PostgreSQL database dump
 6--
 7
 8-- Dumped from database version 14.8
 9-- Dumped by pg_dump version 14.8
10
11SET statement_timeout = 0;
12SET lock_timeout = 0;
13.............
14CREATE SCHEMA abc;
15
16...........
17
18ALTER SCHEMA abc OWNER TO postgres;
19
20SET default_tablespace = '';
21
22SET default_table_access_method = heap;
23
24--
25-- Name: t; Type: TABLE; Schema: abc; Owner: postgres
26--
27
28CREATE TABLE abc.t (
29    id integer,
30    col2 character varying(32)
31.........

看着上边的内容，你会感觉到替换字符串也是需要点技巧。毕竟它是比较普通的字符串。

CREATE SCHEMA abc ==> CREATE SCHEMA abc2
ALTER SCHEMA abc ==> ALTER SCHEMA abc2
TABLE abc. ==> TABLE abc2.
COPY abc. ==> COPY abc2.

完成这三个替换之后，就可以再次导入了。

1[01:27:53-postgres@sean-rh1:/iihero/tmp] sed -i 's/TABLE abc\./TABLE abc2\./g' a.sql
2[01:27:53-postgres@sean-rh1:/iihero/tmp]$ sed -i 's/CREATE SCHEMA abc/CREATE SCHEMA abc2/g' a.sql
3[01:27:53-postgres@sean-rh1:/iihero/tmp]$ sed -i 's/ALTER SCHEMA abc /ALTER SCHEMA abc2 /g' a.sql
4[01:28:41-postgres@sean-rh1:/iihero/tmp]$ sed -i 's/COPY abc\./COPY abc2\./g' a.sql

替换完成，导入试试：

 1[01:41:03-postgres@sean-rh1:/iihero/tmp]$ psql < a.sql
 2SET
 3SET
 4SET
 5SET
 6SET
 7 set_config
 8------------
 9
10(1 row)
11
12SET
13SET
14SET
15SET
16ERROR:  schema "abc2" already exists
17ALTER SCHEMA
18SET
19SET
20ERROR:  relation "t" already exists
21ALTER TABLE
22COPY 200000

还算比较顺利。

2.2、利用唯一串的灵活替换并还原

如果允许abc这个schema临时更换名字，然后再还原，我们可以变通一下：

1postgres=# alter schema abc rename to d74c04e374fb42e587f8e69c766e3086;
2ALTER SCHEMA

再dump出来进行替换：

1[01:41:08-postgres@sean-rh1:/iihero/tmp]$ pg_dump -Fp -d postgres -n d74c04e374fb42e587f8e69c766e3086 -f /iihero/tmp/a_new.sql

文本替换，这下就方便多了，一次性替换，将"d74c04e374fb42e587f8e69c766e3086"直接替换为"abc2"，即可。

1[01:58:22-postgres@sean-rh1:/iihero/tmp]$ sed -i 's/d74c04e374fb42e587f8e69c766e3086/abc2/g' a_new.sql

再行导入到schema abc2中。

 1postgres=# drop schema abc2 cascade;
 2NOTICE:  drop cascades to 2 other objects
 3DETAIL:  drop cascades to table abc2.t
 4drop cascades to table abc2.t2
 5DROP SCHEMA
 6postgres=# \q
 7[01:59:59-postgres@sean-rh1:/iihero/tmp]$ psql < a_new.sql
 8SET
 9SET
10SET
11SET
12SET
13 set_config
14------------
15
16(1 row)
17
18SET
19SET
20SET
21SET
22CREATE SCHEMA
23ALTER SCHEMA
24SET
25SET
26CREATE TABLE
27ALTER TABLE
28COPY 200000

最后再将abc这个schema名字还原一下：

1[02:00:05-postgres@sean-rh1:/iihero/tmp]$ psql -c "alter schema d74c04e374fb42e587f8e69c766e3086 rename to abc;"
2ALTER SCHEMA

这个方法相对比较灵活也不容易出错，充分利用GUID独一无二的特性，替换也不会出错。

3、直接生成建表及insert脚本

前边单表操作里头有一个比较好的思路，我直接生成相关建表语句和insert语句，基本上也能完成既定目标。这里假定abc这个schema下边有t和t2两张表。基本情况如下：

 1postgres=# \d
 2        List of relations
 3 Schema | Name | Type  |  Owner
 4--------+------+-------+----------
 5 public | t1   | table | postgres
 6
 7postgres=# create table abc.t2 (like abc.t including all);
 8CREATE TABLE
 9
10postgres=# insert into abc.t2 select * from abc.t;
11INSERT 0 200000

下边我们试图生成相关建表以及insert语句的脚本：

1postgres=# SELECT 'create table abc2.' || table_name || ' (like ' || table_schema || '.' || table_name || ' including all);' AS table_ddl FROM information_schema.tables WHERE table_schema = 'abc';
2                     table_ddl
3---------------------------------------------------
4 create table abc2.t (like abc.t including all);
5 create table abc2.t2 (like abc.t2 including all);
6(2 rows)

我们确实可以生成DDL的建表语句。利用psql本身的开关：

 1postgres=# SELECT 'create table abc2.' || table_name || ' (like ' || table_schema || '.' || table_name || ' including all);' AS table_ddl FROM information_schema.tables WHERE table_schema = 'abc';
 2                     table_ddl
 3---------------------------------------------------
 4 create table abc2.t (like abc.t including all);
 5 create table abc2.t2 (like abc.t2 including all);
 6(2 rows)
 7
 8postgres=# \gexec
 9CREATE TABLE
10CREATE TABLE

\gexec可以执行前一个查询的结果。这不，两个表都建上了。

我们再生成insert语句：

 1postgres=# SELECT 'INSERT INTO abc2.' || table_name || ' SELECT * FROM  '|| table_schema || '.' || table_name || ';' AS insert_sql FROM information_schema.tables WHERE table_schema = 'abc';
 2                 insert_sql
 3--------------------------------------------
 4 INSERT INTO abc2.t SELECT * FROM  abc.t;
 5 INSERT INTO abc2.t2 SELECT * FROM  abc.t2;
 6(2 rows)
 7
 8postgres=# \gexec
 9INSERT 0 200000
10INSERT 0 200000

这下就把数据也都导进去了。速度应该还可以的。

这里唯一有点缺陷的地方就是当众多的源表之间有依赖关系的时候，就需要调整一个建表和INSERT语句的顺序，父表在前，子表在后。这个需要你自己去独立思考并试验一下。

总结：

提供了三种方法快速导入到不同的schema的方法，单表时，直接语句生成并插入。多表时，可以生成建表及insert语句，也可以pg_dump出来，然后修改导入脚本中的schema名字，至于修改名字，也有笨办法和灵活替换的办法。

其实，我倒希望PostgreSQL能为psql或pg_restore提供一个选项，比如指定新的schema，然后一键导入，岂不快哉。

postgresql schema

文章转载自数据库杂记，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。