作者
digoal
日期
2021-12-09
标签
PostgreSQL , 分区表 , pg_rewrite
1、解决了从非分区表变更为分区表的长时间锁问题
2、需要使用logical replication, 从非分区表增量将数据复制到分区表
3、只需要短暂的排他锁, 在同步完数据后用于切换表名. 有个参数控制切换表名阶段的锁超时, 例如设置为100毫秒, 重试3次还没有拿到这个锁的话, 报错. 代码在本文末尾.
4、非分区表一定要有PK
5、注意: 分区表建议确认约束和非分区表保持一致, 例如check, not null, FK, default value 等约束
6、serial字段也记得要设置妥当
https://github.com/cybertec-postgresql/pg_rewrite
PostgreSQL table partitioning is by far one of the most popular new PostgreSQL features. Many people look to partitioned tables as a way to improve performance and broaden scalability. However, partitioning comes with a little problem: How can you partition an existing table without locking up the database? The answer is: pg_rewrite can help you with PostgreSQL table partitioning. Here you will learn how to implement pg_rewrite to help you solve partitioning problems in the most elegant way possible.
Installing pg_rewrite
pg_rewrite is Open Source and can be quickly and easily downloaded from our Github profile. Cloning the repository works as shown in the next listing:
hs@fedora src]$ git clone https://github.com/cybertec-postgresql/pg_rewrite.git
Cloning into 'pg_rewrite'...
remote: Enumerating objects: 22, done.
remote: Counting objects: 100% (22/22), done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 22 (delta 3), reused 22 (delta 3), pack-reused 0
Receiving objects: 100% (22/22), 44.51 KiB | 1.78 MiB/s, done.
Resolving deltas: 100% (3/3), done.
Once this is done, we can enter the directory:
[hs@fedora src]$ cd pg_rewrite/
To build the code, we have to make sure that the correct version of pg_config is in the path. In my case, I have a working binary in my home directory, so I can effortlessly build the code:
[hs@fedora pg_rewrite]$ which pg_config
~/pg14/bin/pg_config
If pg_config is in the path, all you have to do is to run “make install” to compile and install the code:
[hs@fedora pg_rewrite]$ make install
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -O2 -fPIC -I. -I./ -I/home/hs/pg14/include/postgresql/server -I/home/hs/pg14/include/postgresql/internal -D_GNU_SOURCE -c -o pg_rewrite.o pg_rewrite.c
...
/usr/bin/install -c -m 644 concurrent.bc '/home/hs/pg14/lib/postgresql/bitcode'/pg_rewrite/./
cd '/home/hs/pg14/lib/postgresql/bitcode' && /usr/bin/llvm-lto -thinlto -thinlto-action=thinlink -o pg_rewrite.index.bc pg_rewrite/pg_rewrite.bc pg_rewrite/concurrent.bc
The next thing to do is to adjust postgresql.conf file. pg_rewrite has to be loaded as a library when the server starts. Otherwise, it won’t work. Configuring PostgreSQL to load pg_rewrite works as follows:
wal_level = logical
max_replication_slots = 1
# ... or add 1 to the current value.
shared_preload_libraries = 'pg_rewrite'
# ... or add the library to the existing ones.
The wal_level has to be adjusted to make sure that the WAL contains enough information for logical decoding to work. On top of that, you’ll need a sufficient number of replication slots to safely run logical decoding. pg_rewrite will need one slot to operate.
After these changes are done, restart your server and verify that the variables are properly set:
test=# SHOW shared_preload_libraries;
shared_preload_libraries
--------------------------
pg_rewrite
(1 row)
Finally, you have to enable the pg_rewrite extension to ensure that the partition_table function is available:
test=# CREATE EXTENSION pg_rewrite;
CREATE EXTENSION
test=# \x
Expanded display is on.
test=# \df *partition_table*
List of functions
-[ RECORD 1 ]-------+---------------------------------------------------
Schema | public
Name | partition_table
Result data type | void
Argument data types | src_table text, dst_table text, src_table_new text
Type | func
Voilà, you have a working version of pg_rewrite. Let’s take a look at how we can use it.
Creating a sample table
After installing pg_rewrite, we can create a table – which we’ll want to partition later on:
test=# CREATE TABLE t_number (x numeric);
CREATE TABLE
test=# INSERT INTO t_number SELECT random() - 0.5 FROM generate_series(1, 1000000);
INSERT 0 1000000
For the sake of simplicity, the table contains a couple of random values. Some are greater than zero and some are less than zero. Let’s verify that:
test=# SELECT x < 0, count(*) FROM t_number GROUP BY 1;
?column? | count
----------+--------
f | 499729
t | 500271
(2 rows)
The data seems correct, so we can move to the next step:
test=# ALTER TABLE t_number ADD PRIMARY KEY (x);
ALTER TABLE
PostgreSQL can only identify a row if there is a primary key. Otherwise, we’ll have problems and the code won’t work. So it’s important to ensure that there are indeed primary keys.
Creating table partitions in PostgreSQL
Before we can partition the table, we have to come up with a partitioning scheme. We want to use that scheme for our database. In my example, all I’m doing is putting negative values into one table, and positive values into another. Here’s how it works:
test=# CREATE TABLE t_part_number (x numeric PRIMARY KEY) PARTITION BY RANGE (x);
CREATE TABLE
test=# CREATE TABLE t_part_number_neg PARTITION OF t_part_number FOR VALUES FROM (MINVALUE) TO (0);
CREATE TABLE
test=# CREATE TABLE t_part_number_pos PARTITION OF t_part_number FOR VALUES FROM (0) TO (MAXVALUE);
CREATE TABLE
Make sure that the primary keys and constraints are identical. Otherwise, pg_rewrite will produce an error. Remember: we want to repartition the table – we don’t want anything else to happen.
PostgreSQL table partitioning with almost no locking
Now that all the ingredients are in place, we can rewrite the data:
test=# SELECT partition_table('t_number', 't_part_number', 't_old_number');
partition_table
-----------------
(1 row)
t_number is the source table which has to be rewritten. t_part_number is the freshly partitioned table which we want to use as intermediate storage. t_old_number is the name of the original table which will be renamed. In case something goes wrong, the old table will still be found. The advantage is that nothing can go wrong. The downside is that more storage is needed. However, that is the case DURING repartitioning anyway so it does not matter if the source table is deleted or not. We need twice the space anyway.
The final structure will look as follows:
test=# \d+
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+-------------------+-------------------+-------+-------------+---------------+---------+-------------
public | t_number | partitioned table | hs | permanent | | 0 bytes |
public | t_old_number | table | hs | permanent | heap | 42 MB |
public | t_part_number_neg | table | hs | permanent | heap | 21 MB |
public | t_part_number_pos | table | hs | permanent | heap | 21 MB |
(4 rows)
As you can see, the original table is still in place using up 42 MB of storage. Our two partitions will need the same amount of space.
pg_rewrite is a good method to turn existing tables into PostgreSQL partitions. It only needs a short lock, which makes it superior to the long and extensive table locking needed by the PostgreSQL core.
pg_rewrite aims to be a set of tools to perform maintenance tasks which
require a table to be rewritten (i.e. the table data to be copied to a new
storage) and which are expected to limit the access to the table as little as
possible.
Installation
PostgreSQL server version 13 or later is required.
-
Set PG_CONFIG environment variable to point to pg_config command of your
PostgreSQL installation. -
make
-
sudo make install
-
Apply the following settings to postgresql.conf:
wal_level = logical
max_replication_slots = 1 # ... or add 1 to the current value.
shared_preload_libraries = 'pg_rewrite' # ... or add the library to the existing ones.
-
Start the PG cluster.
-
As a superuser, run
CREATE EXTENSION pg_rewrite;
Currently the extension only contains one function, partition_table(). It
turns a non-partitioned table into partitioned one.
partition_table()
Assuming you have a table defined like this
CREATE TABLE measurement (
id serial,
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int,
PRIMARY KEY(id, logdate)
);
you need to create a partitioned table having the same columns and data types:
CREATE TABLE measurement_aux (
id serial,
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int,
PRIMARY KEY(id, logdate)
) PARTITION BY RANGE (logdate);
Then create partitions for all the rows currently present in the "measurement"
table, and also for the data that might be inserted during processing:
CREATE TABLE measurement_y2006m02 PARTITION OF measurement_aux
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
CREATE TABLE measurement_y2006m03 PARTITION OF measurement_aux
FOR VALUES FROM ('2006-03-01') TO ('2006-04-01');
...
Note that it's essential that both the source ("measurement") and destination
("measurement_aux") table have an identity index - the easiest way to ensure
this is to create PRIMARY KEY or UNIQUE constraint. Also note that the key
(i.e. column list) of the identity index of the source and destination table
must be identical. The identity is needed to process data changes that
applications make while data is being copied from the source to the
destination table.
Also, unless you've set rewrite.check_constraints to false, make sure that
the destination table has all the constraints that the source table has.
Then, in order to copy the data into the destination table, run the
partition_table() function and pass it both the source and destination table,
as well as a new table name for the source table. For example:
SELECT partition_table('measurement', 'measurement_aux', 'measurement_old');
The call will copy data from "measurement" to "measurement_aux", then it will
lock "measurement" exclusively and rename 1) "measurement" to
"measurement_old", 2) "measurement_aux" to "measurement". Thus "measurement"
ends up to be the partitioned table, while "measurement_old" is the original,
non-partitioned table.
Please consider the following before you try to use the function:
* Foreign table partitions are not supported.
* It's not expected that the table that you try to partition is referenced
by any foreign key. The problem is that the destination table is
initially empty, so you won't be able to create the foreign keys that
reference it.
Configuration
Following is description of the configuration variables that affect behavior
of the functions of this extension.
- rewrite.check_constraints
Before copying of the data starts, it's checked whether the destination table
has the same constraints as the source table, and throws an ERROR if a
difference is found. The point is that due to (accidentally) missing
constraint on the destination table, data that violate constraints on the
source table would be allowed to appear in the destination table as soon as
the processing is finished. Even an extra constraint on the destination table
is a problem because the extension only assumes that all the data it copies do
satisfy constraints on the source table, however it does not validate them
against the additional constraints on the destination table.
By setting rewrite.check_constraints to false, the user can turn off the
constraint checks. Please be very cautions before you do so.
The default value is true.
- rewrite.max_xlock_time
Although the table being processed is available for both read and write
operations by other transactions most of the time, an exclusive lock is needed
to finalize the processing (i.e. to process the remaining concurrent changes
and to rename the tables). If the extension function seems to block access to
tables too much, consider setting "rewrite.max_xlock_time" GUC parameter. For
example
SET rewrite.max_xlock_time TO 100;
tells that the exclusive lock shouldn't be held for more than 0.1 second (100
milliseconds). If more time is needed for the final stage, the particular
function releases the exclusive lock, processes the changes committed by the
other transactions in between and tries the final stage again. Error is
reported if the lock duration is exceeded a few more times. If that happens,
you should either increase the setting or try to process the problematic table
later, when the write activity is lower.
The default value is 0, meaning that the final stage can take as much time as
it needs.
Notes on Concurrency
-
The extension does not prevent other transactions from altering table at
certain stages of the processing. If a "disruptive command" (i.e. ALTER TABLE)
manages to commit before the processing could finish, the table processing
aborts and all changes done are rolled back. -
The functions of this extension allow for MVCC-unsafe behavior described in
the first paragraph of [1].
Locking
Since the table renaming requires an exclusive lock, applications won't be
able to access the table that you try to process for very short time. However,
if a significant amount of changes took place in the source table while the
extension was waiting for the lock, the outage will take proportionally longer
time. The point is that those changes need to be propagated to the destination
table before the exclusive lock can be released.
References
[1] https://www.postgresql.org/docs/13/static/mvcc-caveats.html
Authors
The extension was developed by CYBERTEC PostgreSQL International GmbH
(https://www.cybertec-postgresql.com). Please use this page to report issues:
https://github.com/cybertec-postgresql/pg_rewrite/issues
如果你想100%成功, 可以修改一下这个部分, 例如sleep后再重试, 或者terminate lock再重试.
/*
* Try a few times to perform the stage that requires exclusive lock on
* the source relation.
*
* XXX Not sure the number of attempts should be configurable. If it fails
* several times, admin should either increase partition_max_xlock_time or
* disable it.
*/
source_finalized = false;
for (i = 0; i < 4; i++)
{
if (perform_final_merge(estate, mtstate, proute,
relid_src, indexes_src, nindexes,
rel_dst, ident_key, ident_key_nentries,
cat_state, ctx, partitions, conv_map))
{
source_finalized = true;
break;
}
else
elog(DEBUG1,
"pg_rewrite exclusive lock on table %u had to be released.",
relid_src);
}
if (!source_finalized)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_IN_USE),
errmsg("pg_rewrite: \"max_xlock_time\" prevented partitioning from completion")));
期望 PostgreSQL 增加什么功能?
类似Oracle RAC架构的PostgreSQL已开源: 阿里云PolarDB for PostgreSQL云原生分布式开源数据库!
PostgreSQL 解决方案集合
德哥 / digoal's github - 公益是一辈子的事.





