背景
视频回放: https://www.bilibili.com/video/BV1Yb4y1879P/
本文描述在共享存储+多个计算节点的环境中PolarDB for PostgreSQL开源版本的部署实践.
其中共享存储使用的是NBD设备, 这种环境比较容易获取, 你只要有服务器就可以搭建, 降低了学习PolarDB for PG的门槛. 但是不建议生产环境使用, 如果生产环境建议使用商业的SAN存储或者是商业的分布式共享块存储. 搭建方法与本文类似, 跳过NBD的部署使用其他共享块设备替代NBD即可.
环境:
8c32G
ecs 共享存储 , 3块网络共享盘 1台
172.25.9.67
ecs rw 1台
172.25.9.68
ecs ro 2台
172.25.9.69
172.25.9.70
PS: ECS需要能连接公网, 安装过程有一些获取cpan, yum等安装操作.
PS: PolarDB for PostgreSQL 开源地址如下: https://github.com/ApsaraDB
1、部署os (所有ECS)
参考 《network block device(nbd) 共享网络块设备 - 用于测试RAC和PolarDB for PG共享存储版》
2、部署nbd软件 (所有ECS)
参考 《network block device(nbd) 共享网络块设备 - 用于测试RAC和PolarDB for PG共享存储版》
3、export nbd 共享块设备 (共享存储ECS)
参考 《network block device(nbd) 共享网络块设备 - 用于测试RAC和PolarDB for PG共享存储版》
查看用于nbd的块设备如下, 有vdb, vdc, vdd 3块盘用于nbd
[root@iZbp10sz66ubwpqzg2ry5gZ ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 253:0 0 100G 0 disk
└─vda1 253:1 0 100G 0 part /
vdb 253:16 0 100G 0 disk
vdc 253:32 0 100G 0 disk
vdd 253:48 0 100G 0 disk
编写nbd server配置文件, 注意nbd.conf正常配置的行末尾绝对不要存在空格, 否则会启动失败.
# vi /root/nbd.conf
# This is a comment
[generic]
# The [generic] section is required, even if nothing is specified
# there.
# When either of these options are specified, nbd-server drops
# privileges to the given user and group after opening ports, but
# _before_ opening files.
# user = nbd
# group = nbd
listenaddr = 0.0.0.0
port = 1921
[export1]
exportname = /dev/vdb
readonly = false
multifile = false
copyonwrite = false
flush = true
fua = true
sync = true
[export2]
exportname = /dev/vdc
readonly = false
multifile = false
copyonwrite = false
flush = true
fua = true
sync = true
[export3]
exportname = /dev/vdd
readonly = false
multifile = false
copyonwrite = false
flush = true
fua = true
sync = true
启动nbd-server
# nbd-server -C /root/nbd.conf
# netstat -anp|grep 1921
tcp 0 0 0.0.0.0:1921 0.0.0.0:* LISTEN 1296/nbd-server
4、挂载nbd共享块设备 (rw, ro ECS)
参考 《network block device(nbd) 共享网络块设备 - 用于测试RAC和PolarDB for PG共享存储版》
重启服务器后, 必须modprobe nbd加载模块, 再挂载nbd
nbd-client 172.25.9.67 1921 -N export1 /dev/nbd0
nbd-client 172.25.9.67 1921 -N export2 /dev/nbd1
nbd-client 172.25.9.67 1921 -N export3 /dev/nbd2
[root@iZbp13tgwor95f2508zo4oZ ~]# nbd-client 172.25.9.67 1921 -N export1 /dev/nbd0
Negotiation: ..size = 102400MB
bs=1024, sz=107374182400 bytes
[root@iZbp13tgwor95f2508zo4oZ ~]# nbd-client 172.25.9.67 1921 -N export2 /dev/nbd1
Negotiation: ..size = 102400MB
bs=1024, sz=107374182400 bytes
[root@iZbp13tgwor95f2508zo4oZ ~]# nbd-client 172.25.9.67 1921 -N export3 /dev/nbd2
Negotiation: ..size = 102400MB
bs=1024, sz=107374182400 bytes
vi /etc/rc.local
nbd-client 172.25.9.67 1921 -N export1 /dev/nbd0
nbd-client 172.25.9.67 1921 -N export2 /dev/nbd1
nbd-client 172.25.9.67 1921 -N export3 /dev/nbd2
#nbd-client -d /dev/nbd0
#nbd-client -d /dev/nbd1
#nbd-client -d /dev/nbd2
5、安装pfs软件 (rw, ro ECS)
https://github.com/ApsaraDB/PolarDB-FileSystem/blob/master/Readme-CN.md
yum install -y cmake3 gcc gcc-c++ libaio-devel git unzip
ln -s /usr/bin/cmake3 /usr/bin/cmake
https://github.com/HardySimpson/zlog/releases
wget https://github.com/HardySimpson/zlog/archive/refs/tags/1.2.15.tar.gz
tar -zxvf 1.2.15.tar.gz
cd zlog-1.2.15/
make
make install
echo "/usr/local/lib" >> /etc/ld.so.conf
ldconfig
ldconfig -v|grep zlog
... ...
libzlog.so.1.2 -> libzlog.so.1.2
su - root
cd ~
# 注意分支, 建议使用stable分支或者release分支, 具体请参考对应文档说明.
git clone https://github.com/ApsaraDB/PolarDB-FileSystem
cd PolarDB-FileSystem
./autobuild.sh
... ...
[100%] Linking CXX executable ../../../bin/pfsdaemon
[100%] Built target pfsdaemon
~/PolarDB-FileSystem-master
end compile, binary's in ./bin, library's in ./lib
./install.sh
install pfsd success!
6、配置pfs (rw, ro ECS)
不同块设备可以创建不同的路径, 同一个块设备里面也可以创建不同的目录. 通过这种方式将块设备映射到目录, 从而映射到数据库的表空间.
块设备重命名
PFS仅支持特定字符开头的块设备进行访问,建议所有块设备访问节点都通过软链接使用相同名字访问共享块设备。
NBD客户端主机上执行:
ln -s /dev/nbd0 /dev/nvme0n1
ln -s /dev/nbd1 /dev/nvme0n2
ln -s /dev/nbd2 /dev/nvme0n3
块设备初始化
注意: 只在RW节点执行PFS操作来格式化共享块设备即可:
pfs -C disk mkfs nvme0n1
pfs -C disk mkfs nvme0n2
pfs -C disk mkfs nvme0n3
...
pfs mkfs succeeds!
块设备挂载
在RW, RO节点上,分别启动PFS,挂载共享盘:
/usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n1
/usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n2
/usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n3
... ...
pfsdaemon nvme0n1 start success
pfsdaemon nvme0n2 start success
pfsdaemon nvme0n3 start success
vi /etc/rc.local
/usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n1
/usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n2
/usr/local/polarstore/pfsd/bin/start_pfsd.sh -p nvme0n3
# /usr/local/polarstore/pfsd/bin/stop_pfsd.sh nvme0n1
# /usr/local/polarstore/pfsd/bin/stop_pfsd.sh nvme0n2
# /usr/local/polarstore/pfsd/bin/stop_pfsd.sh nvme0n3
7、安装polardb软件 (rw, ro ECS)
《PolarDB for PostgreSQL 开源版 - 计算存储分离版(类似Oracle RAC架构) 部署指南》
su - root
yum install -y readline-devel zlib-devel perl-CPAN bison flex git
cpan -fi Test::More IPC::Run
useradd polardb
su - polardb
cd ~
wget https://github.com/ApsaraDB/PolarDB-for-PostgreSQL/archive/refs/heads/POLARDB_11_STABLE.zip
unzip POLARDB_11_STABLE.zip
cd PolarDB-for-PostgreSQL-POLARDB_11_STABLE
./polardb_build.sh --noinit --with-pfsd
8、初始化rw节点 (rw ECS)
su - polardb
cd ~
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/initdb -D primary -E UTF8 --lc-collate=C --lc-ctype=en_US.utf8 -U polardb
# 共享存储初始化
su - root
pfs -C disk mkdir /nvme0n1/shared_data
# 这一这一步也是用root执行
/home/polardb/tmp_basedir_polardb_pg_1100_bld/bin/polar-initdb.sh /home/polardb/primary/ /nvme0n1/shared_data/
节点配置
su - polardb
cd ~/primary # 注意
打开postgresql.conf,增加以下配置项:
listen_addresses = '0.0.0.0'
port = 5432
max_connections = 1000
unix_socket_directories = '., /tmp'
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 10
shared_buffers = 8GB
maintenance_work_mem = 1GB
dynamic_shared_memory_type = posix
parallel_leader_participation = off
random_page_cost = 1.1
log_destination = 'csvlog'
logging_collector = on
log_directory = 'log'
log_truncate_on_rotation = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_lock_waits = on
log_statement = 'ddl'
log_timezone = 'Asia/Shanghai'
log_autovacuum_min_duration = 0
autovacuum_vacuum_cost_delay = 0ms
datestyle = 'iso, mdy'
timezone = 'Asia/Shanghai'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'
polar_hostid=1 # 注意
polar_enable_shared_storage_mode=on
polar_disk_name='nvme0n1' # 注意
polar_datadir='/nvme0n1/shared_data/' # 注意
polar_vfs.localfs_mode=off
shared_preload_libraries='$libdir/polar_vfs,$libdir/polar_worker'
polar_storage_cluster_name='disk'
synchronous_standby_names='replica1, replica2' # 注意
打开pg_hba.conf,增加以下配置项:
host replication polardb 172.25.9.68/32 trust
host replication polardb 172.25.9.69/32 trust
host replication polardb 172.25.9.70/32 trust
host all all 0.0.0.0/0 md5
9、启动rw (rw ECS)
启动与检查
su - polardb
启动
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/pg_ctl start -D $HOME/primary
检查
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -d postgres -c 'select version();'
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 11.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
(1 row)
配置环境变量, 方便使用:
su - polardb
vi ~/.bashrc
# 追加
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=5432
export PGDATA=/home/polardb/primary
export LANG=en_US.utf8
export PGHOME=/home/polardb/tmp_basedir_polardb_pg_1100_bld
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=127.0.0.1
export PGUSER=polardb
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
10、只读节点的流复制准备 (rw ECS)
创建相应的replication slot,用于接下来创建的只读节点的物理流复制
su - polardb
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -d postgres -c "select pg_create_physical_replication_slot('replica1');"
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -d postgres -c "select pg_create_physical_replication_slot('replica2');"
11、创建表空间(对应pfs不同路径) (rw ECS)
目前不支持多个块设备映射到不同的表空间.
所以如果你有多个盘怎么利用起来? 可以使用lvm2逻辑卷来进行管理. 在nbd server配置即可, 做成1个大的卷
pvcreate
vgcreate
lvcreate
lvextend
这里不展开了, 可以参考:
《PostgreSQL 11 1万亿 tpcb 性能测试 on 阿里云ECS + ESSD + zfs/lvm2条带 + block_size=32K》
其他, 使用PFS可以查看初始化后在共享存储中的内容:
[root@iZbp13tgwor95f2508zo4oZ ~]# pfs -C disk ls /nvme0n1/
File 1 4194304 Mon Nov 8 14:21:58 2021 .pfs-paxos
File 1 1073741824 Mon Nov 8 14:22:03 2021 .pfs-journal
Dir 1 1280 Mon Nov 8 15:23:05 2021 shared_data
total 2105344 (unit: 512Bytes)
[root@iZbp13tgwor95f2508zo4oZ ~]# pfs -C disk ls /nvme0n1/shared_data
Dir 1 512 Mon Nov 8 15:11:38 2021 base
Dir 1 7424 Mon Nov 8 15:11:39 2021 global
Dir 1 0 Mon Nov 8 15:11:39 2021 pg_tblspc
Dir 1 10368 Mon Nov 8 18:06:41 2021 pg_wal
Dir 1 896 Mon Nov 8 18:06:41 2021 pg_logindex
Dir 1 0 Mon Nov 8 15:11:39 2021 pg_twophase
Dir 1 512 Mon Nov 8 17:57:42 2021 pg_xact
Dir 1 0 Mon Nov 8 15:11:40 2021 pg_commit_ts
Dir 1 256 Mon Nov 8 15:11:40 2021 pg_multixact
Dir 1 512 Mon Nov 8 15:23:18 2021 polar_fullpage
total 0 (unit: 512Bytes)
12、初始化ro节点 (ro ECS)
RO 1:
su - polardb
节点初始化
cd ~
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/initdb -D replica1 # 注意
节点配置
cd ~/replica1 # 注意
打开postgresql.conf,增加以下配置项:
listen_addresses = '0.0.0.0'
port = 5432
max_connections = 1000
unix_socket_directories = '., /tmp'
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 10
shared_buffers = 8GB
maintenance_work_mem = 1GB
dynamic_shared_memory_type = posix
parallel_leader_participation = off
random_page_cost = 1.1
log_destination = 'csvlog'
logging_collector = on
log_directory = 'log'
log_truncate_on_rotation = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_lock_waits = on
log_statement = 'ddl'
log_timezone = 'Asia/Shanghai'
log_autovacuum_min_duration = 0
autovacuum_vacuum_cost_delay = 0ms
datestyle = 'iso, mdy'
timezone = 'Asia/Shanghai'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'
polar_hostid=2 # 注意
polar_enable_shared_storage_mode=on
polar_disk_name='nvme0n1' # 注意
polar_datadir='/nvme0n1/shared_data/' # 注意
polar_vfs.localfs_mode=off
shared_preload_libraries='$libdir/polar_vfs,$libdir/polar_worker'
polar_storage_cluster_name='disk'
打开pg_hba.conf,增加以下配置项:
host replication polardb 172.25.9.68/32 trust
host replication polardb 172.25.9.69/32 trust
host replication polardb 172.25.9.70/32 trust
host all all 0.0.0.0/0 md5
创建recovery.conf,增加以下配置项:
polar_replica='on'
recovery_target_timeline='latest'
primary_slot_name='replica1' # 注意
primary_conninfo='host=172.25.9.68 port=5432 user=polardb dbname=postgres application_name=replica1' # 注意
配置环境变量, 方便使用:
su - polardb
vi ~/.bashrc
# 追加
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=5432
export PGDATA=/home/polardb/replica1 # 注意
export LANG=en_US.utf8
export PGHOME=/home/polardb/tmp_basedir_polardb_pg_1100_bld
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=127.0.0.1
export PGUSER=polardb
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
RO 2:
节点初始化
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/initdb -D replica2 # 注意
节点配置
cd ~/replica2 # 注意
打开postgresql.conf,增加以下配置项:
listen_addresses = '0.0.0.0'
port = 5432
max_connections = 1000
unix_socket_directories = '., /tmp'
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 10
shared_buffers = 8GB
maintenance_work_mem = 1GB
dynamic_shared_memory_type = posix
parallel_leader_participation = off
random_page_cost = 1.1
log_destination = 'csvlog'
logging_collector = on
log_directory = 'log'
log_truncate_on_rotation = on
log_checkpoints = on
log_connections = on
log_disconnections = on
log_error_verbosity = verbose
log_lock_waits = on
log_statement = 'ddl'
log_timezone = 'Asia/Shanghai'
log_autovacuum_min_duration = 0
autovacuum_vacuum_cost_delay = 0ms
datestyle = 'iso, mdy'
timezone = 'Asia/Shanghai'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'
polar_hostid=3 # 注意
polar_enable_shared_storage_mode=on
polar_disk_name='nvme0n1' # 注意
polar_datadir='/nvme0n1/shared_data/' # 注意
polar_vfs.localfs_mode=off
shared_preload_libraries='$libdir/polar_vfs,$libdir/polar_worker'
polar_storage_cluster_name='disk'
打开pg_hba.conf,增加以下配置项:
host replication polardb 172.25.9.68/32 trust
host replication polardb 172.25.9.69/32 trust
host replication polardb 172.25.9.70/32 trust
host all all 0.0.0.0/0 md5
创建recovery.conf,增加以下配置项:
polar_replica='on'
recovery_target_timeline='latest'
primary_slot_name='replica2' # 注意
primary_conninfo='host=172.25.9.68 port=5432 user=polardb dbname=postgres application_name=replica2' # 注意
配置环境变量, 方便使用:
su - polardb
vi ~/.bashrc
# 追加
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=5432
export PGDATA=/home/polardb/replica2 # 注意
export LANG=en_US.utf8
export PGHOME=/home/polardb/tmp_basedir_polardb_pg_1100_bld
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=127.0.0.1
export PGUSER=polardb
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
13、启动ro节点 (ro ECS)
RO 1:
启动与检查
su - polardb
启动
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/pg_ctl start -D $HOME/replica1
检查
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -d postgres -c 'select version();'
RO 2:
启动与检查
su - polardb
启动
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/pg_ctl start -D $HOME/replica2
检查
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -p 5432 -d postgres -c 'select version();'
14、数据库使用测试 (rw, ro ECS)
实例检查和测试
部署完成后,需要进行实例检查和测试,确保主节点可正常写入数据、只读节点可以正常读取。
登录RW主节点,创建测试表并插入样例数据:
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -q -p 5432 -d postgres -c "create table t(t1 int primary key, t2 int);insert into t values (1, 1),(2, 3),(3, 3);"
登录任意RO只读节点,查询刚刚插入的样例数据:
$HOME/tmp_basedir_polardb_pg_1100_bld/bin/psql -q -p 5432 -d postgres -c "select * from t;"
t1 | t2
----+----
1 | 1
2 | 3
3 | 3
(3 rows)
15、其他检查
rw 检查复制状态:
postgres=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid | 17361
usesysid | 10
usename | polardb
application_name | replica1
client_addr | 172.25.9.69
client_hostname |
client_port | 56684
backend_start | 2021-11-08 15:34:04.711213+08
backend_xmin |
state | streaming
sent_lsn | 0/166A090
write_lsn | 0/166A090
flush_lsn | 0/166A090
replay_lsn | 0/166A090
write_lag |
flush_lag |
replay_lag |
sync_priority | 1
sync_state | sync
-[ RECORD 2 ]----+------------------------------
pid | 17363
usesysid | 10
usename | polardb
application_name | replica2
client_addr | 172.25.9.70
client_hostname |
client_port | 45858
backend_start | 2021-11-08 15:34:30.442495+08
backend_xmin |
state | streaming
sent_lsn | 0/166A090
write_lsn | 0/166A090
flush_lsn | 0/166A090
replay_lsn | 0/166A090
write_lag |
flush_lag |
replay_lag |
sync_priority | 2
sync_state | potential
postgres=# select * from pg_replication_slots ;
-[ RECORD 1 ]-------+----------
slot_name | replica1
plugin |
slot_type | physical
datoid |
database |
temporary | f
active | t
active_pid | 17361
xmin |
catalog_xmin |
restart_lsn | 0/1669C78
confirmed_flush_lsn |
-[ RECORD 2 ]-------+----------
slot_name | replica2
plugin |
slot_type | physical
datoid |
database |
temporary | f
active | t
active_pid | 17363
xmin |
catalog_xmin |
restart_lsn | 0/1669C78
confirmed_flush_lsn |
压测(优化前):
pgbench -i -s 100
只读压测可以所有节点同时开压力
pgbench -M prepared -n -r -P 1 -c 32 -j 32 -T 120 -S
读写压测在RW节点执行
pgbench -M prepared -n -r -P 1 -c 8 -j 8 -T 120
查询吞吐基本上是与节点数增加呈现线性提升(因为数据都在内存中).
压测过程
RW节点:
读写压测
pgbench -M prepared -n -r -P 1 -c 8 -j 8 -T 1200
progress: 192.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 193.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 194.0 s, 559.0 tps, lat 58.408 ms stddev 456.270
progress: 195.0 s, 1616.7 tps, lat 4.959 ms stddev 18.420
progress: 196.0 s, 2153.2 tps, lat 3.709 ms stddev 1.102
progress: 197.0 s, 646.0 tps, lat 3.635 ms stddev 1.042
progress: 198.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 199.0 s, 0.0 tps, lat 0.000 ms stddev 0.000
progress: 200.0 s, 283.1 tps, lat 104.779 ms stddev 595.861
progress: 201.0 s, 2214.0 tps, lat 3.620 ms stddev 1.123
progress: 202.0 s, 2153.0 tps, lat 3.709 ms stddev 1.096
progress: 203.0 s, 2377.8 tps, lat 3.369 ms stddev 0.977
progress: 204.0 s, 2313.3 tps, lat 3.460 ms stddev 0.987
progress: 205.0 s, 2329.9 tps, lat 3.429 ms stddev 0.976
progress: 206.0 s, 2283.7 tps, lat 3.508 ms stddev 1.052
progress: 207.0 s, 2098.3 tps, lat 3.809 ms stddev 1.895
progress: 208.0 s, 2340.1 tps, lat 3.417 ms stddev 0.977
progress: 209.0 s, 2052.9 tps, lat 3.902 ms stddev 6.712
progress: 210.0 s, 2346.1 tps, lat 3.408 ms stddev 0.965
progress: 211.0 s, 2316.9 tps, lat 3.452 ms stddev 0.994
RO 节点1,2:
只读压测
pgbench -M prepared -n -r -P 1 -c 32 -j 32 -T 1200 -S
progress: 254.0 s, 30921.5 tps, lat 1.034 ms stddev 0.892
progress: 255.0 s, 32351.0 tps, lat 0.990 ms stddev 1.660
progress: 256.0 s, 33540.0 tps, lat 0.953 ms stddev 0.859
progress: 257.0 s, 33027.0 tps, lat 0.971 ms stddev 1.040
progress: 258.0 s, 32791.1 tps, lat 0.976 ms stddev 0.631
progress: 259.0 s, 32839.6 tps, lat 0.975 ms stddev 1.837
progress: 260.0 s, 33539.4 tps, lat 0.954 ms stddev 0.527
progress: 261.0 s, 34344.5 tps, lat 0.932 ms stddev 0.984
progress: 262.0 s, 32383.9 tps, lat 0.988 ms stddev 0.618
progress: 263.0 s, 33186.0 tps, lat 0.964 ms stddev 0.512
progress: 264.0 s, 33253.3 tps, lat 0.962 ms stddev 0.497
progress: 265.0 s, 32584.0 tps, lat 0.982 ms stddev 0.466
progress: 266.0 s, 32959.8 tps, lat 0.967 ms stddev 1.310
progress: 267.0 s, 32392.0 tps, lat 0.991 ms stddev 0.701
progress: 268.0 s, 33307.8 tps, lat 0.961 ms stddev 0.505
progress: 269.0 s, 33255.8 tps, lat 0.962 ms stddev 0.589
progress: 270.0 s, 33994.6 tps, lat 0.941 ms stddev 0.449
progress: 271.0 s, 34127.2 tps, lat 0.937 ms stddev 1.441
pgbench -M prepared -n -r -P 1 -c 32 -j 32 -T 1200 -S
progress: 253.0 s, 32289.2 tps, lat 0.991 ms stddev 0.621
progress: 254.0 s, 32778.7 tps, lat 0.976 ms stddev 0.658
progress: 255.0 s, 32314.9 tps, lat 0.984 ms stddev 1.320
progress: 256.0 s, 31984.6 tps, lat 1.006 ms stddev 1.159
progress: 257.0 s, 31889.4 tps, lat 1.004 ms stddev 0.679
progress: 258.0 s, 32466.4 tps, lat 0.986 ms stddev 0.565
progress: 259.0 s, 33572.2 tps, lat 0.953 ms stddev 0.517
progress: 260.0 s, 33482.7 tps, lat 0.956 ms stddev 0.532
progress: 261.0 s, 33047.4 tps, lat 0.968 ms stddev 0.487
progress: 262.0 s, 32951.0 tps, lat 0.971 ms stddev 0.634
progress: 263.0 s, 34039.1 tps, lat 0.940 ms stddev 1.068
progress: 264.0 s, 33919.8 tps, lat 0.944 ms stddev 0.936
progress: 265.0 s, 34062.7 tps, lat 0.940 ms stddev 0.648
progress: 266.0 s, 31726.4 tps, lat 1.009 ms stddev 0.567
progress: 267.0 s, 34335.7 tps, lat 0.932 ms stddev 1.252
progress: 268.0 s, 33604.4 tps, lat 0.952 ms stddev 0.571
progress: 269.0 s, 34043.3 tps, lat 0.940 ms stddev 0.673
progress: 270.0 s, 33909.3 tps, lat 0.944 ms stddev 0.547
观察延迟, 很低
postgres=# select * from pg_stat_replication ;
-[ RECORD 1 ]----+------------------------------
pid | 17361
usesysid | 10
usename | polardb
application_name | replica1
client_addr | 172.25.9.69
client_hostname |
client_port | 56684
backend_start | 2021-11-08 15:34:04.711213+08
backend_xmin |
state | streaming
sent_lsn | 2/456D3BF8
write_lsn | 2/456D3BF8
flush_lsn | 2/456D3BF8
replay_lsn | 2/456D3BF8
write_lag | 00:00:00.000147
flush_lag | 00:00:00.000147
replay_lag | 00:00:00.000244
sync_priority | 1
sync_state | sync
-[ RECORD 2 ]----+------------------------------
pid | 17363
usesysid | 10
usename | polardb
application_name | replica2
client_addr | 172.25.9.70
client_hostname |
client_port | 45858
backend_start | 2021-11-08 15:34:30.442495+08
backend_xmin |
state | streaming
sent_lsn | 2/456D3BF8
write_lsn | 2/456D3BF8
flush_lsn | 2/456D3BF8
replay_lsn | 2/456D3BF8
write_lag | 00:00:00.000517
flush_lag | 00:00:00.000517
replay_lag | 00:00:00.00052
sync_priority | 2
sync_state | potential
分析等待事件:
集中在wal writer, 因为NBD采用tcp网络, 延迟很高是正常的. 另一方面出现0的TPS说明IO被限流了, 我这个环境用的是ECS+云盘的NBD服务器, ECS网络层、云盘都有限流措施, 可能出现0的情况. 下次可以搞个性能好的本地SSD盘服务器做nbd server.
如果是RT遇到瓶颈, 但是IOBW吞吐没有瓶颈的话, 可以使用group commit提高性能. 如果连IOBW也有瓶颈的话, 就没有办法提升性能了.
postgres=# select wait_event_type,wait_event,count(*) from pg_stat_activity group by 1,2 order by 3 desc;
wait_event_type | wait_event | count
-----------------+---------------------+-------
LWLock | WALWriteLock | 8
Activity | WalSenderMain | 2
Activity | CheckpointerMain | 1
IO | WALInitWrite | 1
Activity | AutoVacuumMain | 1
| | 1
IO | VFSFileOpen | 1
Activity | LogicalLauncherMain | 1
(8 rows)
nbd 服务器, 观察网络、磁盘吞吐:
dstat
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
0 2 80 17 0 0| 0 51M| 54M 33M| 0 0 | 61k 99k
0 2 79 18 0 0| 0 50M| 54M 35M| 0 0 | 63k 102k
0 1 89 10 0 0| 0 26M| 28M 36M| 0 0 | 45k 69k
0 1 92 7 0 0| 0 17M| 19M 31M| 0 0 | 36k 55k
0 1 92 7 0 0| 0 18M| 19M 28M| 0 0 | 36k 53k
iostat -x 1
avg-cpu: %user %nice %system %iowait %steal %idle
0.13 0.00 0.77 7.17 0.00 91.93
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vdb 0.00 1142.00 0.00 6946.00 0.00 18460.00 5.32 0.55 0.08 0.00 0.08 0.14 99.20
vdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vdd 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
监控IO性能
su - root
pfsadm mountstat nvme0n1
开启组提交
// Sets the delay in microseconds between transaction commit and flushing WAL to disk.
postgres=# alter role polardb set commit_delay ='10';
ALTER ROLE
// Sets the minimum concurrent open transactions before performing commit_delay.
postgres=# alter role polardb set commit_siblings =5;
ALTER ROLE
读写压测调到32个并发
pgbench -M prepared -n -r -P 1 -c 32 -j 32 -T 1200
没什么提升效果, 因为存储的IOBW已经打满了.
改成unlogged table后, 使用32个并发的分组提交性能提升比较明显. (注意unlogged table和临时表不支持在RO节点使用, 也就是说主节点创建的unlogged table, 在RO节点是不能查询的.)
progress: 164.0 s, 4019.8 tps, lat 7.965 ms stddev 2.940
progress: 165.0 s, 1676.1 tps, lat 8.152 ms stddev 2.298
progress: 166.0 s, 4027.8 tps, lat 12.503 ms stddev 63.648
progress: 167.0 s, 4445.0 tps, lat 7.197 ms stddev 2.822
progress: 168.0 s, 4357.3 tps, lat 7.342 ms stddev 2.804
progress: 169.0 s, 4567.0 tps, lat 7.006 ms stddev 3.003
progress: 170.0 s, 4648.9 tps, lat 6.881 ms stddev 2.792
progress: 171.0 s, 4427.9 tps, lat 7.226 ms stddev 3.254
progress: 172.0 s, 4468.9 tps, lat 7.163 ms stddev 3.111
progress: 173.0 s, 4571.2 tps, lat 7.003 ms stddev 3.023
progress: 174.0 s, 4695.8 tps, lat 6.814 ms stddev 2.940
progress: 175.0 s, 4627.2 tps, lat 6.914 ms stddev 2.644
progress: 176.0 s, 4466.9 tps, lat 7.159 ms stddev 3.036
progress: 177.0 s, 4508.4 tps, lat 7.109 ms stddev 2.564
progress: 178.0 s, 4474.7 tps, lat 7.143 ms stddev 2.683
progress: 179.0 s, 4476.1 tps, lat 7.156 ms stddev 2.609
progress: 180.0 s, 4622.0 tps, lat 6.924 ms stddev 2.884
progress: 181.0 s, 4726.6 tps, lat 6.770 ms stddev 2.798
progress: 182.0 s, 4480.2 tps, lat 7.142 ms stddev 2.644
progress: 183.0 s, 2224.2 tps, lat 6.899 ms stddev 2.849
progress: 184.0 s, 3633.6 tps, lat 13.391 ms stddev 70.771
progress: 185.0 s, 4314.0 tps, lat 7.416 ms stddev 2.274
敬请期待未来的内容:
- PolarDB PG开源版本备份、恢复、监控、诊断、优化、日常维护等实践.




