暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 17 流复制中开启逻辑复制槽同步,以及逻辑槽的故障转移

原创 之乎者也 20小时前
67


PostgreSQL 17 开始支持逻辑复制槽的流复制同步,在流复制故障转移之后,保证了逻辑复制的正常工作,本文以此为契机,做一个逻辑复制槽故障转移的全面测试。

0,测试环境

主库+发布:192.168.152.136

备库+逻辑槽同步:192.168.152.137

订阅库:192.168.152.138

•PostgreSQL 版本:17.6

•目标:主库宕机后,备库提升为主库,逻辑复制可继续工作

1,PostgreSQL 一键安装

参考:https://www.cnblogs.com/wy123/p/19075391

2,PostgreSQL流复制一键配置

#!/bin/bash
# PostgreSQL standby initialization script (simplified)
# Can be run as root; PostgreSQL commands run as postgres user
# All logs are printed to the console

# ========== Configuration ==========
PRIMARY_HOST="192.168.152.136"
PRIMARY_PORT=9700
# Primary superuser credentials for creating replication user
PRIMARY_SUPERUSER="postgres"
PRIMARY_SUPERUSER_PASS="pg_password_001"
# repl user
REPL_USER="repl_user"
REPL_PASS="repl_user_pwd"
SLOT_NAME="pgstandby_slot01"
# make sure primary pg_hba.conf for replication user
# host 	replication     replica  		0.0.0.0/0 	md5

# replica info
REPLICA_HOST="192.168.152.137"
REPLICA_PORT=9700
PGDATA="/usr/local/pgsql17/pg9700/data"
PG_BIN="/usr/local/pgsql17/server/bin"
# replica instance name 
REPLICA_INSTANCE_NAME="postgresql9700"



# ========== Start Execution ==========
echo "=== PostgreSQL Standby Initialization Start ==="
date

# 1. Check if the target port is already in use
if ss -tln | grep -q ":$REPLICA_PORT"; then
  echo "************ Error: Port $REPLICA_PORT is already in use. Please stop any running instance using this port."
  exit 1
fi

# 2. Check if PostgreSQL binary directory exists
if [ ! -d "$PG_BIN" ]; then
  echo "************ Error: PostgreSQL binary directory $PG_BIN does not exist."
  exit 1
fi

# 3. Check if data directory exists; if not, create it
if [ ! -d "$PGDATA" ]; then
  echo "############ Data directory $PGDATA does not exist. Creating it..."
  mkdir -p "$PGDATA"
  if [ $? -ne 0 ]; then
    echo "************Failed to create data directory $PGDATA. Exiting."
    exit 1
  fi
fi

# 4. Check if data directory already contains files
if [ "$(ls -A "$PGDATA" 2>/dev/null)" ]; then
  echo "************ Warning: Data directory $PGDATA is not empty."
  BACKUP_DIR="${PGDATA}_bak_$(date +%F_%H%M%S)"
  echo "############ Backing up existing data directory to  $BACKUP_DIR"
  mv "$PGDATA" "$BACKUP_DIR"
  mkdir -p "$PGDATA"
fi

# Check and create replication user on primary if it doesn't exist
echo "############ Checking replication user $REPL_USER on primary..."

# Set password for superuser if provided
if [ -n "$PRIMARY_SUPERUSER_PASS" ]; then
  export PGPASSWORD="$PRIMARY_SUPERUSER_PASS"
else
  unset PGPASSWORD
fi

# Check if user exists
USER_EXISTS=$($PG_BIN/psql -h "$PRIMARY_HOST" -p "$PRIMARY_PORT" -U "$PRIMARY_SUPERUSER" -d postgres -tAc "SELECT 1 FROM pg_roles WHERE rolname='$REPL_USER';" 2>/dev/null)

if [ "$USER_EXISTS" != "1" ]; then
  echo "############ Replication user $REPL_USER does not exist. Creating it..."
  $PG_BIN/psql -h "$PRIMARY_HOST" -p "$PRIMARY_PORT" -U "$PRIMARY_SUPERUSER" -d postgres -c "CREATE USER $REPL_USER WITH REPLICATION LOGIN ENCRYPTED PASSWORD '$REPL_PASS';" > /dev/null 2>&1
  if [ $? -ne 0 ]; then
    echo "************ Failed to create replication user $REPL_USER. Exiting."
    exit 1
  fi
  echo "############ Replication user $REPL_USER created successfully."
else
  echo "############ Replication user $REPL_USER already exists."
fi

# 5. Check if the replication slot exists on primary, create if missing
echo "############ Checking replication slot $SLOT_NAME on primary..."
export PGPASSWORD="$REPL_PASS"
SLOT_EXISTS=$($PG_BIN/psql "host=$PRIMARY_HOST port=$PRIMARY_PORT user=$REPL_USER dbname=postgres" -tAc "SELECT 1 FROM pg_replication_slots WHERE slot_name='$SLOT_NAME';")
if [ "$SLOT_EXISTS" != "1" ]; then
  echo "############ Replication slot $SLOT_NAME does not exist. Creating it..."
  psql "host=$PRIMARY_HOST port=$PRIMARY_PORT user=$REPL_USER dbname=postgres" -c "SELECT pg_create_physical_replication_slot('$SLOT_NAME');"
else
  echo "############ Replication slot $SLOT_NAME already exists."
fi


# 6. Perform base backup from primary (as postgres user)
echo "############ Performing pg_basebackup from primary node..."
export PGPASSWORD="$REPL_PASS"
$PG_BIN/pg_basebackup -h $PRIMARY_HOST -p $PRIMARY_PORT -U $REPL_USER -R -P -v --slot=$SLOT_NAME -D $PGDATA
if [ $? -ne 0 ]; then
  echo "************ pg_basebackup failed. Exiting."
  exit 1
fi

# 7. Fix permissions (can be done as root)
chown -R postgres:postgres "$PGDATA"
chmod -R 700 "$PGDATA"

# 8. Start the standby (as postgres user)
echo "############ Starting PostgreSQL standby..."
#systemctl restart "$REPLICA_INSTANCE_NAME"

echo "=== Standby Initialization Complete ==="
date

===

3,相关参数修改

主节点192.168.152.136参数

wal_level = logical						#这个只能为logical,因为后续要配置逻辑复制
max_replication_slots = 10					#默认为10,根据具体的情况适当增大
max_wal_senders = 10						#默认为10,根据具体的情况适当增大
synchronous_standby_names = '*'					#同步从库的名字,这里设置为*,代表所有备库都是同步模式,当然这里只有一个备库
synchronized_standby_slots = 'pgstandby_slot01'			#这个名字是流复制配置中,流复制的复制槽名称

备节点192.168.152.137参数

hot_standby_feedback = on;					#默认关闭的,必须打开,看似这个参数与复制槽无关,但是经测试之后必须打开,如果不打开,即便是打开sync_replication_slots ,,会出现流复制正常,但是逻辑复制槽无法复制
sync_replication_slots = on;					#默认关闭的,必须打开
#synchronous_standby_names = '*'				#因为备库是pgbasebackup备份过来的,跟主库参数一样,所以这里备库的synchronous_standby_names要注释掉
#synchronized_standby_slots = 'pgstandby_slot01'		#因为备库是pgbasebackup备份过来的,跟主库参数一样,所以这里备库的synchronized_standby_slots要注释掉

特别说明:
1,sync_replication_slots默认值为off,打开后会同步逻辑复制槽。
2,hot_standby_feedback默认值为off,打开后,备库会将自己的查询快照反馈给主库,防止主库因 VACUUM 清理掉备库正在使用的数据版本而导致备库查询报错或冲突。这个参数看似无关,但是必须打开

两个参数都要打开,经笔者验证,如果只开启sync_replication_slots,流复制会正常运行,但是不会同步逻辑复制槽。


对于备节点上的postgresql.auto.conf,也即流复制连接信息。
因为 PG17开始支持的逻辑复制槽同步功能, slot sync worker 做的事情不是 WAL 级别协议,而是同时需要 replication protocol + 普通 SQL 连接能力,所以需要手动修改primary_conninfo,在连接串中指定一个dbname,默认的连接串为

# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=repl_user password=repl_user_pwd channel_binding=prefer host=192.168.152.136 port=9700 sslmode=prefer sslnegotiation=postgres sslcompression=0 sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable'
primary_slot_name = 'pgstandby_slot01'

需要再primary_conninfo中指定一个dbname=postgres参数,如下

# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=repl_user password=repl_user_pwd channel_binding=prefer host=192.168.152.136 port=9700 dbname=postgres sslmode=prefer sslnegotiation=postgres sslcompression=0 sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable'
primary_slot_name = 'pgstandby_slot01'

否则备库启动的时候会类似如下的错误: replication slot synchronization requires “dbname” to be specified in “primary_conninfo”

2026-06-29 08:37:37.106 UTC [2241336] LOG:  starting PostgreSQL 17.6 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0, 64-bit
2026-06-29 08:37:37.106 UTC [2241336] LOG:  listening on IPv4 address "0.0.0.0", port 9700
2026-06-29 08:37:37.106 UTC [2241336] LOG:  listening on IPv6 address "::", port 9700
2026-06-29 08:37:37.108 UTC [2241336] LOG:  listening on Unix socket "/tmp/.s.PGSQL.9700"
2026-06-29 08:37:37.110 UTC [2241340] LOG:  database system was interrupted; last known up at 2026-06-29 08:36:52 UTC
2026-06-29 08:37:37.117 UTC [2241340] LOG:  starting backup recovery with redo LSN 0/A000028, checkpoint LSN 0/A000080, on timeline ID 1
2026-06-29 08:37:37.117 UTC [2241340] LOG:  entering standby mode
2026-06-29 08:37:37.122 UTC [2241340] LOG:  redo starts at 0/A000028
2026-06-29 08:37:37.123 UTC [2241340] LOG:  completed backup recovery with redo LSN 0/A000028 and end LSN 0/A000120
2026-06-29 08:37:37.123 UTC [2241340] LOG:  consistent recovery state reached at 0/A000120
2026-06-29 08:37:37.123 UTC [2241336] LOG:  database system is ready to accept read-only connections
2026-06-29 08:37:37.125 UTC [2241342] LOG:  slot sync worker started
2026-06-29 08:37:37.127 UTC [2241342] ERROR:  replication slot synchronization requires "dbname" to be specified in "primary_conninfo"
2026-06-29 08:37:37.299 UTC [2241341] LOG:  started streaming WAL from primary at 0/B000000 on timeline 1
2026-06-29 08:38:37.148 UTC [2241734] LOG:  slot sync worker started
2026-06-29 08:38:37.149 UTC [2241734] ERROR:  replication slot synchronization requires "dbname" to be specified in "primary_conninfo"
2026-06-29 08:39:37.205 UTC [2242122] LOG:  slot sync worker started
2026-06-29 08:39:37.206 UTC [2242122] ERROR:  replication slot synchronization requires "dbname" to be specified in "primary_conninfo"
2026-06-29 08:40:37.270 UTC [2242510] LOG:  slot sync worker started
2026-06-29 08:40:37.271 UTC [2242510] ERROR:  replication slot synchronization requires "dbname" to be specified in "primary_conninfo"
2026-06-29 08:41:37.333 UTC [2242905] LOG:  slot sync worker started
2026-06-29 08:41:37.333 UTC [2242905] ERROR:  replication slot synchronization requires "dbname" to be specified in "primary_conninfo"

启动从库,流复制自动开启,可以看到walreceiver进程启动,说明目前节点是一个备用节点

root@ubuntu27:/usr/local/pg_install# bash pg_streaming_replication_setup.sh
=== PostgreSQL Standby Initialization Start ===
Mon Jun 29 05:30:37 PM CST 2026
############ Data directory /usr/local/pgsql17/pg9700/data does not exist. Creating it...
############ Checking replication user repl_user on primary...
############ Replication user repl_user already exists.
############ Checking replication slot pgstandby_slot01 on primary...
############ Replication slot pgstandby_slot01 already exists.
############ Performing pg_basebackup from primary node...
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/E000060 on timeline 1
pg_basebackup: starting background WAL receiver
30961/30961 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/E000158
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed
############ Starting PostgreSQL standby...
=== Standby Initialization Complete ===
Mon Jun 29 05:30:40 PM CST 2026
root@ubuntu27:/usr/local/pg_install#
root@ubuntu27:/usr/local/pg_install# systemctl restart postgresql9700
root@ubuntu27:/usr/local/pg_install#
root@ubuntu27:/usr/local/pg_install# systemctl status postgresql9700
● postgresql9700.service - PostgreSQL database server
     Loaded: loaded (/etc/systemd/system/postgresql9700.service; enabled; preset: enabled)
     Active: active (running) since Mon 2026-06-29 17:33:34 CST; 8s ago
    Process: 2263119 ExecStart=/usr/local/pgsql17/server/bin/pg_ctl start -D ${PGDATA} -s -w -t ${PGSTARTTIMEOUT} (code=exited, status=0/SUCCESS)
   Main PID: 2263123 (postgres)
      Tasks: 6 (limit: 4543)
     Memory: 63.8M (peak: 64.0M)
        CPU: 124ms
     CGroup: /system.slice/postgresql9700.service
             ├─2263123 /usr/local/pgsql17/server/bin/postgres -D /usr/local/pgsql17/pg9700/data
             ├─2263130 "postgres: logger "
             ├─2263131 "postgres: checkpointer "
             ├─2263132 "postgres: background writer "
             ├─2263133 "postgres: startup recovering 00000001000000000000000F"
             └─2263139 "postgres: walreceiver streaming 0/F000060"
Jun 29 17:33:34 ubuntu27 systemd[1]: Starting postgresql9700.service - PostgreSQL database server...
Jun 29 17:33:34 ubuntu27 pg_ctl[2263123]: 2026-06-29 09:33:34.306 UTC [2263123] LOG:  redirecting log output to logging collector process
Jun 29 17:33:34 ubuntu27 pg_ctl[2263123]: 2026-06-29 09:33:34.306 UTC [2263123] HINT:  Future log output will appear in directory "/usr/local/pgsql17/pg9700/log".
Jun 29 17:33:34 ubuntu27 systemd[1]: Started postgresql9700.service - PostgreSQL database server.
root@ubuntu27:/usr/local/pg_install#

订阅节点192.168.152.138参数

无特殊要求

4,流复制状态检查

192.168.152.136和192.168.152.137之间的流复制已完成同步,检查流复制的状态

主节点192.168.152.136上的状态

root@ubuntu26:/usr/local/pg_install# psql -h 127.0.0.1 -p 9700  postgres postgres
Password for user postgres:
psql (17.6)
Type "help" for help.
postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 3019576
usesysid         | 16388
usename          | repl_user
application_name | walreceiver
client_addr      | 192.168.152.137
client_hostname  |
client_port      | 45264
backend_start    | 2026-06-29 09:03:23.143699+00
backend_xmin     |
state            | streaming
sent_lsn         | 0/D000060
write_lsn        | 0/D000060
flush_lsn        | 0/D000060
replay_lsn       | 0/D000060
write_lag        | 00:00:00.022827
flush_lag        | 00:00:00.023348
replay_lag       | 00:00:00.154594
sync_priority    | 1
sync_state       | sync
reply_time       | 2026-06-29 09:03:23.182874+00

postgres=#

备节点192.168.152.137上的状态

root@ubuntu27:/usr/local/pg_install# psql -h 127.0.0.1 -p 9700  postgres postgres
Password for user postgres:
psql (17.6)
Type "help" for help.

postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid                   | 2251271
status                | streaming
receive_start_lsn     | 0/D000000
receive_start_tli     | 1
written_lsn           | 0/D000060
flushed_lsn           | 0/D000060
received_tli          | 1
last_msg_send_time    | 2026-06-29 09:04:53.210286+00
last_msg_receipt_time | 2026-06-29 09:04:53.21439+00
latest_end_lsn        | 0/D000060
latest_end_time       | 2026-06-29 09:03:23.156472+00
slot_name             | pgstandby_slot01
sender_host           | 192.168.152.136
sender_port           | 9700
conninfo              | user=repl_user password=******** channel_binding=prefer dbname=replication host=192.168.152.136 port=9700 fallback_application_name=walreceiver sslmode=prefer sslnegotiation=postgres sslcompression=0 sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable

postgres=#

===

5,创建发布

创建逻辑复制用户

###创建复制用户
--drop user app_replication;
create user logic_repl_user with password 'logic_repl_user';
alter role logic_repl_user with replication;

###给复制用户授权,切换到目标数据库db01下执行
GRANT CONNECT ON DATABASE db01 TO logic_repl_user;
GRANT USAGE ON SCHEMA public TO logic_repl_user;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO logic_repl_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO logic_repl_user;

创建发布

create database db01;
--切换到db01下面

--创建测试表
create table test_slot_failover
(
	c1 int generated always as identity primary key,
	c2 timestamptz 
);

--创建逻辑复制槽,注意最后一个参数是是否failover,默认是false,要明确指出failover为true
select * from pg_create_logical_replication_slot('logic_repl_slot01','pgoutput', false,false, true);
slot_name        |lsn      |
-----------------+---------+
logic_repl_slot01|0/31A9BB0|

--创建发布
create publication mysqlpg_db01_pulication for table test_slot_failover with (publish = 'insert,update');

Updated Rows	0
Execute time	0.008s
Start time	Mon Jun 29 14:12:15 CST 2026
Finish time	Mon Jun 29 14:12:15 CST 2026
Query	create publication mysqlpg_db01_pulication for table test_slot_failover with (publish = 'insert,update')

查看逻辑复制槽的状态

主节点上

postgres=# SELECT slot_name, database, failover, synced,active  FROM pg_replication_slots;
     slot_name     | database | failover | synced | active
-------------------+----------+----------+--------+--------
 pgstandby_slot01  |          | f        | f      | t											#注意这是物理复制槽,只有主节点上有
 logic_repl_slot01 | db01     | t        | f      | f
(2 rows)

备节点上

postgres=#
postgres=# SELECT slot_name, database, failover, synced,active  FROM pg_replication_slots;
     slot_name     | database | failover | synced | active
-------------------+----------+----------+--------+--------
 logic_repl_slot01 | db01     | t        | t      | f										#可以发现逻辑复制槽已经被复制到备节点
(1 row)

===

6,创建订阅

订阅服务器上创建测试库db01,当然不一定要求库名一致。

--切换到db01下创建订阅表
create table test_slot_failover
(
	c1 int generated always as identity primary key,
	c2 timestamptz 
);

创建订阅

--创建订阅
create subscription mypg_db01_subscription 
connection 'host=192.168.152.136 port=9700 dbname=db01 user=logic_repl_user  password=logic_repl_user' 
publication mypg_db01_pulication 
with 
(
    create_slot = false,						#这里不创建复制槽,因为前面已经提前创建好了
    slot_name = logic_repl_slot01,
    streaming = parallel,
    copy_data = true,
    failover = true								#注意如果没有创建复制槽,上面的create_slot为true的话,这里也一定要为failover = true,
);


Updated Rows	0
Execute time	0.051s
Start time	Mon Jun 29 14:29:59 CST 2026
Finish time	Mon Jun 29 14:29:59 CST 2026
	create subscription mypg_db01_subscription 
	connection 'host=192.168.152.136 port=9700 dbname=db01 user=logic_repl_user  password=logic_repl_user' 
	publication mypg_db01_pulication 
	with 
	(
	    create_slot = false,
	    slot_name = logic_repl_slot01,
	    streaming = parallel,
	    copy_data = true,
	    failover = true
	)


--查看订阅状态
select * from pg_subscription; 
oid  |subdbid|subskiplsn|subname               |subowner|subenabled|subbinary|substream|subtwophasestate|subdisableonerr|subpasswordrequired|subrunasowner|subfailover|subconninfo                                                                              |subslotname      |subsynccommit|subpublications        |suborigin|
-----+-------+----------+----------------------+--------+----------+---------+---------+----------------+---------------+-------------------+-------------+-----------+-----------------------------------------------------------------------------------------+-----------------+-------------+-----------------------+---------+
16444|  16388|0/0       |mypg_db01_subscription|      10|true      |false    |p        |d               |false          |true               |false        |false      |host=192.168.152.136 port=9700 dbname=db01 user=logic_repl_user  password=logic_repl_user|logic_repl_slot01|off          |{mypg_db01_publication}|any      |

创建订阅的另一种方式,同步创建逻辑复制槽

create subscription mypg_db01_subscription 
connection 'host=192.168.152.136 port=9700 dbname=db01 user=logic_repl_user  password=logic_repl_user' 
publication mypg_db01_publication 
with 
(
    create_slot = true,                #注意下面的逻辑复制槽不能已存在,这里会主动创建
    slot_name = logic_repl_slot01,
    streaming = parallel,
    copy_data = true,
    failover = true                    #这里要设置为运行故障转移
);

 

===

7,PostgreSQL流复制故障转移

模拟主节点故障,备用节点提升为主节点

1,关闭主节点                        systemctl stop postgresql9700

root@ubuntu26:/usr/local/pg_install# systemctl stop postgresql9700
root@ubuntu26:/usr/local/pg_install#

2,从节点直接pg_promote    select pg_promote()

postgres=# select pg_promote();
 pg_promote
------------
 t
(1 row)

此时逻辑复制的订阅端开始报错,提示找不到发布端主机: could not connect to the publisher: connection to server at “192.168.152.136”, port 9700 failed: Connection refused  Is the server running on that host and accepting TCP/IP connections?

2026-06-29 09:14:43.546 UTC [2247682] LOG:  background worker "logical replication apply worker" (PID 2265592) exited with exit code 1
2026-06-29 09:14:48.543 UTC [2265603] LOG:  logical replication apply worker for subscription "mypg_db01_subscription" has started
2026-06-29 09:14:48.544 UTC [2265603] ERROR:  could not connect to the publisher: connection to server at "192.168.152.136", port 9700 failed: Connection refused
		Is the server running on that host and accepting TCP/IP connections?
2026-06-29 09:14:48.545 UTC [2247682] LOG:  background worker "logical replication apply worker" (PID 2265603) exited with exit code 1
2026-06-29 09:14:53.544 UTC [2265611] LOG:  logical replication apply worker for subscription "mypg_db01_subscription" has started
2026-06-29 09:14:53.545 UTC [2265611] ERROR:  could not connect to the publisher: connection to server at "192.168.152.136", port 9700 failed: Connection refused
		Is the server running on that host and accepting TCP/IP connections?
2026-06-29 09:14:53.546 UTC [2247682] LOG:  background worker "logical replication apply worker" (PID 2265611) exited with exit code 1
2026-06-29 09:14:58.556 UTC [2265623] LOG:  logical replication apply worker for subscription "mypg_db01_subscription" has started
2026-06-29 09:14:58.557 UTC [2265623] ERROR:  could not connect to the publisher: connection to server at "192.168.152.136", port 9700 failed: Connection refused
		Is the server running on that host and accepting TCP/IP connections?
2026-06-29 09:14:58.559 UTC [2247682] LOG:  background worker "logical replication apply worker" (PID 2265623) exited with exit code 1
2026-06-29 09:15:03.568 UTC [2265636] LOG:  logical replication apply worker for subscription "mypg_db01_subscription" has started
2026-06-29 09:15:03.569 UTC [2265636] ERROR:  could not connect to the publisher: connection to server at "192.168.152.136", port 9700 failed: Connection refused
		Is the server running on that host and accepting TCP/IP connections?
2026-06-29 09:15:03.570 UTC [2247682] LOG:  background worker "logical replication apply worker" (PID 2265636) exited with exit code 1

8,逻辑复制故障转移

此时需要对订阅端执行故障转移,也即先停止订阅,然后修改订阅的连接串中的主机信息,指向新的主节点,最后启动订阅

--停止订阅
ALTER SUBSCRIPTION mypg_db01_subscription DISABLE;
--修改订阅的连接信息,执行新的主节点
ALTER SUBSCRIPTION mypg_db01_subscription CONNECTION  'host=192.168.152.137 port=9700 dbname=db01 user=logic_repl_user  password=logic_repl_user'
--启动订阅
ALTER SUBSCRIPTION mypg_db01_subscription ENABLE;

查看新的订阅信息

oid  |subdbid|subskiplsn|subname               |subowner|subenabled|subbinary|substream|subtwophasestate|subdisableonerr|subpasswordrequired|subrunasowner|subfailover|subconninfo                                                                              |subslotname      |subsynccommit|subpublications        |suborigin|
-----+-------+----------+----------------------+--------+----------+---------+---------+----------------+---------------+-------------------+-------------+-----------+-----------------------------------------------------------------------------------------+-----------------+-------------+-----------------------+---------+
16447|  16388|0/0       |mypg_db01_subscription|      10|true      |false    |p        |d               |false          |true               |false        |false      |host=192.168.152.137 port=9700 dbname=db01 user=logic_repl_user  password=logic_repl_user|logic_repl_slot01|off          |{mypg_db01_publication}|any      |

在新的主库192.168.152.137上,往发布表写入数据,可以正常同步到订阅端。

验证略

9,总结

PostgreSQL 17 已经可以在物理备库上同步逻辑复制槽,实现“逻辑复制 Failover 的基础能力”,本文简单通过一个简单的测试案例来测试逻辑复制槽的failover效果,订阅端可以基于新的主节点做逻辑复制,但订阅端的连接切换仍需人工干预,尚未实现全自动高可用。


第一次在墨天轮写东西,这个格式还是不太适应

最后修改时间:2026-06-30 09:42:09
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论