暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

金仓数据库KingbaseES V8R6 集群备库网卡down测试案例

数据猿 2022-12-23
394

数据库版本:

test=# select version();
                                                       version
----------------------------------------------------------------------------------------------------------------------
 KingbaseES V008R006C005B0041 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

主机节点信息:

[kingbase@node101 bin]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.101   node101  ,  #主库
192.168.1.102   node102     #备库

集群节点信息:

ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node101 | primary | * running |          | running | 11180 | no      | n/a
 2  | node102 | standby |   running | node101  | running | 9242  | no      | 0 second(s) ago

一、查看集群状态及配置信息

1、集群节点状态

[kingbase@node101 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                         
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node101 | primary | * running |          | default  | 100      | 1        | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node102 | standby |   running | node101  | default  | 100      | 1        | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

2、集群配置信息

编辑

二、将备库网卡down测试

1、备库网卡down
[root@node102 ~]# ifconfig enp0s3 down

编辑

2、查看备库messages日志

编辑

3、备库hamgr.log

=日志信息显示repmgrd服务被close,无法提供正常的服务。=

编辑

4、主库查看集群节点状态

[kingbase@node101 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status        | Upstream | Location | Priority | Timeline | Connection string                                 
----+---------+---------+---------------+----------+----------+----------+----------+------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node101 | primary | * running     |          | default  | 100      | 1        | host=192.168.1.101 user=system dbname=esrep port=5 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node102 | standby | ? unreachable | node101  | default  | 100      | ?        | host=192.168.1.102 user=system dbname=esrep port=5 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

WARNING: following issues were detected
  - unable to connect to node "node102" (ID: 2)
  - node "node102" (ID: 2) is registered as an active standby but is unreachable

=== 从以上信息所示,集群没有触发主备库的切换操作。===

三、备库网卡恢复正常(up)

1、查看集群状态信息

[kingbase@node101 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                     
----+---------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node101 | primary | * running |          | default  | 100      | 1        | host=192.168.1.101 user=system dbname=esrep port=54321nect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node102 | standby |   running | node101  | default  | 100      | 1        | host=192.168.1.102 user=system dbname=esrep port=54321nect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

2、查看备库hamgr.log

=如下日志所示,备库网卡恢复正常后,备库通过接收wal日志流执行recovery,和主库同步。=

[2022-03-29 16:11:45] [INFO] node "node102" (ID: 2) monitoring upstream node "node101" (ID: 1) in normal state
[2022-03-29 16:11:45] [ERROR] unable to determine if server is in recovery
[2022-03-29 16:11:45] [DETAIL]
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

[2022-03-29 16:11:45] [DETAIL] query text is:
SELECT pg_catalog.pg_is_in_recovery()
[2022-03-29 16:11:47] [NOTICE] upstream is available but upstream connection has gone away, resetting
[2022-03-29 16:12:24] [ERROR] is_rep_sync_streaming(): get 2 tuples
[2022-03-29 16:12:45] [ERROR] is_wal_all_recevied(): get 0 tuples
[2022-03-29 16:12:45] [ERROR] is_rep_sync_streaming(): get 0 tuples
[2022-03-29 16:12:47] [ERROR] is_wal_all_recevied(): get 0 tuples
[2022-03-29 16:12:47] [ERROR] is_rep_sync_streaming(): get 0 tuples
[2022-03-29 16:12:49] [ERROR] is_wal_all_recevied(): get 0 tuples
[2022-03-29 16:12:49] [ERROR] is_rep_sync_streaming(): get 0 tuples
[2022-03-29 16:16:47] [INFO] node "node102" (ID: 2) monitoring upstream node "node101" (ID: 1) in normal state

四、总结

1、对于备库,如果网卡down引起的网络故障,并不会触发集群的主备切换。当网卡正常后,集群恢复正常。
 2、如果备库的数据库服务down,在recovery=‘automatic | standby’配置时,会自动恢复备库的数据库服务。
 3、本案例是在一主一备的架构下的测试,如果是一主多备的架构,对于同步状态是‘sync’的备库网卡down,会导致其他的备库进行竞选,将同步状态提升为‘sync’。


「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论