暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Centos7.4下部署DRBD+NFS+Keepalived高可用环境记录

请叫我水哥 2020-08-13
1778

一、实验步骤说明

二、准备工作(两台机器都要操作)

三、DBRD配置(两台机器都要操作)

四、两台DRBD主机上添加硬盘

五、在两台机器上分别创建DRBD设备并激活r0资源

六、初始设备同步

    6.1 选择初始同步源

    6.2 启动初始化全量同步

八、挂在文件系统/data,主节点操作

九、DRBD提升和降级资源测试(手动故障主备切换)

十、NFS的安装(主从都要安装)

十一、keepalived安装(主从都要安装配置)

十二、NFS客户端挂载测试

十三、整个架构的高可用环境故障自动切换测试

十四、总结

一、实验步骤说明

  1. 在两台机器上安装keepalived,VIP为192.168.48.200

  2. DRBD的挂载目录/data作为NFS的挂载目录。远程客户机使用vip地址挂载NFS

  3. 当Primary主机发生宕机或NFS挂了的故障时,Secondary主机提权升级为DRBD的主节点,并且VIP资源也会转移过来。

  4. 当Primary主机的故障恢复时,会再次变为DRBD的主节点,并重新夺回VIP资源。从而实现故障转移

实验拓扑图:

二、准备工作(两台机器都要操作)

  1. 安装最新版本的drbd90版

  2. Primary和Secondary两台主机的DRBD环境部署

  3. Primary主机(192.168.48.180)默认作为DRBD的主节点,DRBD挂载目录是/data

  4. Secondary主机(192.168.48.181)是DRBD的备份节点

1)服务器信息(centos7.4
192.168.48.180 主服务器 主机名:Primary
192.168.48.181 备服务器 主机名:Secondary


2)两台机器的防火墙要相互允许访问。最好是关闭selinux和iptables防火墙(两台机器同样操作)
[root@Primary ~]# setenforce 0 //临时性关闭;永久关闭的话,需要修改/etc/sysconfig/selinux的SELINUX为disabled
[root@Primary ~]#systemctl stop firewalld && systemctl stop iptables


3)设置hosts文件(两台机器同样操作)
[root@Primary drbd-8.4.3]# vim /etc/hosts
192.168.1.151 Primary
192.168.1.152 Secondary


4)两台机器同步时间
[root@Primary ~]# yum install -y ntpdate
[root@Primary ~]# ntpdate ntp1.aliyun.com


5)DRBD的安装配置(两台机器上同样操作)
当前centOS7.4版本内核:
[root@secondary ~]# uname -r
3.10.0-693.el7.x86_64
这里采用yum方式安装(也可以选择清华大学或者阿里云的源)
# rpm -ivh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
# yum -y install drbd90-utils kmod-drbd90 #注意centos7.4安装会更新内核到3.10.0-1127.18.2.el7才支持。需要重启系统,从新内核进入
=========================================================================================================================================
Package 架构 版本 源 大小
=========================================================================================================================================
正在安装:
drbd90-utils x86_64 9.12.2-1.el7.elrepo elrepo 720 k
kernel x86_64 3.10.0-1127.18.2.el7 updates 50 M
kmod-drbd90 x86_64 9.0.22-2.el7_8.elrepo elrepo 284 k
为依赖而更新:
linux-firmware noarch 20191203-76.gite8a0f4c.el7 base 81 M
##由于新版本涉及了内核的更新,所以需要重启系统,使用新内核启动
[root@primary ~]# uname -r
3.10.0-1127.18.2.el7.x86_64


6)加载模块:
[root@primary ~]# modprobe drbd
[root@primary ~]# lsmod |grep drbd
drbd 573100 0
libcrc32c 12644 4 xfs,drbd,nf_nat,nf_conntrack

三、DBRD配置(两台机器都要操作)

1)确认配置文件
[root@primary ~]# cat /etc/drbd.conf
# You can find an example in /usr/share/doc/drbd.../drbd.conf.example
include "drbd.d/global_common.conf"; #global和common配置路径
include "drbd.d/*.res"; #资源文件配置路径


2)编辑全局配置文件及资源配置文件
[root@primary ~]# cat /etc/drbd.d/global_common.conf |egrep -v "^.*#|^$"
global {
usage-count yes;
}
common {
protocol C; //采用完全同步复制模式
handlers {
}
startup {
wfc-timeout 240;
degr-wfc-timeout 240;
outdated-wfc-timeout 240;
}
options {
}
disk {
on-io-error detach;
}
net {
cram-hmac-alg md5;
shared-secret "testdrbd";
}
}
[root@primary ~]# cat /etc/drbd.d/r0.res
resource r0 {
on Primary {
device /dev/drbd0; //这是Primary机器上的DRBD虚拟块设备,事先不要格式化
disk /dev/sdb1;
address 192.168.48.180:7789; //官网示例文件规定的进行网络连接端口,也可自定义
meta-disk internal;
}
on Primary {
device /dev/drbd0;
disk /dev/sdb1;
address 192.168.48.181:7789;
meta-disk internal;
}
}
##拷贝到备用机上
[root@primary ~]# cd /etc/drbd.d/
[root@primary drbd.d]# scp r0.res global_common.conf root@192.168.48.181:/etc/drbd.d
root@192.168.48.181's password:
r0.res 100% 222 219.6KB/s 00:00
global_common.conf 100% 2743 1.6MB/s 00:00

四、两台DRBD主机上添加硬盘

[root@Primary ~]# fdisk -l
......
[root@Primary ~]# fdisk /dev/sdb
依次输入"n->p->1->1->回车->w" //分区创建后,再次使用"fdisk /dev/vdd",输入p,即可查看到创建的分区,比如/dev/vdd1

在Secondary机器上添加一块10G的硬盘作为DRBD,分区为/dev/vde1,不做格式化,并在本地系统创建/data目录,不做挂载操作。
[root@Secondary ~]# fdisk -l
......
[root@Secondary ~]# fdisk /dev/sdb
依次输入"n->p->1->1->回车->w"

五、在两台机器上分别创建DRBD设备并激活r0资源

1)创建drdb磁盘设备
[root@primary ~]# mknod /dev/drbd0 b 147 0
[root@primary ~]# ll /dev/drbd0
brw-r--r-- 1 root root 147, 0 Aug 8 10:15 /dev/drbd0 exists


2)创建drdb设备的元数据
[root@Primary ~]# drbdadm create-md r0
--== Thank you for participating in the global usage survey ==--
The server's response is:
you are the 13439th user to install this version
initializing activity log
initializing bitmap (320 KB) to all zero
Writing meta data...
New drbd meta data block successfully created.
success


3)再次输入该命令进行激活r0资源。这里也可以直接使用drbdadm up r0激活资源,官方推荐配置
[root@primary drbd.d]# drbdadm create-md r0
open(/dev/sdb1) failed: Device or resource busy
Exclusive open failed. Do it anyways?
[need to type 'yes' to confirm] yes //输入yes
# Output might be stale, since minor 0 is attached
Device '0' is configured!
Command 'drbdmeta 0 v09 /dev/sdb1 internal create-md 1' terminated with exit code 20

4)启动drbd服务(注意:需要主从共同启动方能生效)
[root@primary drbd.d]# systemctl start drbd
[root@primary drbd.d]# systemctl status drbd
● drbd.service - DRBD -- please disable. Unless you are NOT using a cluster manager.
Loaded: loaded (/usr/lib/systemd/system/drbd.service; disabled; vendor preset: disabled)
Active: active (exited) since Sat 2020-08-08 10:43:08 EDT; 5s ago
Process: 1726 ExecStart=/lib/drbd/drbd start (code=exited, status=0/SUCCESS)
Main PID: 1726 (code=exited, status=0/SUCCESS)
Aug 08 10:42:23 primary systemd[1]: Starting DRBD -- please disable. Unless you are NOT using a cluster manager....
Aug 08 10:42:23 primary drbd[1726]: Starting DRBD resources: /lib/drbd/drbd: line 148: /var/lib/linstor/loop_device_mapping: N...irectory
Aug 08 10:42:23 primary drbd[1726]: [
Aug 08 10:42:23 primary drbd[1726]: ]
Aug 08 10:43:08 primary drbd[1726]: WARN: stdin/stdout is not a TTY; using /dev/console
Aug 08 10:43:08 primary drbd[1726]: .
Aug 08 10:43:08 primary systemd[1]: Started DRBD -- please disable. Unless you are NOT using a cluster manager..
Hint: Some lines were ellipsized, use -l to show in full.
[root@primary drbd.d]# ps -ef |grep drbd
root 1267 2 0 09:52 ? 00:00:00 [drbd-reissue]
root 1684 2 0 10:39 ? 00:00:00 [drbd_w_r0]
root 1686 2 0 10:39 ? 00:00:00 [drbd0_submit]
root 1698 2 0 10:39 ? 00:00:00 [drbd_s_r0]
root 1704 2 0 10:39 ? 00:00:00 [drbd_r_r0]
root 1740 2 0 10:43 ? 00:00:00 [drbd_a_r0]
root 1741 2 0 10:43 ? 00:00:00 [drbd_as_r0]

查看状态(两台机器上都执行查看)
[root@primary drbd.d]# cat /proc/drbd
version: 9.0.22-2 (api:2/proto:86-116)
GIT-hash: 719792f2cc1360c65c848ffdb66090959e27fde5 build by mockbuild@, 2020-04-05 03:16:50
Transports (api:16): tcp (9.0.22-2)
6) 检查磁盘状态,此时应该都是 Inconsistent/Inconsistent
[root@primary drbd.d]# drbdadm status
r0 role:Secondary
disk:Inconsistent
Secondary role:Secondary
peer-disk:Inconsistent
到目前为止,DRBD已经成功地分配了磁盘和网络资源,并准备就绪。然而它还不知道应该使用哪个节点作为初始设备同步的源

六、初始设备同步

要使DRBD完全运行,还需要两个步骤:

6.1 选择初始同步源

如果处理的是新初始化的空磁盘,则此选择完全是任意的。但是,如果您的某个节点已经有需要保留的有价值的数据,则选择该节点作为同步源至关重要。如果在错误的方向上执行初始设备同步,则会丢失该数据。这点要非常小心。

6.2 启动初始化全量同步

此步骤只能在一个节点上执行,只能在初始资源配置上执行,并且只能在您选择作为同步源的节点上执行。要执行此步骤,请输入以下命令:

[root@primary drbd.d]# drbdadm primary --force r0
发出发出此命令后,将启动初始化全量同步。您将能够通过 drbdadm status 监视其进度。根据设备的大小,可能需要一些时间。
此命令后,将启动初始化全量同步。您将能够通过 drbdadm status 监视其进度。根据设备的大小,可能需要一些时间。
[root@primary drbd.d]# drbdadm status
r0 role:Primary
disk:UpToDate
Secondary role:Secondary
replication:SyncSource peer-disk:Inconsistent done:36.82 //这里有初始化全量同步过程
......
[root@primary drbd.d]# drbdadm status
r0 role:Primary
disk:UpToDate
Secondary role:Secondary
replication:SyncSource peer-disk:Inconsistent done:40.02
[root@primary drbd.d]# drbdadm status
r0 role:Primary
disk:UpToDate
Secondary role:Secondary
peer-disk:UpToDate //显示UpToDate/UpToDate 表示主从配置成功

现在,您的DRBD设备已经完全运行,甚至在初始化同步完成之前(尽管性能略有降低)。如果从空磁盘开始,现在可能已经在设备上创建了一个文件系统,将其用作原始块设备,挂载它,并对可访问的块设备执行任何其他操作。

八、挂在文件系统/data,主节点操作

1)先格式化/dev/drbd0
[root@primary drbd.d]# mkfs.xfs /dev/drbd0
meta-data=/dev/drbd0 isize=512 agcount=4, agsize=655274 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0, sparse=0
data = bsize=4096 blocks=2621095, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
2)创建挂载目录,然后执行DRBD挂载
[root@Primary ~]# mkdir /data
[root@Primary ~]# mount /dev/drbd0 /data
[root@primary drbd.d]# df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/centos-root xfs 39G 3.0G 36G 8% /
devtmpfs devtmpfs 475M 0 475M 0% /dev
tmpfs tmpfs 487M 0 487M 0% /dev/shm
tmpfs tmpfs 487M 6.7M 480M 2% /run
tmpfs tmpfs 487M 0 487M 0% /sys/fs/cgroup
/dev/sda1 xfs 1014M 155M 860M 16% /boot
/dev/mapper/centos-home xfs 19G 33M 19G 1% /home
tmpfs tmpfs 98M 0 98M 0% /run/user/0
/dev/drbd0 xfs 10G 33M 10G 1% /data
特别注意:
Secondary节点上不允许对DRBD设备进行任何操作,包括只读,所有的读写操作只能在Primary节点上进行。
只有当Primary节点挂掉时,Secondary节点才能提升为Primary节点
3)备用节点创建挂在目录就可以了
[root@secondary ~]# mkdir /data

九、DRBD提升和降级资源测试(手动故障主备切换)

single-primary mode 主从模式(DRBD的默认设置)中,任何资源在任何给定时间只能在一个节点上处于主要角色,而connection state是连接的。因此,在一个节点上发出 drbdadm primary<resource> ,而该指定的资源是在另一个节点上的primary角色, 将导致错误。

配置为允许dual-primary mode双主模式的资源可以切换到两个节点上的主要角色;这是虚拟机在线迁移所必需的。

1. DBRD的提升和降级资源测试
使用以下命令之一手动将aresource’s role从次要角色切换到主要角色(提升)或反之亦然(降级):
##创建测试文件
[root@primary drbd.d]# cd /data/
[root@primary data]# touch file{1..5}
[root@primary data]# ls
file1 file2 file3 file4 file5
##切换测试
[root@primary /]# drbdadm secondary r0 //切换资源到从节点,主节点降级
r0: State change failed: (-12) Device is held open by someone //提示资源被占用,所以要先卸载磁盘,在进去降级操作
additional info from kernel:
/dev/drbd0 opened by mount (pid 1924) at 2020-08-08 15:18:58.781
Command 'drbdsetup secondary r0' terminated with exit code 11
[root@primary /]# umount /data/
[root@primary /]# drbdadm secondary r0
##从节点检查
[root@secondary data]# drbdadm primary r0 //备用节点提权测试
[root@secondary /]# mount /dev/drbd0 /data/
[root@secondary /]# cd /data/
[root@secondary data]# ls
file1 file2 file3 file4 file5 //可以看到数据已经同步
##从节点测试是否能够写入数据
[root@secondary data]# touch slave-{1..3}
[root@secondary data]# ls
file1 file2 file3 file4 file5 slave-1 slave-2 slave-3 //可以写入,说明切换测试正常
##查看此时drbd集群状态
[root@primary ~]# drbdadm status
r0 role:Secondary //此时主变成备用节点
disk:UpToDate
Secondary role:Primary //此时从变成主节点
peer-disk:UpToDate

十、NFS的安装(主从都要安装)

在Primary和Secondary两台主机上安装NFS(可以参考:http://www.cnblogs.com/kevingrace/p/6084604.html)
[root@Primary ~]# yum install rpcbind nfs-utils
[root@Primary ~]# vim /etc/exports
/data 192.168.48.0/24(rw,sync,no_root_squash)

[root@primary ~]# systemctl start rpcbind.service
[root@primary ~]# systemctl enable rpcbind.service
[root@primary ~]# systemctl start nfs-server.service
[root@primary ~]# systemctl enable nfs-server.service

十一、keepalived安装(主从都要安装配置)

## yum安装keepalived
[root@primary data]# yum install keepalived

##Primary主机的keepalived.conf配置
[root@primary keepalived]# cp keepalived.conf keepalived.conf.bak
[root@Primary ~]# vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
notification_email {
root@localhost
}

notification_email_from keepalived@localhost
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id DRBD_HA_MASTER
}

vrrp_script chk_nfs {
script "/etc/keepalived/check_nfs.sh"
interval 5
}
vrrp_instance VI_1 {
state MASTER
interface ens33
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
track_script {
chk_nfs
}
notify_stop /etc/keepalived/notify_stop.sh
notify_master /etc/keepalived/notify_master.sh
virtual_ipaddress {
192.168.48.200
}
}

##启动keepalived服务
[root@primary keepalived]# systemctl start keepalived.service
[root@primary keepalived]# systemctl enable keepalived.service
Created symlink from /etc/systemd/system/multi-user.target.wants/keepalived.service to /usr/lib/systemd/system/keepalived.service.
[root@primary keepalived]# ps -ef |grep keepalived
root 2666 1 0 11:58 ? 00:00:00 /usr/sbin/keepalived -D
root 2667 2666 0 11:58 ? 00:00:00 /usr/sbin/keepalived -D
root 2668 2666 0 11:58 ? 00:00:00 /usr/sbin/keepalived -D
root 2695 1235 0 11:58 pts/0 00:00:00 grep --color=auto keepalived

##查看VIP
[root@primary keepalived]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:30:b5:bd brd ff:ff:ff:ff:ff:ff
inet 192.168.48.180/24 brd 192.168.48.255 scope global ens33
valid_lft forever preferred_lft forever
inet 192.168.48.200/32 scope global ens33 //vip地址
valid_lft forever preferred_lft forever
inet6 fe80::bb8e:bf99:3f66:42cc/64 scope link
valid_lft forever preferred_lft forever

##Secondary主机的keepalived.conf配置
[root@primary keepalived]# scp keepalived.conf root@192.168.48.181:/etc/keepalived/keepalived.conf //拷贝主节点的配置文件
root@192.168.48.181's password:
keepalived.conf 100% 694 836.6KB/s 00:00
[root@Secondary ~]# vim /etc/keepalived/keepalived.conf //调整配置文件
! Configuration File for keepalived
global_defs {
notification_email {
root@localhost
}
notification_email_from keepalived@localhost
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id DRBD_HA_BACKUP
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
notify_master /etc/keepalived/notify_master.sh //当此机器为keepalived的master角色时执行这个脚本
notify_backup /etc/keepalived/notify_backup.sh //当此机器为keepalived的backup角色时执行这个脚本
virtual_ipaddress {
192.168.48.200
}
}
启动keepalived服务
[root@secondary ~]# systemctl start keepalived.service
[root@secondary ~]# systemctl enable keepalived.service
Created symlink from /etc/systemd/system/multi-user.target.wants/keepalived.service to /usr/lib/systemd/system/keepalived.service.
[root@secondary ~]# ps -ef |grep keepalived
root 13520 1 0 00:03 ? 00:00:00 /usr/sbin/keepalived -D
root 13521 13520 0 00:03 ? 00:00:00 /usr/sbin/keepalived -D
root 13522 13520 0 00:03 ? 00:00:00 /usr/sbin/keepalived -D
root 13548 12284 0 00:03 pts/0 00:00:00 grep --color=auto keepalived

##四个脚本配置---------------
1)此脚本只在Primary机器上配置
[root@Primary ~]# vim /etc/keepalived/check_nfs.sh
#!/bin/bash
##检查nfs服务可用性:进程和是否能挂载
systemctl status nfs-server &> /dev/null
if [ $? -ne 0 ];then //-ne测试两个整数是否不等,不等为真,相等为假。
##如果服务不正常,重启服务
systemctl restart nfs-server
systemctl status nfs-server &>/dev/null
if [ $? -ne 0 ];then
##若重启后还不正常。卸载drbd0,并降级为备用节点
umount /dev/drbd0
drbdadm secondary r0
##关闭keepalived服务
systemctl stop keepalived
fi
fi
[root@Primary ~]# chmod 755 /etc/keepalived/check_nfs.sh

2)此脚本只在Primary机器上配置
[root@Primary ~]# mkdir /etc/keepalived/logs
[root@primary keepalived]# cat notify_stop.sh
#!/bin/bash
time=`date "+%F %H:%M:%S"`
echo -e "$time ------notify_stop------\n" >> /etc/keepalived/logs/notify_stop.log
systemctl stop nfs-server &>> /etc/keepalived/logs/notify_stop.log
umount /data &>> /etc/keepalived/logs/notify_stop.log
drbdadm secondary r0 &>> /etc/keepalived/logs/notify_stop.log
echo -e "\n" >> /etc/keepalived/logs/notify_stop.log
[root@Primary ~]# chmod 755 /etc/keepalived/notify_stop.sh

3)此脚本在两台机器上都要配置
[root@primary keepalived]# cat notify_master.sh
#!/bin/bash
time=`date "+%F %H:%M:%S"`
echo -e "$time ------notify_master------\n" >> /etc/keepalived/logs/notify_master.log
drbdadm primary r0 &>> /etc/keepalived/logs/notify_master.log
mount /dev/drbd0 /data &>> /etc/keepalived/logs/notify_master.log
systemctl restart nfs-server &>> /etc/keepalived/logs/notify_master.log
echo -e "\n" >> /etc/keepalived/logs/notify_master.log
[root@Primary ~]# chmod 755 /etc/keepalived/notify_master.sh

4)此脚本只在Secondary机器上配置
[root@Secondary ~]# mkdir /etc/keepalived/logs
[root@secondary keepalived]# cat notify_backup.sh
#!/bin/bash
time=`date "+%F %H:%M:%S"`
echo -e "$time ------notify_stop------\n" >> /etc/keepalived/logs/notify_backup.log
systemctl stop nfs-server &>> /etc/keepalived/logs/notify_backup.log
umount /data &>> /etc/keepalived/logs/notify_backup.log
drbdadm secondary r0 &>> /etc/keepalived/logs/notify_backup.log
echo -e "\n" >> /etc/keepalived/logs/notify_backup.log
[root@Secondary ~]# chmod 755 /etc/keepalived/notify_backup.sh
-----------------------------------------------------------------------------------------------------------

十二、NFS客户端挂载测试

在远程客户机上挂载NFS
客户端只需要安装rpcbind程序,并确认服务正常
[root@nfsclient ~]# yum -y install rpcbind nfs-utils
[root@nfsclient ~]# systemctl start rpcbind && systemctl enable rpcbind
挂载NFS
[root@nfsclient ~]# mount -t nfs 192.168.48.200:/data /drbd
如下查看,发现已经成功挂载了NFS
[root@nfsclient ~]# df -hT
文件系统 类型 容量 已用 可用 已用% 挂载点
/dev/mapper/centos-root xfs 36G 6.8G 29G 20% /
devtmpfs devtmpfs 897M 0 897M 0% /dev
tmpfs tmpfs 912M 0 912M 0% /dev/shm
tmpfs tmpfs 912M 9.0M 903M 1% /run
tmpfs tmpfs 912M 0 912M 0% /sys/fs/cgroup
/dev/sda1 xfs 1014M 180M 835M 18% /boot
tmpfs tmpfs 183M 32K 183M 1% /run/user/0
192.168.48.200:/data nfs4 10G 33M 10G 1% /drbd //挂载的目录

[root@nfsclient ~]# cd /drbd/
[root@nfsclient drbd]# ls
file1 file2 file3 file4 file5 slave-1 slave-2 slave-3

十三、整个架构的高可用环境故障自动切换测试

13.1 先关闭Primary主机上的keepalived服务。就会发现VIP资源已经转移到Secondary主机上了。同时,Primary主机的nfs也会主动关闭,同时Secondary会升级为DRBD的主节点

## 停止主节点的keepalived服务,vip资源已经漂移
[root@primary keepalived]# systemctl stop keepalived.service
[root@primary keepalived]# ip addr |grep ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
inet 192.168.48.180/24 brd 192.168.48.255 scope global ens33
##查看备节点,是否接管了vip
[root@secondary keepalived]# ip addr |grep ens33
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
inet 192.168.48.181/24 brd 192.168.48.255 scope global ens33
inet 192.168.48.200/32 scope global ens33
## 查看系统日志,vip漂移信息
[root@primary keepalived]# tailf /var/log/messages //主节点日志切换记录
Aug 8 12:34:50 primary Keepalived_vrrp[2668]: Sending gratuitous ARP on ens33 for 192.168.48.200
Aug 8 12:45:53 primary chronyd[714]: Selected source 119.28.206.193
Aug 8 12:45:58 primary Keepalived[2666]: Stopping
Aug 8 12:45:58 primary systemd: Stopping LVS and VRRP High Availability Monitor...
Aug 8 12:45:58 primary Keepalived_vrrp[2668]: VRRP_Instance(VI_1) sent 0 priority
Aug 8 12:45:58 primary Keepalived_vrrp[2668]: VRRP_Instance(VI_1) removing protocol VIPs.
Aug 8 12:45:58 primary Keepalived_healthcheckers[2667]: Stopped
Aug 8 12:45:59 primary Keepalived_vrrp[2668]: Stopped
Aug 8 12:45:59 primary Keepalived[2666]: Stopped Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2
Aug 8 12:45:59 primary systemd: Stopped LVS and VRRP High Availability Monitor.
[root@primary keepalived]# tailf /var/log/messages //备节点日志切换记录
Aug 9 00:34:45 secondary Keepalived_vrrp[13522]: VRRP_Instance(VI_1) Entering BACKUP STATE
Aug 9 00:45:59 secondary Keepalived_vrrp[13522]: VRRP_Instance(VI_1) Transition to MASTER STATE
Aug 9 00:46:00 secondary Keepalived_vrrp[13522]: VRRP_Instance(VI_1) Entering MASTER STATE
Aug 9 00:46:00 secondary Keepalived_vrrp[13522]: VRRP_Instance(VI_1) setting protocol VIPs.
Aug 9 00:46:00 secondary Keepalived_vrrp[13522]: Sending gratuitous ARP on ens33 for 192.168.48.200
Aug 9 00:46:00 secondary Keepalived_vrrp[13522]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 192.168.48.200


##会触发notify_stop脚本,停止nfs,卸载/data,drbd降级操作,依次检查是否正确?
[root@primary keepalived]# ps -ef |grep nfs //已经没有nfs进程
root 3557 1235 0 13:03 pts/0 00:00:00 grep --color=auto nfs
[root@primary keepalived]# df -hT |grep /data |wc -l //data目录已经卸载
0
[root@primary keepalived]# drbdadm status
r0 role:Secondary //此时主节点已经切换成备节点了
disk:UpToDate
Secondary role:Primary //此时备节点已经切换成主节点了
peer-disk:UpToDate
##备用节点检查
[root@secondary keepalived]# df -hT|grep /data
/dev/drbd0 xfs 10G 33M 10G 1% /data
[root@secondary keepalived]# cd /data/
[root@secondary data]# ls
file1 file2 file3 file4 file5 slave-1 slave-2 slave-3


##客户端检查
[root@nfsclient drbd]# ls
file1 file2 file3 file4 file5 slave-1 slave-2 slave-3
[root@secondary data]# rm -rf slave-* //现在主节点删除数据测试,客户端是否同步
[root@nfsclient drbd]# ls
file1 file2 file3 file4 file5 //slave文件已经没有了。说明同步成功

当Primary机器的keepalived服务恢复启动后,VIP资源又会强制夺回来(可以查看/var/log/message系统日志)

并且Primary还会再次变为DRBD的主节点

  

13.2 关闭Primary主机的nfs服务。根据监控脚本,会主动去启动nfs,只要当启动失败时,才会强制由DRBD的主节点降为备份节点,并关闭keepalived。

从而跟上面流程一样实现故障转移

[root@primary keepalived]# systemctl stop nfs-server //停止nfs服务
[root@primary keepalived]# ps -ef |grep nfs-server
root 4367 1235 0 13:22 pts/0 00:00:00 grep --color=auto nfs-server
[root@primary keepalived]# ps -ef |grep nfs //keepalived会检测到nfs_chk脚本,会自动启动nfs服务
root 4328 2 0 13:22 ? 00:00:00 [nfsd4_callbacks]
root 4332 2 0 13:22 ? 00:00:00 [nfsd]
root 4333 2 0 13:22 ? 00:00:00 [nfsd]
root 4334 2 0 13:22 ? 00:00:00 [nfsd]
root 4335 2 0 13:22 ? 00:00:00 [nfsd]
root 4336 2 0 13:22 ? 00:00:00 [nfsd]
root 4337 2 0 13:22 ? 00:00:00 [nfsd]
root 4338 2 0 13:22 ? 00:00:00 [nfsd]
root 4339 2 0 13:22 ? 00:00:00 [nfsd]
root 4369 1235 0 13:22 pts/0 00:00:00 grep --color=auto nfs
[root@primary keepalived]# drbdadm status //DRBD主节点并不会强制降级,除非nfs无法启动
r0 role:Primary
disk:UpToDate
Secondary role:Secondary
peer-disk:UpToDate

十四、总结

在上面的主从故障切换过程中,对于客户端来说,挂载NFS不影响使用,只是会有一点的延迟。这也验证了drbd提供的数据一致性功能(包括文件的打开和修改状态等),在客户端看来,整个切换过程就是"一次nfs重启"(主nfs停,备nfs启)。


文章转载自请叫我水哥,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论