问题描述
下面是模拟其中一个节点VOTEDISK磁盘丢失导致主机重启
1,环境介绍
[root@cisser2 ~]# crsctl query crs activeversion CRS active version on the cluster is [10.2.0.5.0] [root@cisser2 ~]# lsb_release -a LSB Version: :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 5.11 (Tikanga) Release: 5.11 Codename: Tikanga
2,查看磁盘信息
[root@cisser1 tmp]# dmsetup ls disk1_votep1 (253, 5) disk1_vote (253, 2) disk1_ocr (253, 3) VolGroup00-LogVol01 (253, 0) disk1_data1 (253, 4) VolGroup00-LogVol00 (253, 1) disk1_ocrp1 (253, 6) [root@cisser1 tmp]# raw -qa /dev/raw/raw1: bound to major 253, minor 6 /dev/raw/raw4: bound to major 253, minor 5 [root@cisser1 ~]# crsctl query css votedisk 0. 0 /dev/raw/raw4 located 1 votedisk(s). [root@cisser1 tmp]# multipath -ll disk1_vote (36000c291eaeb9a8cb897fed3bb029eb7) dm-2 VMware,,VMware Virtual [size=307M][features=0][hwhandler=0][rw] \_ round-robin 0 [prio=1][active] \_ 2:0:0:0 sdb 8:16 [active][ready] disk1_ocr (36000c293ecddddd9af5f396457322054) dm-3 VMware,,VMware Virtual [size=307M][features=0][hwhandler=0][rw] \_ round-robin 0 [prio=1][active] \_ 2:0:1:0 sdc 8:32 [active][ready] disk1_data1 (36000c294078123daee865a29e3b1ea63) dm-4 VMware,,VMware Virtual [size=50G][features=0][hwhandler=0][rw] \_ round-robin 0 [prio=1][active] \_ 2:0:2:0 sdd 8:48 [active][ready]
这里可以看到/dev/raw/raw4是votedisk对应多路径磁盘/dev/dm-2别名是disk1_vote,对应磁盘名是sdb
专家解答
3,删除sdb磁盘
[root@cisser1 device]# pwd /sys/block/sdb/device [root@cisser1 device]# ls -l delete –w——- 1 root root 4096 Mar 29 11:46 delete [root@cisser1 dm-2]# echo 1 > /sys/block/sdb/device/delete [root@cisser1 dm-2]# multipath -ll disk1_vote (36000c291eaeb9a8cb897fed3bb029eb7) dm-2 , [size=307M][features=0][hwhandler=0][rw] \_ round-robin 0 [prio=0][enabled] \_ #:#:#:# – #:# [failed][faulty]这里可以看到路径丢失
4,查看节点1的cssd日志信息
[root@cisser1 cssd]# tail -f ocssd.log 从这个时候开始报磁盘错误 [ CSSD]2015-03-29 13:16:35.485 [843671872] >ERROR: Internal Error Information: Category: 1234 Operation: scls_block_read Location: fread_failed Other: fread unable to read buffer Dep: 5 ………………………… [ CSSD]2015-03-29 13:16:35.485 [843671872] >ERROR: clssnmvReadBlocks: read failed 1 at offset 529 of /dev/raw/raw4 [ CSSD]2015-03-29 13:16:35.485 [843671872] >TRACE: clssnmDiskStateChange: state from 4 to 3 disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:16:35.485 [832436544] >ERROR: Internal Error Information: Category: 1234 Operation: scls_block_write Location: fwrite_faile Other: fwrite unable to write buffer Dep: 5 第12次出现下面日志 [ CSSD]2015-03-29 13:19:26.604 [832436544] >ERROR: Internal Error Information: Category: 1234 Operation: scls_block_read Location: fread_failed Other: fread unable to read buffer Dep: 5 [ CSSD]2015-03-29 13:19:26.604 [832436544] >ERROR: clssnmvReadBlocks: read failed 1 at offset 4 of /dev/raw/raw4 [ CSSD]2015-03-29 13:19:27.814 [1013770560] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 13:19:27.814 [1013770560] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes [ CSSD]2015-03-29 13:19:31.820 [1013770560] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 13:19:31.820 [1013770560] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes [ CSSD]2015-03-29 13:19:35.615 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 19900 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:36.615 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 18900 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:36.825 [1013770560] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 13:19:36.825 [1013770560] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes [ CSSD]2015-03-29 13:19:37.617 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 17900 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:37.892 [992381248] >TRACE: clssgmDispatchCMXMSG(): msg type(3) src(2) dest(1) size(420) tag(01f7002a) incarnation(5) [ CSSD]2015-03-29 13:19:37.892 [992381248] >TRACE: clssgmHandleMasterAdd(): src(2) dest(1) size(420) [ CSSD]2015-03-29 13:19:37.892 [992381248] >TRACE: clssgmHandleMasterAdd(): grock(SRVM.DATABASE.NODEAPPS.cisser2) memberNo(-1) node(2) client(1f7002a) type(3). [ CSSD]2015-03-29 13:19:37.892 [992381248] >TRACE: clssgmAddMember: granted member(0) flags(0x1) node(2) grock (0x8c66cb0/SRVM.DATABASE.NODEAPPS.cisser2) [ CSSD]2015-03-29 13:19:37.892 [992381248] >TRACE: clssgmCommonAddMember: Remote member(0) node(2) flags 0x1 0x1 grock (3/0x8c66cb0/SRVM.DATABASE.NODEAPPS.cisser2) [ CSSD]2015-03-29 13:19:37.898 [992381248] >TRACE: clssgmDispatchCMXMSG(): msg type(4) src(2) dest(1) size(352) tag(01f8002a) incarnation(5) [ CSSD]2015-03-29 13:19:37.898 [992381248] >TRACE: clssgmHandleMasterExit(): src(2) dest(1) size(352) [ CSSD]2015-03-29 13:19:37.898 [992381248] >TRACE: clssgmRemoveMember: grock(SRVM.DATABASE.NODEAPPS.cisser2) member(0/0x8c013b0) nodeNum(2) flags(0x1) type(3) [ CSSD]2015-03-29 13:19:38.618 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 16900 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:39.619 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 15890 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:39.701 [992381248] >TRACE: clssgmDispatchCMXMSG(): msg type(12) src(2) dest(1) size(360) tag(01f9002a) incarnation(5) [ CSSD]2015-03-29 13:19:40.620 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 14890 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:41.621 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 13890 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:41.831 [1013770560] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 13:19:41.831 [1013770560] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes [ CSSD]2015-03-29 13:19:42.623 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 12890 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:43.237 [950012224] >TRACE: clssgmAllocProc: (0x8c23c60) allocated [ CSSD]2015-03-29 13:19:43.238 [971401536] >TRACE: Connect request from user root [ CSSD]2015-03-29 13:19:43.238 [950012224] >TRACE: clssgmClientConnectMsg: Connect from con(0x8c013b0) proc(0x8c23c60) pid() proto(10:2:1:1) [ CSSD]2015-03-29 13:19:43.239 [950012224] >TRACE: clssgmRegisterClient: proc(17/0x8c23c60), client(1/0x8bd2110) [ CSSD]2015-03-29 13:19:43.239 [950012224] >TRACE: clssgmExecuteClientRequest: GRKJOIN recvd from client 1 (0x8bd2110) [ CSSD]2015-03-29 13:19:43.239 [950012224] >TRACE: clssgmJoinGrock: grock SRVM.DATABASE.NODEAPPS.cisser1 new client 0x8bd2110 with con 0x8bfdef0, requested num -1 [ CSSD]2015-03-29 13:19:43.239 [950012224] >TRACE: clssgmAddGrockMember: adding member to grock SRVM.DATABASE.NODEAPPS.cisser1 [ CSSD]2015-03-29 13:19:43.239 [950012224] >TRACE: clssgmAddMember: granted member(0) flags(0x1) node(1) grock (0x8c64fa0/SRVM.DATABASE.NODEAPPS.cisser1) [ CSSD]2015-03-29 13:19:43.239 [950012224] >TRACE: clssgmQueueGrockEvent: lockName(SRVM.DATABASE.NODEAPPS.cisser1) type(2) count (1/1) xwaiters(0) event(1) to memberNo(0) [ CSSD]2015-03-29 13:19:43.239 [950012224] >TRACE: clssgmCommonAddMember: Local member(0) node(1) flags 0x1 0x1 grock (3/0x8c64fa0/SRVM.DATABASE.NODEAPPS.cisser1) [ CSSD]2015-03-29 13:19:43.244 [950012224] >TRACE: clssgmExecuteClientRequest: GRKEXIT recvd from client 1 (0x8bd2110) [ CSSD]2015-03-29 13:19:43.244 [950012224] >TRACE: clssgmExitGrock: client 1 (0x8bd2110), grock SRVM.DATABASE.NODEAPPS.cisser1, member 0 [ CSSD]2015-03-29 13:19:43.244 [950012224] >TRACE: clssgmUnregisterClient(): removing proc 17 client 1, flags 0x04000000 [ CSSD]2015-03-29 13:19:43.244 [950012224] >TRACE: clssgmRemoveMember: grock(SRVM.DATABASE.NODEAPPS.cisser1) member(0/0x8c22fa0) nodeNum(1) flags(0x1) type(3) [ CSSD]2015-03-29 13:19:43.244 [950012224] >TRACE: clssgmUnregisterClient: client 0x8bd2110 expiring [ CSSD]2015-03-29 13:19:43.452 [950012224] >TRACE: clssgmDeadProc: proc 0x8c23c60 [ CSSD]2015-03-29 13:19:43.452 [950012224] >TRACE: clssgmDeleteClientListener: deleting cmProc (0x8c23c60), with 0 clients [ CSSD]2015-03-29 13:19:43.452 [950012224] >TRACE: clssgmDeleteClientListener: cleanup for proc(0x8c23c60) con(0x8c013b0) pid() [ CSSD]2015-03-29 13:19:43.625 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 11890 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:44.627 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 10890 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:45.616 [832436544] >ERROR: Internal Error Information: Category: 1234 Operation: scls_block_read Location: fread_failed Other: fread unable to read buffer Dep: 5 [ CSSD]2015-03-29 13:19:45.616 [832436544] >ERROR: clssnmvReadBlocks: read failed 1 at offset 4 of /dev/raw/raw4 [ CSSD]2015-03-29 13:19:45.616 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 9900 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:46.617 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 8900 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:46.839 [1013770560] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 13:19:46.839 [1013770560] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes [ CSSD]2015-03-29 13:19:47.618 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 7900 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:48.620 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 6900 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:49.621 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 5890 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:50.623 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 4890 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:50.845 [1013770560] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 13:19:50.845 [1013770560] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes [ CSSD]2015-03-29 13:19:51.625 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 3890 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:52.626 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 2890 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:53.627 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 1890 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:54.628 [854161728] >WARNING: clssnmDiskPMT: voting device offline at 90% fatal, termination in 890 ms, disk (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:55.530 [854161728] >TRACE: clssnmDiskPMT: offline disk (200010 ms) (0//dev/raw/raw4) [ CSSD]2015-03-29 13:19:55.530 [854161728] >ERROR: clssnmDiskPMT: Aborting, 1 of 1 voting disks unavailable [ CSSD]2015-03-29 13:19:55.533 [854161728] >ERROR: ################################### [ CSSD]2015-03-29 13:19:55.533 [854161728] >ERROR: clssscExit: CSSD aborting from thread clssnmvDiskPingMonitorThread [ CSSD]2015-03-29 13:19:55.533 [854161728] >ERROR: ################################### [ CSSD]2015-03-29 13:19:55.533 [854161728] >TRACE: clssgmDiscOmonReady: omon was posted for member 1
这里可以看到clssnmDiskPMT: Aborting, 1 of 1 voting disks unavailable,主机1由于不能访问VOTEDISK磁盘,CSSD进程被clssnmvDiskPingMonitorThread线程终止,导致主机重启。
5,主机2的OCSSD日志
这里看到主机2收到节点1已经failure,后面开始出现50% heartbeat网络心跳丢失。 [ CSSD]2015-03-29 13:19:58.691 [1518352704] >TRACE: clssgmPeerListener: discarded 0 future msgsfor 1 [ CSSD]2015-03-29 13:19:58.691 [1396734272] >WARNING: clssnmeventhndlr: Receive failure with node 1 (cisser1), state 3, con(0xf66b090), probe((nil)), rc=11 ………………………………… [ CSSD]2015-03-29 13:20:28.778 [1529252160] >WARNING: clssnmPollingThread: node cisser1 (1) at 50% heartbeat fatal, eviction in 29.200 seconds seedhbimpd 0 [ CSSD]2015-03-29 13:20:28.778 [1529252160] >TRACE: clssnmPollingThread: node cisser1 (1) is impending reconfig, flag 1, misstime 30800 [ CSSD]2015-03-29 13:20:28.778 [1529252160] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1) [ CSSD]2015-03-29 13:20:32.905 [1539742016] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 13:20:32.905 [1539742016] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes [ CSSD]2015-03-29 13:20:36.910 [1539742016] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 13:20:36.910 [1539742016] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes [ CSSD]2015-03-29 13:20:39.826 [1407633728] >TRACE: clssgmAllocateRPCIndex: allocated rpc 507 (0x2abe50b1cd90) [ CSSD]2015-03-29 13:20:39.826 [1407633728] >TRACE: clssgmpeersend: send failed type 12, node 1, unreachable, flags 0x0, quiesced 0 [ CSSD]2015-03-29 13:20:39.826 [1407633728] >TRACE: clssgmFreeRPCIndex: freeing rpc 507 [ CSSD]2015-03-29 13:20:41.917 [1539742016] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 13:20:41.917 [1539742016] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes [ CSSD]2015-03-29 13:20:43.780 [1529252160] >WARNING: clssnmPollingThread: node cisser1 (1) at 75% heartbeat fatal, eviction in 14.200 seconds seedhbimpd 1 [ CSSD]2015-03-29 13:20:45.923 [1539742016] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 13:20:45.923 [1539742016] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes [ CSSD]2015-03-29 13:20:49.930 [1539742016] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 13:20:49.930 [1539742016] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes [ CSSD]2015-03-29 13:20:52.784 [1529252160] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 5.200 seconds seedhbimpd 1 [ CSSD]2015-03-29 13:20:53.785 [1529252160] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 4.200 seconds seedhbimpd 1 [ CSSD]2015-03-29 13:20:54.787 [1529252160] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 3.200 seconds seedhbimpd 1 [ CSSD]2015-03-29 13:20:54.938 [1539742016] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 13:20:54.938 [1539742016] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes [ CSSD]2015-03-29 13:20:55.788 [1529252160] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 2.200 seconds seedhbimpd 1 [ CSSD]2015-03-29 13:20:56.790 [1529252160] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 1.200 seconds seedhbimpd 1 [ CSSD]2015-03-29 13:20:57.791 [1529252160] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 0.190 seconds seedhbimpd 1 [ CSSD]2015-03-29 13:20:57.984 [1529252160] >TRACE: clssnmPollingThread: Eviction started for node cisser1 (1), flags 0x0001, state 3, wt4c 0 seedhbimpd 1
node2在2015-03-29 13:20:57.984开始踢节点,其实这个主机1已经重启了,是由于节点1重启导致网络丢失,所以菜出现了驱除节点的提示。
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。