问题描述
下面是模拟主机OCSSD.LOG进程HANG住导致主机重启
专家解答
1,环境介绍
[root@cisser2 ~]# crsctl query crs activeversion CRS active version on the cluster is [10.2.0.5.0] [root@cisser2 ~]# lsb_release -a LSB Version: :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch Distributor ID: RedHatEnterpriseServer Description: Red Hat Enterprise Linux Server release 5.11 (Tikanga) Release: 5.11 Codename: Tikanga
2,手动暂停OCSSD进程
[root@cisser1 ~]# ps -ef|grep d.bin oracle 4666 4665 0 11:23 ? 00:00:00 /oracle/app/oracle/product/10.2.0/crs_1/bin/evmd.bin root 4746 4063 0 11:23 ? 00:00:00 /oracle/app/oracle/product/10.2.0/crs_1/bin/crsd.bin reboot root 5258 4874 0 11:23 ? 00:00:00 /oracle/app/oracle/product/10.2.0/crs_1/bin/oprocd.bin run -t 1000 -m 500 -f oracle 5348 4903 0 11:23 ? 00:00:00 /oracle/app/oracle/product/10.2.0/crs_1/bin/ocssd.bin root 14621 13718 0 11:44 pts/1 00:00:00 grep d.bin [root@cisser1 ~]# kill -19 5348
30S后会生成下面的日志
3,主机messages日志
Mar 29 11:45:23 cisser1 logger: Oracle clsomon failed with fatal status 13.
这里看到状态为13,这里由于status代码不通,可能出错的原因不通,下面是常见的代码说明。
/* 10-39 are reserved for various kinds of steady state errors * i.e. anything that comes after the group registration. */ clssomonretMEM = 11, /* memory allocation failure */ clssomonretCSS = 12, /* misc error in CSS layer */ clssomonretFATAL = 13, /* failure in CSS layer that should cause a reboot*/ clssomonretOCR = 14, /* misc error in OCR layer */ clssomonretOSD = 15, /* error in OSD layer used by generic code*/ /* 40-69 are reserved for various kinds of initialization errors * i.e. anything that comes before the group registration. */ clssomonretCRSHOME = 40, /* CRS home is unavailable. */ clssomonretHOSTNAME = 42, /* unable to fetch hostname */ clssomonretSTDERR = 43, /* failure redirecting stderr */ clssomonretSTDOUT = 44, /* failure redirecting stdout */ clssomonretCHDIR = 45, /* failure redirecting corefile */ clssomonretARGS = 50, /* error processing arguments */ clssomonretCSSINIT = 51, /* failure initializing CSS-objects/APIs */ clssomonretCSSINIT = 51, /* failure initializing CSS-objects/APIs */ clssomonretOCRINIT = 52, /* failure initializing OCR-objects/APIs */ clssomonretOSDINIT = 53, /* error in OSD layer used by generic code*/ clssomonretMEMINIT = 54, /* unable to allocate memory during init */ clssomonretREINIT = 55, /* exceeded the CSS context reinit limit */ clssomonretINUSE = 56, /* duplicate oclsomon found */
4,ocssd于oclsomon日志
由于ocssd进程已经暂停了,所有ocssd没有任何日志信息
下面是clsomon日志信息
[root@cisser1 ~]# tail -f /oracle/app/oracle/product/10.2.0/crs_1/log/cisser1/cssd/oclsomon/oclsomon.log 2015-03-29 11:23:05.781: clsc_connect: (0x8467d70) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_cisser1_)) 2015-03-29 11:23:08.435: clsc_connect: (0x8466450) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_cisser1_)) 2015-03-29 11:23:15.684 clssomon: end of cssinit, status 0 2015-03-29 11:23:15.685 Reconfig event. (1/1/1) 2015-03-29 11:23:16.186 Reconfig event. (2/2/1) 2015-03-29 11:45:23.250 clssomon: Timeout waiting for CSS response.
5,节点2的日志信息
5.1 ocssd日志信息
[ CSSD]2015-03-29 11:44:55.852 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 50% heartbeat fatal, eviction in 29.810 seconds seedhbimpd 0 [ CSSD]2015-03-29 11:44:55.852 [633092416] >TRACE: clssnmPollingThread: node cisser1 (1) is impending reconfig, flag 1039, misstime 30190 [ CSSD]2015-03-29 11:44:55.852 [633092416] >TRACE: clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1) [ CSSD]2015-03-29 11:44:56.854 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 50% heartbeat fatal, eviction in 28.810 seconds seedhbimpd 1 [ CSSD]2015-03-29 11:44:58.709 [643582272] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 11:44:58.709 [643582272] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes [ CSSD]2015-03-29 11:45:02.716 [643582272] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 11:45:02.716 [643582272] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes [ CSSD]2015-03-29 11:45:07.725 [643582272] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 11:45:07.725 [643582272] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes [ CSSD]2015-03-29 11:45:10.855 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 75% heartbeat fatal, eviction in 14.810 seconds seedhbimpd 1 [ CSSD]2015-03-29 11:45:11.857 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 75% heartbeat fatal, eviction in 13.810 seconds seedhbimpd 1 [ CSSD]2015-03-29 11:45:12.733 [643582272] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 11:45:12.733 [643582272] >TRACE: clssnmSendingThread: sent 5 status msgs to all nodes [ CSSD]2015-03-29 11:45:16.738 [643582272] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 11:45:16.738 [643582272] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes [ CSSD]2015-03-29 11:45:19.858 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 5.810 seconds seedhbimpd 1 [ CSSD]2015-03-29 11:45:20.744 [643582272] >TRACE: clssnmSendingThread: sending status msg to all nodes [ CSSD]2015-03-29 11:45:20.744 [643582272] >TRACE: clssnmSendingThread: sent 4 status msgs to all nodes [ CSSD]2015-03-29 11:45:20.859 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 4.810 seconds seedhbimpd 1 [ CSSD]2015-03-29 11:45:21.860 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 3.810 seconds seedhbimpd 1 [ CSSD]2015-03-29 11:45:22.673 [579823936] >TRACE: clssgmAllocateRPCIndex: allocated rpc 326 (0x2b7b1f5a1310) [ CSSD]2015-03-29 11:45:22.673 [579823936] >TRACE: clssgmRPC: rpc 0x2b7b1f5a1310 (RPC#326) tag(146002a) sent to node 1 [ CSSD]2015-03-29 11:45:22.861 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 2.810 seconds seedhbimpd 1 [ CSSD]2015-03-29 11:45:23.863 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 1.800 seconds seedhbimpd 1 [ CSSD]2015-03-29 11:45:24.855 [633092416] >WARNING: clssnmPollingThread: node cisser1 (1) at 90% heartbeat fatal, eviction in 0.810 seconds seedhbimpd 1 [ CSSD]2015-03-29 11:45:25.667 [633092416] >TRACE: clssnmPollingThread: Eviction started for node cisser1 (1), flags 0x040f, state 3, wt4c 0 seedhbimpd 1
这里看以看到主机2在11:45:25的时候开始驱除主机1,但是主机在11:45:23分的时候就开始重启主机了,所以主机重启由于oclsomon进程导致的,而不是节点驱除导致的。
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。