重启节点2服务器遇到以下报错,OHASD无法正常启动
检查相关报警日志
理解相关概念
[root@cwglrac11 ~]# /u01/app/11.2.0/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
CRS-4638: Oracle高可用性服务处于在线状态
CRS-4537:集群就绪服务在线
CRS-4529:集群同步服务处于在线状态
CRS-4533:事件管理器在线
层次1:OHAS层面,负责集群的初始化资源和进程
层次2:CSS层面,负责构建集群并保证集群的一致性
层次3:CRS层面,负责管理集群得各种应用程序资源
层次4:EVM层面,负责在集群节点间传递集群事件
1、查看数据库集群信息,直接报错
[root@cwglrac12 ~]# /u01/app/11.2.0/grid/bin/crsctl status res -t
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4000: Command Status failed, or completed with errors.
2、查看数据库集群具体到了哪步
[root@cwglrac12 ~]# /u01/app/11.2.0/grid/bin/crsctl check crs
CRS-4639: Could not contact Oracle High Availability Services
3、检查进程
[root@cwglrac12 ~]# ps -ef|grep d.bin
root 14343 1 0 14:06 ? 00:00:00 /u01/app/11.2.0/grid/bin/ohasd.bin reboot
root 34104 33560 0 14:45 pts/0 00:00:00 grep d.bin
[root@cwglrac12 ~]# ps -ef| grep ohasd
root 6920 1 0 14:05 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
root 14343 1 0 14:06 ? 00:00:00 /u01/app/11.2.0/grid/bin/ohasd.bin reboot
root 35353 33560 0 14:47 pts/0 00:00:00 grep ohasd
4、查看系统报警日志
tail -3000f /var/log/message
Apr 16 14:14:51 cwglrac12 logger: autorun file for ohasd is missing
5、查看ohasd报警日志,负责集群的初始化资源和进程
tail -3000f /u01/app/11.2.0/grid/log/cwglrac12/ohasd/ohasd.log
2021-04-16 14:16:10.707: [ default][1283712800] Created alert : (:OHAS00117:) : TIMED OUT WAITING FOR OHASD MONITOR
tail -3000f /u01/app/11.2.0/grid/log/cwglrac12/ohasd/ohasdOUT.log
2021-04-16 15:17:39
Changing directory to /u01/app/11.2.0/grid/log/cwglrac12/ohasd
OHASD starting
Timed out waiting for init.ohasd script to start; posting an alert
OHASD stderr redirected to ohasdOUT.log
6、查看ocssd报警日志,负责构建集群并保证集群的一致性
tail -3000f /u01/app/11.2.0/grid/log/cwglrac12/cssd/ocssd.log
关键点Apr 16 14:14:51 以后没任何信息
7、查看crsd报警日志,负责管理集群得各种应用程序资源
tail -3000f /u01/app/11.2.0/grid/log/cwglrac12/crsd/crsd.log
关键点Apr 16 14:14:51 以后没任何信息
8、数据库集群报警日志
tail -3000f /u01/app/11.2.0/grid/log/cwglrac12/alertcwglrac12.log
2021-04-16 14:16:10.706:
[ohasd(14343)]CRS-0715:Oracle High Availability Service has timed out waiting for init.ohasd to be started.
=========================================================
猜想:临时解决办法1
通过数据库集群报警日志Oracle High Availability Service has timed out waiting for init.ohasd to be started 里边的 init.ohasd
[root@cwglrac12 ~]# ps -ef| grep ohasd
root 6920 1 0 14:05 ? 00:00:00 /bin/sh /etc/init.d/init.ohasd run
root 14343 1 0 14:06 ? 00:00:00 /u01/app/11.2.0/grid/bin/ohasd.bin reboot
root 35353 33560 0 14:47 pts/0 00:00:00 grep ohasd
kill -9 6920
即可解决
为啥kill它会自动生成??
=========================================================
猜想:临时解决办法2
编写开机自启动任务
用户级别
写入/etc/rc.local(推荐使用,一般作为企业服务器的档案文件,Linxu启动最后加载的东西)
vim /etc/rc.local # 所有程序都必须写入rc.local,注释并备份
最后一行为加入命令行
[root@cwglrac12 ~]# vim /etc/rc.local
#!/bin/sh
touch /var/lock/subsys/local
ps -ef|grep init.ohasd|grep -v grep|awk ‘{print $2}’|xargs kill -9
=========================================================
最终解决办法如下
vim /etc/init.d/init.ohasd
sleep 30
######### Shell functions #########
##############################################################
感谢陈哥鼎力相助!!!!祝你早日找到女朋友~
最后就是一个灵魂提问:问题的根源是什么?希望大佬们能够帮忙,拜谢
=================================================================
感谢强哥帮我解决了这个问题,把方案粘贴出来如下
参考如下 MOS 文档检查看看
autorun file for ohasd is missing (Doc ID 1427234.1)
ohasd的自动运行文件丢失
A. Bug 15869775 - init.ohasd starts before hostname is set
The bug is fixed in 11.2.0.3 GI PSU9, 11.2.0.4 GI PSU4, 12.1.0.1 onwards
Bug 15869775 - init。ohasd在主机名设置之前开始
这个bug在11.2.0.3 GI PSU9, 11.2.0.4 GI PSU4, 12.1.0.1之后被修复
It’s recommended to apply the patch.
建议应用补丁。
Before the patch can be applied, the workaround is to modify init.ohasd script:
在应用补丁之前,解决方法是修改init.ohasd脚本:
Add the following line before line “######### Shell functions #########” so it will look like the following:
在"######### Shell functions #########"之前添加以下一行,这样它看起来就像下面这样:
vim /etc/init.d/init.ohasd
sleep 30
######### Shell functions #########




