
问题现象:
Oracle RAC 节点1重启操作系统后,CRS无法启动。
报错如下:
2022-06-15 19:41:22.077:[cssd(29365)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in oracle/grid/product/11.0/log/cjc-db-01/cssd/ocssd.log
检查日志
检查/oracle/grid/product/11.0/log/cjc-db-01/cssd/ocssd.log没有找到详细信息。
找不到voting files,难道共享存储有问题?

问题分析:
1.对比两节点磁盘挂载挂载权限,完全一致。
查看节点2共享存储权限
root@cjc-db-01:/oradata#ll -rthtotal 159G-rw-rw---- 1 grid asmadmin 20G Jun 16 09:23 cjcdata5-rw-rw---- 1 grid asmadmin 20G Jun 16 09:24 cjcdata3-rw-rw---- 1 grid asmadmin 20G Jun 16 09:25 cjcdata7-rw-rw---- 1 grid asmadmin 20G Jun 16 09:25 cjcdata4-rw-rw---- 1 grid asmadmin 20G Jun 16 09:25 cjcdata6-rw-rw---- 1 grid asmadmin 20G Jun 16 09:25 cjcdata1-rw-rw---- 1 grid asmadmin 6.0G Jun 16 09:25 ocr2-rw-rw---- 1 grid asmadmin 6.0G Jun 16 09:25 ocr3-rw-rw---- 1 grid asmadmin 6.0G Jun 16 09:25 ocr1-rw-rw---- 1 grid asmadmin 20G Jun 16 09:25 cjcdata2
对比两个节点检查NAS挂载参数,完全一致
root@cjc-db-01:/oradata#mount|grep oradata192.168.0.10:/cjc_db_oradata_01_nfs on oradata type nfs (rw,relatime,sync,vers=3,rsize=32768,wsize=32768,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,noac,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.0.10,mountvers=3,mountport=2050,mountproto=tcp,local_lock=none,addr=192.168.0.10)root@cjc-db-02:/root#mount|grep oradata192.168.0.10:/cjc_db_oradata_01_nfs on oradata type nfs (rw,relatime,sync,vers=3,rsize=32768,wsize=32768,namlen=255,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0,hard,noac,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.0.10,mountvers=3,mountport=2050,mountproto=tcp,local_lock=none,addr=192.168.0.10)
2.检查voting files、OCR是否正常?
登录节点2,检查集群 voting files、OCR正常,无问题。
root@cjc-db-02:/root#su - grid
检查ocr
grid@cjc-db-02:/home/grid$ ocrcheckStatus of Oracle Cluster Registry is as follows :Version : 3Total space (kbytes) : 262120Used space (kbytes) : 2920Available space (kbytes) : 259200ID : 550682119Device/File Name : +SYSDevice/File integrity check succeededDevice/File Name : +DATADevice/File integrity check succeededDevice/File not configuredDevice/File not configuredDevice/File not configuredCluster registry integrity check succeededLogical corruption check bypassed due to non-privileged user
检查ocrdisk
grid@cjc-db-02:/home/grid$ crsctl query css ocrdisk## STATE File Universal Id File Name Disk group-- ----- ----------------- --------- ---------1. ONLINE 64c8b1ebd5fd4f19bfa1c7ef88397359 (/dev/rac/ocr1) [SYS]2. ONLINE 1953c55c29bd4fa9bf3d04d7fe001112 (/dev/rac/ocr2) [SYS]3. ONLINE 7d719d3170a34ff7bfff5cc5abae15cb (/dev/rac/ocr3) [SYS]Located 3 voting disk(s).
嗯,不对,共享磁盘为什么显示在/dev/rac目录下,不是在/oradata下吗?
登录节点2,检查磁盘路径,显示共享磁盘均在/dev/rac目录下
select path from v$asm_disk;
难道是有软链接?
检查节点2,确实存在软链接:
root@cjc-db-02:/root#ll -rth /dev/rac/total 0lrwxrwxrwx 1 root root 14 Dec 12 2019 ocr1 -> /oradata/ocr1lrwxrwxrwx 1 root root 14 Dec 12 2019 ocr2 -> /oradata/ocr2lrwxrwxrwx 1 root root 14 Dec 12 2019 ocr3 -> /oradata/ocr3lrwxrwxrwx 1 root root 15 Dec 13 2019 cjcdata1 -> /oradata/cjcdata1lrwxrwxrwx 1 root root 15 Dec 13 2019 cjcdata2 -> /oradata/cjcdata2lrwxrwxrwx 1 root root 15 Dec 13 2019 cjcdata3 -> /oradata/cjcdata3lrwxrwxrwx 1 root root 15 Dec 13 2019 cjcdata4 -> /oradata/cjcdata4lrwxrwxrwx 1 root root 15 Dec 13 2019 cjcdata5 -> /oradata/cjcdata5lrwxrwxrwx 1 root root 15 Dec 13 2019 cjcdata6 -> /oradata/cjcdata6lrwxrwxrwx 1 root root 15 Dec 13 2019 cjcdata7 -> /oradata/cjcdata7
检查节点1软链接,没有?
root@cjc-db-01:/root#ll -rth /dev/rac*ls: cannot access /dev/rac*: No such file or directory
检查历史命令,没有删除软链接的操作。

解决方案:
先新建软链接
新建/dev/rac目录,授权,创建软连接。
注意软连接路径要和另一个节点完全一致。
ln -s /oradata/cjcdata1 /dev/rac/cjcdata1ln -s /oradata/cjcdata2 /dev/rac/cjcdata2ln -s /oradata/cjcdata3 /dev/rac/cjcdata3ln -s /oradata/cjcdata4 /dev/rac/cjcdata4ln -s /oradata/cjcdata5 /dev/rac/cjcdata5ln -s /oradata/cjcdata6 /dev/rac/cjcdata6ln -s /oradata/cjcdata7 /dev/rac/cjcdata7ln -s /oradata/ocr1 /dev/rac/ocr1ln -s /oradata/ocr2 /dev/rac/ocr2ln -s /oradata/ocr3 /dev/rac/ocr3
查看权限和名称
root@cjc-db-01:/dev#ll -rht rac/*lrwxrwxrwx 1 root root 15 Jun 15 19:49 rac/cjcdata1 -> /oradata/cjcdata1lrwxrwxrwx 1 root root 15 Jun 15 19:49 rac/cjcdata2 -> /oradata/cjcdata2lrwxrwxrwx 1 root root 15 Jun 15 19:49 rac/cjcdata3 -> /oradata/cjcdata3lrwxrwxrwx 1 root root 15 Jun 15 19:49 rac/cjcdata4 -> /oradata/cjcdata4lrwxrwxrwx 1 root root 15 Jun 15 19:49 rac/cjcdata5 -> /oradata/cjcdata5lrwxrwxrwx 1 root root 15 Jun 15 19:49 rac/cjcdata6 -> /oradata/cjcdata6lrwxrwxrwx 1 root root 15 Jun 15 19:49 rac/cjcdata7 -> /oradata/cjcdata7lrwxrwxrwx 1 root root 14 Jun 15 19:51 rac/ocr1 -> /oradata/ocr1lrwxrwxrwx 1 root root 14 Jun 15 19:51 rac/ocr2 -> /oradata/ocr2lrwxrwxrwx 1 root root 14 Jun 15 19:51 rac/ocr3 -> /oradata/ocr3
再次启动CRS,集群恢复正常。
再回到最开始的问题,为什么重启服务器后共享存储的软链接会丢失?
是软链接位置的原因,不能放在/dev下,实际上也没必要使用软链接。
经验证,手动在/dev目录下创建任何文件,包括软链接,重启操作系统后文件都会丢失。

文章转载自IT小Chen,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




