暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

KFED修复ASM磁盘故障诊断案例一则

原创 eygle 2011-09-14
643
最近遇到一则用户ASM磁盘故障,转载一篇老杨的文章,供参考。
老杨原文地址: http://yangtingkun.itpub.net/post/468/521325

一个客户的RAC环境出现了故障,一个节点操作系统崩溃,重装系统后,CLUSTER添加成功,但是ASM实例添加报错。




当时通过电话简单了解了一下情况:Oracle 10204 RAC for Linux X86-64的环境,前一段时间操作系统出现了故障,导致一个节点重装。重装过程到了添加ASM实例的时候出现了错误。


一般来说,重装过程最复杂的是CLUSTER的清除和添加,如果CLUSTER已经添加成功,那么剩下的就是数据库级的操作,相对来说会简单得多。根据上面的信息初步判断,问题可能发生在几点,新节点的ORACLE_HOME的版本和旧节点不一致,或者10204补丁没有在CLUSTER上安装,或者是在新节点上没有给oracle用户授权,还有可能就是共享存储在新节点上挂载存在问题。


到了现场后才发现,问题和我的判断大相径庭,数据库的版本,权限等都不存在问题,事实上,新节点上的ASM实例已经启动,且两个磁盘组中的一个已经MOUNT,但另外一个磁盘组无法MOUNT


SQL> alter diskgroup datag1
mount;

alter diskgroup datag1 mount

*

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15063: ASM discovered an insufficient number of disks for diskgroup
"DATAG1"


ORA-15032只是一个描述性错误,而关键的ORA-15063错误比较少见。


由于另外一个节点从RAC环境出现故障后一直没有停机,因此可以从这个节点的ASM实例上检查磁盘组和磁盘的状态:


SQL> select group_number, name, state, total_mb, free_mb from
v$asm_diskgroup;


GROUP_NUMBER NAME
STATE
TOTAL_MB FREE_MB

------------ ------------------------------------ ---------- ----------

1 DATAG1 MOUNTED 512000 217396

2 DATAG2 MOUNTED 1024000 347457


SQL> select group_number, disk_number, mount_status, header_status,
name, path from v$asm_disk where group_number = 1;


GROUP_NUMBER DISK_NUMBER MOUNT_STATUS HEADER_STATUS NAME
PATH

------------ ----------- ------------ -------------- ----------- -------------

1 0 CACHED PROVISIONED DATAG1_0000 /dev/raw/raw4

1 1 CACHED MEMBER DATAG1_0001 /dev/raw/raw5


SQL> select instance_number, instance_name from v$instance;


INSTANCE_NUMBER INSTANCE_NAME

--------------- --------------------------------

2 +ASM2


可以看到,磁盘组DATAG1中的磁盘DATAG1_0000的文件头状态不正确。从新添加的实例1上检查ASM磁盘信息:


SQL> select group_number, name, state, total_mb, free_mb from
v$asm_diskgroup;


GROUP_NUMBER NAME
STATE
TOTAL_MB FREE_MB

------------ ------------------------------------ ---------- ----------

1 DATAG1 DISMOUNTED 0 0

2 DATAG2 MOUNTED 1024000 347457


SQL> select group_number, disk_number, mount_status, header_status,
name, path from v$asm_disk;


GROUP_NUMBER DISK_NUMBER MOUNT_STATUS HEADER_STATUS NAME
PATH

------------ ----------- ------------ -------------- ----------- --------------

0 0 CLOSED FOREIGN /dev/raw/raw1

0 1 CLOSED FOREIGN /dev/raw/raw2

0 2 CLOSED FOREIGN /dev/raw/raw8

0 3 CLOSED FOREIGN /dev/raw/raw10

0 4 CLOSED FOREIGN /dev/raw/raw9

0 8 CLOSED PROVISIONED /dev/raw/raw4

0 5 CLOSED MEMBER /dev/raw/raw5

2 1 CACHED
MEMBER DATAG2_0001
/dev/raw/raw7

2 0 CACHED MEMBER DATAG2_0000 /dev/raw/raw6


SQL> select instance_number, instance_name from v$instance;


INSTANCE_NUMBER INSTANCE_NAME

--------------- --------------------------------

1 +ASM1


由于节点1上的ASM磁盘组1无法MOUNT,因此可以看到/dev/raw/raw4/dev/raw/raw5两个磁盘对应的GROUP_NUMBER0,且MOUNT_STATUSCLOSED,而raw4对应的HEADER_STATUS也是PROVISIONED,显然就是这个错误的状态值导致当前节点无法MOUNT磁盘组。


由于另外一个实例一直没有关闭过,因此虽然磁盘头的状态不正常,但是并不影响这个磁盘组的正常使用,而如果这时对ASM实例进行重启,则这个磁盘组必然无法再次MOUNT


根据Oracle的文档的描述,PROVISIONEDCANDIDATE状态有所区别,二者虽然表示的都是这个磁盘为候选磁盘,不属于任何一个磁盘组,不过CANDIDATE是正常的候选状态,而PROVISIONED则说明磁盘经过特殊的处理,比如操作系统工具对磁盘进行过特殊的操作。


既然这个磁盘的状态是PROVISIONED,那么能否通过再次添加这个磁盘到磁盘组,使得磁盘头的状态变成正常。于是尝试在问题节点再次添加这个磁盘,但是由于磁盘组处于NOMOUNT状态,导致添加操作失败,而尝试在目前正常工作的节点添加磁盘,结果同样报错:


SQL> alter diskgroup datag1 add disk '/dev/raw/raw4';

alter diskgroup datag1 add disk '/dev/raw/raw4'

*

ERROR at line 1:

ORA-15032: not all alterations performed

ORA-15029: disk '/dev/raw/raw4' is already mounted by this instance


看来常规手段已经无法解决这个问题,那么现在只剩两个办法,一是重建ASM磁盘,二是直接修改ASM磁盘头信息。


重建意味着较长的停机时间,既是客户当前环境中配置了DATA GUARD,重建ASM磁盘组也不是一个轻松的事情。当前的PRIMARY数据库和STANDBY数据库都是两节点的RAC环境,而在数据库中还配置STREAMCAPTURE,这本身使得数据库架构异常复杂,进行SWITCHOVER的难度相对也比较大。而且对于当前运行的节点而言,一旦关闭,很有可能导致ASM磁盘无法MOUNT,这就会导致原本计划中的SWITCHOVER变成FAILOVER,甚至有可能损失ONLINE REDO LOGFILE的风险,因此无论从哪个角度看,重建都不是一个最佳的解决方案,显然如果能直接将ASM磁盘的头信息改写正确,这无疑是最直接代价最小的解决方案。

想要修改ASM磁盘头信息,要借助Oracle的工具kfed,而默认这个工具是没有编译的,进入$ORACLE_HOME/lib目录,对kfed工具信息编译:


[oracle@node1 lib]$ make -f ins_rdbms.mk ikfed


Linking KFED utility (kfed)

rm -f /u01/app/oracle/product/10.2.0/db_1/rdbms/lib/kfed

gcc -o /u01/app/oracle/product/10.2.0/db_1/rdbms/lib/kfed
-L/u01/app/oracle/product/10.2.0/db_1/rdbms/lib/
-L/u01/app/oracle/product/10.2.0/db_1/lib/ -L/u01/app/oracle/product/10.2.0/db_1/lib/stubs/
/u01/app/oracle/product/10.2.0/db_1/lib/s0main.o
/u01/app/oracle/product/10.2.0/db_1/rdbms/lib/sskfeded.o
/u01/app/oracle/product/10.2.0/db_1/rdbms/lib/skfedpt.o
/u01/app/oracle/product/10.2.0/db_1/rdbms/lib/defopt.o -ldbtools10 -lclntsh
`cat /u01/app/oracle/product/10.2.0/db_1/lib/ldflags` -lnsslb10 -lncrypt10
-lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat
/u01/app/oracle/product/10.2.0/db_1/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10
-lnzjs10 -ln10 -lnnz10 -lnl10 -lclient10 -lnnetd10 -lvsn10 -lcommon10
-lgeneric10 -lmm -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10
-lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `cat
/u01/app/oracle/product/10.2.0/db_1/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10
-lnzjs10 -ln10 -lnnz10 -lnl10 -lnro10 `cat
/u01/app/oracle/product/10.2.0/db_1/lib/ldflags` -lnsslb10 -lncrypt10 -lnsgr10
-lnzjs10 -ln10 -lnnz10 -lnl10 -lclient10 -lnnetd10 -lvsn10 -lcommon10
-lgeneric10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10
-lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lclient10
-lnnetd10 -lvsn10 -lcommon10 -lgeneric10 -lsnls10 -lnls10 -lcore10 -lsnls10
-lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10
-lcore10 -lnls10 `cat /u01/app/oracle/product/10.2.0/db_1/lib/sysliblist`
-Wl,-rpath,/u01/app/oracle/product/10.2.0/db_1/lib -lm `cat
/u01/app/oracle/product/10.2.0/db_1/lib/sysliblist` -ldl -lm
-L/u01/app/oracle/product/10.2.0/db_1/lib

mv -f /u01/app/oracle/product/10.2.0/db_1/bin/kfed
/u01/app/oracle/product/10.2.0/db_1/bin/kfedO

mv: cannot stat `/u01/app/oracle/product/10.2.0/db_1/bin/kfed': No such file or
directory

make: [ikfed] Error 1 (ignored)

mv /u01/app/oracle/product/10.2.0/db_1/rdbms/lib/kfed
/u01/app/oracle/product/10.2.0/db_1/bin/kfed

chmod 751 /u01/app/oracle/product/10.2.0/db_1/bin/kfed


下面就可以用kfed来检查ASM磁盘裸设备的文件头信息了:


[oracle@node1 data]$ kfed read /dev/raw/raw4 > raw4.txt

[oracle@node1 data]$ kfed read /dev/raw/raw5 > raw5.txt


[oracle@node1 data]$ kfed read /dev/raw/raw6 > raw6.txt


将磁盘组DATAG1的两个磁盘头以及另一个磁盘组DATAG2的第一个文件的头信息输出到文本文件,将三个文件进行对比,找到raw4文件中异常的部分。


由于每个磁盘组中有两个文件,且有两个磁盘组,使得比对的工作非常顺利,定位到raw4中存在两个标志位异常:


[oracle@nccpxdb1 lib]$ kfed read /dev/raw/raw4

kfbh.endian: 1 ; 0x000: 0x01

kfbh.hard: 130 ;
0x001: 0x82

kfbh.type: 1 ;
0x002: KFBTYP_DISKHEAD

kfbh.datfmt: 1 ;
0x003: 0x01

kfbh.block.blk: 0 ;
0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483648 ;
0x008: TYPE=0x8 NUMB=0x0

kfbh.check: 4024565597 ;
0x00c: 0xefe1ff5d

kfbh.fcn.base: 952047 ;
0x010: 0x000e86ef

kfbh.fcn.wrap: 0 ;
0x014: 0x00000000

kfbh.spare1: 0 ;
0x018: 0x00000000

kfbh.spare2: 0 ;
0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISK ;
0x000: length=8

kfdhdb.driver.reserved[0]: 0 ;
0x008: 0x00000000

kfdhdb.driver.reserved[1]: 0 ;
0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ;
0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ;
0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ;
0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ; 0x01c: 0x00000000

kfdhdb.compat: 168820736 ;
0x020: 0x0a100000

kfdhdb.dsknum: 0 ;
0x024: 0x0000

kfdhdb.grptyp: 1 ;
0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ;
0x027: KFDHDR_MEMBER

kfdhdb.dskname: DATAG1_0000 ;
0x028: length=11

kfdhdb.grpname: DATAG1 ;
0x048: length=6

kfdhdb.fgname: DATAG1_0000 ;
0x068: length=11

kfdhdb.capname: ;
0x088: length=0

kfdhdb.crestmp.hi: 32928501 ;
0x0a8: HOUR=0x15 DAYS=0x17 MNTH=0xc YEAR=0x7d9

kfdhdb.crestmp.lo: 2195144704 ;
0x0ac: USEC=0x0 MSEC=0x1d0 SECS=0x2d MINS=0x20

kfdhdb.mntstmp.hi: 32940275 ;
0x0b0: HOUR=0x13 DAYS=0x7 MNTH=0x8 YEAR=0x7da

kfdhdb.mntstmp.lo: 3201116160 ;
0x0b4: USEC=0x0 MSEC=0x34a SECS=0x2c MINS=0x2f

kfdhdb.secsize: 512 ;
0x0b8: 0x0200

kfdhdb.blksize: 4096 ;
0x0ba: 0x1000

kfdhdb.ausize: 1048576 ;
0x0bc: 0x00100000

kfdhdb.mfact: 113792 ;
0x0c0: 0x0001bc80

kfdhdb.dsksize: 512000 ;
0x0c4: 0x0007d000

kfdhdb.pmcnt: 6 ;
0x0c8: 0x00000006

kfdhdb.fstlocn: 1 ;
0x0cc: 0x00000001

kfdhdb.altlocn: 2 ;
0x0d0: 0x00000002

kfdhdb.f1b1locn: 2 ;
0x0d4: 0x00000002

kfdhdb.redomirrors[0]: 0 ;
0x0d8: 0x0000

kfdhdb.redomirrors[1]: 65535 ;
0x0da: 0xffff

kfdhdb.redomirrors[2]: 65535 ; 0x0dc: 0xffff

kfdhdb.redomirrors[3]: 65535 ;
0x0de: 0xffff

kfdhdb.dbcompat: 168820736 ;
0x0e0: 0x0a100000

kfdhdb.grpstmp.hi: 32928501 ;
0x0e4: HOUR=0x15 DAYS=0x17 MNTH=0xc YEAR=0x7d9

kfdhdb.grpstmp.lo: 2195053568 ; 0x0e8: USEC=0x0 MSEC=0x177
SECS=0x2d MINS=0x20

kfdhdb.ub4spare[0]: 0 ;
0x0ec: 0x00000000

kfdhdb.ub4spare[1]: 0 ;
0x0f0: 0x00000000

kfdhdb.ub4spare[2]: 0 ;
0x0f4: 0x00000000

kfdhdb.ub4spare[3]: 0 ;
0x0f8: 0x00000000

kfdhdb.ub4spare[4]: 0 ;
0x0fc: 0x00000000

kfdhdb.ub4spare[5]: 0 ;
0x100: 0x00000000

kfdhdb.ub4spare[6]: 0 ;
0x104: 0x00000000

kfdhdb.ub4spare[7]: 0 ; 0x108: 0x00000000

kfdhdb.ub4spare[8]: 0 ;
0x10c: 0x00000000

kfdhdb.ub4spare[9]: 0 ;
0x110: 0x00000000

kfdhdb.ub4spare[10]: 0 ;
0x114: 0x00000000

kfdhdb.ub4spare[11]: 0 ;
0x118: 0x00000000

kfdhdb.ub4spare[12]: 0 ;
0x11c: 0x00000000

kfdhdb.ub4spare[13]: 0 ;
0x120: 0x00000000

kfdhdb.ub4spare[14]: 0 ;
0x124: 0x00000000

kfdhdb.ub4spare[15]: 0 ;
0x128: 0x00000000

kfdhdb.ub4spare[16]: 0 ;
0x12c: 0x00000000

kfdhdb.ub4spare[17]: 0 ;
0x130: 0x00000000

kfdhdb.ub4spare[18]: 0 ;
0x134: 0x00000000

kfdhdb.ub4spare[19]: 0 ;
0x138: 0x00000000

kfdhdb.ub4spare[20]: 0 ;
0x13c: 0x00000000

kfdhdb.ub4spare[21]: 0 ;
0x140: 0x00000000

kfdhdb.ub4spare[22]: 0 ;
0x144: 0x00000000

kfdhdb.ub4spare[23]: 0 ;
0x148: 0x00000000

kfdhdb.ub4spare[24]: 0 ; 0x14c: 0x00000000

kfdhdb.ub4spare[25]: 0 ;
0x150: 0x00000000

kfdhdb.ub4spare[26]: 0 ;
0x154: 0x00000000

kfdhdb.ub4spare[27]: 0 ;
0x158: 0x00000000

kfdhdb.ub4spare[28]: 0 ;
0x15c: 0x00000000

kfdhdb.ub4spare[29]: 0 ;
0x160: 0x00000000

kfdhdb.ub4spare[30]: 0 ;
0x164: 0x00000000

kfdhdb.ub4spare[31]: 0 ;
0x168: 0x00000000

kfdhdb.ub4spare[32]: 0 ;
0x16c: 0x00000000

kfdhdb.ub4spare[33]: 0 ;
0x170: 0x00000000

kfdhdb.ub4spare[34]: 0 ;
0x174: 0x00000000

kfdhdb.ub4spare[35]: 0 ;
0x178: 0x00000000

kfdhdb.ub4spare[36]: 0 ;
0x17c: 0x00000000

kfdhdb.ub4spare[37]: 0 ; 0x180: 0x00000000

kfdhdb.ub4spare[38]: 0 ;
0x184: 0x00000000

kfdhdb.ub4spare[39]: 0 ;
0x188: 0x00000000

kfdhdb.ub4spare[40]: 0 ;
0x18c: 0x00000000

kfdhdb.ub4spare[41]: 0 ;
0x190: 0x00000000

kfdhdb.ub4spare[42]: 0 ;
0x194: 0x00000000

kfdhdb.ub4spare[43]: 104436 ;
0x198: 0x000197f4

kfdhdb.ub4spare[44]: 0 ;
0x19c: 0x00000000

kfdhdb.ub4spare[45]: 0 ;
0x1a0: 0x00000000

kfdhdb.ub4spare[46]: 0 ;
0x1a4: 0x00000000

kfdhdb.ub4spare[47]: 0 ;
0x1a8: 0x00000000

kfdhdb.ub4spare[48]: 0 ;
0x1ac: 0x00000000

kfdhdb.ub4spare[49]: 0 ;
0x1b0: 0x00000000

kfdhdb.ub4spare[50]: 0 ;
0x1b4: 0x00000000

kfdhdb.ub4spare[51]: 0 ;
0x1b8: 0x00000000

kfdhdb.ub4spare[52]: 0 ;
0x1bc: 0x00000000

kfdhdb.ub4spare[53]: 0 ;
0x1c0: 0x00000000

kfdhdb.ub4spare[54]: 0 ;
0x1c4: 0x00000000

kfdhdb.ub4spare[55]: 0 ;
0x1c8: 0x00000000

kfdhdb.ub4spare[56]: 0 ;
0x1cc: 0x00000000

kfdhdb.ub4spare[57]: 0 ;
0x1d0: 0x00000000

kfdhdb.acdb.aba.seq: 0 ; 0x1d4: 0x00000000

kfdhdb.acdb.aba.blk: 0 ;
0x1d8: 0x00000000

kfdhdb.acdb.ents: 0 ;
0x1dc: 0x0000

kfdhdb.acdb.ub2spare: 43605 ;
0x1de: 0xaa55


raw5磁盘的标识位全为0


[oracle@nccpxdb1 lib]$ kfed read /dev/raw/raw5

kfbh.endian: 1 ;
0x000: 0x01

kfbh.hard: 130 ;
0x001: 0x82

kfbh.type: 1 ;
0x002: KFBTYP_DISKHEAD

kfbh.datfmt: 1 ;
0x003: 0x01

kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0

kfbh.block.obj: 2147483649 ;
0x008: TYPE=0x8 NUMB=0x1

kfbh.check: 1802212223 ;
0x00c: 0x6b6b937f

kfbh.fcn.base: 0 ;
0x010: 0x00000000

kfbh.fcn.wrap: 0 ; 0x014: 0x00000000

kfbh.spare1: 0 ;
0x018: 0x00000000

kfbh.spare2: 0 ;
0x01c: 0x00000000

kfdhdb.driver.provstr: ORCLDISK ;
0x000: length=8

kfdhdb.driver.reserved[0]: 0 ;
0x008: 0x00000000

kfdhdb.driver.reserved[1]: 0 ;
0x00c: 0x00000000

kfdhdb.driver.reserved[2]: 0 ;
0x010: 0x00000000

kfdhdb.driver.reserved[3]: 0 ;
0x014: 0x00000000

kfdhdb.driver.reserved[4]: 0 ;
0x018: 0x00000000

kfdhdb.driver.reserved[5]: 0 ;
0x01c: 0x00000000

kfdhdb.compat: 168820736 ;
0x020: 0x0a100000

kfdhdb.dsknum: 1 ;
0x024: 0x0001

kfdhdb.grptyp: 1 ;
0x026: KFDGTP_EXTERNAL

kfdhdb.hdrsts: 3 ;
0x027: KFDHDR_MEMBER

kfdhdb.dskname: DATAG1_0001 ;
0x028: length=11

kfdhdb.grpname: DATAG1 ;
0x048: length=6

kfdhdb.fgname: DATAG1_0001 ;
0x068: length=11

kfdhdb.capname: ; 0x088: length=0

kfdhdb.crestmp.hi: 32928501 ;
0x0a8: HOUR=0x15 DAYS=0x17 MNTH=0xc YEAR=0x7d9

kfdhdb.crestmp.lo: 2195144704 ;
0x0ac: USEC=0x0 MSEC=0x1d0 SECS=0x2d MINS=0x20

kfdhdb.mntstmp.hi: 32940275 ;
0x0b0: HOUR=0x13 DAYS=0x7 MNTH=0x8 YEAR=0x7da

kfdhdb.mntstmp.lo: 3201116160 ;
0x0b4: USEC=0x0 MSEC=0x34a SECS=0x2c MINS=0x2f

kfdhdb.secsize: 512 ;
0x0b8: 0x0200

kfdhdb.blksize: 4096 ;
0x0ba: 0x1000

kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000

kfdhdb.mfact: 113792 ;
0x0c0: 0x0001bc80

kfdhdb.dsksize: 512000 ;
0x0c4: 0x0007d000

kfdhdb.pmcnt: 6 ;
0x0c8: 0x00000006

kfdhdb.fstlocn: 1 ; 0x0cc: 0x00000001

kfdhdb.altlocn: 2 ;
0x0d0: 0x00000002

kfdhdb.f1b1locn: 0 ;
0x0d4: 0x00000000

kfdhdb.redomirrors[0]: 0 ;
0x0d8: 0x0000

kfdhdb.redomirrors[1]: 0 ;
0x0da: 0x0000

kfdhdb.redomirrors[2]: 0 ;
0x0dc: 0x0000

kfdhdb.redomirrors[3]: 0 ;
0x0de: 0x0000

kfdhdb.dbcompat: 168820736 ;
0x0e0: 0x0a100000

kfdhdb.grpstmp.hi: 32928501 ;
0x0e4: HOUR=0x15 DAYS=0x17 MNTH=0xc YEAR=0x7d9

kfdhdb.grpstmp.lo: 2195053568 ;
0x0e8: USEC=0x0 MSEC=0x177 SECS=0x2d MINS=0x20

kfdhdb.ub4spare[0]: 0 ;
0x0ec: 0x00000000

kfdhdb.ub4spare[1]: 0 ;
0x0f0: 0x00000000

kfdhdb.ub4spare[2]: 0 ;
0x0f4: 0x00000000

kfdhdb.ub4spare[3]: 0 ;
0x0f8: 0x00000000

kfdhdb.ub4spare[4]: 0 ;
0x0fc: 0x00000000

kfdhdb.ub4spare[5]: 0 ;
0x100: 0x00000000

kfdhdb.ub4spare[6]: 0 ;
0x104: 0x00000000

kfdhdb.ub4spare[7]: 0 ;
0x108: 0x00000000

kfdhdb.ub4spare[8]: 0 ;
0x10c: 0x00000000

kfdhdb.ub4spare[9]: 0 ;
0x110: 0x00000000

kfdhdb.ub4spare[10]: 0 ;
0x114: 0x00000000

kfdhdb.ub4spare[11]: 0 ;
0x118: 0x00000000

kfdhdb.ub4spare[12]: 0 ;
0x11c: 0x00000000

kfdhdb.ub4spare[13]: 0 ;
0x120: 0x00000000

kfdhdb.ub4spare[14]: 0 ;
0x124: 0x00000000

kfdhdb.ub4spare[15]: 0 ; 0x128: 0x00000000

kfdhdb.ub4spare[16]: 0 ;
0x12c: 0x00000000

kfdhdb.ub4spare[17]: 0 ;
0x130: 0x00000000

kfdhdb.ub4spare[18]: 0 ;
0x134: 0x00000000

kfdhdb.ub4spare[19]: 0 ;
0x138: 0x00000000

kfdhdb.ub4spare[20]: 0 ;
0x13c: 0x00000000

kfdhdb.ub4spare[21]: 0 ;
0x140: 0x00000000

kfdhdb.ub4spare[22]: 0 ;
0x144: 0x00000000

kfdhdb.ub4spare[23]: 0 ;
0x148: 0x00000000

kfdhdb.ub4spare[24]: 0 ;
0x14c: 0x00000000

kfdhdb.ub4spare[25]: 0 ;
0x150: 0x00000000

kfdhdb.ub4spare[26]: 0 ;
0x154: 0x00000000

kfdhdb.ub4spare[27]: 0 ;
0x158: 0x00000000

kfdhdb.ub4spare[28]: 0 ;
0x15c: 0x00000000

kfdhdb.ub4spare[29]: 0 ;
0x160: 0x00000000

kfdhdb.ub4spare[30]: 0 ;
0x164: 0x00000000

kfdhdb.ub4spare[31]: 0 ;
0x168: 0x00000000

kfdhdb.ub4spare[32]: 0 ; 0x16c: 0x00000000

kfdhdb.ub4spare[33]: 0 ;
0x170: 0x00000000

kfdhdb.ub4spare[34]: 0 ;
0x174: 0x00000000

kfdhdb.ub4spare[35]: 0 ;
0x178: 0x00000000

kfdhdb.ub4spare[36]: 0 ;
0x17c: 0x00000000

kfdhdb.ub4spare[37]: 0 ;
0x180: 0x00000000

kfdhdb.ub4spare[38]: 0 ;
0x184: 0x00000000

kfdhdb.ub4spare[39]: 0 ;
0x188: 0x00000000

kfdhdb.ub4spare[40]: 0 ;
0x18c: 0x00000000

kfdhdb.ub4spare[41]: 0 ;
0x190: 0x00000000

kfdhdb.ub4spare[42]: 0 ;
0x194: 0x00000000

kfdhdb.ub4spare[43]: 0 ;
0x198: 0x00000000

kfdhdb.ub4spare[44]: 0 ;
0x19c: 0x00000000

kfdhdb.ub4spare[45]: 0 ;
0x1a0: 0x00000000

kfdhdb.ub4spare[46]: 0 ;
0x1a4: 0x00000000

kfdhdb.ub4spare[47]: 0 ;
0x1a8: 0x00000000

kfdhdb.ub4spare[48]: 0 ;
0x1ac: 0x00000000

kfdhdb.ub4spare[49]: 0 ; 0x1b0: 0x00000000

kfdhdb.ub4spare[50]: 0 ;
0x1b4: 0x00000000

kfdhdb.ub4spare[51]: 0 ;
0x1b8: 0x00000000

kfdhdb.ub4spare[52]: 0 ;
0x1bc: 0x00000000

kfdhdb.ub4spare[53]: 0 ;
0x1c0: 0x00000000

kfdhdb.ub4spare[54]: 0 ;
0x1c4: 0x00000000

kfdhdb.ub4spare[55]: 0 ;
0x1c8: 0x00000000

kfdhdb.ub4spare[56]: 0 ;
0x1cc: 0x00000000

kfdhdb.ub4spare[57]: 0 ;
0x1d0: 0x00000000

kfdhdb.acdb.aba.seq: 0 ;
0x1d4: 0x00000000

kfdhdb.acdb.aba.blk: 0 ;
0x1d8: 0x00000000

kfdhdb.acdb.ents: 0 ;
0x1dc: 0x0000

kfdhdb.acdb.ub2spare: 0 ;
0x1de: 0x0000


而磁盘组的DATAG2的两个磁盘raw6raw7的标识位也都是全0


确定了问题后,手工修改这个文本文件,将kfdhdb.ub4spare[43]kfdhdb.acdb.ub2spare均改为0


kfdhdb.ub4spare[43]: 0 ; 0x198: 0x00000000

kfdhdb.ub4spare[44]: 0 ;
0x19c: 0x00000000

.

.

.

kfdhdb.acdb.ents: 0 ; 0x1dc: 0x0000

kfdhdb.acdb.ub2spare: 0 ; 0x1de: 0x0000


然后利用kfedmerge将其写回磁盘头:


[oracle@node1 data]$ kfed merge /dev/raw/raw4 text=raw4_new.txt

[oracle@node1 data]$ export ORACLE_SID=+ASM1

[oracle@node1 data]$ sqlplus / as sysdba


SQL*Plus: Release 10.2.0.4.0 - Production on Mon Jun 13 18:45:54
2011


Copyright (c) 1982, 2007, Oracle. All Rights Reserved.


Connected to:


Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 -
64bit Production

With the Partitioning, Real Application Clusters, Oracle Label Security, OLAP,

Data Mining and Real Application Testing options


SQL> set pages 100 lines 120

SQL> select group_number, name, state, total_mb, free_mb from
v$asm_diskgroup;


GROUP_NUMBER NAME
STATE TOTAL_MB FREE_MB

------------ ------------------------------------ ---------- ----------

1 DATAG1 DISMOUNT 0 0

2 DATAG2 MOUNTED 1024000 347457


SQL> select group_number, disk_number, mount_status,
header_status, name, path from v$asm_disk;


GROUP_NUMBER DISK_NUMBER MOUNT_STATUS HEADER_STATUS NAME
PATH

------------ ----------- ------------ -------------- ----------- --------------

0 0 CLOSED FOREIGN /dev/raw/raw1

0 1 CLOSED FOREIGN /dev/raw/raw2

0 2 CLOSED FOREIGN /dev/raw/raw8

0 3 CLOSED FOREIGN /dev/raw/raw10

0 4 CLOSED FOREIGN /dev/raw/raw9

0 8 CLOSED MEMBER /dev/raw/raw4

0 5 CLOSED MEMBER /dev/raw/raw5

2 1 CACHED MEMBER DATAG2_0001 /dev/raw/raw7

2 0 CACHED MEMBER DATAG2_0000 /dev/raw/raw6


SQL> alter diskgroup datag1 mount;


Diskgroup altered.


SQL> shutdown immediate

ASM diskgroups dismounted

ASM instance shutdown

SQL> startup

ASM instance started


Total System Global Area
130023424 bytes

Fixed Size 2082208 bytes

Variable Size 102775392 bytes

ASM Cache 25165824 bytes

ASM diskgroups mounted

SQL> exit


利用MERGE修改磁盘头后,磁盘状态正常,磁盘组顺利挂载。此时检查节点1上的磁盘头信息,状态也恢复了正常。


在运行MERGE命令的时候,整个磁盘组并没有卸载,数据库也在正常运行,也就是说,在整个操作过程中,并没有1秒钟的停机时间,所有的修改都是ONLINE进行的。


只要磁盘组恢复了正常,剩下的操作就很简单了,手工将第二个实例添加到数据库中就可以了。



「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论