暂无图片
暂无图片
2
暂无图片
暂无图片
暂无图片

ORA-15137|Oracle ASM 磁盘组故障处理两则

原创 JiekeXu 2025-10-31
237

大家好,我是JiekeXu,江湖人称“强哥”,青学会MOP技术社区主席,荣获Oracle ACE Pro称号,金仓最具价值倡导者KVA,崖山最具价值专家YVP,IvorySQL开源社区专家顾问委员会成员,墨天轮MVP,墨天轮年度“墨力之星”,拥有 Oracle OCP/OCM 认证,MySQL 5.7/8.0 OCP 认证以及金仓KCA、KCP、KCM、KCSM证书,PCA、PCTA、OBCA、OGCA等众多国产数据库认证证书,欢迎关注我的微信公众号“JiekeXu DBA之路”,然后点击右上方三个点“设为星标”置顶,更多干货文章才能第一时间推送,谢谢!

前 言

最近,遇到了 Oracle ASM 磁盘组故障的几个小问题,感觉比较典型,可能有很多朋友在工作中也会遇到,特此记录一下。Oracle ASM 官方有一本在线书籍《Automatic Storage Management Administrator’s Guide》,详细记录了 ASM 相关的管理员指南操作,使用 ASMCA、ASMCMD 命令行实用程序,可用于管理 Oracle ASM 实例、磁盘组、磁盘组的文件权限改造、磁盘组内的文件和目录、磁盘组模板以及卷。记得大概在六年前也写过一篇 《Oracle ASM 磁盘组基础知识整理收藏版》,内容较多,阅读量也比较高,看来很受欢迎。

问题一、OCR 磁盘为 DROPPED 状态处理

近期搭建了一套生产仿真环境的 RAC,由于系统工程师配置多路径时没有配置正确,仅配置了单路径,测试高可用的时候发现多路径链路并没有高可用,在测试的过程中导致 OCR 磁盘出现异常,被踢出 OCR 磁盘组出现了DROPPED 状态。

检查 OCR 状态

使用 query 命令查看仅有两块磁盘,缺少一块盘。因 OCR 是默认冗余,数据库还能正常提供服务,暂时没有影响,但不能再失去一块盘,否则就会挂掉。

$ crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 47bc00a3d8c34f2fbf56c11613ce71e7 (/dev/asmdisks/otwrac-ocr01) [OCR] 2. ONLINE 060ea9af3ca04f20bf3d01e5ef7e9f91 (/dev/asmdisks/otwrac-ocr02) [OCR] Located 2 voting disk(s).

使用 SQL 在 ASM 实例或者 DB 实例查看,确实发现有一块 5GB 大小的磁盘出现 UNKNOWN 状态,磁盘名字为 _DROPPED_0002_OCR 则可以确定就是 OCR 磁盘。

SQL> set lin 1000 pagesize 999 SQL> col NAME for a25 SQL> col FAILGROUP for a15 SQL> col path for a35 SQL> select GROUP_NUMBER,DISK_NUMBER,OS_MB/1024,TOTAL_MB/1024,FREE_MB/1024,NAME,FAILGROUP,PATH,FAILGROUP_TYPE,header_status,state from v$asm_disk order by 1; GROUP_NUMBER DISK_NUMBER OS_MB/1024 TOTAL_MB/1024 FREE_MB/1024 NAME FAILGROUP PATH FAILGRO HEADER_STATU STATE ------------ ----------- ---------- ------------- ------------ ------------------------- --------------- ----------------------------------- ------- ------------ -------- 0 0 5 0 0 /dev/asmdisks/otedwrac-ocr03 REGULAR MEMBER NORMAL 1 0 2048 2048 384.5625 ARCH_0000 ARCH_0000 /dev/asmdisks/otedwrac-arch01 REGULAR MEMBER NORMAL 1 2 2048 2048 384.402344 ARCH_0002 ARCH_0002 /dev/asmdisks/otedwrac-arch03 REGULAR MEMBER NORMAL 1 1 2048 2048 384.332031 ARCH_0001 ARCH_0001 /dev/asmdisks/otedwrac-arch02 REGULAR MEMBER NORMAL 2 1 2048 2048 768.371094 DATA_0001 DATA_0001 /dev/asmdisks/otedwrac-data02 REGULAR MEMBER NORMAL 2 8 2048 2048 768.421875 DATA_0008 DATA_0008 /dev/asmdisks/otedwrac-data09 REGULAR MEMBER NORMAL 2 5 2048 2048 768.554688 DATA_0005 DATA_0005 /dev/asmdisks/otedwrac-data06 REGULAR MEMBER NORMAL 2 0 2048 2048 768.414063 DATA_0000 DATA_0000 /dev/asmdisks/otedwrac-data01 REGULAR MEMBER NORMAL 2 6 2048 2048 768.453125 DATA_0006 DATA_0006 /dev/asmdisks/otedwrac-data07 REGULAR MEMBER NORMAL 2 3 2048 2048 768.417969 DATA_0003 DATA_0003 /dev/asmdisks/otedwrac-data04 REGULAR MEMBER NORMAL 2 2 2048 2048 768.484375 DATA_0002 DATA_0002 /dev/asmdisks/otedwrac-data03 REGULAR MEMBER NORMAL 2 9 2048 2048 768.414063 DATA_0009 DATA_0009 /dev/asmdisks/otedwrac-data10 REGULAR MEMBER NORMAL 2 7 2048 2048 768.445313 DATA_0007 DATA_0007 /dev/asmdisks/otedwrac-data08 REGULAR MEMBER NORMAL 2 4 2048 2048 768.421875 DATA_0004 DATA_0004 /dev/asmdisks/otedwrac-data05 REGULAR MEMBER NORMAL 3 2 0 5 4.82421875 _DROPPED_0002_OCR OCR_0002 REGULAR UNKNOWN FORCING 3 1 5 5 4.59765625 OCR_0001 OCR_0001 /dev/asmdisks/otedwrac-ocr02 REGULAR MEMBER NORMAL 3 0 5 5 4.59765625 OCR_0000 OCR_0000 /dev/asmdisks/otedwrac-ocr01 REGULAR MEMBER NORMAL --也可以使用 lsdsk -p 命令在 ASM 中查询。 --示例:正常输出 ASMCMD> lsdsk -p Group_Num Disk_Num Incarn Mount_Stat Header_Stat Mode_Stat State Path 2 0 4081510609 CACHED MEMBER ONLINE NORMAL /dev/sdc 2 1 4081510610 CACHED MEMBER ONLINE NORMAL /dev/sdd 1 0 4081510605 CACHED MEMBER ONLINE NORMAL /dev/sde 3 0 4081510612 CACHED MEMBER ONLINE NORMAL /dev/sdf 3 1 4081510613 CACHED MEMBER ONLINE NORMAL /dev/sdg 3 2 4081510611 CACHED MEMBER ONLINE NORMAL /dev/sdh 2 2 4081510608 CACHED MEMBER ONLINE NORMAL /dev/sdi 1 1 4081510604 CACHED MEMBER ONLINE NORMAL /dev/sdj 2 4 4081510607 CACHED MEMBER ONLINE NORMAL /dev/sdk 2 3 4081510606 CACHED MEMBER ONLINE NORMAL /dev/sdl 1 2 4081510603 CACHED MEMBER ONLINE NORMAL /dev/sdm ASMCMD>

图片.png

我们去查看 ASM 日志,发现此 OCR 磁盘确实也是被 ASM 实例强制 drop 掉了,这里仅提供日志路径,如果有相似问题,请自行查看。

--19c /u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log grep OCR_00 alert_+ASM1.log

强制重新添加

这里如果不使用强制,可能无法添加磁盘,会报已经存在;但如果使用 Force 强制添加时一定要确认是正确的磁盘,如果把其他正常磁盘加进去问题就大了,赶紧跑路吧。。。

--alter diskgroup OCR drop disk OCR_02 FORCE; alter diskgroup OCR add disk '/dev/asmdisks/otwrac-ocr03' force;

问题二、添加 ASM 磁盘遇到 ORA-15137 问题处理

还有一次在测试环境中,表空间不足,需要添加 ASM 磁盘,但在 ASM 实例中添加磁盘时报错 ORA-15137 问题处理。

su - grid sqlplus / as sysasm alter diskgroup DATA add disk '/dev/asmdisks/asm-data04'rebalance power 10; ORA-15032: not all alterations performed ORA-15137: The ASM cluster is in rolling patch state.

图片.png
图片.png

确认集群的当前状态是否为滚动补丁状态

(+ASM1)$ crsctl query crs activeversion -f Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [3509829075]. (+ASM1)$ asmcmd showclusterstate In Rolling Patch

检查各节点的补丁是否一致

$ /u01/app/19.0.0.0/grid/bin/kfod op=patches (+ASM1)$ kfod op=patches kgfnGetFacility: facility=0x15f4d90 kgfnInitDiag: diagctx=0x14c1fc0 --------------- List of Patches =============== 29517242 29517247 29585399 36233263 36233343 36240578 36383196 36460248 $ /u01/app/19.0.0.0/grid/bin/kfod op=PATCHLVL kgfnGetFacility: facility=0x24a8d90 kgfnInitDiag: diagctx=0x2375fc0 ------------------- Current Patch level =================== 1545812015 (+ASM2)$ kfod op=patches kgfnGetFacility: facility=0x2161d90 kgfnInitDiag: diagctx=0x202efc0 --------------- List of Patches =============== 29517242 29517247 29585399 36233263 36233343 36240578 36383196 36460248 (+ASM2)$ kfod op=PATCHLVL kgfnGetFacility: facility=0x1badd90 kgfnInitDiag: diagctx=0x1a7afc0 ------------------- Current Patch level =================== 1545812015 opatch lspatches 36460248;TOMCAT RELEASE UPDATE 19.0.0.0.0 (36460248) 36383196;DBWLM RELEASE UPDATE 19.0.0.0.0 (36383196) 36240578;OCW RELEASE UPDATE 19.23.0.0.0 (36240578) 36233343;ACFS RELEASE UPDATE 19.23.0.0.0 (36233343) 36233263;Database Release Update : 19.23.0.0.240416 (36233263)

以上排查结果一致,如果不一致要么打补丁,要么回退小补丁;接下来排查这四项。

$ <GRID_HOME>/bin/crsctl query crs releasepatch 
$ <GRID_HOME>/bin/crsctl query crs softwarepatch 
$ <GRID_HOME>/bin/crsctl query crs releaseversion 
$ <GRID_HOME>/bin/crsctl query crs softwareversion

(+ASM1)$ crsctl query crs releasepatch
Oracle Clusterware release patch level is [1545812015] and the complete list of patches [29517242 29517247 29585399 36233263 36233343 36240578 36383196 36460248 ] have been applied on the local node. The release patch string is [19.23.0.0.0].
Oracle Clusterware release patch level is [1545812015] and the complete list of patches [29517242 29517247 29585399 36233263 36233343 36240578 36383196 36460248 ] have been applied on the local node. The release patch string is [19.23.0.0.0].

(+ASM1)$ crsctl query crs softwarepatch 
Oracle Clusterware patch level on node dwrac-86 is [3509829075].
Oracle Clusterware patch level on node dwrac-87 is [1545812015].

(+ASM1)$ crsctl query crs releaseversion
Oracle High Availability Services release version on the local node is [19.0.0.0.0]
Oracle High Availability Services release version on the local node is [19.0.0.0.0]

(+ASM1)$ crsctl query crs softwareversion
Oracle Clusterware version on node [dwrac-86] is [19.0.0.0.0]
Oracle Clusterware version on node [dwrac-87] is [19.0.0.0.0]

(+ASM1)$ cluutil -ckpt -oraclebase $ORACLE_BASE -chkckpt -name ROOTCRS_PREPATCH -status
FAIL
(+ASM2)$ cluutil -ckpt -oraclebase $ORACLE_BASE -chkckpt -name ROOTCRS_PREPATCH -status
START

如果发现补丁不一致,需要对不一致的节点进行修复。这里发现 softwarepatch 不一致。

解决办法

根据文档ORA-15137 After Rolling Back Failed Patch at GI_HOME (Doc ID 1968727.1) Case 1. Update the patch level on all nodes, As root user $ clscfg -patch Example: # <GRID_HOME>/bin/clscfg -patch clscfg: -patch mode specified clscfg: EXISTING configuration version 5 detected. clscfg: version 5 is 12c Release 1. Successfully accumulated necessary OCR keys. Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. And validate as root user [root@dwrac-86 ~]# /u01/app/19.0.0.0/grid/bin/clscfg -patch clscfg: -patch mode specified clscfg: EXISTING configuration version 19 detected. Successfully accumulated necessary OCR keys. Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. [root@dwrac-87 ~]# /u01/app/19.0.0.0/grid/bin/clscfg -patch clscfg: -patch mode specified clscfg: EXISTING configuration version 19 detected. Successfully accumulated necessary OCR keys. Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. $ crsctl query crs softwarepatch Example: # <GRID_HOME>/bin/crsctl query crs softwarepatch Oracle Clusterware patch level on node host01 is [1964077727]. # <GRID_HOME>/bin/crsctl query crs softwarepatch Oracle Clusterware patch level on node host02 is [1964077727]. dwrac-86:/home/grid(+ASM1)$ crsctl query crs softwarepatch Oracle Clusterware patch level on node dwrac-86 is [1545812015]. dwrac-87:/home/grid(+ASM2)$ crsctl query crs softwarepatch Oracle Clusterware patch level on node dwrac-87 is [1545812015].

If both were at same level,stop rolling patch mode from one of the node as root user
如果两者处于同一级别,则以根用户身份从其中一个节点停止滚动补丁模式

# crsctl stop rollingpatch Then startup the cluster and create asm diskgroup. 然后启动群集并创建 asm 磁盘组。 Example: [root@dwrac-86 ~]# /u01/app/19.0.0.0/grid/bin/crsctl stop rollingpatch CRS-1161: The cluster was successfully patched to patch level [1545812015].

另外一种情况就是补丁不一致的情况下,应用相同补丁或者回退小补丁,这里没有遇到,就不在说明了。

修复不一致的节点,在多余补丁的节点上执行以下操作: # 停止数据库 $ shutdown immediate # 准备补丁前操作 $ ./rootcrs.sh -prepatch # 提交并回滚多余补丁(假设多余补丁ID为29757256) $ ./patchgen commit -rb 29757256 # 补丁后操作 $ ./rootcrs.sh -postpatch # 再次执行准备和补丁后操作以确保一致性 $ ./rootcrs.sh -prepatch $ ./rootcrs.sh -postpatch 验证集群状态,确保集群状态从滚动补丁变为正常。 $ asmcmd showclusterstate In Rolling Patch -> NORMAL

ASM 其他常用命令 SQL

--使用 ASMCMD showclustermode 命令来确定是否启用了 Oracle Flex ASM $ asmcmd showclustermode ASM cluster : Flex mode enabled - Direct Storage Access --使用 SQL*Plus 和 ASMCMD 命令查看 Oracle Flex ASM 连接 col INSTANCE_NAME for a35 set line 345 SELECT instance_name, db_name, status FROM V$ASM_CLIENT; INSTANCE_NAME DB_NAME STATUS --------------- -------- ------------ +ASM1 +ASM CONNECTED orcl1 orcl CONNECTED orcl2 orcl CONNECTED $ asmcmd lsct data DB_Name Status Software_Version Compatible_version Instance_Name Disk_Group +ASM CONNECTED 19.0.0.0.0 19.0.0.0.0 +ASM DATA orcl CONNECTED 19.0.0.0.0 19.0.0.0.0 orcl1 DATA orcl CONNECTED 19.0.0.0.0 19.0.0.0.0 orcl2 DATA --示例 6-1 使用 V$ASM_ATTRIBUTE 视图查看磁盘组属性 --这是使用V$ASM_ATTRIBUTE和V$ASM_DISKGROUP视图的一个示例。要使磁盘组在V$ASM_ATTRIBUTE视图输出中显示,COMPATIBLE.ASM值必须设置为11.1或更高。标记为只读(Y)的属性只能在磁盘组创建期间进行设置。 SELECT SUBSTR(dg.name,1,12) AS diskgroup, SUBSTR(a.name,1,24) AS name, SUBSTR(a.value,1,24) AS value, read_only FROM V$ASM_DISKGROUP dg, V$ASM_ATTRIBUTE a WHERE dg.name = 'DATA' AND dg.group_number = a.group_number AND a.name NOT LIKE '%template%'; DISKGROUP NAME VALUE READ_ON ------------ ------------------------ ------------------------ ------- DATA idp.type dynamic N DATA idp.boundary auto N DATA vam_migration_done false Y DATA scrub_metadata.enabled TRUE N DATA scrub_async_limit 1 N DATA content_hardcheck.enable FALSE N DATA access_control.umask 066 N DATA access_control.enabled FALSE N DATA cell.sparse_dg allnonsparse N DATA cell.smart_scan_capable FALSE N DATA compatible.advm 19.0.0.0.0 N DATA compatible.rdbms 19.0.0.0.0 N DATA compatible.asm 19.0.0.0.0 N DATA appliance._partnering_ty GENERIC Y DATA au_size 1048576 Y DATA content.check FALSE N DATA content.type data N DATA logical_sector_size 512 N DATA sector_size 512 N DATA ate_conversion_done true Y DATA preferred_read.enabled FALSE N DATA thin_provisioned FALSE N DATA failgroup_repair_time 24.0h N DATA phys_meta_replicated true Y DATA disk_repair_time 12.0h N --示例6-3 使用V$ASM_DISK查看磁盘组中的磁盘 --这是在Oracle ASM实例上运行V$ASM_DISK和V$ASM_DISKGROUP视图的使用示例。该示例显示与磁盘组关联的磁盘,以及磁盘的挂载状态、状态和故障组。 SELECT SUBSTR(dg.name,1,16) AS diskgroup, SUBSTR(d.name,1,16) AS asmdisk, d.mount_status, d.state, SUBSTR(d.failgroup,1,16) AS failgroup FROM V$ASM_DISKGROUP dg, V$ASM_DISK d WHERE dg.group_number = d.group_number; DISKGROUP ASMDISK MOUNT_S STATE FAILGROUP ---------------- ---------------- ------- -------- ---------------- DATA DATA_0008 CACHED NORMAL DATA_0008 --示例6-4 使用V$ASM_DISK_STAT查看磁盘组中的磁盘 --这是在Oracle ASM实例上使用V$ASM_DISK_STAT和V$ASM_DISKGROUP_STAT视图的一个示例。该示例显示与特定磁盘组关联的磁盘,以及挂载状态、状态和各种读写统计信息。 SELECT SUBSTR(dgs.name,1,10) AS diskgroup, SUBSTR(ds.name,1,10) AS asmdisk, ds.mount_status, ds.state, ds.reads, ds.writes, ds.read_time, ds.write_time, bytes_read, bytes_written FROM V$ASM_DISKGROUP_STAT dgs, V$ASM_DISK_STAT ds WHERE dgs.group_number = ds.group_number AND dgs.name = 'DATA'; DISKGROUP ASMDISK MOUNT_S STATE READS WRITES READ_TIME WRITE_TIME BYTES_READ BYTES_WRITTEN ---------- ---------- ------- ------- ------ ------ --------- ---------- ---------- ------------- DATA DATA_0000 CACHED NORMAL 841 10407 1.212218 3.511977 23818240 178369024 DATA DATA_0008 CACHED NORMAL 26065 1319 1.592524 .297728 436203520 38358528 DATA DATA_0010 CACHED NORMAL 561 868 .794849 .337575 18631680 22584320 --示例6-5 使用V$ASM_CLIENT查看磁盘组客户端 这是在Oracle ASM实例上使用V$ASM_CLIENT和V$ASM_DISKGROUP视图的一个示例。该示例显示磁盘组以及有关已连接数据库客户端实例的信息。 SELECT dg.name AS diskgroup, SUBSTR(c.instance_name,1,12) AS instance, SUBSTR(c.db_name,1,12) AS dbname, SUBSTR(c.SOFTWARE_VERSION,1,12) AS software, SUBSTR(c.COMPATIBLE_VERSION,1,12) AS compatible FROM V$ASM_DISKGROUP dg, V$ASM_CLIENT c WHERE dg.group_number = c.group_number; DISKGROUP INSTANCE DBNAME SOFTWARE COMPATIBLE ------------------------------ ------------ -------- ------------ ------------ DATA +ASM +ASM 19.0.0.0.0 19.0.0.0.0 DATA orcl orcl 19.0.0.0.0 19.0.0.0.0 DATA +ASM asmvol 19.0.0.0.0 19.0.0.0.0 FRA orcl orcl 19.0.0.0.0 19.0.0.0.0 ... --示例6-9 使用V$ASM_FILE查看Oracle ASM文件访问控制信息 --此示例显示了在V$ASM_FILE视图中显示的Oracle ASM文件访问控制文件权限信息。 SELECT dg.name AS diskgroup, a.name, f.permissions, f.user_number, u.os_name, f.usergroup_number, ug.name FROM V$ASM_DISKGROUP dg, V$ASM_USER u, V$ASM_USERGROUP ug, V$ASM_FILE f, V$ASM_ALIAS a WHERE dg.name = 'FRA' AND dg.group_number = u.group_number AND u.group_number = ug.group_number AND ug.group_number = f.group_number AND f.group_number = a.group_number AND f.usergroup_number = ug.usergroup_number AND f.user_number = u.user_number AND f.file_number = a.file_number; DISKGROUP NAME PERMISSIONS USER_NUMBER OS_NAME USERGROUP_NUMBER NAME --------- ---------------------- ----------- ----------- ------- ---------------- ------- DATA USERS.259.685366091 rw-r----- 3 grid 1 asm_fra DATA TEMP.264.685366227 rw-r----- 3 grid 1 asm_fra ...

参考链接

https://docs.oracle.com/en/database/oracle/oracle-database/19/ostmg/asm-intro.html#GUID-D577D6A5-0A08-4070-B4FA-A39C5D548CD2 https://docs.oracle.com/en/database/oracle/oracle-database/26/ostmg/create-diskgroups.html#GUID-D8A4F1E1-D5F2-4372-8BF8-2A6CB9724930 ORA-15137 After Rolling Back Failed Patch at GI_HOME (Doc ID 1968727.1)

全文完,希望可以帮到正在阅读的你,如果觉得有帮助,可以分享给你身边的朋友,同事,你关心谁就分享给谁,一起学习共同进步~~~

欢迎关注我的公众号【JiekeXu DBA之路】,一起学习新知识!
——————————————————————————
公众号:JiekeXu DBA之路
墨天轮:https://www.modb.pro/u/4347
CSDN :https://blog.csdn.net/JiekeXu
ITPUB:https://blog.itpub.net/69968215
腾讯云:https://cloud.tencent.com/developer/user/5645107
——————————————————————————
facebook_pro_light_1920 × 1080  副本.png

最后修改时间:2025-11-04 23:50:25
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论