
看到了这个日志,ofsctl ,asm!.. 这可能和acfs/afd driver有关,加上工程师回退过程做过的操作,确定只做了回退软件没有执行prepatch和postpatch。心里一悬。mos一查,果然有bug。 'blk_cloned_rq_check_limits: over max segments limit' Kernel Messages After Grid Infrastructure Patching to 19.10 DB RU (文档 ID 2754950.1) 32437246: HIGH PERFORMANCE ISSUE AFTER APPLYING 19.10 ACFSRU
19.10 , AFD, IO 性能问题, 都对上了。所以必须禁用AFD。
正文开始。
一、 将AFD磁盘转成普通磁盘
1 、关闭crs
/u01/app/19.0.0/grid/bin/crsctl stop crs -f
2、确定AFD的状态
[root@ora19c02 ~]# export ORACLE_BASE=/u01/app/grid
[root@ora19c02 ~]# /u01/app/19.0.0/grid/bin/asmcmd afd_state
ASMCMD-9526: The AFD state is 'LOADED' and filtering is 'ENABLED' on host 'ora19c02'
3、把AFDfilter禁用
[root@ora19c02 ~]# /u01/app/19.0.0/grid/bin/asmcmd afd_filter -d
[root@ora19c02 ~]# /u01/app/19.0.0/grid/bin/asmcmd afd_state
ASMCMD-9526: The AFD state is 'LOADED' and filtering is 'DISABLED' on host 'ora19c02'
4、查看状态
[root@ora19c02 ~]# /u01/app/19.0.0/grid/bin/asmcmd afd_lsdsk
--------------------------------------------------------------------------------
Label Filtering Path
================================================================================
DATA00 DISABLED /dev/asm-data0
DATA2DG00 DISABLED /dev/asm-data2dg0
DATA2DG01 DISABLED /dev/asm-data2dg1
DATA2DG02 DISABLED /dev/asm-data2dg2
DATA2DG03 DISABLED /dev/asm-data2dg3
FRA00 DISABLED /dev/asm-fra0
MGMT00 DISABLED /dev/asm-mgmt0
OCR00 DISABLED /dev/asm-ocr0
OCR01 DISABLED /dev/asm-ocr1
OCR02 DISABLED /dev/asm-ocr2
DATA00 DISABLED /dev/asm-data0
DATA2DG00 DISABLED /dev/asm-data2dg0
DATA2DG01 DISABLED /dev/asm-data2dg1
DATA2DG02 DISABLED /dev/asm-data2dg2
DATA2DG03 DISABLED /dev/asm-data2dg3
FRA00 DISABLED /dev/asm-fra0
MGMT00 DISABLED /dev/asm-mgmt0
OCR00 DISABLED /dev/asm-ocr0
OCR01 DISABLED /dev/asm-ocr1
OCR02 DISABLED /dev/asm-ocr2
5、在正常节点把普通盘路径加上
[grid@ora19c01 ~]$ asmcmd dsget
parameter:AFD:*
profile:AFD:*
[grid@ora19c01 ~]$ asmcmd dsset '/dev/asm-*','AFD:*'
[grid@ora19c01 ~]$ asmcmd dsget
parameter:/dev/asm-*, AFD:*
profile:/dev/asm-*,AFD:*
6、停止acfs驱动和afd驱动,deconfigure掉AFD
[root@ora19c02 ~]# /u01/app/19.0.0/grid/bin/acfsload stop
[root@ora19c02 ~]# /u01/app/19.0.0/grid/bin/afdload stop
[root@ora19c02 ~]# /u01/app/19.0.0/grid/bin/asmcmd afd_deconfigure
AFD-632: Existing AFD installation detected.
AFD-634: Removing previous AFD installation.
AFD-635: Previous AFD components successfully removed.
Modifying resource dependencies - this may take some time.
这里需要说明如果不停掉acfs驱动,无法deconfig掉AFD配置。这里只是stop,并没有禁用,因此ACFS在启动集群时还是会被拉起。只是deconfig掉的AFD就不会再被使用上。
同样这里也分析一下afd_deconfigure都做了啥。
01-Apr-21 00:09 ASMCMD (PID = 1362) Given command - afd_deconfigure
01-Apr-21 00:09 NOTE: asmcmdafd_is_exadata /etc/oracle/cell/network-config/cellip.ora not found
01-Apr-21 00:09 NOTE: Verifying AFD driver state : supported
01-Apr-21 00:09 NOTE: Verifying AFD driver state : installed
01-Apr-21 00:09 NOTE: Status of clusterware stack : Not online
01-Apr-21 00:09 NOTE: command execution (/u01/app/19.0.0/grid/bin/acfsdriverstate loaded) returned : 1
01-Apr-21 00:09 NOTE: Verifying ACFS driver state : Not loaded
01-Apr-21 00:09 NOTE: Uninstalling AFD...
01-Apr-21 00:10 NOTE: Successfully uninstalled AFD
01-Apr-21 00:10 NOTE: command execution (/bin/rpm -q sles-release 2>&1) returned : 1
01-Apr-21 00:10 NOTE: asmcmdafd_is_siha Opening /etc/oracle/ocr.loc
01-Apr-21 00:10 NOTE: asmcmdafd_is_siha Value (FALSE) set for local_only
01-Apr-21 00:10 NOTE: Starting OHASD...
01-Apr-21 00:10 NOTE: asmcmdafd_is_siha Opening /etc/oracle/ocr.loc
01-Apr-21 00:10 NOTE: asmcmdafd_is_siha Value (FALSE) set for local_only
上面是检查和uninstall 掉AFD。
01-Apr-21 00:10 cssd res modify command : /u01/app/19.0.0/grid/bin/crsctl modify res ora.cssd -attr "STOP_DEPENDENCIES='hard(shutdown:ora.gipcd,shutdown:ora.diskmon,intermediate:ora.cssdmonitor, shutdown:ora.gpnpd)'" -init
01-Apr-21 00:10 cssdmonitor res modify command : /u01/app/19.0.0/grid/bin/crsctl modify res ora.cssdmonitor -attr "START_DEPENDENCIES='hard(ora.gpnpd)'" -init
这两步是修改olr资源的依赖关系
01-Apr-21 00:10 NOTE: Modified ora.cssd and ora.cssdmonitor resource attributes
01-Apr-21 00:10 NOTE: asmcmdafd_is_siha Opening /etc/oracle/ocr.loc
01-Apr-21 00:10 NOTE: asmcmdafd_is_siha Value (FALSE) set for local_only
01-Apr-21 00:10 NOTE: Checking existence of AFD resource
01-Apr-21 00:10 NOTE: command execution (/u01/app/19.0.0/grid/bin/crsctl stop resource ora.driver.afd -init) returned : 1
01-Apr-21 00:10 NOTE: AFD resource stopped or not running
01-Apr-21 00:10 NOTE: Deleted AFD resource
以上是删除集群中afd的相关资源。ora.driver.afd 从-init 的状态里面移除了。
01-Apr-21 00:10 NOTE: asmcmdafd_is_siha Opening /etc/oracle/ocr.loc
01-Apr-21 00:10 NOTE: asmcmdafd_is_siha Value (FALSE) set for local_only
01-Apr-21 00:10 NOTE: Stopping OHASD...
7 、启动crs
[root@ora19c02 ~]#
[root@ora19c02 ~]# /u01/app/19.0.0/grid/bin/crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
到这里并没完,只是完成了一个节点。
8 、节点1重复以上1-4。
[root@ora19c01 ~]# /u01/app/19.0.0/grid/bin/crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'ora19c01'
…………
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'ora19c01' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@ora19c01 ~]# export ORACLE_BASE=/u01/app/grid
[root@ora19c01 ~]# /u01/app/19.0.0/grid/bin/asmcmd afd_state
ASMCMD-9526: The AFD state is 'LOADED' and filtering is 'ENABLED' on host 'ora19c01'
[root@ora19c01 ~]# /u01/app/19.0.0/grid/bin/asmcmd afd_filter -d
[root@ora19c01 ~]# /u01/app/19.0.0/grid/bin/asmcmd afd_lsdsk
--------------------------------------------------------------------------------
Label Filtering Path
================================================================================
DATA00 DISABLED /dev/asm-data0
DATA2DG00 DISABLED /dev/asm-data2dg0
DATA2DG01 DISABLED /dev/asm-data2dg1
DATA2DG02 DISABLED /dev/asm-data2dg2
DATA2DG03 DISABLED /dev/asm-data2dg3
FRA00 DISABLED /dev/asm-fra0
MGMT00 DISABLED /dev/asm-mgmt0
OCR00 DISABLED /dev/asm-ocr0
OCR01 DISABLED /dev/asm-ocr1
OCR02 DISABLED /dev/asm-ocr2
9 、关键的一步,移除AFD label
(从官方文档描述, 该步骤不是必须要做的, As a root user unlabel the ASMFD disks. (it is an optional step, if you skip this step you can ignore step 7))
/u01/app/19.0.0/grid/bin/asmcmd afd_lsdsk |grep DISABLED |awk '{print "/u01/app/19.0.0/grid/bin/asmcmd afd_unlabel "$1} "-f "'
/u01/app/19.0.0/grid/bin/asmcmd afd_unlabel DATA00 -f
/u01/app/19.0.0/grid/bin/asmcmd afd_unlabel DATA2DG00 -f
/u01/app/19.0.0/grid/bin/asmcmd afd_unlabel DATA2DG01 -f
/u01/app/19.0.0/grid/bin/asmcmd afd_unlabel DATA2DG02 -f
/u01/app/19.0.0/grid/bin/asmcmd afd_unlabel DATA2DG03 -f
/u01/app/19.0.0/grid/bin/asmcmd afd_unlabel FRA00 -f
/u01/app/19.0.0/grid/bin/asmcmd afd_unlabel MGMT00 -f
/u01/app/19.0.0/grid/bin/asmcmd afd_unlabel OCR00 -f
/u01/app/19.0.0/grid/bin/asmcmd afd_unlabel OCR01 -f
/u01/app/19.0.0/grid/bin/asmcmd afd_unlabel OCR02 -f
做完之后,可以看到磁盘头信息更新了
[grid@ora19c02 alert]$ kfed read /dev/asm-data0 |grep provstr
kfdhdb.driver.provstr: ORCLDISKDATA00 ; 0x000: length=14
[grid@ora19c02 alert]$ kfed read /dev/asm-data0 |grep provstr
kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8
10 、停止acfs驱动和afd驱动,deconfigure掉AFD
[root@ora19c01 ~]# lsmod |grep oracle
oracleacfs 5581810 0
oracleadvm 1231385 0
oracleoks 721311 2 oracleacfs,oracleadvm
oracleafd 214072 0
[root@ora19c01 ~]# /u01/app/19.0.0/grid/bin/asmcmd afd_state
ASMCMD-9526: The AFD state is 'LOADED' and filtering is 'DISABLED' on host 'ora19c01'
[root@ora19c01 ~]# /u01/app/19.0.0/grid/bin/acfsload stop
[root@ora19c01 ~]# /u01/app/19.0.0/grid/bin/afdload stop
[root@ora19c01 ~]# lsmod |grep oracle
11 、启动crs
[root@ora19c01 ~]# /u01/app/19.0.0/grid/bin/crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
12 修改asm磁盘的扫描路径
[grid@ora19c02 ~]$ asmcmd dsget
parameter:/dev/asm-*, AFD:*
profile:/dev/asm-*,AFD:*
[grid@ora19c02 ~]$ asmcmd dsset '/dev/asm-*'
[grid@ora19c02 ~]$ asmcmd dsget
parameter:/dev/asm-*
profile:/dev/asm-*
至此,禁用AFD,转回普通盘的操作完成。 从操作验证来看,这个过程可以滚动完成。不需全停集群。
二 故障的处理方案
再回到故障的处理。
可行的方案一, 禁用AFD,转为普通盘,一个节点一个节点操作
1、关闭crs :
/u01/app/19.0.0/grid/bin/crsctl stop crs
2、禁用AFD filter:
/u01/app/19.0.0/grid/bin/asmcmd afd_filter -d
3、停止acfs驱动和afd驱动,deconfigure掉AFD
/u01/app/19.0.0/grid/bin/acfsload stop
/u01/app/19.0.0/grid/bin/afdload stop
/u01/app/19.0.0/grid/bin/asmcmd afd_deconfigure
4、启动crs:
/u01/app/19.0.0/grid/bin/crsctl start crs
可行的方案二,使用旧版的ACFS和AFD驱动启动集群:
1、关闭crs:
stop crs
2、移除19.10的驱动
mv /opt/oracle/app/grid_home/usm/install/Oracle/EL7/x86_64/3.10.0-862/3.10.0-862-x86_64 /opt/oracle/app/grid_home/usm/install/Oracle/EL7/x86_64/3.10.0-862/3.10.0-862-x86_64_19.10
3、恢复源版本驱动
cp -r /opt/oracle/app/grid_home/.patch_storage/32218663_*/files/usm/install/Oracle/EL7/x86_64/3.10.0-862/3.10.0-862-x86_64 /opt/oracle/app/grid_home/usm/install/Oracle/EL7/x86_64/3.10.0-862/
chmod 755 /opt/oracle/app/grid_home/usm/install/Oracle/EL7/x86_64/3.10.0-862/3.10.0-862-x86_64
chmod 755 /opt/oracle/app/grid_home/usm/install/Oracle/EL7/x86_64/3.10.0-862/3.10.0-862-x86_64/bin
cd /opt/oracle/app/grid_home/usm/install/Oracle/EL7/x86_64/3.10.0-862/3.10.0-862-x86_64/bin
unzip oracka_mod_kga.zip
unzip oracka.zip
unzip oracleacfs.zip
unzip oracleadvm.zip
unzip oracleafd.zip
unzip oracleoks.zip
unzip oracleolfs.zip
4、删除系统已加载的驱动文件
rm -f /lib/modules/3.10.0-957.el7.x86_64/weak-updates/oracle/oracleafd.ko
rm -f /lib/modules/3.10.0-957.el7.x86_64/weak-updates/usm/oracleacfs.ko
rm -f /lib/modules/3.10.0-957.el7.x86_64/weak-updates/usm/oracleadvm.ko
rm -f /lib/modules/3.10.0-957.el7.x86_64/weak-updates/usm/oracleoks.ko
5、启动crs
三 小结
AFD功能喜忧参半吧,简单的说就是用来加强磁盘权限的磁盘绑定方案。通过操作系统底层驱动控制当前系统的误操作格盘,并不是对磁盘加上了特定的格式。因此,AFD和块设备(裸设备)的源路径磁盘可以共存。掌握AFD和普通盘的相互转换,也能应对一些特定情况的问题。
参考文档:
ASMFD : How to Migrate ASM Diskgroups from ASMFD (ASM Filter Driver) to ASMLIB on Oracle Grid Infrastructure (RAC) or Oracle Restart (文档 ID 2172803.1)
'blk_cloned_rq_check_limits: over max segments limit' Kernel Messages After Grid Infrastructure Patching to 19.10 DB RU (文档 ID 2754950.1)
32437246: HIGH PERFORMANCE ISSUE AFTER APPLYING 19.10 ACFSRU




