ASM磁盘组扩容节点补丁级别不一致导致的问题

原创肖雪松 2022-02-09
1497
因为二节点打补丁当时遇到一些问题，有一些补丁未被回退，导致两边节点补丁级别不一致，asm磁盘扩容时才发现
ORA-15032: not all alterations performed
ORA-15137: The ASM cluster is in rolling patch state.
可以参考这篇mos解决
[OCI] Scale-up Failed in DBCS With ORA-15032: not all alterations performed, ORA-15137: The ASM cluster is in rolling patch state (Doc ID 2681040.1)

APPLIES TO:
Oracle Cloud Infrastructure - Database Service - Version N/A to N/A [Release 1.0]
Linux x86-64
SYMPTOMS
Customer initiated a scale-up which failed with below errors-
Job details
----------------------------------------------------------------
ID: df2ddb25-fd86-4a71-993d-586626b77b7f
Description: Storage scaling
Status: Failure
Created: June 14, 2020 4:00:00 AM UTC
Progress: 89%
Message: DCS-11021:VM storage scaling : Fail to scale the storage.ORA-15032: not all alterations performed
ORA-15137: The ASM cluster is in rolling patch state.
Task Name Start Time End Time Status
------------------------------------------------------------------------ ----------------------------------- ----------------------------------- ----------
Storage scaling June 14, 2020 4:00:02 AM UTC June 14, 2020 4:04:09 AM UTC Failure
Storage scaling June 14, 2020 4:00:02 AM UTC June 14, 2020 4:04:09 AM UTC Failure
Storage discovery June 14, 2020 4:00:02 AM UTC June 14, 2020 4:03:46 AM UTC Success
disk creation June 14, 2020 4:03:46 AM UTC June 14, 2020 4:04:09 AM UTC Success
Altering Disk Group June 14, 2020 4:04:09 AM UTC June 14, 2020 4:04:09 AM UTC Failure
From dcs-agent.log
2020-06-14 04:04:09,426 DEBUG [Altering Disk Group : JobId=df2ddb25-fd86-4a71-993d-586626b77b7f] [] c.o.d.c.u.CommonsUtils:
run: cmd= '[su,
-,
grid,
-c,
export ORACLE_SID=+ASM1;
export ORACLE_HOME=/u01/app/18.0.0.0/grid;
/u01/app/18.0.0.0/grid/bin/sqlplus -S -L / as sysasm @ /tmp/dcsserver/asm/asm2020-06-14_04-04-09-7653683477625128586.sql]'
2020-06-14 04:04:09,524 DEBUG [Altering Disk Group : JobId=df2ddb25-fd86-4a71-993d-586626b77b7f] [] c.o.d.c.u.c.CommandExecutor: Return code: 0
2020-06-14 04:04:09,524 DEBUG [Altering Disk Group : JobId=df2ddb25-fd86-4a71-993d-586626b77b7f] [] c.o.d.c.u.CommonsUtils: Output :
alter diskgroup DATA add disk '/dev/DATADISK5',
'/dev/DATADISK6',
'/dev/DATADISK7',
'/dev/DATADISK8' drop disk 'DATA_0000',
'DATA_0003',
'DATA_0002',
'DATA_0001'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15137: The ASM cluster is in rolling patch state.
From ASM instance
SQL> select sys_context('SYS_CLUSTER_PROPERTIES', 'CLUSTER_STATE') from dual;

SYS_CONTEXT('SYS_CLUSTER_PROPERTIES','CLUSTER_STATE')
--------------------------------------------------------------------------------
In Rolling Patch
From ASMCMD
ASMCMD> showclusterstate
In Rolling Patch
CHANGES
 
CAUSE
We have collected below details from both nodes:
# sudo su - grid

# hostname

# $GRID_HOME/bin/kfod op=patches
# $GRID_HOME/binkfod op=PATCHLVL

# opatch lsinv -oh <<GRID HOME>>

# crsctl query crs softwarepatch
# crsctl query crs activeversion -f
And identified that patch 29757256 is present in node 1 kfod op=patches, but not present in node 2 kfod op=patches.
Also this patch is not present in lsinventory output for both nodes.
From node 1
[grid@node 1 ~]$ hostname
node 1

[grid@node 1 ~]$ $ORACLE_HOME/bin/kfod op=patches
---------------
List of Patches
===============
28531803
28655963
29173957
29301631
29301643
29302264
29757256 <---------------------------------- Extra patch entry
30872794
30882568
30888855

[grid@node 1 ~]$ $ORACLE_HOME/bin/kfod op=PATCHLVL
-------------------
Current Patch level
===================
960551215

[grid@node 1 OPatch]$ crsctl query crs softwarepatch
Oracle Clusterware patch level on node node 1 is [960551215].

[grid@node 1 OPatch]$ crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [18.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [438287255].
From node 2
[opc@node 2 ~]$ sudo su - grid
[grid@node 2 ~]$ hostname
node 2

[grid@node 2 ~]$ $ORACLE_HOME/bin/kfod op=patches
---------------
List of Patches
===============
28531803
28655963
29173957
29301631
29301643
29302264
30872794
30882568
30888855

[grid@node 2 ~]$ $ORACLE_HOME/bin/kfod op=PATCHLVL
-------------------
Current Patch level
===================
143095269

[grid@node 2 OPatch]$ crsctl query crs softwarepatch
Oracle Clusterware patch level on node node 2 is [143095269].

[grid@node 2 OPatch]$ crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [18.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [438287255].
Here we have suggested to follow below action plan, which fixed the issue.
SOLUTION
Here we have suggested to follow below action plan, which fixed the issue.

1.  Execute below steps as root

     # sudo -s
     # $GRID_HOME/crs/install/rootcrs.sh -prepatch    <-------会停止此节点集群和DB

2.  Execute the below steps to remove from kfod

     # sudo su - grid
     # $GRID_HOME/bin/patchgen commit -rb 29757256

3.  Execute below steps as root

     # $GRID_HOME/crs/install/rootcrs.sh -postpatch

4.  Then collect below command outputs, again
     # $GRID_HOME/bin/kfod op=patches
     # $GRID_HOME/binkfod op=PATCHLVL
     # opatch lsinv -oh <GRID HOME>
     # crsctl query crs softwarepatch
     # crsctl query crs activeversion -f
5.  Also check
     # sudo su - grid
     ASMCMD> showclusterstate
     # sqlplus / as sysasm
    SQL> select sys_context('SYS_CLUSTER_PROPERTIES', 'CLUSTER_STATE') from dual;
    SQL> select sys_context('SYS_CLUSTER_PROPERTIES', 'CURRENT_PATCHLVL') from dual;
Note: We have to execute the command on the node where we have that patch present in kfod.
After completing this action plan, we can re-initiated the scaling (which completed successfully for the customer).
oracle rac
「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」
关注作者
ASM磁盘组扩容节点补丁级别不一致导致的问题

评论