1.故障描述
因硬件故障,导致数据库集群无法正常运行,查看备库集群状态:
PS C:\Users\Administrator> crsctl.exe stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
ONLINE ONLINE bzrdsrv1 STABLE
ONLINE ONLINE bzrdsrv2 STABLE
ora.LISTENER1.lsnr
ONLINE ONLINE bzrdsrv1 STABLE
ONLINE ONLINE bzrdsrv2 STABLE
ora.net1.network
ONLINE ONLINE bzrdsrv1 STABLE
ONLINE ONLINE bzrdsrv2 STABLE
ora.net2.network
ONLINE ONLINE bzrdsrv1 STABLE
ONLINE ONLINE bzrdsrv2 STABLE
ora.ons
ONLINE ONLINE bzrdsrv1 STABLE
ONLINE ONLINE bzrdsrv2 STABLE
ora.proxy_advm
OFFLINE OFFLINE bzrdsrv1 STABLE
OFFLINE OFFLINE bzrdsrv2 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
1 ONLINE ONLINE bzrdsrv1 STABLE
2 ONLINE ONLINE bzrdsrv2 STABLE
3 ONLINE OFFLINE STABLE
ora.DATA.dg(ora.asmgroup)
1 ONLINE ONLINE bzrdsrv1 STABLE
2 ONLINE ONLINE bzrdsrv2 STABLE
3 OFFLINE OFFLINE STABLE
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE bzrdsrv2 STABLE
ora.OCR.dg(ora.asmgroup)
1 ONLINE ONLINE bzrdsrv1 STABLE
2 ONLINE ONLINE bzrdsrv2 STABLE
3 OFFLINE OFFLINE STABLE
ora.asm(ora.asmgroup)
1 ONLINE ONLINE bzrdsrv1 Started,STABLE
2 ONLINE ONLINE bzrdsrv2 Started,STABLE
3 OFFLINE OFFLINE STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
1 ONLINE ONLINE bzrdsrv1 STABLE
2 ONLINE ONLINE bzrdsrv2 STABLE
3 OFFLINE OFFLINE STABLE
ora.atsdg.db
1 ONLINE INTERMEDIATE bzrdsrv1 Mounted (Closed),HOM
E=E:\app\oracle\prod
uct\19.3.0\dbhome_1,
STABLE
2 ONLINE INTERMEDIATE bzrdsrv2 Mounted (Closed),HOM
E=E:\app\oracle\prod
uct\19.3.0\dbhome_1,
STABLE
ora.bzrdsrv1.vip
1 ONLINE ONLINE bzrdsrv1 STABLE
ora.bzrdsrv1_2.vip
1 ONLINE ONLINE bzrdsrv1 STABLE
ora.bzrdsrv2.vip
1 ONLINE ONLINE bzrdsrv2 STABLE
ora.bzrdsrv2_2.vip
1 ONLINE ONLINE bzrdsrv2 STABLE
ora.cvu
1 ONLINE ONLINE bzrdsrv2 STABLE
ora.qosmserver
1 ONLINE ONLINE bzrdsrv2 STABLE
ora.scan1.vip
1 ONLINE ONLINE bzrdsrv2 STABLE
--------------------------------------------------------------------------------
PS C:\Users\Administrator>
查看数据库的运行日志,从告警日志总获取到了
2025-01-06T03:10:42.998488+08:00
Warning: VKTM detected a time stall.
Time drifts can result in unexpected behavior such as time-outs.
Please see the VKTM trace file for more details:
E:\APP\ORACLE\diag\rdbms\atsdg\ats1\trace\ats1_vktm_8976.trc
2025-01-06T03:13:57.473509+08:00
WARNING: too many parse errors, count=4900 SQL hash=0x6c60d746
PARSE ERROR: ospid=10992, error=6550 for statement:
2025-01-06T03:13:57.475500+08:00
BEGIN DBMS_SESSION.USE_DEFAULT_EDITION_DEFERRED; END;
Additional information: hd=00007FF6C3F597E0 phd=00007FF66FD7D9A8 flg=0x20 cisid=0 sid=0 ciuid=2147483620 uid=2147483620 sqlid=0qc1hv1q61pu6
...Current username=SYSRAC
...Application: oraagent.exe Action:
2.根据问题分析
按照 http://bytehouse.org.cn/doc/3059/?highlight=VKTM 进行处理,数据库运行于mount 状态。
查看主库状态,确认主库一个节点未运行,且改节点 osysmond 进程占用内存达到 78% ,重启集群节点 解决该问题 http://bytehouse.org.cn/doc/3061/; 此时集群节点与实例编号不对应 http://bytehouse.org.cn/doc/2424/?highlight=pinned ,该问题暂时搁置。
此时主库已经正常启动,备库处于 mount 状态。
查看主库 角色
SQL> set line 1000
SQL> col status for a10
SQL> col type for a10
SQL> col error for a20
SQL> col gap_status for a20
SQL> col synchronization_status for a30
SQL> col recovery_mode for a60
SQL> SELECT DATABASE_ROLE, DB_UNIQUE_NAME INSTANCE, OPEN_MODE, PROTECTION_MODE, PROTECTION_LEVEL, SWITCHOVER_STATUS FROM V$DATABASE;
DATABASE_ROLE INSTANCE OPEN_MODE PROTECTION_MODE PROTECTION_LEVEL SWITCHOVER_STATUS
---------------- ------------------------------ -------------------- -------------------- -------------------- --------------------
PRIMARY ats READ WRITE MAXIMUM PERFORMANCE MAXIMUM PERFORMANCE NOT ALLOWED
SQL>
查看 备库角色
SQL> set line 1000
SQL> col status for a10
SQL> col type for a10
SQL> col error for a20
SQL> col gap_status for a20
SQL> col synchronization_status for a30
SQL> col recovery_mode for a60
SQL> SELECT DATABASE_ROLE, DB_UNIQUE_NAME INSTANCE, OPEN_MODE, PROTECTION_MODE, PROTECTION_LEVEL, SWITCHOVER_STATUS FROM V$DATABASE;
DATABASE_ROLE INSTANCE OPEN_MODE PROTECTION_MODE PROTECTION_LEVEL SWITCHOVER_STATUS
---------------- ------------------------------ -------------------- -------------------- -------------------- --------------------
PRIMARY atsdg MOUNTED MAXIMUM PERFORMANCE UNPROTECTED NOT ALLOWED
SQL>
主备库角色都为 PRIMARY 。
手工尝试切换备数据库为备库
--- 将standby数据库转化为primary角色
alter database commit to switchover to physical standby;
---在原Primary切换成standby角色后,查看standby是否支持角色切换
select SWITCHOVER_STATUS from v$DATABASE;
切换失败并报错。该环境配置了dg_broker ,使用 dg_broker 查看主备库的状态
---主库
DGMGRL> SHOW CONFIGURATION;
配置 - ats
保护模式: MaxPerformance
成员:
ats - 主数据库
错误: ORA-16825: 检测到数据库的多个错误或警告, 包括与快速启动故障转移相关的错误或警告
atsdg - (*) 物理备用数据库 (disabled)
ORA-16661: 需要恢复备用数据库
快速启动故障转移: 在潜在数据丢失模式中启用
配置状态:
ERROR (状态已在 1 秒前更新)
DGMGRL>
---备库
DGMGRL> show CONFIGURATION
配置 - ats
保护模式: MaxPerformance
成员:
atsdg - 主数据库
ats - (*) 物理备用数据库
快速启动故障转移: 在潜在数据丢失模式中启用
配置状态:
DISABLED
DGM-17290: 检测到角色更改。此数据库可能不再是主数据库。
DGMGRL>
在备库执行恢复
DGMGRL> reinstate database atsdg;
正在恢复数据库 "atsdg", 请稍候...
错误: ORA-16815: 不需要恢复成员
失败。
恢复数据库 "atsdg" 失败
DGMGRL>
在主库上执行
DGMGRL> reinstate database atsdg;
正在恢复数据库 "atsdg", 请稍候...
已成功恢复数据库 "atsdg"
DGMGRL>
此时在备库查看数据库的角色
SQL> SELECT DATABASE_ROLE, DB_UNIQUE_NAME INSTANCE, OPEN_MODE, PROTECTION_MODE, PROTECTION_LEVEL, SWITCHOVER_STATUS FROM V$DATABASE;
DATABASE_ROLE INSTANCE OPEN_MODE PROTECTION_MODE PROTECTION_LEVEL SWITCHOVER_STATUS
---------------- ------------------------------ -------------------- -------------------- -------------------- --------------------
PHYSICAL STANDBY atsdg READ ONLY WITH APPLY MAXIMUM PERFORMANCE MAXIMUM PERFORMANCE NOT ALLOWED
SQL>
备库的集群状态恢复正常。
PS C:\Users\Administrator> crsctl.exe stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
ONLINE ONLINE bzrdsrv1 STABLE
ONLINE ONLINE bzrdsrv2 STABLE
ora.LISTENER1.lsnr
ONLINE ONLINE bzrdsrv1 STABLE
ONLINE ONLINE bzrdsrv2 STABLE
ora.net1.network
ONLINE ONLINE bzrdsrv1 STABLE
ONLINE ONLINE bzrdsrv2 STABLE
ora.net2.network
ONLINE ONLINE bzrdsrv1 STABLE
ONLINE ONLINE bzrdsrv2 STABLE
ora.ons
ONLINE ONLINE bzrdsrv1 STABLE
ONLINE ONLINE bzrdsrv2 STABLE
ora.proxy_advm
OFFLINE OFFLINE bzrdsrv1 STABLE
OFFLINE OFFLINE bzrdsrv2 STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
1 ONLINE ONLINE bzrdsrv1 STABLE
2 ONLINE ONLINE bzrdsrv2 STABLE
3 ONLINE OFFLINE STABLE
ora.DATA.dg(ora.asmgroup)
1 ONLINE ONLINE bzrdsrv1 STABLE
2 ONLINE ONLINE bzrdsrv2 STABLE
3 OFFLINE OFFLINE STABLE
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE bzrdsrv2 STABLE
ora.OCR.dg(ora.asmgroup)
1 ONLINE ONLINE bzrdsrv1 STABLE
2 ONLINE ONLINE bzrdsrv2 STABLE
3 OFFLINE OFFLINE STABLE
ora.asm(ora.asmgroup)
1 ONLINE ONLINE bzrdsrv1 Started,STABLE
2 ONLINE ONLINE bzrdsrv2 Started,STABLE
3 OFFLINE OFFLINE STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
1 ONLINE ONLINE bzrdsrv1 STABLE
2 ONLINE ONLINE bzrdsrv2 STABLE
3 OFFLINE OFFLINE STABLE
ora.atsdg.db
1 ONLINE ONLINE bzrdsrv1 Open,Readonly,HOME=E
:\app\oracle\product
\19.3.0\dbhome_1,STA
BLE
2 ONLINE ONLINE bzrdsrv2 Open,Readonly,HOME=E
:\app\oracle\product
\19.3.0\dbhome_1,STA
BLE
ora.bzrdsrv1.vip
1 ONLINE ONLINE bzrdsrv1 STABLE
ora.bzrdsrv1_2.vip
1 ONLINE ONLINE bzrdsrv1 STABLE
ora.bzrdsrv2.vip
1 ONLINE ONLINE bzrdsrv2 STABLE
ora.bzrdsrv2_2.vip
1 ONLINE ONLINE bzrdsrv2 STABLE
ora.cvu
1 ONLINE ONLINE bzrdsrv2 STABLE
ora.qosmserver
1 ONLINE ONLINE bzrdsrv2 STABLE
ora.scan1.vip
1 ONLINE ONLINE bzrdsrv2 STABLE
--------------------------------------------------------------------------------
PS C:\Users\Administrator>
DGMGRL> SHOW CONFIGURATION;
配置 - ats
保护模式: MaxPerformance
成员:
ats - 主数据库
atsdg - (*) 物理备用数据库
快速启动故障转移: 在潜在数据丢失模式中启用
配置状态:
SUCCESS (状态已在 45 秒前更新)
DGMGRL>
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




