暂无图片
暂无图片
3
暂无图片
暂无图片
暂无图片

Oracle X7一体机存储节点重启导致u01使用率不断增大

原创 Xiaofei Huangfu 2023-03-09
1051

背景:存储节点发生存储,数据文件状态不支持,计算节点2产生trace过多导致u01使用率不断增高
基础环境:Oracle X7一体机 数据库版本:12.2.0.1.180717
以下是分析和处理过程:

1、数据库日志


ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: ‘+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405’
2023-02-15T16:00:58.833384+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl18_107783.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: ‘+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405’
2023-02-15T16:00:58.833398+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl11_267126.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: ‘+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405’
2023-02-15T16:00:58.833460+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl07_6783.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: ‘+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405’
2023-02-15T16:00:58.833478+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl17_163546.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: ‘+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405’
2023-02-15T16:00:58.833498+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl32_372326.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: ‘+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405’
2023-02-15T16:00:58.833511+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl00_372255.trc:
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: ‘+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405’
2023-02-15T16:00:58.833515+08:00
Errors in file /u01/app/oracle/diag/rdbms/SKEDCDB/SKEDCDB2/trace/SKEDCDB2_cl24_54355.trc:
ORA-00376: file 7 cannot be read at this time

数据库日志忠报错为ORA-00376,ORA-0110,从日志看读取undo表空间的’+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405’数据文件没有成功,产生多个trace。

2、trace日志

*** 2023-02-15T15:25:46.303098+08:00 (CDB$ROOT(1))
kssxdl signal 376: 0xbcd85410 = transaction (78) @ false_loc 0
  ----------------------------------------
  SO: 0xbcd85410, type: 78, owner: 0x79849860, flag: INIT/-/-/-/0x00 if: 0x3 c: 0x3
   proc=0xe81373f0, name=transaction, file=ktccts.h LINE:451, pg=0, conuid=1
   SGA version=(1,0)
ORA-00376: file 7 cannot be read at this time
ORA-01110: data file 7: '+DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405'
kssxdl success: so 0x3fb9b5a00 = session
PMON waiting for 1000 csecs

trace日志的内容和alert一致,读取数据文件失败,PMON进程有等待。
## 3、ASM日志
2023-02-06T01:05:00.756765+08:00
Exadata cell: o/..... is no longer accessible. I/O errors to disks on this might get suppressed
2023-02-06T01:05:00.868575+08:00
Some DDE async actions failed or were cancelled
NOTE: failed resync of disk group 1 disks
DATAC1_CD_02_CRMPCELADM03 (4)
DATAC1_CD_00_CRMPCELADM03 (8)
DATAC1_CD_03_CRMPCELADM03 (10)
DATAC1_CD_04_CRMPCELADM03 (16)
DATAC1_CD_09_CRMPCELADM03 (22)
DATAC1_CD_10_CRMPCELADM03 (33)
DATAC1_CD_05_CRMPCELADM03 (39)
DATAC1_CD_11_CRMPCELADM03 (40)

ORA-15032: not all alterations performed
ORA-15080: synchronous I/O operation failed to write block 0 of disk 22 in disk group DATAC1
ORA-15080: synchronous I/O operation failed to write block 0 of disk 4 in disk group DATAC1
ORA-15080: synchronous I/O operation failed to write block 0 of disk 33 in disk group DATAC1
ORA-15080: synchronous I/O operation failed to write block 0 of disk 39 in disk group DATAC1

2023-02-06T01:05:01.106577+08:00
ERROR: /* Exadata Auto Mgmt: ONLINE ASM Disk */
alter diskgroup DATAC1 online disk DATAC1_CD_04_CRMPCELADM03
  , DATAC1_CD_02_CRMPCELADM03
  , DATAC1_CD_05_CRMPCELADM03
  , DATAC1_CD_03_CRMPCELADM03
  , DATAC1_CD_00_CRMPCELADM03
  , DATAC1_CD_10_CRMPCELADM03
  , DATAC1_CD_11_CRMPCELADM03
  , DATAC1_CD_09_CRMPCELADM03
nowait

WARNING: Exadata Auto Management: OS PID: 331356 Operation ID: 911602: ONLINE disk RECOC1_CD_04_CRMPCELADM03 in diskgroup RECOC1 Failed
ASM日志中有I/O错误,自动添加磁盘没有成功。	ORA-15080 Synchronous I/O Operation Failed With Exadata error:'HARD check failed' (Doc ID 2412871.1)
对于该错误按官方建议可以打补丁,也可以通过以下方式
Workaround:-

关闭ASM实例的磁盘检测
```sql
SQL>alter diskgroup RECOC1 set attribute 'hard_check.enabled' = 'FALSE';

ALTER DISKGROUP RECOC1 REBALANCE POWER 32;
完成rebalance后再开启磁盘检测
SQL>alter diskgroup RECOC1 set attribute 'hard_check.enabled' = 'TRUE';

4、处理过程

(1)对数据库进行rman全备。
(2)检测数据文件状态

SQL> select file#,name,status,enabled from v$datafile where status='RECOVER'; FILE# NAME STATUS ENABLED ------ -------------------------------- --------- --------- 3 +DATAC1/SKEDCDB/DATAFILE/sysaux.1002.1066835393 RECOVER READ WRITE 7 +DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405 RECOVER READ WRITE 10 +DATAC1/SKEDCDB/PDB/DATAFILE/sysaux.975.1066838165 RECOVER READ WRITE

有3个数据文件状态为RECOVER状态,其中CDB中2个sysaux和udnotbs2表空间各1个,pdb中sysaux表空间1个数据文件。
无法查询表空间使用率

SQL> select a.tablespace_name, round(a.s,2) "CURRENT_TOTAL(MB)", round(a.s - f.s,2) "USED(MB)", f.s "FREE(MB)", round((a.s-f.s)/ a.s * 100, 2) "USED%", g.autoextensible, round(a.ms,2) "MAX_TOTAL(MB)", round((a.s-f.s)/ a.ms * 100, 2) "MAX USED%" from (select d.tablespace_name, sum(bytes / 1024 / 1024) s, sum(decode(maxbytes, 0, bytes, maxbytes) / 1024 / 1024) ms from dba_data_files d group by d.tablespace_name) a, (select f.tablespace_name, sum(f.bytes / 1024 / 1024) s from dba_free_space f group by f.tablespace_name) f, (select distinct tablespace_name, autoextensible from DBA_DATA_FILES where autoextensible = 'YES' union select distinct tablespace_name, autoextensible from DBA_DATA_FILES where autoextensible = 'NO' and tablespace_name not in (select distinct tablespace_name from DBA_DATA_FILES where autoextensible = 'YES')) g where a.tablespace_name = f.tablespace_name and g.tablespace_name = f.tablespace_name order by "MAX USED%"; from dba_data_files d *

ERROR at line 12:
ORA-00376: file 3 cannot be read at this time
ORA-01110: data file 3: ‘+DATAC1/SKEDCDB/DATAFILE/sysaux.1002.1066835393’
(3)重建undo表空间

SQL> create undo tablespace UNDOTBS3 datafile '+DATAC1/SKEDCDB/DATAFILE/undotbs3_01.dbf' size 5G autoextend on;

(4)修复数据文件

rman target / Recovery Manager: Release 12.2.0.1.0 - Production on Mon Feb 20 22:22:50 2023 Copyright (c) 1982, 2017, Oracle and/or its affiliates. All rights reserved. connected to target database: PLANNING (DBID=2741129465) RMAN> RMAN> recover datafile 3; Starting recover at 20-FEB-23 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=2185 instance=SKEDCDB1 device type=DISK starting media recovery archived log for thread 1 with sequence 14340 is already on disk as file +DATAC1/SKEDCDB/ARCHIVELOG/2023_02_20/thread_1_seq_14340.794.1129321103 ...

通过rman修复3个数据文件,然后将数据文件onlin
在CDB中进行3号数据文件online

SQL> alter database datafile 3 online; SQL> alter database datafile 10 online; alter database datafile 10 online * ERROR at line 1: ORA-01516: nonexistent log file, data file, or temporary file "10" in the current container PDB的数据文件需要在pdb中进行online

(5)检查数据文件状态

SQL> select file#,name,status,enabled from v$datafile where status='RECOVER'; FILE# NAME STATUS ENABLED ------ -------------------------------- --------- --------- 3 +DATAC1/SKEDCDB/DATAFILE/sysaux.1002.1066835393 ONLINE READ WRITE 7 +DATAC1/SKEDCDB/DATAFILE/undotbs2.982.1066835405 ONLINE READ WRITE 10 +DATAC1/SKEDCDB/PDB/DATAFILE/sysaux.975.1066838165 ONLINE READ WRITE

数据文件状态正常
(6)删除undotbs2表空间

SQL> drop tablespace undotbs2 including contents and datafiles; drop tablespace undotbs2 including contents and datafiles * ERROR at line 1: ORA-01548: active rollback segment '_SYSSMU11_759994052$' found, terminate dropping tablespace SQL> Select segment_name, status, tablespace_name 2 from dba_rollback_segs 3 where status not in ('ONLINE', 'OFFLINE'); SEGMENT_NAME STATUS TABLESPACE_NAME ------------------------------ ---------------- ------------------------------ _SYSSMU11_759994052$ NEEDS RECOVERY UNDOTBS2 _SYSSMU12_3581412579$ NEEDS RECOVERY UNDOTBS2 _SYSSMU13_398964664$ NEEDS RECOVERY UNDOTBS2 _SYSSMU14_2753826883$ NEEDS RECOVERY UNDOTBS2 _SYSSMU15_1616211766$ NEEDS RECOVERY UNDOTBS2 _SYSSMU16_2927909258$ NEEDS RECOVERY UNDOTBS2 _SYSSMU17_4161369910$ NEEDS RECOVERY UNDOTBS2 _SYSSMU18_2116035987$ NEEDS RECOVERY UNDOTBS2 _SYSSMU19_2637726138$ NEEDS RECOVERY UNDOTBS2 _SYSSMU20_3386602257$ NEEDS RECOVERY UNDOTBS2 10 rows selected.

参数文件加入以下内容
_CORRUPTED_ROLLBACK_SEGMENTS=(_SYSSMU11_759994052,SYSSMU123581412579,_SYSSMU12_3581412579,_SYSSMU13_398964664,SYSSMU142753826883,_SYSSMU14_2753826883,_SYSSMU15_1616211766,SYSSMU162927909258,_SYSSMU16_2927909258,_SYSSMU17_4161369910,SYSSMU182116035987,_SYSSMU18_2116035987,_SYSSMU19_2637726138,SYSSMU203386602257,_SYSSMU20_3386602257)
使用pfile重新启动数据库

SQL> startup pfile='/home/oracle/pfile_skedcdb_0220.ora'; ORA-32006: SEC_CASE_SENSITIVE_LOGON initialization parameter has been deprecated ORACLE instance started. Total System Global Area 2.6844E+10 bytes Fixed Size 26502656 bytes Variable Size 1.2176E+10 bytes Database Buffers 1.4563E+10 bytes Redo Buffers 77959168 bytes Database mounted. Database opened. SQL> SQL> SQL> show pdbs CON_ID CON_NAME OPEN MODE RESTRICTED ---------- ------------------------------ ---------- ---------- 2 PDB$SEED READ ONLY NO 3 SKEDCDB MOUNTED 删除undotbs2表空间 SQL> drop tablespace undotbs2 including contents and datafiles; Tablespace dropped. SQL> SQL> 重启数据库 SQL> shutdown immediate Database closed. Database dismounted. ORACLE instance shut down. SQL> SQL> SQL> SQL> SQL> SQL> SQL> startup ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance ORA-29707: inconsistent value 0 for initialization parameter _lm_use_tx_tsn with other instances undo参数2个节点不一致导致该报错 SQL> SQL> SQL> create pfile='/tmp/pfile.ora' from spfile; create pfile='/tmp/pfile.ora' from spfile * ERROR at line 1: ORA-01565: error in identifying file '?=/dbs/spfile@.ora' ORA-27037: unable to obtain file status Linux-x86_64 Error: 2: No such file or directory Additional information: 7 SQL> startup pfile='/home/oracle/pfileskedcdb_0220.ora'; ORA-32006: SEC_CASE_SENSITIVE_LOGON initialization parameter has been deprecated ORACLE instance started. Total System Global Area 2.6844E+10 bytes Fixed Size 26502656 bytes Variable Size 1.2176E+10 bytes Database Buffers 1.4563E+10 bytes Redo Buffers 77959168 bytes Database mounted. Database opened. SQL> 【小结】数据文件状态修复后,数据库alert日志不在产生读取数据文件失败相关的trace,u01目录使用率正常;建议生产环境数据库日志和数据文件状态等关键指标应该加强监控和巡检,发现问题及时处理。 -the end-
最后修改时间:2023-03-14 09:58:46
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论