暂无图片
暂无图片
3
暂无图片
暂无图片
暂无图片

ORA-19501&ORA-15081&11g_RAC迁移到单点报错

原创 董小姐 2024-04-09
676

问题描述

清明节假期计划将11g RAC核心业务库迁移到11g 单点实例中,最初采用rman联机备份的方式在节点1上进行备份,备份中途提示icpsp53.dbf文件备份报错:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================

RMAN-03009: failure of backup command on c2 channel at 04/04/2024 01:21:29
ORA-19501: read error on file "+DATA/icpsp/datafile/icpsp53.dbf", block number 156672 (block size=8192)
ORA-15081: failed to submit an I/O operation to a disk
ORA-15081: failed to submit an I/O operation to a disk

根据群里大佬们查MOS说是存储IO有问题,可是同事反馈如果存储IO有问题,平时办理业务是正常。

节点1会话界面等待期间不断输出如下报错:

查询操作系统日志也是频繁记录如上报错,现场同事反馈从服务器上架就一直存在该报错,不影响平常使用。我也就没放心上,和群里大佬请教后才得知这是内存有故障的意思,会导致读取数据文件报错,实例可能crash。和头头沟通后关闭节点1,在节点2上进行备份,规避内存报错可能引发备份报错。但是后面在节点2备份报错依旧,也可以说明可能不是内存报错引发的数据文件读取报错。

分析过程

更换备份方法

更改备份方式1:采用rman冷备份备份

--关闭整个实例
srvctl stop database -d icpsp

--启动实例至mount状态
srvctl start database -d icpsp -n rac02 -o mount

Usage: srvctl start database -d <db_unique_name> [-o <start_options>] [-n <node>]
    -d <db_unique_name>      Unique name for the database
    -o <start_options>       Options to startup command (e.g. OPEN, MOUNT, or 'READ ONLY')
    -n <node>                Node on which to start the database (only for RAC One Node databases)
    -h                       Print usage

或
--节点2  mount库  
登录节点2实例下 startup mount

推测是由于归档5分钟1清理导致,和客户申请停止业务进行rman冷备份(mount下rman备份),依然报同样错误。

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================

RMAN-03009: failure of backup command on c2 channel at 04/04/2024 01:21:29
ORA-19501: read error on file "+DATA/icpsp/datafile/icpsp53.dbf", block number 156672 (block size=8192)
ORA-15081: failed to submit an I/O operation to a disk
ORA-15081: failed to submit an I/O operation to a disk

更改备份方式2:采用cp数据文件备份

--关闭整个实例
srvctl stop database -d icpsp

和头头申请闭库cp数据文件的方式,依然报错,不过报错和rman联机、冷备不一样,报错如下:

查询坏块

如果有坏块通常会2提示ORA-01578的报错,但是备份中的没有ORA-01578的报错,存储又是第三方且早已过保,抱着查坏块的思路尝试

使用dbv工具校验坏块

--使用dbv工具校验坏块
[oracle@rac01 20240403]$ dbv file=+DATA/icpsp/datafile/icpsp53.dbf  userid=sys/icpspa123

DBVERIFY: Release 11.2.0.4.0 - Production on Thu Apr 4 07:55:46 2024

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

DBVERIFY - Verification starting : FILE = +DATA/icpsp/datafile/icpsp53.dbf

DBV-00111: OCI failure (4159) (ORA-19501: read error on file "+DATA/icpsp/datafile/icpsp53.dbf", block number 156672 (block size=8192)
ORA-15081: failed to submit an I/O operation to a disk
ORA-15081: failed to submit an I/O operation to a disk
ORA-06512: at "SYS.DBMS_DBVERIFY", line 24
ORA-06512: at line 1
)

使用RMAN工具校验

--RMAN工具校验
RMAN> backup validate datafile 69
RMAN就像真正备份时一样读取整个要备份的文件,但RMAN实际上并不产生任何备份集或镜像副本。

长时间无输出

或者
RMAN> validate datafile 69;
RMAN> validate datafile 69 block 156672;
使用validate命令可以手动检查数据库文件中物理损坏或逻辑损坏和数据文件丢失,或者确定备份集是否可以用于还原。

长时间无输出

RMAN> validate database;
长时间无输出

查询记录坏块视图

--校验出来的坏块会记录在v$database_block_corruption视图中
SQL> select * from v$database_block_corruption;

无输出

查询报错的块对应的对象是表还是索引

--根据报错提示的文件号块号确定坏块对象
set linesize 999 pagesize 999
select tablespace_name,segment_type,owner,segment_name from dba_extents where file_id=69 and 156672 between block_id AND block_id + blocks - 1;

TABLESPACE_NAME 					     SEGMENT_TYPE			  OWNER 						       SEGMENT_NAME
------------------------------------------------------------ ------------------------------------ ------------------------------------------------------------ ------------------------------------------------------------------------------------------------------------------------------------------------------------------
ICPSP					        		     INDEX				      ICPSPA						       SYS_C0017272

--查看索引对应哪个表的哪个列
SQL> select index_name ,table_name from dba_indexes where index_name='SYS_C0017272';

INDEX_NAME						     TABLE_NAME
------------------------------------------------------------ ------------------------------------------------------------
SYS_C0017272						     UPLOAD_MATERIAL_LOCATION

SQL> set linesize 999 pagesize 999
SQL> col index_owner for a30
SQL> col index_name for a30
SQL> col table_owner for a30
SQL> col column_name for a30
SQL> select index_owner,index_name,table_owner,table_name,column_name from dba_ind_columns where table_name='UPLOAD_MATERIAL_LOCATION' and index_name='SYS_C0017272';

INDEX_OWNER		       INDEX_NAME		      TABLE_OWNER		     TABLE_NAME 						  COLUMN_NAME
------------------------------ ------------------------------ ------------------------------ ------------------------------------------------------------ ------------------------------
ICPSPA			       SYS_C0017272		      ICPSPA			     UPLOAD_MATERIAL_LOCATION					  ID

重建索引

alter index SYS_C0017272 rebuild ;

重建索引后备份还是同样的报错。

更改备份方式3:数据泵备份

--查询undo_rentention
SQL> show parameter undo

NAME				     TYPE		    VALUE
------------------------------------ ---------------------- ------------------------------
_gc_undo_affinity		     boolean		    FALSE
undo_management 		     string		    AUTO
undo_retention			     integer		    900
undo_tablespace 		     string		    UNDOTBS1


--扩大undo_rentention  规避导出过程中快照过旧
alter system set undo_retention=18000

--查询目录
col DIRECTORY_PATH for a50
set lin 1000
select OWNER,DIRECTORY_NAME,DIRECTORY_PATH from dba_directories;

--创建目录
create directory expdp_backup as '/75data/expdp';
grant read,write on directory expdp_backup to icpspa;

--编写备份文件  rac备份必须加cluster=no参数
cat icpspaexp20240404.par
directory=expdp_backup dumpfile=icpsp_full_20240404_%U.dmp filesize=100g logfile=icpsp_full_20240404.log  schemas=ICPSPA cluster=no

--在节点2上导出备份
nohup expdp icpspa/icpspa123 parfile=icpspaexp20240404.par  &

有大佬提醒如果坏块不处理,可能数据泵抽取数据也会报错,果不其然,备份2张大表中有如下报错:

业务已停止,undo_retention参数值已扩大。

郑重提示:数据泵中途报错千万不要手动停止,即使报错会继续备份其他对象,

解决办法

对整库采用数据泵导出导入的方式进行迁移

--编写备份文件  rac备份必须加cluster=no参数
cat icpspaexp20240404.par
directory=expdp_backup dumpfile=icpsp_full_20240404_%U.dmp filesize=100g logfile=icpsp_full_20240404.log  schemas=ICPSPA cluster=no

--在节点2上导出备份
nohup expdp icpspa/icpspa123 parfile=icpspaexp20240404.par  &

针对数据泵导出过程中的报错采用对提示报错的表通过时间范围分批进行导出并导入,得亏导出过程中未发生报错,数据都导出并导入。

--导出
[oracle@rac02 ~]$ cat expdp_reg_busmaiinf_20240406.sh
source ~/.bash_profile
export ORACLE_SID=icpsp2

expdp \'/ as sysdba\' directory=expdp_backup dumpfile=expdp_reg_busmaiinf_xmldata_20240406_0.dmp logfile=expdp_reg_busmaiinf_xmldata_20240406_0.log tables=icpspa.reg_busmaiinf_xmldata query=\"where timestamp\>to_date\(\'20240101\',\'yyyy-mm-dd\'\)\" cluster=N VERSION=11.2 exclude=STATISTICS


[oracle@rac02 ~]$ cat expdp_reg_busmaiinf_20240407_20200101_20200531.sh
source ~/.bash_profile
export ORACLE_SID=icpsp2

expdp \'/ as sysdba\' directory=expdp_backup dumpfile=expdp_reg_busmaiinf_xmldata_20240407_20200101_20200531.dmp logfile=expdp_reg_busmaiinf_xmldata_20240407_20200101_20200531.log tables=icpspa.reg_busmaiinf_xmldata query=\"where timestamp\>to_date\(\'20200101\',\'yyyy-mm-dd\'\) and timestamp\<=to_date\(\'20200601\',\'yyyy-mm-dd\'\)\" cluster=N VERSION=11.2 exclude=STATISTICS

[oracle@rac02 ~]$ cat expdp_reg_busmaiinf_20240407_20200601_20201231.sh
source ~/.bash_profile
export ORACLE_SID=icpsp2

expdp \'/ as sysdba\' directory=expdp_backup dumpfile=expdp_reg_busmaiinf_xmldata_20240407_20200601_20201231.dmp logfile=expdp_reg_busmaiinf_xmldata_20240407_20200601_20201231.log tables=icpspa.reg_busmaiinf_xmldata query=\"where timestamp\>to_date\(\'20200601\',\'yyyy-mm-dd\'\) and timestamp\<=to_date\(\'20210101\',\'yyyy-mm-dd\'\)\" cluster=N VERSION=11.2 exclude=STATISTICS
[oracle@rac02 ~]$ 


--导入
nohup impdp icpspa/icpspa123 parfile=icpspaimp20240406_reg_busmaiinf_xmldata_20230101_20230531.par  &

cat icpspaimp20240406_reg_busmaiinf_xmldata_20230101_20230531.par
directory=impdp1_backup dumpfile=expdp_reg_busmaiinf_xmldata_20240406_2.dmp logfile=reg_busmaiinf_xmldata_20230101_20230531_impdp_20240406.log  CONTENT=DATA_ONLY    TABLE_EXISTS_ACTION=APPEND

最后经过3天加2夜的加班终于完成了业务库迁移,感谢JiekeXu DBA 技术交流群中的大佬们清明节假期献计献策。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论