暂无图片
暂无图片
5
暂无图片
暂无图片
暂无图片

ADG 11.2.0.4同步延迟之经典报错:ORA-16191

原创 董小姐 2024-08-14
718

环境描述

RHEL7.9 11.2.0.4 ADG,一主一备。

问题描述

同事巡检时发现备库归档目录下无归档文件,且备库的最大SCN号落后于主库。查看告警日志发现如下报错:

Wed Aug 14 13:47:08 2024
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.
      returning error ORA-16191


问题原因

主备库密码文件不一致,甲方对主库sys用户进行了密码更改。

问题复现

检查主备库同步SCN情况

确保主备库Current log sequence一致

su - oracle
sqlplus / as sysdba
archive log list;

主库输出如下:

idle 14-AUG-24> archive log list;
Database log mode	       Archive Mode
Automatic archival	       Enabled
Archive destination	       /data/oradata/archivelog/
Oldest online log sequence     0
Next log sequence to archive   0
Current log sequence	       102

备库输出如下:

sys@FUWA 2024-08-14 06:41:03> archive log list;
Database log mode	       Archive Mode
Automatic archival	       Enabled
Archive destination	       /data/oradata/archivelog/
Oldest online log sequence     99
Next log sequence to archive   102
Current log sequence	       102

检查备库同步状态

说明问题复现前同步是正常的。

检查同步时间

v$dataguard_stats 该视图只能在备库查,主库无返回记录。

select value from v$dataguard_stats where NAME='apply lag';
或
col name for a50
set linesize 999
select name,value from v$dataguard_stats;

输出如下:

NAME						             VALUE
-------------------------------------------------- -----------------------------------
transport lag					       +00 00:00:00
apply lag
apply finish time
estimated startup time				   8

补充:v$dataguard_stats视图说明

name字段

transport lag:没有传到备库的redo量或者在已经传输到备库但是备库没有应用的redo量。

apply lag:备库通过应用主库传过来的redo日志与主库同步所延迟的时间。

apply finish time :表示在备库上完成redo应用所需要的时间。

estimated startup time:启动和打开物理备库需要的时间,不适用逻辑备库

value字段

参数的值

unit字段

各个参数的时间单元

TIME_COMPUTED字段

物理备库上估算各个参数的本地时间

检查进程

select process,status,client_process,thread#,sequence# from v$managed_standby;

输出如下:

sys@FUWA 2024-08-14 09:26:11> select process,status,client_process,thread#,sequence# from v$managed_standby;

PROCESS   STATUS       CLIENT_P    THREAD#  SEQUENCE#
--------- ------------ -------- ---------- ----------
ARCH	  CLOSING      ARCH		 1	  101
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CLOSING      ARCH		 1	   97
ARCH	  CLOSING      ARCH		 1	  102
ARCH	  CLOSING      ARCH		 1	  103
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
RFS	    IDLE	       ARCH		 0	    0
RFS	    IDLE	       UNKNOWN		 0	    0
RFS	    IDLE	       UNKNOWN		 0	    0
RFS	    IDLE	       UNKNOWN		 0	    0
RFS	    IDLE	       LGWR		     1	  104
RFS	    IDLE	       UNKNOWN		 0	    0
RFS	    IDLE	       UNKNOWN		 0	    0

37 rows selected.

MRP0不存在,RFS进程存在

主库更改sys用户密码

su - oracle
sqlplus / as sysdba
alter user sys identified by "top@123";

主库重启远程归档线程参数

生产环境不需要更改该参数,数据量达到一定程度后会触发同步报错。

--远程归档线程参数   参数defer是禁用重新发起的连接,但是不对已有连接处理 defer:推迟 
alter system set log_archive_dest_state_2='defer';

--远程归档线程参数   启动
alter system set log_archive_dest_state_2='enable';

主库构建测试数据

生成一百万测试数据

--创建一个表,并同时添加1000000条数据
create table TestTable as 
select rownum as id,
               to_char(sysdate + rownum/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime,
               trunc(dbms_random.value(0, 100)) as random_id,
               dbms_random.string('x', 20) random_string
          from dual
        connect by level <= 1000000;

主库切归档

sqlplus / as sysdba
alter system switch logfile;

分析过程

检查主备库同步SCN情况

发现主备库Current log sequence不一致

su - oracle
sqlplus / as sysdba
archive log list;

主库输出如下:

sys@FUWA 2024-08-14 14:09:39> archive log list;
Database log mode	       Archive Mode
Automatic archival	       Enabled
Archive destination	       /data/oradata/archivelog/
Oldest online log sequence     106
Next log sequence to archive   109
Current log sequence	         109    #主库比备库大

备库输出如下:

sys@FUWA 2024-08-14 14:09:29> archive log list;
Database log mode	       Archive Mode
Automatic archival	       Enabled
Archive destination	       /data/oradata/archivelog/
Oldest online log sequence     101
Next log sequence to archive   0
Current log sequence	         104

查看主备库参数

remote_login_passwordfile 值为 EXCLUSIVE 表示设置正常

主库

sys@FUWA 2024-08-14 09:02:31> show parameter remote_login_passwordfile

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
remote_login_passwordfile	     string	 EXCLUSIVE

备库

sys@FUWA 2024-08-14 09:09:39> show parameter remote_login_passwordfile

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
remote_login_passwordfile	     string	 EXCLUSIVE

查主备库告警日志

主备库告警日志位置:$ORACLE_BASE/diag/rdbms/服务名/实例名/trace/alter_sid.log

主库

cd /data/u01/app/oracle/diag/rdbms/fuwa/fuwa/trace
tail -300f alert_fuwa.log

输出如下:

Wed Aug 14 13:47:08 2024
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.
      returning error ORA-16191

备库

cd /data/u01/app/oracle/diag/rdbms/wafu/wafu/trace
tail -300f alert_wafu.log
无报错

查看归档同步报错

select dest_name,status,error from v$archive_dest where dest_id=1 or dest_id=2;

主库

sys@FUWA 2024-08-14 13:54:36> select dest_name,status,error from v$archive_dest where dest_id=1 or dest_id=2;

DEST_NAME																															 STATUS    ERROR
-------------- -----------------------------------------------------------------
LOG_ARCHIVE_DEST_1																														 VALID
LOG_ARCHIVE_DEST_2																														 ERROR ORA-16191: Primary log shipping client not logged on standby

备库

备库无报错

sys@FUWA 2024-08-14 13:51:27> select dest_name,status,error from v$archive_dest where dest_id=1 or dest_id=2;

DEST_NAME																															 STATUS    ERROR
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------- -----------------------------------------------------------------
LOG_ARCHIVE_DEST_1																														 VALID
LOG_ARCHIVE_DEST_2																														 VALID

检查备库同步状态

检查同步时间

v$dataguard_stats 该视图只能在备库查,主库无返回记录。

select value from v$dataguard_stats where NAME='apply lag';
或
col name for a50
set linesize 999
select name,value from v$dataguard_stats;

输出如下:

NAME						             VALUE
-------------------------------------------------- -----------------------------------
transport lag					       +00 00:00:00
apply lag
apply finish time
estimated startup time				   8

补充:v$dataguard_stats视图说明

name字段

transport lag:没有传到备库的redo量或者在已经传输到备库但是备库没有应用的redo量。

apply lag:备库通过应用主库传过来的redo日志与主库同步所延迟的时间。

apply finish time :表示在备库上完成redo应用所需要的时间。

estimated startup time:启动和打开物理备库需要的时间,不适用逻辑备库

value字段

参数的值

unit字段

各个参数的时间单元

TIME_COMPUTED字段

物理备库上估算各个参数的本地时间

检查进程

select process,status,client_process,thread#,sequence# from v$managed_standby;

输出如下:

sys@FUWA 2024-08-14 09:26:11> select process,status,client_process,thread#,sequence# from v$managed_standby;

PROCESS   STATUS       CLIENT_P    THREAD#  SEQUENCE#
--------- ------------ -------- ---------- ----------
ARCH	  CLOSING      ARCH		 1	  101
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CLOSING      ARCH		 1	   97
ARCH	  CLOSING      ARCH		 1	  102
ARCH	  CLOSING      ARCH		 1	  103
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
ARCH	  CONNECTED    ARCH		 0	    0
RFS	    IDLE	       ARCH		 0	    0
MRP0    WAIT_FOR_LOG N/A               1       104
RFS	    IDLE	       UNKNOWN		 0	    0
RFS	    IDLE	       UNKNOWN		 0	    0
RFS	    IDLE	       UNKNOWN		 0	    0
RFS	    IDLE	       LGWR		     1	  104
RFS	    IDLE	       UNKNOWN		 0	    0
RFS	    IDLE	       UNKNOWN		 0	    0

37 rows selected.

MRP0、RFS进程存在。

注意:

有的模拟场景MRP0进程不存在且apply lag 的 value 为空。

解决办法

将主库的密码文件传输至备库并重命名

主库

将主库的密码文件传输至备库

scp $ORACLE_HOME/dbs/orapwfuwa oracle@192.168.40.56:/data/u01/app/oracle/product/11.2.0/db/dbs/

备库

备份原密码文件,重命名主库传输至备库的密码文件

su - oracle
cd $ORACLE_HOME/dbs/
mv orapwfuwa orapwwafu

再次查看主备库同步状态,恢复正常。

总结

sys主备库密码不一致不同版本报错不一样:
Oracle 版本:11.2.0.3 报错为:ORA-01031: insufficient privileges
Oracle 版本:11.2.0.4 报错为:ORA-16191: Primary log shipping client not logged on standby

11G 修改sys 或密码文件均需要同步至备库,否则将影响同步,12c以Oracle推出了db passwd asm存储的特性,RAC TO RAC ADG的情况下,DB的口令文件存储在ASM中,那么主库修改的操作,可以同步到DG环境,避免每次修改都需要人为修改!

在密码不一致的情况下,主库报错:ORA-01031或ORA-16191后,备库v$dataguard_stats视图:apply lag 的值依然:+00 00:00:00。
进程RFS、MRP0未见异常,可能随着时间长会有更新变动。但根据v$dataguard_stats 的apply lag 值做同步时间监控,就不是很敏感了。

从测试上来看,密码修改或密码文件清除,如果不立即启动重传归档的话,不影响归档传输。案例的报错应该是之前修改过sys密码,没有同步密码文件。本身库数据变化就少,归档也没多少,所以等了几天后触发了重传归档,才有了0点才出现报错的问题。



参考链接:DG 同步延迟之奇怪的经典报错:ORA-16191 - 墨天轮 (modb.pro)

一次特别的ADG ORA-16191故障解决-CSDN博客


最后修改时间:2024-08-15 18:07:01
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论