环境描述
RHEL7.9 11.2.0.4 ADG,一主一备。
问题描述
同事巡检时发现备库归档目录下无归档文件,且备库的最大SCN号落后于主库。查看告警日志发现如下报错:
Wed Aug 14 13:47:08 2024
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.
returning error ORA-16191问题原因
主备库密码文件不一致,甲方对主库sys用户进行了密码更改。
问题复现
检查主备库同步SCN情况
确保主备库Current log sequence一致
su - oracle
sqlplus / as sysdba
archive log list;主库输出如下:
idle 14-AUG-24> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /data/oradata/archivelog/
Oldest online log sequence 0
Next log sequence to archive 0
Current log sequence 102备库输出如下:
sys@FUWA 2024-08-14 06:41:03> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /data/oradata/archivelog/
Oldest online log sequence 99
Next log sequence to archive 102
Current log sequence 102检查备库同步状态
说明问题复现前同步是正常的。
检查同步时间
v$dataguard_stats 该视图只能在备库查,主库无返回记录。
select value from v$dataguard_stats where NAME='apply lag';
或
col name for a50
set linesize 999
select name,value from v$dataguard_stats;输出如下:
NAME VALUE
-------------------------------------------------- -----------------------------------
transport lag +00 00:00:00
apply lag
apply finish time
estimated startup time 8
补充:v$dataguard_stats视图说明
name字段
transport lag:没有传到备库的redo量或者在已经传输到备库但是备库没有应用的redo量。
apply lag:备库通过应用主库传过来的redo日志与主库同步所延迟的时间。
apply finish time :表示在备库上完成redo应用所需要的时间。
estimated startup time:启动和打开物理备库需要的时间,不适用逻辑备库
value字段
参数的值
unit字段
各个参数的时间单元
TIME_COMPUTED字段
物理备库上估算各个参数的本地时间
检查进程
select process,status,client_process,thread#,sequence# from v$managed_standby;输出如下:
sys@FUWA 2024-08-14 09:26:11> select process,status,client_process,thread#,sequence# from v$managed_standby;
PROCESS STATUS CLIENT_P THREAD# SEQUENCE#
--------- ------------ -------- ---------- ----------
ARCH CLOSING ARCH 1 101
ARCH CONNECTED ARCH 0 0
ARCH CLOSING ARCH 1 97
ARCH CLOSING ARCH 1 102
ARCH CLOSING ARCH 1 103
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
RFS IDLE ARCH 0 0
RFS IDLE UNKNOWN 0 0
RFS IDLE UNKNOWN 0 0
RFS IDLE UNKNOWN 0 0
RFS IDLE LGWR 1 104
RFS IDLE UNKNOWN 0 0
RFS IDLE UNKNOWN 0 0
37 rows selected.
MRP0不存在,RFS进程存在
主库更改sys用户密码
su - oracle
sqlplus / as sysdba
alter user sys identified by "top@123";主库重启远程归档线程参数
生产环境不需要更改该参数,数据量达到一定程度后会触发同步报错。
--远程归档线程参数 参数defer是禁用重新发起的连接,但是不对已有连接处理 defer:推迟
alter system set log_archive_dest_state_2='defer';
--远程归档线程参数 启动
alter system set log_archive_dest_state_2='enable';主库构建测试数据
生成一百万测试数据
--创建一个表,并同时添加1000000条数据
create table TestTable as
select rownum as id,
to_char(sysdate + rownum/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime,
trunc(dbms_random.value(0, 100)) as random_id,
dbms_random.string('x', 20) random_string
from dual
connect by level <= 1000000;主库切归档
sqlplus / as sysdba
alter system switch logfile;分析过程
检查主备库同步SCN情况
发现主备库Current log sequence不一致
su - oracle
sqlplus / as sysdba
archive log list;主库输出如下:
sys@FUWA 2024-08-14 14:09:39> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /data/oradata/archivelog/
Oldest online log sequence 106
Next log sequence to archive 109
Current log sequence 109 #主库比备库大备库输出如下:
sys@FUWA 2024-08-14 14:09:29> archive log list;
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /data/oradata/archivelog/
Oldest online log sequence 101
Next log sequence to archive 0
Current log sequence 104查看主备库参数
remote_login_passwordfile 值为 EXCLUSIVE 表示设置正常
主库
sys@FUWA 2024-08-14 09:02:31> show parameter remote_login_passwordfile
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
remote_login_passwordfile string EXCLUSIVE备库
sys@FUWA 2024-08-14 09:09:39> show parameter remote_login_passwordfile
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
remote_login_passwordfile string EXCLUSIVE查主备库告警日志
主备库告警日志位置:$ORACLE_BASE/diag/rdbms/服务名/实例名/trace/alter_sid.log
主库
cd /data/u01/app/oracle/diag/rdbms/fuwa/fuwa/trace
tail -300f alert_fuwa.log输出如下:
Wed Aug 14 13:47:08 2024
Error 1017 received logging on to the standby
------------------------------------------------------------
Check that the primary and standby are using a password file
and remote_login_passwordfile is set to SHARED or EXCLUSIVE,
and that the SYS password is same in the password files.
returning error ORA-16191备库
cd /data/u01/app/oracle/diag/rdbms/wafu/wafu/trace
tail -300f alert_wafu.log
无报错查看归档同步报错
select dest_name,status,error from v$archive_dest where dest_id=1 or dest_id=2;主库
sys@FUWA 2024-08-14 13:54:36> select dest_name,status,error from v$archive_dest where dest_id=1 or dest_id=2;
DEST_NAME STATUS ERROR
-------------- -----------------------------------------------------------------
LOG_ARCHIVE_DEST_1 VALID
LOG_ARCHIVE_DEST_2 ERROR ORA-16191: Primary log shipping client not logged on standby
备库
备库无报错
sys@FUWA 2024-08-14 13:51:27> select dest_name,status,error from v$archive_dest where dest_id=1 or dest_id=2;
DEST_NAME STATUS ERROR
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------- -----------------------------------------------------------------
LOG_ARCHIVE_DEST_1 VALID
LOG_ARCHIVE_DEST_2 VALID检查备库同步状态
检查同步时间
v$dataguard_stats 该视图只能在备库查,主库无返回记录。
select value from v$dataguard_stats where NAME='apply lag';
或
col name for a50
set linesize 999
select name,value from v$dataguard_stats;输出如下:
NAME VALUE
-------------------------------------------------- -----------------------------------
transport lag +00 00:00:00
apply lag
apply finish time
estimated startup time 8
补充:v$dataguard_stats视图说明
name字段
transport lag:没有传到备库的redo量或者在已经传输到备库但是备库没有应用的redo量。
apply lag:备库通过应用主库传过来的redo日志与主库同步所延迟的时间。
apply finish time :表示在备库上完成redo应用所需要的时间。
estimated startup time:启动和打开物理备库需要的时间,不适用逻辑备库
value字段
参数的值
unit字段
各个参数的时间单元
TIME_COMPUTED字段
物理备库上估算各个参数的本地时间
检查进程
select process,status,client_process,thread#,sequence# from v$managed_standby;输出如下:
sys@FUWA 2024-08-14 09:26:11> select process,status,client_process,thread#,sequence# from v$managed_standby;
PROCESS STATUS CLIENT_P THREAD# SEQUENCE#
--------- ------------ -------- ---------- ----------
ARCH CLOSING ARCH 1 101
ARCH CONNECTED ARCH 0 0
ARCH CLOSING ARCH 1 97
ARCH CLOSING ARCH 1 102
ARCH CLOSING ARCH 1 103
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
ARCH CONNECTED ARCH 0 0
RFS IDLE ARCH 0 0
MRP0 WAIT_FOR_LOG N/A 1 104
RFS IDLE UNKNOWN 0 0
RFS IDLE UNKNOWN 0 0
RFS IDLE UNKNOWN 0 0
RFS IDLE LGWR 1 104
RFS IDLE UNKNOWN 0 0
RFS IDLE UNKNOWN 0 0
37 rows selected.
MRP0、RFS进程存在。
注意:
有的模拟场景MRP0进程不存在且apply lag 的 value 为空。
解决办法
将主库的密码文件传输至备库并重命名
主库
将主库的密码文件传输至备库
scp $ORACLE_HOME/dbs/orapwfuwa oracle@192.168.40.56:/data/u01/app/oracle/product/11.2.0/db/dbs/备库
备份原密码文件,重命名主库传输至备库的密码文件
su - oracle
cd $ORACLE_HOME/dbs/
mv orapwfuwa orapwwafu再次查看主备库同步状态,恢复正常。
总结
sys主备库密码不一致不同版本报错不一样:
Oracle 版本:11.2.0.3 报错为:ORA-01031: insufficient privileges
Oracle 版本:11.2.0.4 报错为:ORA-16191: Primary log shipping client not logged on standby
11G 修改sys 或密码文件均需要同步至备库,否则将影响同步,12c以Oracle推出了db passwd asm存储的特性,RAC TO RAC ADG的情况下,DB的口令文件存储在ASM中,那么主库修改的操作,可以同步到DG环境,避免每次修改都需要人为修改!
在密码不一致的情况下,主库报错:ORA-01031或ORA-16191后,备库v$dataguard_stats视图:apply lag 的值依然:+00 00:00:00。
进程RFS、MRP0未见异常,可能随着时间长会有更新变动。但根据v$dataguard_stats 的apply lag 值做同步时间监控,就不是很敏感了。
从测试上来看,密码修改或密码文件清除,如果不立即启动重传归档的话,不影响归档传输。案例的报错应该是之前修改过sys密码,没有同步密码文件。本身库数据变化就少,归档也没多少,所以等了几天后触发了重传归档,才有了0点才出现报错的问题。
参考链接:DG 同步延迟之奇怪的经典报错:ORA-16191 - 墨天轮 (modb.pro)





