暂无图片
暂无图片
5
暂无图片
暂无图片
暂无图片

诡异的ORA-17503: ksfdopn: 2 δDATA/jyc/PASSWORD/pwdjyc.263.1131705075

原创 jieguo 2023-03-28
1330

故障现象:

oracle linux 7.9+oracle 19.18 rac(集群和数据库补丁打了最新的20230117PSU)
突然发现连续几天节点1在凌晨2:05左右节点1oracle alert日志报错如下:报错8次。
image.png

2023-03-18T02:05:11.597975+08:00
Errors in file /oracle/app/oracle/diag/rdbms/jyc/jyc1/trace/jyc1_ora_19622.trc:
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/jyc/PASSWORD/pwdjyc.263.1131705075
ORA-27300: 操作系统系统相关操作: open 失败, 状态为: 13
ORA-27301: 操作系统故障消息: Permission denied
ORA-27302: 错误发生在: sskgmsmr_7
2023-03-18T02:05:11.602151+08:00
Errors in file /oracle/app/oracle/diag/rdbms/jyc/jyc1/trace/jyc1_ora_19622.trc:
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/jyc/PASSWORD/pwdjyc.263.1131705075
ORA-27300: 操作系统系统相关操作: open 失败, 状态为: 13
ORA-27301: 操作系统故障消息: Permission denied
ORA-27302: 错误发生在: sskgmsmr_7
ORA-01017: 用户名/口令无效; 登录被拒绝
2023-03-18T02:05:12.644246+08:00
Errors in file /oracle/app/oracle/diag/rdbms/jyc/jyc1/trace/jyc1_ora_19727.trc:
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/jyc/PASSWORD/pwdjyc.263.1131705075
ORA-27300: 操作系统系统相关操作: open 失败, 状态为: 13
ORA-27301: 操作系统故障消息: Permission denied
ORA-27302: 错误发生在: sskgmsmr_7
2023-03-18T02:05:12.650235+08:00
Errors in file /oracle/app/oracle/diag/rdbms/jyc/jyc1/trace/jyc1_ora_19727.trc:
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/jyc/PASSWORD/pwdjyc.263.1131705075
ORA-27300: 操作系统系统相关操作: open 失败, 状态为: 13
ORA-27301: 操作系统故障消息: Permission denied
ORA-27302: 错误发生在: sskgmsmr_7
ORA-01017: 用户名/口令无效; 登录被拒绝
2023-03-18T02:05:13.246784+08:00
Errors in file /oracle/app/oracle/diag/rdbms/jyc/jyc1/trace/jyc1_ora_19778.trc:
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/jyc/PASSWORD/pwdjyc.263.1131705075
ORA-27300: 操作系统系统相关操作: open 失败, 状态为: 13
ORA-27301: 操作系统故障消息: Permission denied
ORA-27302: 错误发生在: sskgmsmr_7
2023-03-18T02:05:13.251266+08:00
Errors in file /oracle/app/oracle/diag/rdbms/jyc/jyc1/trace/jyc1_ora_19778.trc:
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/jyc/PASSWORD/pwdjyc.263.1131705075
ORA-27300: 操作系统系统相关操作: open 失败, 状态为: 13
ORA-27301: 操作系统故障消息: Permission denied
ORA-27302: 错误发生在: sskgmsmr_7
ORA-01017: 用户名/口令无效; 登录被拒绝

Trace file /oracle/app/oracle/diag/rdbms/jyc/jyc1/trace/jyc1_ora_19622.trc
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.18.0.0.0
Build label:    RDBMS_19.18.0.0.0DBRU_LINUX.X64_230111
ORACLE_HOME:    /oracle/app/oracle/product/19c/dbhome_1
System name:	Linux
Node name:	rac1
Release:	5.4.17-2102.201.3.el7uek.x86_64
Version:	#2 SMP Fri Apr 23 09:05:55 PDT 2021
Machine:	x86_64
Instance name: jyc1
Redo thread mounted by this instance: 1
Oracle process number: 170
Unix process pid: 19622, image: oracle@rac1 (TNS V1-V3)


*** 2023-03-18T02:05:11.597802+08:00 (CDB$ROOT(1))
*** SESSION ID:(2826.63186) 2023-03-18T02:05:11.597839+08:00
*** CLIENT ID:() 2023-03-18T02:05:11.597845+08:00
*** SERVICE NAME:(SYS$USERS) 2023-03-18T02:05:11.597849+08:00
*** MODULE NAME:(sqlplus@rac1 (TNS V1-V3)) 2023-03-18T02:05:11.597854+08:00
*** ACTION NAME:() 2023-03-18T02:05:11.597859+08:00
*** CLIENT DRIVER:(SERVER) 2023-03-18T02:05:11.597863+08:00
*** CONTAINER ID:(1) 2023-03-18T02:05:11.597868+08:00
 
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/jyc/PASSWORD/pwdjyc.263.1131705075
ORA-27300: 操作系统系统相关操作: open 失败, 状态为: 13
ORA-27301: 操作系统故障消息: Permission denied
ORA-27302: 错误发生在: sskgmsmr_7
ORA-17503: ksfdopn: 2 未能打开文件 +DATA/jyc/PASSWORD/pwdjyc.263.1131705075
ORA-27300: 操作系统系统相关操作: open 失败, 状态为: 13
ORA-27301: 操作系统故障消息: Permission denied
ORA-27302: 错误发生在: sskgmsmr_7
<error barrier> at 0x7ffd140a6538 placed kzia.c@2649
ORA-01017: 用户名/口令无效; 登录被拒绝
 

集群和数据库服务并没发现异常:

[oracle@rac1 trace]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.LISTENER.lsnr
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.chad
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.net1.network
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
ora.ons
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       rac1                     STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
      3        ONLINE  OFFLINE                               STABLE
ora.DATA.dg(ora.asmgroup)
      1        ONLINE  ONLINE       rac1                     STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rac1                     STABLE
ora.RECO.dg(ora.asmgroup)
      1        ONLINE  ONLINE       rac1                     STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.asm(ora.asmgroup)
      1        ONLINE  ONLINE       rac1                     Started,STABLE
      2        ONLINE  ONLINE       rac2                     Started,STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       rac1                     STABLE
      2        ONLINE  ONLINE       rac2                     STABLE
      3        OFFLINE OFFLINE                               STABLE
ora.jyc.db
      1        ONLINE  ONLINE       rac1                     Open,HOME=/oracle/ap
                                                             p/oracle/product/19c
                                                             /dbhome_1,STABLE
      2        ONLINE  ONLINE       rac2                     Open,HOME=/oracle/ap
                                                             p/oracle/product/19c
                                                             /dbhome_1,STABLE
ora.cvu
      1        ONLINE  ONLINE       rac1                     STABLE
ora.qosmserver
      1        ONLINE  ONLINE       rac1                     STABLE
ora.rac1.vip
      1        ONLINE  ONLINE       rac1                     STABLE
ora.rac2.vip
      1        ONLINE  ONLINE       rac2                     STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       rac1                     STABLE
--------------------------------------------------------------------------------
[oracle@rac1 trace]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       rac1                     STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       rac1                     STABLE
ora.crf
      1        ONLINE  ONLINE       rac1                     STABLE
ora.crsd
      1        ONLINE  ONLINE       rac1                     STABLE
ora.cssd
      1        ONLINE  ONLINE       rac1                     STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       rac1                     STABLE
ora.ctssd
      1        ONLINE  ONLINE       rac1                     ACTIVE:0,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       rac1                     STABLE
ora.evmd
      1        ONLINE  ONLINE       rac1                     STABLE
ora.gipcd
      1        ONLINE  ONLINE       rac1                     STABLE
ora.gpnpd
      1        ONLINE  ONLINE       rac1                     STABLE
ora.mdnsd
      1        ONLINE  ONLINE       rac1                     STABLE
ora.storage
      1        ONLINE  ONLINE       rac1                     STABLE
--------------------------------------------------------------------------------

故障分析:

第一感觉一定是数据库的定时任务引起,否则不可能这么规律。
第二想法是数据库集群服务都正常,为何会报这样的错误?权限肯定都是正常的,实在没道理。
查遍数据库里的所有任务,并没有异常。都是success的。

生成2:00-2:30的awr报告:符合trc报错特征的信息如下:
image.png
image.png
image.png

set line 160
set wrap off
col owner for a10
col JOB_NAME for a30
col errors for a30
col RUN_DURATION for a10
col STATUS for a10
select * from (SELECT INSTANCE_ID,STATUS,owner,JOB_NAME,RUN_DURATION,ACTUAL_START_DATE
FROM dba_scheduler_job_run_details order by actual_start_date desc) a where rownum < 50;

select job_name,ENABLED,STATE,RUN_COUNT,LAST_START_DATE,NEXT_RUN_DATE,LAST_RUN_DURATION from dba_scheduler_jobs;

参考:
	
New 11g Default Jobs (Doc ID 755838.1)

1. ORA$AUTOTASK_CLEAN
The job is created by the 11g script catmwin.sql - Catalog script for Maintenance WINdow
catmwin.sql defines maintenance window and stats collection job. This job is an autotask repository data ageing job and deletes obsolete AUTOTASK repository data. The delete of this old data will be performed by the procedure ora$age_autotask_data which will be executed by the job.

2. HM_CREATE_OFFLINE_DICTIONARY
The job is created by the 11g script catmwin.sql which mentions that this is a job for creation of offline dictionary for Database Repair Advisor.
From unpublished Note 602459.1 - 'What Is OFFLINE_DICTIONARY': 'The system job SYS.HM_CREATE_OFFLINE_DICTIONARY executes the dbms_hm.create_offline_dictionary package which creates a LogMiner offline dictionary in the ADR for DRA name translation service. The job for generating the logminer dictionary is scheduled during the maintenance window. This job can be disabled. '

3. DRA_REEVALUATE_OPEN_FAILURES
The job is created by the 11g script catmwin.sql which mentions that this is a job for reevaluate open failures for Database Recovery Advisor. In other words it is used to get assistance in case of loss or corruption of datafiles, redo log files or controlfiles.
The job executes the procedure dbms_ir.reevaluateopenfailures.
Note.466682.1 - 'Data Recovery Advisor -Reference Guide'.

4. MGMT_CONFIG_JOB - comes with the OCM(Oracle Configuration Manager) installation - This is a configuration collection job.
The job is created by the script ocmjb10.sql by running procedure 'ORACLE_OCM.MGMT_CONFIG.collect_config'.

5. MGMT_STATS_CONFIG_JOB
This is an OCM Statistics collection job created in ocmjb10.sql by running 'ORACLE_OCM.MGMT_CONFIG.collect_stats'.

6. BSLN_MAINTAIN_STATS_JOB
This job is a compute statistics job. This job runs the  BSLN_MAINTAIN_STATS_PROG program on the BSLN_MAINTAIN_STATS_SCHED schedule. The program BSLN_MAINTAIN_STATS_PROG will keep the default baseline's statistics up-to-date.

7. XMLDB_NFS_CLEANUP_JOB
The job is created in xdbu102.sql and runs the procedure dbms_xdbutil_int.cleanup_expired_nfsclients.

8. RSE$CLEAN_RECOVERABLE_SCRIPT - this job is STREAMS related.
It is created in execstr.sql and is new in 11.2. The script execstr.sql executes anonymous blocks for STReam. The job is responsible with the cleaning of recoverable scripts and calls 'sys.dbms_streams_auto_int.clean_recoverable_script;'.

9. SM$CLEAN_AUTO_SPLIT_MERGE - this job is STREAMS related. It is created in file execstr.sql and is new in 11.2 too. The job is responsible with the cleaning of auto split merge views.

image.png
image.png
image.png
image.png
尝试处理:在cdb下禁用在2点中启动的任务SM$CLEAN_AUTO_SPLIT_MERGE。
唯一有2点中开始的任务也显示是success的,尝试禁用,无效还原。

SQL> execute dbms_scheduler.disable('SM$CLEAN_AUTO_SPLIT_MERGE');

PL/SQL procedure successfully completed.

检查oracle metalink文章也没有找到符合的情况:
ORA-17503/ORA-27300/ORA-27301/ORA-27302 occurred if connected database via an os user who is not in “oinstall” group (Doc ID 2310640.1)

SP execution fails with error - ORA-29780: Unable To Connect To GPnP Daemon [CLSGPNP_ERR] in RAC Database. (Doc ID 2627124.1)
ORA-27300: OS system dependent operation:open failed with status: 13 (Doc ID 2347696.1)
甚至还尝试了如下操作,修改两节点/oracle/app/19c/grid/gpnp/节点名称/wallets目录权限,无效。(观察几天后还报错所以还原)
https://www.cnblogs.com/yuweijade/p/14026862.html
反复查阅资料发现:
查看ls -alt /tmp/下有个文件.orachk.loc的生成时间是在2:04左右的orachk相关名称的文件,时间上匹配。所以查阅了orachk相关的定时任务:orachk -autostatus果然时间上匹配到了,2:03开始执行。
image.png
再去查阅资料,发现有人也遇到过类似问题。
https://support.oracle.com/epmos/faces/CommunityDisplay?resultUrl=https%3A%2F%2Fcommunity.oracle.com%2Fmosc%2Fdiscussion%2Fcomment%2F16908759&_afrLoop=350147762629898&resultTitle=Re%3A+ora-17503+ora-27301+ora-1017+duration+10+second+at+02%3A04+every+day&commId=&displayIndex=1&_afrWindowMode=0&_adf.ctrl-state=1ajw0ejz6e_231#Comment_16908759

根本原因是run orachk with ORA-17503/ORA-27300/ORA-27301/ORA-27302

https://community.oracle.com/mosc/discussion/4523714/run-orachk-with-ora-17503-ora-27300-ora-27301-ora-27302
手动执行orachk,观察alert输出确定引起错误信息。
由于升级到当前最新版orachk23.2也未解决,并且orachk这个工具并不需要经常执行,一般初次安装的时候用来核查一下健康状况而已,所以暂时停用了该健康检查的任务。计划等后续再出最新版本后尝试升级看是否可修复该bug。

禁用任务命令:orachk -autostop

关于升级orachk请参看另外写的一篇文章:https://www.modb.pro/db/620052

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论