突然发现应用出现无法从套接字获取更多数据的告警,检查数据,发现生产rac的节点二数据库出现重启,asm日志报错如下:
Thu Apr 11 17:38:58 2019
NOTE: ASM client epay2:epay disconnected unexpectedly.
NOTE: check client alert log.
NOTE: Trace records dumped in trace file /epayrac/grid/crs_base/diag/asm/+asm/+ASM2/trace/+ASM2_ora_27530.trc
Thu Apr 11 17:39:21 2019
NOTE: client epay2:epay registered, osid 5058, mbr 0x2
数据库alert日志见附件
asm进程是受害者,主要是故障前的ORA-07445错误,上传如下3个文件
/epayrac/oracle/diag/rdbms/epay/epay2/incident/incdir_272028/epay2_pmon_27464_i272028.trc
/epayrac/oracle/diag/rdbms/epay/epay2/trace/epay2_pmon_27464.trc
/epayrac/oracle/diag/rdbms/epay/epay2/trace/epay2_diag_27472_20190411173859.trc
评论
有用 0
评论
有用 0
评论
有用 0已上传,麻烦帮忙看看
评论
有用 0节点一出现新的告警信息:
Thu Apr 11 20:35:00 2019
Reconfiguration started (old inc 34, new inc 36)
List of instances:
1 2 (myinst: 1)
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Thu Apr 11 20:35:00 2019
LMS 4: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Thu Apr 11 20:35:00 2019
Thu Apr 11 20:35:00 2019
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Thu Apr 11 20:35:00 2019
LMS 5: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Thu Apr 11 20:35:00 2019
Thu Apr 11 20:35:00 2019
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Thu Apr 11 20:35:01 2019
minact-scn: Master returning as live inst:2 has inc# mismatch instinc:34 cur:36 errcnt:0
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
评论
有用 0同一时间节点二信息:
Thu Apr 11 20:34:59 2019
Reconfiguration started (old inc 34, new inc 36)
List of instances:
1 2 (myinst: 2)
Global Resource Directory frozen
Communication channels reestablished
Thu Apr 11 20:35:00 2019
* domain 0 valid = 1 according to instance 1
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Thu Apr 11 20:35:00 2019
Thu Apr 11 20:35:00 2019
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 3: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Thu Apr 11 20:35:00 2019
Thu Apr 11 20:35:00 2019
Thu Apr 11 20:35:00 2019
LMS 4: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Thu Apr 11 20:35:00 2019
LMS 5: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
评论
有用 0查看实例的启动时间,实例在这个时间点前后并没有重启
评论
有用 0reconguration不一定会重启的,再提供如下日志文件
/epayrac/oracle/diag/rdbms/epay/epay2/incident/incdir_272812/epay2_ora_2485_i272812.trc
评论
有用 0
评论
有用 0已上传trc文件
评论
有用 0从trace看,是一个procedure调用的SQL SERVER的表不存在导致的(透明网关?)
核实对端对象是否存在?权限是否正确?
直接SQL访问测试并处理问题。


评论
有用 0这边确实是有用透明网关的,都是从oracle通过dblink调用sybase上的表的。使用plsql分别连接两个节点进行查询都是没问题的(对象存在,权限正常)。但是从alert日志中也有看到下面的报错:
Errors in file /epayrac/oracle/diag/rdbms/epay/epay2/trace/epay2_reco_5052.trc:
ORA-01017: invalid username/password; logon denied
[Oracle][ODBC Sybase Wire Protocol driver][Sybase ASE]Login Failed. Check for valid user ID, server name and password. {28000,NativeErr = 4002}
ORA-02063: preceding 2 lines from UMPS
ps:调用失败为何会导致节点重启?或者疑似脑裂的现象
评论
有用 0kgxExclusive - kernel generic mutex/communication mutex get in X mode
这个是和解析有关的报错,正是这个导致这些会话报错,随后PMON清理mutex是吧从而重启的实例。
MOS没有类似的信息,也可能是相关的BUG,建议调试那个存储过程或者升级透明网关软件。
BTW:
透明网关设计不同数据库的交换,本身相对不稳定,可能的建议使用ogg等软件复制到本地处理更方便。

评论
有用 0
墨值悬赏

