2020-05-18
RAC1号节点频繁自动重启,1号节点的crsd日志和alert日志均看不出具体原因,但gicpd.log文件里面,一直循环的报错
25M请教各位专家:
我们的情况是这样的,一套RHEL 6.2 服务器上面的11.2.0.3 RAC 数据库,出现如下问题:
- 1号服务器经常自动重启,或者莫名的hang死;
- 故障时间点的OSW监控显示,CPU/MEMORY/DISK等均正常;
- 1号节点的alert日志和crsd日志里面,均看不出明显原因;
- 在crsd当中,发现了如下信息:2020-05-18 08:41:23.282: [GIPCXCPT][2775185152] gipchaInternalResolve: failed to resolve ret gipcretKeyNotFound (36), host ‘xxxxxx’, port ‘5031-eeef-1b4a-9685’
- 于是去查看了gipcd.log,发现日志一直在重复如下信息:
2020-05-18 08:44:59.776: [GIPCDMON][1029220096] gipcdMonitorCssCheck: found node xxxxxxxnode1
2020-05-18 08:44:59.777: [GIPCDMON][1029220096] gipcdMonitorCssCheck: found node xxxxxxxnode2
2020-05-18 08:44:59.777: [GIPCDMON][1029220096] gipcdMonitorCssCheck: updating timeout node xxxxxxxnode2
2020-05-18 08:44:59.777: [GIPCDMON][1029220096] gipcdMonitorCssCheck: updating timeout node xxxxxxxnode2
2020-05-18 08:44:59.777: [GIPCDMON][1029220096] gipcdMonitorFailZombieNodes: skipping live node ‘xxxxxxxnode2’, time 0 ms, endp 0000000000000000, 0000000000000920
2020-05-18 08:44:59.777: [GIPCDMON][1029220096] gipcdMonitorFailZombieNodes: skipping live node ‘xxxxxxxnode2’, time 0 ms, endp 0000000000000000, 00000000000009db
2020-05-18 08:44:59.777: [GIPCDCLT][1033422592] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 0000000000000357
2020-05-18 08:44:59.777: [GIPCDCLT][1033422592] gipcdClientInterfaceMetrics: Received type(gipcdmsgtypeInterfaceMetrics), endp(0000000000000357), len(1032), buf(0x7
fab34266fa8), inf(ip: 300.300.300.5:56171, mask: 255.255.255.0, subnet: 300.300.300.0, mac: , ifname: ) time(0), retry(0), stamp(15), send(15), recv(15)
2020-05-18 08:44:59.778: [GIPCDCLT][1033422592] gipcdClientInterfaceMetrics: enqueue local interface metrics (1) to worklist
2020-05-18 08:45:00.539: [GIPCDCLT][1033422592] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 0000000000000c6d
2020-05-18 08:45:00.539: [GIPCDCLT][1033422592] gipcdClientInterfaceMetrics: Received type(gipcdmsgtypeInterfaceMetrics), endp(0000000000000c6d), len(1032), buf(0x7
fab34266fa8), inf(ip: 300.300.300.5:41064, mask: 255.255.255.0, subnet: 300.300.300.0, mac: , ifname: ) time(0), retry(0), stamp(0), send(0), recv(0)
2020-05-18 08:45:00.539: [GIPCDCLT][1033422592] gipcdClientInterfaceMetrics: enqueue local interface metrics (1) to worklist
2020-05-18 08:45:02.916: [GIPCDCLT][1033422592] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 0000000000000129
2020-05-18 08:45:02.916: [GIPCDCLT][1033422592] gipcdClientInterfaceMetrics: Received type(gipcdmsgtypeInterfaceMetrics), endp(0000000000000129), len(1032), buf(0x7
fab34266fa8), inf(ip: 300.300.300.5:10654, mask: 255.255.255.0, subnet: 300.300.300.0, mac: , ifname: ) time(10), retry(0), stamp(3), send(3), recv(3)
2020-05-18 08:45:02.916: [GIPCDCLT][1033422592] gipcdClientInterfaceMetrics: enqueue local interface metrics (1) to worklist
2020-05-18 08:45:03.342: [GIPCDCLT][1033422592] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 000000000000088b
2020-05-18 08:45:03.342: [GIPCDCLT][1033422592] gipcdClientInterfaceMetrics: Received type(gipcdmsgtypeInterfaceMetrics), endp(000000000000088b), len(1032), buf(0x7
fab340b3398), inf(ip: 300.300.300.5:34596, mask: 255.255.255.0, subnet: 300.300.300.0, mac: , ifname: ) time(0), retry(0), stamp(0), send(0), recv(0)
2020-05-18 08:45:03.342: [GIPCDCLT][1033422592] gipcdClientInterfaceMetrics: enqueue local interface metrics (1) to worklist
2020-05-18 08:45:04.037: [ CLSINET][1029220096] Returning NETDATA: 1 interfaces
2020-05-18 08:45:04.037: [ CLSINET][1029220096] # 0 Interface ‘bond0’,ip=‘300.300.300.5’,mac=‘00-e0-ed-28-80-d0’,mask=‘255.255.255.0’,net=‘300.300.300.0’,use=‘cluster_int
erconnect’
2020-05-18 08:45:04.777: [GIPCDCLT][1033422592] gipcdClientThread: req from local client of type gipcdmsgtypeInterfaceMetrics, endp 0000000000000408
2020-05-18 08:45:04.778: [GIPCDCLT][1033422592] gipcdClientInterfaceMetrics: Received type(gipcdmsgtypeInterfaceMetrics), endp(0000000000000408), len(1032), buf(0x7
fab340b3398), inf(ip: 300.300.300.5:32930, mask: 255.255.255.0, subnet: 300.300.300.0, mac: , ifname: ) time(0), retry(0), stamp(0), send(0), recv(0)
2020-05-18 08:45:04.778: [GIPCDCLT][1033422592] gipcdClientInterfaceMetrics: enqueue local interface metrics (1)
我一直怀疑是两个节点的私网通信有问题,但是OSW监控显示一直正常,一直到节点重启的时候才无法连通,而且也能正常ping通,ssh互连等等。
各位专家有没有谁有分析思路的?
我来答
添加附件
收藏
分享
问题补充
11条回答
默认
最新
回答交流
提交
问题信息
请登录之后查看
附件列表
请登录之后查看
邀请回答
暂无人订阅该标签,敬请期待~~
墨值悬赏

评论

