
作者介绍
李伟,北京科讯华通科技发展有限公司 高级工程师


预警内容
Linux通用平台下,Oracle Database - Enterprise Edition 11.2.0.1以及之后的版本,可能发生IPC发送超时/节点逐出等高包重组失败,需要关注并采取预防措施。
预警级别: 高级
现象
Red Hat Enterprise Linux or Oracle Linux running Red-Hat compatible kernel, after upgraded to 6.6, database/node fails with messages:
Errors in file xddv1covd/oracle/diag/rdbms/xrcovd/XRCOVD3/trace/XRCOVD3_lms0_28660.trc: IPC Send timeout detected. Receiver ospid 28670 [oracle@xxxxx (LMS1)] Fri May 01 03:05:53 2015 Errors in file xddv1covd/oracle/diag/rdbms/xrcovd/XRCOVD3/trace/XRCOVD3_lms1_28670.trc: Fri May 01 03:06:00 2015 IPC Send timeout detected. Receiver ospid 31414 [oracle@xxxxx (PZ98)] Fri May 01 03:06:00 2015 Errors in file xddv1covd/oracle/diag/rdbms/xrcovd/XRCOVD3/trace/XRCOVD3_pz98_31414.trc: Fri May 01 03:06:13 2015 IPC Send timeout detected. Receiver ospid 1835 [oracle@xxxxx (PZ97)] |
While this is happening, "netstat" shows huge jump of "packet reassembles failed":
==>> before the issue, the following number is more or less stable or increasing slowly |
其他的一些症状:
1. 节点驱逐
2. instance/node won't join the cluster after instance/node eviction without rebooting the node where "packet reassembles failed" is happening
解决方法
The issue is not fixed at the time of this writing, the temporary workaround is to enable jumbo frame
or
Increase value of below kernel parameter as mentioned below,
net.ipv4.ipfrag_high_thresh = 16M
net.ipv4.ipfrag_low_thresh = 15M
References:
RHEL 6.6: IPC Send timeout/node eviction etc with high packet reassembles failure (Doc ID 2008933.1)




