暂无图片
暂无图片
9
暂无图片
暂无图片
暂无图片

Oracle RAC 再遇 MTU 坑:cssd 无法启动!

原创 Lucifer三思而后行 2025-11-11
861

前言

最近部署了一套 Oracle 19C RAC 环境,安装完成后关机打包发往异地机房。上架后发现其中一个节点无法正常启动,集群卡在 cssd 资源启动阶段。心跳网络采用光纤连接,通过交换机的独立 VLAN,节点间可以正常 ping 通,但是 RAC 只能单节点运行。经过深入分析,最终定位到是 MTU 配置问题导致的故障。

本文记录详细的问题分析过程和解决方案。

问题现象

Oracle RAC 中一个节点启动失败,集群状态显示:

[root@orcl02:/root]# crsctl stat res -t -init -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE OFFLINE STABLE ora.cluster_interconnect.haip 1 ONLINE OFFLINE STABLE ora.crf 1 ONLINE ONLINE orcl02 STABLE ora.crsd 1 ONLINE OFFLINE STABLE ora.cssd 1 ONLINE OFFLINE orcl02 STARTING ora.cssdmonitor 1 ONLINE ONLINE orcl02 STABLE ora.ctssd 1 ONLINE OFFLINE STABLE ora.diskmon 1 OFFLINE OFFLINE STABLE ora.drivers.acfs 1 ONLINE ONLINE orcl02 STABLE ora.evmd 1 ONLINE INTERMEDIATE orcl02 STABLE ora.gipcd 1 ONLINE ONLINE orcl02 STABLE ora.gpnpd 1 ONLINE ONLINE orcl02 STABLE ora.mdnsd 1 ONLINE ONLINE orcl02 STABLE ora.storage 1 ONLINE OFFLINE STABLE

集群在 cssd 资源启动阶段 HANG 住。

检查 CRS 日志 ($ORACLE_BASE/diag/crs/orcl02/crs/trace/alert.log):

2025-11-11 12:21:57.020 [GIPCD(127382)]CRS-7517: The Oracle Grid Interprocess Communication (GIPC) failed to identify the Fast Node Death Detection (FNDD). 2025-11-11 12:23:07.241 [OCSSD(130761)]CRS-1621: The IPMI configuration data for this node stored in the Oracle registry is incomplete; details at (:CSSNK00002:) in /u01/app/grid/diag/crs/orcl02/crs/trace/ocssd.trc 2025-11-11 12:23:07.241 [OCSSD(130761)]CRS-1617: The information required to do node kill for node orcl02 is incomplete; details at (:CSSNM00004:) in /u01/app/grid/diag/crs/orcl02/crs/trace/ocssd.trc 2025-11-11 12:23:37.324 [OCSSD(130761)]CRS-7500: The Oracle Grid Infrastructure process 'ocssd' failed to establish Oracle Grid Interprocess Communication (GIPC) high availability connection with remote node 'orcl01'. 2025-11-11 12:28:39.020 [OCSSD(130761)]CRS-7500: The Oracle Grid Infrastructure process 'ocssd' failed to establish Oracle Grid Interprocess Communication (GIPC) high availability connection with remote node 'orcl01'. 2025-11-11 12:32:58.470 [OCSSD(130761)]CRS-1609: This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00086:) in /u01/app/grid/diag/crs/orcl02/crs/trace/ocssd.trc. 2025-11-11 12:32:58.469 [CSSDAGENT(130651)]CRS-5818: Aborted command 'start' for resource 'ora.cssd'. Details at (:CRSAGF00113:) {0:5:4} in /u01/app/grid/diag/crs/orcl02/crs/trace/ohasd_cssdagent_root.trc. 2025-11-11 12:32:58.506 [OHASD(126708)]CRS-2757: Command 'Start' timed out waiting for response from the resource 'ora.cssd'. Details at (:CRSPE00221:) {0:5:4} in /u01/app/grid/diag/crs/orcl02/crs/trace/ohasd.trc. 2025-11-11 12:32:59.470 [OCSSD(130761)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/grid/diag/crs/orcl02/crs/trace/ocssd.trc 2025-11-11 12:32:59.470 [OCSSD(130761)]CRS-1603: CSSD on node orcl02 has been shut down. 2025-11-11 12:33:00.151 [OCSSD(130761)]CRS-1609: This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00086:) in /u01/app/grid/diag/crs/orcl02/crs/trace/ocssd.trc. 2025-11-11T12:33:04.480316+08:00 Errors in file /u01/app/grid/diag/crs/orcl02/crs/trace/ocssd.trc (incident=17): CRS-8503 [] [] [] [] [] [] [] [] [] [] [] [] Incident details in: /u01/app/grid/diag/crs/orcl02/crs/incident/incdir_17/ocssd_i17.trc 2025-11-11 12:33:04.471 [OCSSD(130761)]CRS-8503: Oracle Clusterware process OCSSD with operating system process ID 130761 experienced fatal signal or exception code 6.

日志明确显示节点 orcl02 无法与节点 orcl01 建立 GIPC 高可用连接。

进一步分析 cssd 日志 ($ORACLE_BASE/diag/crs/orcl02/crs/trace/ocssd.trc):

2025-11-11 13:07:10.563 : CSSD:909854464: [ INFO] clssnmvDHBValidateNCopy: node 1, orcl01, has a disk HB, but no network HB, DHB has rcfg 658329638, wrtcnt, 36406, LATS 5334194, lastSeqNo 36403, uniqueness 1762834827, timestamp 1762837626/5323124 2025-11-11 13:07:10.564 : CSSD:897136384: [ INFO] clssscSelect: gipcwait returned with status gipcretTimeout (16)

日志表明心跳网络存在通信问题。

问题分析

初步排查

首先检查基础网络连通性:

[root@orcl02:/tmp/mcasttest]# ping orcl01-priv PING orcl01-priv (1.1.1.1) 56(84) bytes of data. 64 bytes from orcl01-priv (1.1.1.1): icmp_seq=1 ttl=64 time=0.053 ms 64 bytes from orcl01-priv (1.1.1.1): icmp_seq=2 ttl=64 time=0.044 ms 64 bytes from orcl01-priv (1.1.1.1): icmp_seq=3 ttl=64 time=0.116 ms

心跳 IP 可以正常 ping 通,防火墙已关闭。

MOS 文档参考

查询 MOS 文档,发现类似案例:OCI DBCS : Failed to start CRS on first RAC node - (GIPC) failed to identify the Fast Node Death Detection (FNDD). (Doc ID 2969313.1)

该案例指出问题根源在于节点间 MTU 配置不一致:

MTU 配置检查

检查两个节点的 MTU 配置:

## 节点1 [root@orcl01:/home/grid]# ifconfig bond1|grep mtu bond1: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 9000 ## 节点2 [root@orcl02:/home/grid]$ ifconfig bond1|grep mtu bond1: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 9000

节点间的 MTU 配置一致,均为 9000。

CVU 检测

使用 Oracle Cluster Verification Utility (CVU) 验证一下 RAC 集群节点间的网络连通性:

[grid@orcl01:/home/grid]$ cluvfy comp nodecon -n all -verbose Performing following verification checks ... Node Connectivity ... Hosts File ... Node Name Status ------------------------------------ ------------------------ orcl01 passed orcl02 passed Hosts File ...PASSED Interface information for node "orcl02" Name IP Address Subnet Gateway Def. Gateway HW Address MTU ------ --------------- --------------- --------------- --------------- ----------------- ------ bond0 192.168.6.206 192.168.6.0 0.0.0.0 192.168.6.1 8C:84:74:75:EF:00 1500 bond1 1.1.1.2 1.1.1.0 0.0.0.0 192.168.6.1 8C:84:74:DA:B8:E0 9000 Interface information for node "orcl01" Name IP Address Subnet Gateway Def. Gateway HW Address MTU ------ --------------- --------------- --------------- --------------- ----------------- ------ bond0 192.168.6.205 192.168.6.0 0.0.0.0 192.168.6.1 8C:84:74:75:F2:00 1500 bond0 192.168.6.207 192.168.6.0 0.0.0.0 192.168.6.1 8C:84:74:75:F2:00 1500 bond0 192.168.6.209 192.168.6.0 0.0.0.0 192.168.6.1 8C:84:74:75:F2:00 1500 bond0 192.168.6.208 192.168.6.0 0.0.0.0 192.168.6.1 8C:84:74:75:F2:00 1500 bond1 1.1.1.1 1.1.1.0 0.0.0.0 192.168.6.1 8C:84:74:DA:D3:10 9000 Check: MTU consistency on the private interfaces of subnet "1.1.1.0" Node Name IP Address Subnet MTU ---------------- ------------ ------------ ------------ ---------------- orcl02 bond1 1.1.1.2 1.1.1.0 9000 orcl01 bond1 1.1.1.1 1.1.1.0 9000 Check: MTU consistency of the subnet "192.168.6.0". Node Name IP Address Subnet MTU ---------------- ------------ ------------ ------------ ---------------- orcl02 bond0 192.168.6.206 192.168.6.0 1500 orcl01 bond0 192.168.6.205 192.168.6.0 1500 orcl01 bond0 192.168.6.207 192.168.6.0 1500 orcl01 bond0 192.168.6.209 192.168.6.0 1500 orcl01 bond0 192.168.6.208 192.168.6.0 1500 Source Destination Connected? ------------------------------ ------------------------------ ---------------- orcl01[bond0:192.168.6.205] orcl02[bond0:192.168.6.206] yes orcl01[bond0:192.168.6.205] orcl01[bond0:192.168.6.207] yes orcl01[bond0:192.168.6.205] orcl01[bond0:192.168.6.209] yes orcl01[bond0:192.168.6.205] orcl01[bond0:192.168.6.208] yes orcl02[bond0:192.168.6.206] orcl01[bond0:192.168.6.207] yes orcl02[bond0:192.168.6.206] orcl01[bond0:192.168.6.209] yes orcl02[bond0:192.168.6.206] orcl01[bond0:192.168.6.208] yes orcl01[bond0:192.168.6.207] orcl01[bond0:192.168.6.209] yes orcl01[bond0:192.168.6.207] orcl01[bond0:192.168.6.208] yes orcl01[bond0:192.168.6.209] orcl01[bond0:192.168.6.208] yes Source Destination Connected? ------------------------------ ------------------------------ ---------------- orcl01[bond1:1.1.1.1] orcl02[bond1:1.1.1.2] yes Check that maximum (MTU) size packet goes through subnet ...FAILED (PRVG-12885, PRVG-12884, PRVG-2043) subnet mask consistency for subnet "192.168.6.0" ...PASSED subnet mask consistency for subnet "1.1.1.0" ...PASSED Node Connectivity ...FAILED (PRVG-12885, PRVG-12884, PRVG-2043) Multicast or broadcast check ... Checking subnet "1.1.1.0" for multicast communication with multicast group "224.0.0.251" Multicast or broadcast check ...PASSED Verification of node connectivity was unsuccessful on all the specified nodes. Failures were encountered during execution of CVU verification request "node connectivity". Node Connectivity ...FAILED Check that maximum (MTU) size packet goes through subnet ...FAILED PRVG-12885 : ICMP packet of MTU size "9000" does not go through subnet "1.1.1.0". PRVG-12884 : Maximum (MTU) size packet check failed on subnets "1.1.1.0" orcl01: PRVG-2043 : Command "/bin/ping 1.1.1.2 -c 1 -w 3 -M do -s 8972 " failed on node "orcl01" and produced the following output: PING 1.1.1.2 (1.1.1.2) 8972(9000) bytes of data. --- 1.1.1.2 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2074ms CVU operation performed: node connectivity Date: Nov 11, 2025 1:10:21 PM CVU version: 19.28.0.0.0 (070125x8664) Clusterware version: 19.0.0.0.0 CVU home: /u01/app/19.3.0/grid Grid home: /u01/app/19.3.0/grid User: grid Operating system: Linux4.18.0-553.el8_10.x86_64

CVU 检测发现 MTU 为 9000 的数据包无法通过子网:

Node Connectivity ...FAILED Check that maximum (MTU) size packet goes through subnet ...FAILED PRVG-12885 : ICMP packet of MTU size "9000" does not go through subnet "1.1.1.0". PRVG-12884 : Maximum (MTU) size packet check failed on subnets "1.1.1.0" orcl01: PRVG-2043 : Command "/bin/ping 1.1.1.2 -c 1 -w 3 -M do -s 8972 " failed on node "orcl01" and produced the following output: PING 1.1.1.2 (1.1.1.2) 8972(9000) bytes of data. --- 1.1.1.2 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2074ms

MTU 测试验证

使用 /bin/ping 1.1.1.2 -c 1 -w 3 -M do -s 8972 进行 MTU 大小测试:

# 测试 9000 MTU - 失败 [grid@orcl01:/home/grid]$ ping 1.1.1.2 -c 3 -M do -s 8972 PING 1.1.1.2 (1.1.1.2) 8972(9000) bytes of data. ^C --- 1.1.1.2 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1021ms # 测试 1500 MTU - 成功 [grid@orcl01:/home/grid]$ ping 1.1.1.2 -c 3 -M do -s 1472 PING 1.1.1.2 (1.1.1.2) 1472(1500) bytes of data. 1480 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.177 ms 1480 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.127 ms ^C --- 1.1.1.2 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1032ms rtt min/avg/max/mdev = 0.127/0.152/0.177/0.025 ms

测试结果表明:MTU 1500 通信正常,但 MTU 9000 出现 100% 丢包。经与网络工程师确认,交换机未配置 Jumbo Frame 支持,破案了!

解决方案

交换机配置

要求网络工程师在交换机上启用 Jumbo Frame 支持,配置 MTU 大于 9000。交换机配置完成后,重新测试:

[grid@orcl01:/home/grid]$ /bin/ping 1.1.1.2 -c 1 -w 3 -M do -s 8972 PING 1.1.1.2 (1.1.1.2) 8972(9000) bytes of data. 8980 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.232 ms --- 1.1.1.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.232/0.232/0.232/0.000 ms

MTU 9000 通信测试成功。

重启集群服务

重启两个节点的集群服务:

crsctl stop crs -f crsctl start crs

检查集群状态:

[root@orcl01:/soft]# crsctl stat res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.LISTENER.lsnr ONLINE ONLINE orcl01 STABLE ONLINE ONLINE orcl02 STABLE ora.chad ONLINE ONLINE orcl01 STABLE ONLINE ONLINE orcl02 STABLE ora.net1.network ONLINE ONLINE orcl01 STABLE ONLINE ONLINE orcl02 STABLE ora.ons ONLINE ONLINE orcl01 STABLE ONLINE ONLINE orcl02 STABLE ora.proxy_advm OFFLINE OFFLINE orcl01 STABLE OFFLINE OFFLINE orcl02 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.ARCH.dg(ora.asmgroup) 1 ONLINE ONLINE orcl01 STABLE 2 ONLINE ONLINE orcl02 STABLE ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup) 1 ONLINE ONLINE orcl01 STABLE 2 ONLINE ONLINE orcl02 STABLE ora.DATA.dg(ora.asmgroup) 1 ONLINE ONLINE orcl01 STABLE 2 ONLINE ONLINE orcl02 STABLE ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE orcl02 STABLE ora.OCR.dg(ora.asmgroup) 1 ONLINE ONLINE orcl01 STABLE 2 ONLINE ONLINE orcl02 STABLE ora.asm(ora.asmgroup) 1 ONLINE ONLINE orcl01 Started,STABLE 2 ONLINE ONLINE orcl02 Started,STABLE ora.asmnet1.asmnetwork(ora.asmgroup) 1 ONLINE ONLINE orcl01 STABLE 2 ONLINE ONLINE orcl02 STABLE ora.cvu 1 ONLINE ONLINE orcl02 STABLE ora.orcl.db 1 ONLINE ONLINE orcl01 Open,HOME=/u01/app/o racle/product/19.3.0 /db,STABLE 2 ONLINE ONLINE orcl02 Open,HOME=/u01/app/o racle/product/19.3.0 /db,STABLE ora.qosmserver 1 ONLINE ONLINE orcl02 STABLE ora.scan1.vip 1 ONLINE ONLINE orcl02 STABLE ora.orcl01.vip 1 ONLINE ONLINE orcl01 STABLE ora.orcl02.vip 1 ONLINE ONLINE orcl02 STABLE --------------------------------------------------------------------------------

集群所有资源正常启动,问题解决。

MTU 配置原理

心跳网卡的 MTU 默认是 1500,交换机的默认 MTU 是 1500,当在系统层面修改网卡配置 MTU 为 9000 之后,交换机没有配置,这时候就会无法进行通信,100% 丢包。

为什么 Oracle 建议配置心跳网卡 MTU 为 9000?

可以参考 MOS 文档:Recommendation for the Real Application Cluster Interconnect and Jumbo Frames (Doc ID 341788.1)

配置巨型帧的优势

  1. 减少协议开销:降低 TCP、UDP 和以太网头部开销
  2. 提升吞吐量:避免数据包分片,提高传输效率
  3. 降低延迟:减少缓冲区传输次数,缩短 Oracle 块传输延迟
  4. CPU 优化:在 CPU 受限场景中显著提升性能

修改私网 MTU 配置可参考 MOS 文档:如何在 oracle 集群环境下修改私网信息 (Doc ID 2103317.1)。

写在最后

本次故障的根本原因是端到端的 MTU 配置不一致。虽然节点操作系统层面配置了 9000 字节的 MTU,但中间网络设备的 MTU 仍保持默认的 1500 字节,导致大数据包传输失败。


📚 推荐阅读:DBA 学习之路

如果这篇文章对你有帮助,推荐访问我的 Oracle DBA 系统学习站点,涵盖 100 天完整学习路线:

  • 🔧 Oracle 安装部署 · RMAN 备份恢复 · Data Pump 数据迁移
  • 🏗️ RAC 高可用 · DataGuard 容灾 · 多租户架构
  • 🔍 故障排查 · 升级迁移 · GoldenGate 数据同步

👉 立即访问 ora100.com →

最后修改时间:2026-04-19 15:21:08
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论