暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

11.2.0.3 CRS start slow and cssd.log show 'Msg-reply has soap fault 10' 案例

原创 Anbob 2016-11-01
760
有一套11.2的RAC环境遇到了很怪的问题,每次CRS重启节点1的CRS都要在30分钟左右才能启动,期间只能看到就是在GPNP启动阶段循环的尝试,和MOS中bug 12356910很像但是安装补丁后问题并未解决,也尝试增加了该进程的日志级别(crsctl set log gpnp "GPNP=5")均未找到问题根源,等CRS启动后使用gpnptool find也可以取到其它节点列表的未发现同名的CLUSTER,并对CRS做了Deconfigure/Reconfigure,以至于最后换机器重装CRS重启挂臷DB,测试阶段正常,但是只要更改到生产之前所用的IP后问题就会再次重现, 之前曾见过有一相似案例最后是使用在网络层用VLAN后解决,因这个环境的网络复杂VLAN界限难以明确,并且后期会埋下雷,环境版本ORACLE 11.2.0.3 2NODES RAC ON HPUX IA 11.31,当前版本已过ORACLE的服务期无法再联系开发定位BUG, 最佳方法可能是升级定位或更换IP, GPNP是通常使用PUBLIC IP发送UDP广播包在同一网域内如果有装有ORACLE CRS的mDNS服务(mdnsd.bin)解析本机的cluster_id and cluster_name, 来寻找同一CLUSTER的其它节点。理论修改PUBLIC IP应该就可以,但是应用前期连接数据库存在使用public IP的中件间,而且短时间内无法梳理并修改, 如何及解决CRS启动慢的问题又可以避免中间件或为中间件争取时间梳理? 下面是我的一种方案。
先附段CRS的错误日志, 下面是用我们的脚本格式化后的同时段的所有日志文件的集合片段
2016-09-21 00:26:06.870@./ohasd/ohasd.log [   CRSPE][29] {073} State change received from qdyya1 for ora.diskmon 1 1
2016-09-21 00:26:06.871@./agent/ohasd/orarootagent_root/orarootagent_root.log [ AGFW][10] {073} Agent is exiting with exit code 1
2016-09-21 00:26:06.871@./agent/ohasd/orarootagent_root/orarootagent_root.log [ AGFW][10] {073} Agent is shutting down.
2016-09-21 00:26:06.871@./ohasd/ohasd.log [ CRSPE][29] {073} RI [ora.diskmon 1 1] new external state [OFFLINE] old value [ONLINE] on qdyya1 label = []
2016-09-21 00:26:06.871@./ohasd/ohasd.log [ CRSPE][29] {073} RI [ora.diskmon 1 1] new target state [OFFLINE] old value [ONLINE]
2016-09-21 00:26:06.872@./ohasd/ohasd.log [ CRSPE][29] {073} Processing unplanned state change for [ora.diskmon 1 1]
2016-09-21 00:26:06.872@./ohasd/ohasd.log [ CRSOCR][27] {073} Multi Write Batch processing...
2016-09-21 00:26:06.875@./ohasd/ohasd.log [ CRSOCR][27] {073} Multi Write Batch done.
2016-09-21 00:26:06.879@./ohasd/ohasd.log [ CRSPE][29] {073} Failover cannot be completed for [ora.diskmon 1 1]. Stopping it and the resource tree
2016-09-21 00:26:06.879@./ohasd/ohasd.log [ CRSPE][29] {073} Target is not ONLINE, not recovering [ora.diskmon 1 1]
2016-09-21 00:26:06.880@./ohasd/ohasd.log [ CRSPE][29] {073} Op 600000000260ac20 has 4 WOs
2016-09-21 00:26:06.887@./ohasd/ohasd.log [ CRSPE][29] {073} ICE has queued an operation. Details Operation [STOP of [ora.diskmon 1 1] on [qdyya1] 600000000260ac20] cannot run cause it needs R lock for
2016-09-21 00:26:06.985@./ohasd/ohasd.log [ CRSCOMM][21] Ipc Client disconnected.
2016-09-21 00:26:06.985@./ohasd/ohasd.log [ CRSCOMM][21] IpcL connection to member 7 has been removed
2016-09-21 00:26:06.985@./ohasd/ohasd.log [ CRSCOMM][21][FFAIL] Ipc Couldnt clscreceive message, no message 11
2016-09-21 00:26:06.985@./ohasd/ohasd.log [ CRSCOMM][21][FFAIL] IpcL Listener got clsc error 11 for memNum. 7
2016-09-21 00:26:06.985@./ohasd/ohasd.log [CLSFRAME][21] Disconnected from AGENT process {Relative|Node0|Process7|Type3}
2016-09-21 00:26:06.985@./ohasd/ohasd.log [CLSFRAME][21] Removing IPC Member{Relative|Node0|Process7|Type3}
2016-09-21 00:26:06.986@./ohasd/ohasd.log [ AGFW][24] {0090} /oracle/app/11.2.0.3/grid/bin/orarootagent_root disconnected.
2016-09-21 00:26:06.986@./ohasd/ohasd.log [ AGFW][24] {0090} Agent /oracle/app/11.2.0.3/grid/bin/orarootagent_root[6161] stopped!
2016-09-21 00:26:06.986@./ohasd/ohasd.log [ AGFW][24] {0090} Agfw Proxy Server received process disconnected notification, count=1
2016-09-21 00:26:06.986@./ohasd/ohasd.log [ CRSPE][29] {0088} Disconnected from server
2016-09-21 00:26:06.986@./ohasd/ohasd.log [ CRSCOMM][24] {0090} IpcL removeConnection Member 7 does not exist.
2016-09-21 00:26:07.179@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:08.241@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:08.310@./gpnpd/gpnpd.log [ OCRMSG][3]GIPC error [29] msg [gipcretConnectionRefused]
2016-09-21 00:26:09.189@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:10.262@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:11.199@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:12.281@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:13.219@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:14.301@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:15.229@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:16.321@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:17.239@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:18.341@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:19.249@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:20.361@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:21.260@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:22.382@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:22.452@./gpnpd/gpnpd.log [ OCRMSG][3]GIPC error [29] msg [gipcretConnectionRefused]
2016-09-21 00:26:23.279@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:24.401@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:25.289@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:26.421@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:27.299@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:28.441@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:29.309@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:30.461@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"]
2016-09-21 00:26:30.963@./agent/ohasd/oraagent_grid/oraagent_grid.l01 [ora.mdnsd][8] {002} [check] clsdmc_respget return status=0, ecode=0
...--- log show 尝试半小时后最终放弃尝试,启动CRS

其实方法很简单:
因为CRS中对于PUBLIC IP记录的是网卡名,所以给原网卡修改成新IP, 再增加一物理网卡并修改为原PUBLIC IP, 新IP用于CRS PUBLIC IP, 原IP用于手动增加ORACLE LISTENER为原应用提示服务。 下面是在我虚拟机的测试。
因为只更改PUBLIC IP在同一网卡和subnet,CRS不需要任何修改,只是在OS层修改IP前后重启一次CRS.以下只操作节点1.步骤如下:
1. Shutdown Oracle Clusterware stack
2. Modify the new IP address at network layer, DNS and /etc/hosts file to reflect the change
3. Add a new network interface and set original IP
4. Restart Oracle Clusterware stack
5. Add an additional listener to Oracle manually with original IP
192.168.1.116 node1 node1.anbob.com --更换为192.168.1.16
192.168.1.126 node2 node2.anbob.com
192.168.1.216 node1-vip
192.168.1.226 node2-vip
172.168.1.116 node1-priv
172.168.1.126 node2-priv
192.168.1.200 anbob-cluster anbob-cluster-scan

[root@node1 ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 08:00:27:5F:EC:1A
inet addr:192.168.1.116 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe5f:ec1a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:17081 errors:0 dropped:0 overruns:0 frame:0
TX packets:13480 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1405616 (1.3 MiB) TX bytes:6599214 (6.2 MiB)
eth0:1 Link encap:Ethernet HWaddr 08:00:27:5F:EC:1A
inet addr:192.168.1.216 Bcast:192.168.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth1 Link encap:Ethernet HWaddr 08:00:27:93:EE:5E
inet addr:172.168.1.116 Bcast:172.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe93:ee5e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:27881 errors:0 dropped:0 overruns:0 frame:0
TX packets:12911 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:15779267 (15.0 MiB) TX bytes:3670283 (3.5 MiB)
eth1:1 Link encap:Ethernet HWaddr 08:00:27:93:EE:5E
inet addr:169.254.193.48 Bcast:169.254.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth2 Link encap:Ethernet HWaddr 08:00:27:73:D2:CE
inet addr:192.168.56.20 Bcast:192.168.56.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe73:d2ce/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3014 errors:0 dropped:0 overruns:0 frame:0
TX packets:1794 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:272606 (266.2 KiB) TX bytes:304859 (297.7 KiB)
eth3 Link encap:Ethernet HWaddr 08:00:27:7C:D7:DA
inet addr:192.168.1.16 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe7c:d7da/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:16960 errors:0 dropped:0 overruns:0 frame:0
TX packets:268 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1385423 (1.3 MiB) TX bytes:36566 (35.7 KiB)

Note:
eth0 用于public ip, eth1 用于cluster_interconnect(private ip),eth2 是我的物理机与虚拟的专用网(可以忽略),eth3 是新增加的物理网卡.
[root@node1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[grid@node1 ~]$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.3.0]
[grid@node1 ~]$ crsctl query crs releaseversion
Oracle High Availability Services release version on the local node is [11.2.0.3.0]

[grid@node1 ~]$ oifcfg getif
eth0 192.168.1.0 global public
eth1 172.168.1.0 global cluster_interconnect
[grid@node1 ~]$ srvctl config nodeapps -a
Network exists: 1/192.168.1.0/255.255.255.0/eth0, type static
VIP exists: /node1-vip/192.168.1.216/192.168.1.0/255.255.255.0/eth0, hosting node node1
VIP exists: /node2-vip/192.168.1.226/192.168.1.0/255.255.255.0/eth0, hosting node node2
[grid@node1 ~]$ lsnrctl status
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 01-NOV-2016 15:18:18
Copyright (c) 1991, 2011, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date 01-NOV-2016 15:15:46
Uptime 0 days 0 hr. 2 min. 34 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/11.2.0.3/grid/network/admin/listener.ora
Listener Log File /u01/app/grid/diag/tnslsnr/node1/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.116)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.216)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM1", status READY, has 1 handler(s) for this service...
The command completed successfully
[root@node1 ~]$ crsctl stop crs

Note:
节点2不用修改,只在节点1增加新网卡并停到CRS后,主机上就可以修改交换eth0和eth3的IP地址。
[root@node1 ~]# ifdown eth3
[root@node1 ~]# ifdown eth0
[root@node1 ~]# ifup eth0
[root@node1 ~]# ifup eth3
[root@node1 ~]# ifconfig
eth0 Link encap:Ethernet HWaddr 08:00:27:5F:EC:1A
inet addr:192.168.1.16 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe5f:ec1a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:40411 errors:0 dropped:0 overruns:0 frame:0
TX packets:31081 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3345829 (3.1 MiB) TX bytes:14142844 (13.4 MiB)
eth1 Link encap:Ethernet HWaddr 08:00:27:93:EE:5E
inet addr:172.168.1.116 Bcast:172.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe93:ee5e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:64788 errors:0 dropped:0 overruns:0 frame:0
TX packets:35273 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:36053768 (34.3 MiB) TX bytes:10979476 (10.4 MiB)
eth2 Link encap:Ethernet HWaddr 08:00:27:73:D2:CE
inet addr:192.168.56.20 Bcast:192.168.56.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe73:d2ce/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6400 errors:0 dropped:0 overruns:0 frame:0
TX packets:3946 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:560483 (547.3 KiB) TX bytes:698482 (682.1 KiB)
eth3 Link encap:Ethernet HWaddr 08:00:27:7C:D7:DA
inet addr:192.168.1.116 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe7c:d7da/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:42434 errors:0 dropped:0 overruns:0 frame:0
TX packets:498 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3363456 (3.2 MiB) TX bytes:71937 (70.2 KiB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:35592 errors:0 dropped:0 overruns:0 frame:0
TX packets:35592 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:30547907 (29.1 MiB) TX bytes:30547907 (29.1 MiB)
vi /etc/hosts
# 192.168.1.116 node1 node1.anbob.com
192.168.1.16 node1 node1.anbob.com
192.168.1.126 node2 node2.anbob.com
192.168.1.216 node1-vip
192.168.1.226 node2-vip
172.168.1.116 node1-priv
172.168.1.126 node2-priv
192.168.1.200 anbob-cluster anbob-cluster-scan
[root@node1 ~]$ crsctl start crs
[grid@node1 ~]$ lsnrctl status
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 01-NOV-2016 15:47:26
Copyright (c) 1991, 2011, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias LISTENER
Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date 01-NOV-2016 15:47:21
Uptime 0 days 0 hr. 0 min. 18 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/11.2.0.3/grid/network/admin/listener.ora
Listener Log File /u01/app/11.2.0.3/grid/log/diag/tnslsnr/node1/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.16)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.216)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM1", status READY, has 1 handler(s) for this service...
The command completed successfully

Note:
在主机层修改完IP地址后,并修改了/etc/hosts后,启动CRS, 监听的IP地址已使用了新的Public IP,如果有使用静态注册,记的手动修改参数文件。
手动增加原有IP的监听并静态注册,把下面的附加到listener.ora文件中,注意我这里叫listener2.

SID_LIST_LISTENER2 =
(SID_LIST =
(SID_DESC =
(GLOBAL_DBNAME = anbob)
(ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1) # this is DB oracle home,no GI
(SID_NAME = anbob1)
)
)
LISTENER2 =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.116)(PORT = 1521)(IP = FIRST))
)
)
[grid@node1 admin]$ lsnrctl start listener2
LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 01-NOV-2016 19:29:28
Copyright (c) 1991, 2011, Oracle. All rights reserved.
Starting /u01/app/11.2.0.3/grid/bin/tnslsnr: please wait...
TNSLSNR for Linux: Version 11.2.0.3.0 - Production
System parameter file is /u01/app/11.2.0.3/grid/network/admin/listener.ora
Log messages written to /u01/app/11.2.0.3/grid/log/diag/tnslsnr/node1/listener2/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.116)(PORT=1521)))
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.116)(PORT=1521)(IP=FIRST)))
STATUS of the LISTENER
------------------------
Alias listener2
Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date 01-NOV-2016 19:29:29
Uptime 0 days 0 hr. 0 min. 2 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/11.2.0.3/grid/network/admin/listener.ora
Listener Log File /u01/app/11.2.0.3/grid/log/diag/tnslsnr/node1/listener2/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.116)(PORT=1521)))
Services Summary...
Service "anbob" has 1 instance(s).
Instance "anbob1", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

Note:
手动增加了新的listener名为Listener2,监听在原PUBLIC IP和相同的端口上,注意这里我们不会把该监听注册到CRS中,因为我只在节点1上增加了新的网卡,并且如果增加新的监听资源需要增加新的network资源,担心CRS又使用原PUBLIC IP出现CRS启动的问题,所以后续需要我们自己写shell维护该监听。 下面从客户端测试连接。
[grid@node1 admin]$ ps -ef|grep lsnr
grid 4070 1 0 19:29 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr listener2 -inherit
grid 5439 1 0 19:43 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
grid 5476 1 0 19:43 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER -inherit
[grid@node1 admin]$ netstat -an|grep 1521
tcp 0 0 192.168.1.216:1521 0.0.0.0:* LISTEN
tcp 0 0 192.168.1.16:1521 0.0.0.0:* LISTEN
tcp 0 0 192.168.1.200:1521 0.0.0.0:* LISTEN
tcp 0 0 192.168.1.116:1521 0.0.0.0:* LISTEN
tcp 0 0 192.168.1.216:1521 192.168.1.216:58796 ESTABLISHED
tcp 0 0 192.168.1.216:58796 192.168.1.216:1521 ESTABLISHED
tcp 0 0 192.168.1.200:1521 192.168.1.200:60114 ESTABLISHED
tcp 0 0 192.168.1.216:58825 192.168.1.216:1521 ESTABLISHED
tcp 0 0 192.168.1.200:60114 192.168.1.200:1521 ESTABLISHED
tcp 0 0 192.168.1.216:1521 192.168.1.216:58825 ESTABLISHED
unix 3 [ ] STREAM CONNECTED 61521
# Tnsnames.ora 中增加
# new public ip
anbob16=
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=TCP) (HOST=192.168.1.16) (PORT=1521) )
)
(CONNECT_DATA=
(SERVER = DEDICATED)
(SERVICE_NAME=anbob)
)
)

# original public ip
anbob116=
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=TCP) (HOST=192.168.1.116) (PORT=1521) )
)
(CONNECT_DATA=
(SERVER = DEDICATED)
(SERVICE_NAME=anbob)
)
)
[oracle@node1 admin]$ sqlplus anbob/anbob@anbob16
SQL*Plus: Release 11.2.0.3.0 Production on Tue Nov 1 20:21:20 2016
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL>

[oracle@node1 admin]$ sqlplus anbob/anbob@anbob116
SQL*Plus: Release 11.2.0.3.0 Production on Tue Nov 1 20:25:17 2016
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL>

Note:
经测试无论是新还是原PUBLIC IP 都可以连接,当然原VIP 但是在原eth0上,并且IP并未改变,也在监听上可以连接不再演示。
我开始测试时想增加一块新网卡不互换IP,如节点1 使用eth3,eht1;节点2 使用eth0, eht1 结果当然是失败了。
1,修改public 网卡
oifcfg delif -global eth0 -n node1
oifcfg setif -global eth3/192.168.1.0:public
这样会级连的修改另一节点,即使你不使用global而是用node也会;
2, 如果另一节点是关闭状态,并且做上面的修改时不报错,但是想再回滚删除eth3时会报错如下
“[grid@node1 ~]$ oifcfg delif -global eth3
PRIF-33: Failed to set or delete interface because hosts could not be discovered
CRS-02307: No GPnP services on requested remote hosts.
PRIF-32: Error in checking for profile availability for host node2
CRS-02306: GPnP service on host "node2" not found.

解决方法是打开另一节点,确保所有的节点是RUNNING状态,并增加 -force删除。
3,如果另一节点是关闭状态,做#1操作,也是可以完成,但是节点2启动后vip不会漂回来,因为vip 依赖网卡名,节点2不存在eth3.
Note!    有时是因为节点间的GIPC通信出了问题,可以尝试通过KILL 问题外的节点的GIPCD.BIN进程来解决, KILL GIPC进程后该进程会自动重启该但并不会影响CRS可用性。
--------- update 2018 -10  -----------
MOS上有篇Note 与该版本问题较为匹配 bug 12356910
CSSD Fails to Start After Repeated Message: Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) (文档 ID 1588034.1)

Solution


The fix is included in 11.2.0.4, 12.1, apply interim patch 12356910 if business is impacted.
The workaround is to restart GI on all nodes.
 
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论