问题描述
有一套11.2的RAC环境遇到了很怪的问题,每次CRS重启节点1的CRS都要在30分钟左右才能启动,期间只能看到就是在GPNP启动阶段循环的尝试,和MOS中bug 12356910很像但是安装补丁后问题并未解决,也尝试增加了该进程的日志级别(crsctl set log gpnp “GPNP=5”)均未找到问题根源,等CRS启动后使用gpnptool find也可以取到其它节点列表的未发现同名的CLUSTER,并对CRS做了Deconfigure/Reconfigure,以至于最后换机器重装CRS重启挂臷DB,测试阶段正常,但是只要更改到生产之前所用的IP后问题就会再次重现, 之前曾见过有一相似案例最后是使用在网络层用VLAN后解决,因这个环境的网络复杂VLAN界限难以明确,并且后期会埋下雷,环境版本ORACLE 11.2.0.3 2NODES RAC ON HPUX IA 11.31,当前版本已过ORACLE的服务期无法再联系开发定位BUG, 最佳方法可能是升级定位或更换IP, GPNP是通常使用PUBLIC IP发送UDP广播包在同一网域内如果有装有ORACLE CRS的mDNS服务(mdnsd.bin)解析本机的cluster_id and cluster_name, 来寻找同一CLUSTER的其它节点。理论修改PUBLIC IP应该就可以,但是应用前期连接数据库存在使用public IP的中件间,而且短时间内无法梳理并修改, 如何及解决CRS启动慢的问题又可以避免中间件或为中间件争取时间梳理?
专家解答
下面是我的一种方案。
先附段CRS的错误日志, 下面是用我们的脚本格式化后的同时段的所有日志文件的集合片段
2016-09-21 00:26:06.870@./ohasd/ohasd.log [ CRSPE][29] {073} State change received from qdyya1 for ora.diskmon 1 1 2016-09-21 00:26:06.871@./agent/ohasd/orarootagent_root/orarootagent_root.log [ AGFW][10] {073} Agent is exiting with exit code 1 2016-09-21 00:26:06.871@./agent/ohasd/orarootagent_root/orarootagent_root.log [ AGFW][10] {073} Agent is shutting down. 2016-09-21 00:26:06.871@./ohasd/ohasd.log [ CRSPE][29] {073} RI [ora.diskmon 1 1] new external state [OFFLINE] old value [ONLINE] on qdyya1 label = [] 2016-09-21 00:26:06.871@./ohasd/ohasd.log [ CRSPE][29] {073} RI [ora.diskmon 1 1] new target state [OFFLINE] old value [ONLINE] 2016-09-21 00:26:06.872@./ohasd/ohasd.log [ CRSPE][29] {073} Processing unplanned state change for [ora.diskmon 1 1] 2016-09-21 00:26:06.872@./ohasd/ohasd.log [ CRSOCR][27] {073} Multi Write Batch processing... 2016-09-21 00:26:06.875@./ohasd/ohasd.log [ CRSOCR][27] {073} Multi Write Batch done. 2016-09-21 00:26:06.879@./ohasd/ohasd.log [ CRSPE][29] {073} Failover cannot be completed for [ora.diskmon 1 1]. Stopping it and the resource tree 2016-09-21 00:26:06.879@./ohasd/ohasd.log [ CRSPE][29] {073} Target is not ONLINE, not recovering [ora.diskmon 1 1] 2016-09-21 00:26:06.880@./ohasd/ohasd.log [ CRSPE][29] {073} Op 600000000260ac20 has 4 WOs 2016-09-21 00:26:06.887@./ohasd/ohasd.log [ CRSPE][29] {073} ICE has queued an operation. Details Operation [STOP of [ora.diskmon 1 1] on [qdyya1] 600000000260ac20] cannot run cause it needs R lock for 2016-09-21 00:26:06.985@./ohasd/ohasd.log [ CRSCOMM][21] Ipc Client disconnected. 2016-09-21 00:26:06.985@./ohasd/ohasd.log [ CRSCOMM][21] IpcL connection to member 7 has been removed 2016-09-21 00:26:06.985@./ohasd/ohasd.log [ CRSCOMM][21][FFAIL] Ipc Couldnt clscreceive message, no message 11 2016-09-21 00:26:06.985@./ohasd/ohasd.log [ CRSCOMM][21][FFAIL] IpcL Listener got clsc error 11 for memNum. 7 2016-09-21 00:26:06.985@./ohasd/ohasd.log [CLSFRAME][21] Disconnected from AGENT process {Relative|Node0|Process7|Type3} 2016-09-21 00:26:06.985@./ohasd/ohasd.log [CLSFRAME][21] Removing IPC Member{Relative|Node0|Process7|Type3} 2016-09-21 00:26:06.986@./ohasd/ohasd.log [ AGFW][24] {0090} /oracle/app/11.2.0.3/grid/bin/orarootagent_root disconnected. 2016-09-21 00:26:06.986@./ohasd/ohasd.log [ AGFW][24] {0090} Agent /oracle/app/11.2.0.3/grid/bin/orarootagent_root[6161] stopped! 2016-09-21 00:26:06.986@./ohasd/ohasd.log [ AGFW][24] {0090} Agfw Proxy Server received process disconnected notification, count=1 2016-09-21 00:26:06.986@./ohasd/ohasd.log [ CRSPE][29] {0088} Disconnected from server 2016-09-21 00:26:06.986@./ohasd/ohasd.log [ CRSCOMM][24] {0090} IpcL removeConnection Member 7 does not exist. 2016-09-21 00:26:07.179@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:08.241@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:08.310@./gpnpd/gpnpd.log [ OCRMSG][3]GIPC error [29] msg [gipcretConnectionRefused] 2016-09-21 00:26:09.189@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:10.262@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:11.199@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:12.281@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:13.219@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:14.301@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:15.229@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:16.321@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:17.239@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:18.341@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:19.249@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:20.361@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:21.260@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:22.382@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:22.452@./gpnpd/gpnpd.log [ OCRMSG][3]GIPC error [29] msg [gipcretConnectionRefused] 2016-09-21 00:26:23.279@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:24.401@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:25.289@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:26.421@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:27.299@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:28.441@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:29.309@./cssd/ocssd.l01 [ GPNP][1]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:30.461@./ohasd/ohasd.log [ CRSCCL][18]clsgpnpm_newWiredMsg [at clsgpnpm.c741] Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) [uri "http//www.grid-pnp.org/2005/12/gpnp-errors#"] 2016-09-21 00:26:30.963@./agent/ohasd/oraagent_grid/oraagent_grid.l01 [ora.mdnsd][8] {002} [check] clsdmc_respget return status=0, ecode=0 ...--- log show 尝试半小时后最终放弃尝试,启动CRS
其实方法很简单:
因为CRS中对于PUBLIC IP记录的是网卡名,所以给原网卡修改成新IP, 再增加一物理网卡并修改为原PUBLIC IP, 新IP用于CRS PUBLIC IP, 原IP用于手动增加ORACLE LISTENER为原应用提示服务。 下面是在我虚拟机的测试。
因为只更改PUBLIC IP在同一网卡和subnet,CRS不需要任何修改,只是在OS层修改IP前后重启一次CRS.以下只操作节点1.步骤如下:
1. Shutdown Oracle Clusterware stack
2. Modify the new IP address at network layer, DNS and /etc/hosts file to reflect the change
3. Add a new network interface and set original IP
4. Restart Oracle Clusterware stack
5. Add an additional listener to Oracle manually with original IP
192.168.1.116 node1 node1.anbob.com –更换为192.168.1.16
192.168.1.126 node2 node2.anbob.com
192.168.1.216 node1-vip
192.168.1.226 node2-vip
172.168.1.116 node1-priv
172.168.1.126 node2-priv
192.168.1.200 anbob-cluster anbob-cluster-scan
[root@node1 ~]# ifconfig eth0 Link encap:Ethernet HWaddr 08:00:27:5F:EC:1A inet addr:192.168.1.116 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe5f:ec1a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:17081 errors:0 dropped:0 overruns:0 frame:0 TX packets:13480 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1405616 (1.3 MiB) TX bytes:6599214 (6.2 MiB) eth0:1 Link encap:Ethernet HWaddr 08:00:27:5F:EC:1A inet addr:192.168.1.216 Bcast:192.168.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth1 Link encap:Ethernet HWaddr 08:00:27:93:EE:5E inet addr:172.168.1.116 Bcast:172.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe93:ee5e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:27881 errors:0 dropped:0 overruns:0 frame:0 TX packets:12911 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:15779267 (15.0 MiB) TX bytes:3670283 (3.5 MiB) eth1:1 Link encap:Ethernet HWaddr 08:00:27:93:EE:5E inet addr:169.254.193.48 Bcast:169.254.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth2 Link encap:Ethernet HWaddr 08:00:27:73:D2:CE inet addr:192.168.56.20 Bcast:192.168.56.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe73:d2ce/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3014 errors:0 dropped:0 overruns:0 frame:0 TX packets:1794 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:272606 (266.2 KiB) TX bytes:304859 (297.7 KiB) eth3 Link encap:Ethernet HWaddr 08:00:27:7C:D7:DA inet addr:192.168.1.16 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe7c:d7da/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:16960 errors:0 dropped:0 overruns:0 frame:0 TX packets:268 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1385423 (1.3 MiB) TX bytes:36566 (35.7 KiB)
Note:
eth0 用于public ip, eth1 用于cluster_interconnect(private ip),eth2 是我的物理机与虚拟的专用网(可以忽略),eth3 是新增加的物理网卡.
[root@node1 ~]# crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4535: Cannot communicate with Cluster Ready Services CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online [grid@node1 ~]$ crsctl query crs activeversion Oracle Clusterware active version on the cluster is [11.2.0.3.0] [grid@node1 ~]$ crsctl query crs releaseversion Oracle High Availability Services release version on the local node is [11.2.0.3.0] [grid@node1 ~]$ oifcfg getif eth0 192.168.1.0 global public eth1 172.168.1.0 global cluster_interconnect [grid@node1 ~]$ srvctl config nodeapps -a Network exists: 1/192.168.1.0/255.255.255.0/eth0, type static VIP exists: /node1-vip/192.168.1.216/192.168.1.0/255.255.255.0/eth0, hosting node node1 VIP exists: /node2-vip/192.168.1.226/192.168.1.0/255.255.255.0/eth0, hosting node node2 [grid@node1 ~]$ lsnrctl status LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 01-NOV-2016 15:18:18 Copyright (c) 1991, 2011, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER))) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production Start Date 01-NOV-2016 15:15:46 Uptime 0 days 0 hr. 2 min. 34 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/11.2.0.3/grid/network/admin/listener.ora Listener Log File /u01/app/grid/diag/tnslsnr/node1/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.116)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.216)(PORT=1521))) Services Summary... Service "+ASM" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... The command completed successfully[root@node1 ~]$ crsctl stop crs
Note:
节点2不用修改,只在节点1增加新网卡并停到CRS后,主机上就可以修改交换eth0和eth3的IP地址。
[root@node1 ~]# ifdown eth3 [root@node1 ~]# ifdown eth0 [root@node1 ~]# ifup eth0 [root@node1 ~]# ifup eth3 [root@node1 ~]# ifconfig eth0 Link encap:Ethernet HWaddr 08:00:27:5F:EC:1A inet addr:192.168.1.16 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe5f:ec1a/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:40411 errors:0 dropped:0 overruns:0 frame:0 TX packets:31081 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3345829 (3.1 MiB) TX bytes:14142844 (13.4 MiB) eth1 Link encap:Ethernet HWaddr 08:00:27:93:EE:5E inet addr:172.168.1.116 Bcast:172.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe93:ee5e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:64788 errors:0 dropped:0 overruns:0 frame:0 TX packets:35273 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:36053768 (34.3 MiB) TX bytes:10979476 (10.4 MiB) eth2 Link encap:Ethernet HWaddr 08:00:27:73:D2:CE inet addr:192.168.56.20 Bcast:192.168.56.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe73:d2ce/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:6400 errors:0 dropped:0 overruns:0 frame:0 TX packets:3946 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:560483 (547.3 KiB) TX bytes:698482 (682.1 KiB) eth3 Link encap:Ethernet HWaddr 08:00:27:7C:D7:DA inet addr:192.168.1.116 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::a00:27ff:fe7c:d7da/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:42434 errors:0 dropped:0 overruns:0 frame:0 TX packets:498 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3363456 (3.2 MiB) TX bytes:71937 (70.2 KiB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:35592 errors:0 dropped:0 overruns:0 frame:0 TX packets:35592 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:30547907 (29.1 MiB) TX bytes:30547907 (29.1 MiB) vi /etc/hosts # 192.168.1.116 node1 node1.anbob.com 192.168.1.16 node1 node1.anbob.com 192.168.1.126 node2 node2.anbob.com 192.168.1.216 node1-vip 192.168.1.226 node2-vip 172.168.1.116 node1-priv 172.168.1.126 node2-priv 192.168.1.200 anbob-cluster anbob-cluster-scan[root@node1 ~]$ crsctl start crs[grid@node1 ~]$ lsnrctl status LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 01-NOV-2016 15:47:26 Copyright (c) 1991, 2011, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER))) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production Start Date 01-NOV-2016 15:47:21 Uptime 0 days 0 hr. 0 min. 18 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/11.2.0.3/grid/network/admin/listener.ora Listener Log File /u01/app/11.2.0.3/grid/log/diag/tnslsnr/node1/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.16)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.216)(PORT=1521))) Services Summary... Service "+ASM" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... The command completed successfully
Note:
在主机层修改完IP地址后,并修改了/etc/hosts后,启动CRS, 监听的IP地址已使用了新的Public IP,如果有使用静态注册,记的手动修改参数文件。
手动增加原有IP的监听并静态注册,把下面的附加到listener.ora文件中,注意我这里叫listener2.
SID_LIST_LISTENER2 = (SID_LIST = (SID_DESC = (GLOBAL_DBNAME = anbob) (ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1) # this is DB oracle home,no GI (SID_NAME = anbob1) ) ) LISTENER2 = (DESCRIPTION_LIST = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.1.116)(PORT = 1521)(IP = FIRST)) ) ) [grid@node1 admin]$ lsnrctl start listener2 LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 01-NOV-2016 19:29:28 Copyright (c) 1991, 2011, Oracle. All rights reserved. Starting /u01/app/11.2.0.3/grid/bin/tnslsnr: please wait... TNSLSNR for Linux: Version 11.2.0.3.0 - Production System parameter file is /u01/app/11.2.0.3/grid/network/admin/listener.ora Log messages written to /u01/app/11.2.0.3/grid/log/diag/tnslsnr/node1/listener2/alert/log.xml Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.116)(PORT=1521))) Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.116)(PORT=1521)(IP=FIRST))) STATUS of the LISTENER ------------------------ Alias listener2 Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production Start Date 01-NOV-2016 19:29:29 Uptime 0 days 0 hr. 0 min. 2 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/11.2.0.3/grid/network/admin/listener.ora Listener Log File /u01/app/11.2.0.3/grid/log/diag/tnslsnr/node1/listener2/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.116)(PORT=1521)))Services Summary... Service "anbob" has 1 instance(s). Instance "anbob1", status UNKNOWN, has 1 handler(s) for this service... The command completed successfully
Note:
手动增加了新的listener名为Listener2,监听在原PUBLIC IP和相同的端口上,注意这里我们不会把该监听注册到CRS中,因为我只在节点1上增加了新的网卡,并且如果增加新的监听资源需要增加新的network资源,担心CRS又使用原PUBLIC IP出现CRS启动的问题,所以后续需要我们自己写shell维护该监听。 下面从客户端测试连接。
[grid@node1 admin]$ ps -ef|grep lsnr grid 4070 1 0 19:29 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr listener2 -inherit grid 5439 1 0 19:43 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER_SCAN1 -inherit grid 5476 1 0 19:43 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/tnslsnr LISTENER -inherit [grid@node1 admin]$ netstat -an|grep 1521 tcp 0 0 192.168.1.216:1521 0.0.0.0:* LISTEN tcp 0 0 192.168.1.16:1521 0.0.0.0:* LISTEN tcp 0 0 192.168.1.200:1521 0.0.0.0:* LISTEN tcp 0 0 192.168.1.116:1521 0.0.0.0:* LISTEN tcp 0 0 192.168.1.216:1521 192.168.1.216:58796 ESTABLISHED tcp 0 0 192.168.1.216:58796 192.168.1.216:1521 ESTABLISHED tcp 0 0 192.168.1.200:1521 192.168.1.200:60114 ESTABLISHED tcp 0 0 192.168.1.216:58825 192.168.1.216:1521 ESTABLISHED tcp 0 0 192.168.1.200:60114 192.168.1.200:1521 ESTABLISHED tcp 0 0 192.168.1.216:1521 192.168.1.216:58825 ESTABLISHED unix 3 [ ] STREAM CONNECTED 61521 # Tnsnames.ora 中增加 # new public ip anbob16= (DESCRIPTION= (ADDRESS_LIST= (ADDRESS=(PROTOCOL=TCP) (HOST=192.168.1.16) (PORT=1521) ) ) (CONNECT_DATA= (SERVER = DEDICATED) (SERVICE_NAME=anbob) ) ) # original public ip anbob116= (DESCRIPTION= (ADDRESS_LIST= (ADDRESS=(PROTOCOL=TCP) (HOST=192.168.1.116) (PORT=1521) ) ) (CONNECT_DATA= (SERVER = DEDICATED) (SERVICE_NAME=anbob) ) ) [oracle@node1 admin]$ sqlplus anbob/anbob@anbob16 SQL*Plus: Release 11.2.0.3.0 Production on Tue Nov 1 20:21:20 2016 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options SQL> [oracle@node1 admin]$ sqlplus anbob/anbob@anbob116 SQL*Plus: Release 11.2.0.3.0 Production on Tue Nov 1 20:25:17 2016 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options SQL>
Note:
经测试无论是新还是原PUBLIC IP 都可以连接,当然原VIP 但是在原eth0上,并且IP并未改变,也在监听上可以连接不再演示。
我开始测试时想增加一块新网卡不互换IP,如节点1 使用eth3,eht1;节点2 使用eth0, eht1 结果当然是失败了。
1,修改public 网卡
oifcfg delif -global eth0 -n node1
oifcfg setif -global eth3/192.168.1.0:public
这样会级连的修改另一节点,即使你不使用global而是用node也会;
2, 如果另一节点是关闭状态,并且做上面的修改时不报错,但是想再回滚删除eth3时会报错如下
“[grid@node1 ~]$ oifcfg delif -global eth3
PRIF-33: Failed to set or delete interface because hosts could not be discovered
CRS-02307: No GPnP services on requested remote hosts.
PRIF-32: Error in checking for profile availability for host node2
CRS-02306: GPnP service on host “node2” not found.
”
解决方法是打开另一节点,确保所有的节点是RUNNING状态,并增加 -force删除。
3,如果另一节点是关闭状态,做#1操作,也是可以完成,但是节点2启动后vip不会漂回来,因为vip 依赖网卡名,节点2不存在eth3.
Note! 有时是因为节点间的GIPC通信出了问题,可以尝试通过KILL 问题外的节点的GIPCD.BIN进程来解决, KILL GIPC进程后该进程会自动重启该但并不会影响CRS可用性。
——— update 2018 -10 ———–
MOS上有篇Note 与该版本问题较为匹配 bug 12356910
CSSD Fails to Start After Repeated Message: Msg-reply has soap fault 10 (Operation returned Retry (error CLSGPNP_CALL_AGAIN)) (文档 ID 1588034.1)
Solution
The fix is included in 11.2.0.4, 12.1, apply interim patch 12356910 if business is impacted.
The workaround is to restart GI on all nodes.