暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

故障诊断:12cR2 Flex ASM 环境中节点启动失败的诊断和分析

张维照 2019-03-17
1583

点击▲关注 “数据和云”   给微信标星置顶

更多精彩 第一时间直达

作者 | 张维照,云和恩墨技术专家,Oracle ACEA,2006年起从事数据库管理工作,2009年转 Oracle,从事过多套 TB 级省级工商、医疗、交通、人社、电信运营等数据库维护优化工作,擅长Oracle 数据库性能问题的分析与解决,Oracle数据库故障分析,Oracle数据库升级迁移。



Flex ASM

在12c以前的版本数据库实例使用操作系统认证连接ASM实例,因为ASM CLIENT(DB INSTANCE)和ASM Server总是在同一个主机上, 从12c版本开始引入的FLEX ASM架构允许数据库实例可以和ASM运行在不同的主机中, 使用FLEX ASM user password文件认证, ASM 密码文件存储在ASM DISKGROUP中, 同时在创建Flex ASM时会默认创建ASM USER。Flex ASM 也支持oracle 12c前版本的rdbms, 同样也是建议使用的ASM架构.


ASM Network

在Flex ASM Oracle 12c引入了一种新类型network, 叫做ASM network.  这种network用于ASM和ASM CLIENT及所有节点间通信。集群中的所有ASM client可以访问所有ASM network,    也可以只配置一个network共用于支持private network和asm network。


ASM Listeners

ASM listener用来支持Flex ASM 访问, 为每个ASM network配置一组ASM listener, 每个ASM client数据库实例中最多将三个ASM listener地址注册为remote listeners,所有客户端连接都在整个ASM实例集中进行负载平衡. 默认名为ASMNET1LSNR_ASM


案例

有了上面的基础认识,开始分析最近遇到的一个FLEX ASM相关的案例, 这是一套12c R2 2-nodes RAC  DG 环境 , 在检查DG standby side时,发现Standby node2 instance未启动,是在standby node1 上接收并应用。  尝试启动实例2时发现了问题。

root@anbobstb02:~$crsctl start crs
CRS-4123Oracle High Availability Services has been started.

root@anbobstb02:~$crsctl check crs
CRS-4638Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager

grid@anbobstb02$ crsctl stat res -t -init
--------------------------------------------------------------------------------
Name    Target  State   Server    State details      Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.crf
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.crsd
      1        ONLINE  OFFLINE                               STABLE
ora.cssd
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.cssdmonitor
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.ctssd
      1        ONLINE  ONLINE       anbobstb02                  OBSERVER,STABLE
ora.diskmon
      1        OFFLINE OFFLINE                               STABLE
ora.drivers.acfs
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.evmd
      1        ONLINE  INTERMEDIATE anbobstb02                  STABLE
ora.gipcd
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.gpnpd
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.mdnsd
      1        ONLINE  ONLINE       anbobstb02                  STABLE
ora.storage
      1        ONLINE  ONLINE       anbobstb02                  STABLE
--------------------------------------------------------------------------------

grid@anbobstb02:~$crsctl get cluster mode status
Cluster is running in "flex" mode


注意:
Flex ASM环境, 在启动NODE2 CRS时失败。

CRS 告警日志

2019-02-12 10:50:05.229 [OCSSD(51008)]CRS-1713: CSSD daemon is started in hub mode
2019-02-12 10:50:06.670000 +08:00
2019-02-12 10:50:06.670 [OCSSD(51008)]CRS-1707: Lease acquisition for node anbobstb02 number 2 completed
2019-02-12 10:50:07.756000 +08:00
2019-02-12 10:50:07.756 [OCSSD(51008)]CRS-1605: CSSD voting file is online: /dev/asm-disk55; details in /oracle/app/grid/diag/crs/anbobstb02/crs/trace/ocssd.trc.
2019-02-12 10:50:07.759 [OCSSD(51008)]CRS-1605: CSSD voting file is online: /dev/asm-disk52; details in /oracle/app/grid/diag/crs/anbobstb02/crs/trace/ocssd.trc.
2019-02-12 10:50:07.763 [OCSSD(51008)]CRS-1605: CSSD voting file is online: /dev/asm-disk51; details in /oracle/app/grid/diag/crs/anbobstb02/crs/trace/ocssd.trc.
2019-02-12 10:50:14.376000 +08:00
2019-02-12 10:50:14.376 [OCSSD(51008)]CRS-1601: CSSD Reconfiguration complete. Active nodes are anbobstb01 anbobstb02 .
2019-02-12 10:50:17.177000 +08:00
2019-02-12 10:50:17.176 [OCTSSD(55374)]CRS-8500: Oracle Clusterware OCTSSD process is starting with operating system process ID 55374
2019-02-12 10:50:17.193 [OCSSD(51008)]CRS-1720: Cluster Synchronization Services daemon (CSSD) is ready for operation.
2019-02-12 10:50:18.157 [OCTSSD(55374)]CRS-2403: The Cluster Time Synchronization Service on host anbobstb02 is in observer mode.
2019-02-12 10:50:19.266000 +08:00
2019-02-12 10:50:19.266 [OCTSSD(55374)]CRS-2407: The new Cluster Time Synchronization Service reference node is host anbobstb01.
2019-02-12 10:50:19.266 [OCTSSD(55374)]CRS-2401: The Cluster Time Synchronization Service started on host anbobstb02.
2019-02-12 10:50:35.725000 +08:00
2019-02-12 10:50:35.725 [ORAROOTAGENT(50588)]CRS-5019: All OCR locations are on ASM disk groups [OCRDG], 
and none of these disk groups are mounted. Details are at "(:CLSN00140:)" in "/oracle/app/grid/diag/crs/anbobstb02/crs/trace/ohasd_orarootagent_root.trc".


    Trace ohasd_orarootagent_root.trc
    adrci> show trace oracle/app/grid/diag/crs/anbobstb02/crs/trace/ohasd_orarootagent_root.trc


    2019-02-12 10:50:25.765 : AGFW:2530133760: {0:5:3} Agent sending reply for: RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:403
    2019-02-12 10:50:25.765 : USRTHRD:2519627520: {0:5:3} Check: 0-1
    2019-02-12 10:50:25.766 : AGFW:2530133760: {0:5:3} ora.cluster_interconnect.haip 1 1 state changed from: STARTING to: ONLINE
    2019-02-12 10:50:25.766 : AGFW:2530133760: {0:5:3} RECYCLE_AGENT attribute not found
    2019-02-12 10:50:25.766 : AGFW:2530133760: {0:5:3} Started implicit monitor for [ora.cluster_interconnect.haip 1 1] interval=30000 delay=3000
    0
    2019-02-12 10:50:25.766 : AGFW:2530133760: {0:5:3} Agent sending last reply for: RESOURCE_START[ora.cluster_interconnect.haip 1 1] ID 4098:40
    3
    2019-02-12 10:50:25.768 : USRTHRD:2512111360: {0:5:3} got pubgrpdata, 1-8-2-2-2
    2019-02-12 10:50:25.770 : USRTHRD:2512111360: {0:5:3} Completed 1 HAIP assignment, start complete
    2019-02-12 10:50:25.770 : USRTHRD:2512111360: {0:5:3} to verify inf event
    2019-02-12 10:50:25.813 : AGFW:2530133760: {0:5:3} Agent received the message: RESOURCE_START[ora.storage 1 1] ID 4098:438
    2019-02-12 10:50:25.813 : AGFW:2530133760: {0:5:3} Preparing START command for: ora.storage 1 1
    2019-02-12 10:50:25.813 : AGFW:2530133760: {0:5:3} ora.storage 1 1 state changed from: OFFLINE to: STARTING
    2019-02-12 10:50:25.813 : AGFW:2530133760: {0:5:3} RECYCLE_AGENT attribute not found
    2019-02-12 10:50:25.813 :CLSDYNAM:2519627520: [ora.storage]{0:5:3} [start] (:CLSN00107:) clsn_agent::start {
    2019-02-12 10:50:25.814 :CLSDYNAM:2519627520: [ora.storage]{0:5:3} [start] StorageAgent::init NodeRole = 1
    2019-02-12 10:50:25.814 :CLSDYNAM:2519627520: [ora.storage]{0:5:3} [start] StorageAgent::check NODEROLE_HUB getOCRdetails
    2019-02-12 10:50:25.832 : default:2519627520: clsvactversion:4: Retrieving Active Version from local storage.
    2019-02-12 10:50:25.840 :GIPCXCPT:2519627520: gipcInternalSetAttribute: failed during gipcInternalSetAttribute, ret gipcretInvalidAttribute (5)
    2019-02-12 10:50:25.840 :GIPCXCPT:2519627520: gipcSetAttributeNativeF [clscrsconGipcConnect : clscrscon.c : 655]: EXCEPTION[ ret gipcretInvalid
    Attribute (5) ] failure for obj 0x7f8664436460 [0000000000000955] { gipcEndpoint : localAddr ”, remoteAddr ”, numPend 0, numReady 0, numDone
    0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp (nil) status 13flags 0x20000000, flags-2 0x0, u
    srFlags 0x0 }, name ‘traceLevel’, val 0x7f86962cf004, len 4, flags 0x0
    2019-02-12 10:50:25.855 : CLSNS:2519627520: clsns_SetTraceLevel:trace level set to 1.
    2019-02-12 10:50:25.859 : default:2519627520: Inited LSF context: 0x7f866453c5d0
    2019-02-12 10:50:25.863 : CLSCRED:2519627520: clsCredCommonInit: Inited singleton credctx.
    2019-02-12 10:50:25.863 : CLSCRED:2519627520: (:CLSCRED0101:)clsCredDomInitRootDom: Using user given storage context for repository access.
    2019-02-12 10:50:25.886 : USRTHRD:2519627520: {0:5:3} 8154 Error 4 querying length of attr ASM_DISCOVERY_ADDRESS
    2019-02-12 10:50:25.889 : USRTHRD:2519627520: {0:5:3} 8154 Error 4 querying length of attr ASM_STATIC_DISCOVERY_ADDRESS
    2019-02-12 10:50:25.924 : CLSCRED:2519627520: (:CLSCRED1079:)clsCredOcrKeyExists: Obj dom : SYSTEM.credentials.domains.root.ASM.Self.bb15e951dcbc4fc2ff3aec3bfe1f0424.root not found
    2019-02-12 10:50:25.924 : USRTHRD:2519627520: {0:5:3} 7872 Error 4 opening dom root in 0x7f866441d590
    2019-02-12 10:50:25.929 :GIPCXCPT:2519627520: gipcInternalSetAttribute: failed during gipcInternalSetAttribute, ret gipcretInvalidAttribute (5)
    2019-02-12 10:50:25.929 :GIPCXCPT:2519627520: gipcSetAttributeNativeF [clscrsconGipcConnect : clscrscon.c : 655]: EXCEPTION[ ret gipcretInvalid Attribute (5) ] failure for obj 0x7f86647145d0 [0000000000000fc1] { gipcEndpoint : localAddr ”, remoteAddr ”, numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp (nil) status 13flags 0x20000000, flags-2 0x0, usrFlags 0x0 }, name ‘traceLevel’, val 0x7f86962cf004, len 4, flags 0x0
    2019-02-12 10:50:26.014 : AGFW:2743070784: Recvd request to shed the threads
    2019-02-12 10:50:26.014 :CLSFRAME:2743070784: TM [MultiThread] is changing desired thread # to 8. Current # is 9
    2019-02-12 10:50:26.014 :CLSFRAME:2532235008: {0:1:5} Worker thread is exiting in TM [MultiThread] to meet the desired count of 8. New count is 8



    2019-02-12 10:50:29.219 : USRTHRD:2519627520: {0:5:3} 7872 Error 4 opening dom root in 0x7f866465b680
    2019-02-12 10:50:29.222 :GIPCXCPT:2519627520: gipcInternalSetAttribute: failed during gipcInternalSetAttribute, ret gipcretInvalidAttribute (5)
    2019-02-12 10:50:29.222 :GIPCXCPT:2519627520: gipcSetAttributeNativeF [clscrsconGipcConnect : clscrscon.c : 655]: EXCEPTION[ ret gipcretInvalid
    Attribute (5) ] failure for obj 0x7f8664546cd0 [0000000000002068] { gipcEndpoint : localAddr ”, remoteAddr ”, numPend 0, numReady 0, numDone
    0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp (nil) status 13flags 0x20000000, flags-2 0x0, usrFlags 0x0 }, name ‘traceLevel’, val 0x7f86962cf004, len 4, flags 0x0

    2019-02-12 10:50:34.664 : USRTHRD:2519627520: {0:5:3} 7872 Error 4 opening dom root in 0x7f86645abf50
    2019-02-12 10:50:34.668 :GIPCXCPT:2519627520: gipcInternalSetAttribute: failed during gipcInternalSetAttribute, ret gipcretInvalidAttribute (5)
    2019-02-12 10:50:34.668 :GIPCXCPT:2519627520: gipcSetAttributeNativeF [clscrsconGipcConnect : clscrscon.c : 655]: EXCEPTION[ ret gipcretInvalid Attribute (5) ] failure for obj 0x7f8664707b20 [000000000000392d] { gipcEndpoint : localAddr ”, remoteAddr ”, numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj (nil), sendp (nil) status 13flags 0x20000000, flags-2 0x0, usrFlags 0x0 }, name ‘traceLevel’, val 0x7f86962cf004, len 4, flags 0x0
    2019-02-12 10:50:35.715 : USRTHRD:2509133568: HAIP: event GIPCD_METRIC_UPDATE
    2019-02-12 10:50:35.715 : USRTHRD:2512111360: {0:5:3} to verify inf event
    2019-02-12 10:50:35.724 : default:2519627520: clsCredDomClose: Credctx deleted 0x7f866443f840
    2019-02-12 10:50:35.724 : USRTHRD:2519627520: {0:5:3} — trace dump on error exit —
    2019-02-12 10:50:35.724 : USRTHRD:2519627520: {0:5:3} Error [kgfoAl06] in [kgfokge] at kgfo.c:3115
    2019-02-12 10:50:35.724 : USRTHRD:2519627520: {0:5:3} ORA-12547: TNS:lost contact
    ORA-12547: TNS:lost contact
    ORA-15077: could not locate ASM instance serving a required diskgroup


    2019-02-12 10:50:35.724 : USRTHRD:2519627520: {0:5:3} Category: 7
    2019-02-12 10:50:35.724 : USRTHRD:2519627520: {0:5:3} DepInfo: 12547
    2019-02-12 10:50:35.724 : USRTHRD:2519627520: {0:5:3} — trace dump end —
    2019-02-12 10:50:35.724 :CLSDYNAM:2519627520: [ora.storage]{0:5:3} [start] StorageAgent::parsekgforetcodes retcode = 7, kgfoCheckMount(OCRDG), flag 2
    2019-02-12 10:50:35.724 :CLSDYNAM:2519627520: [ora.storage]{0:5:3} [start] (null) category: 7, operation: kgfoAl06, loc: kgfokge, OS error: 1254
    7, other: ORA-12547: TNS:lost contact
    ORA-12547: TNS:lost contact
    ORA-15077: could not locate ASM instance serving a required diskgroup
    2019-02-12 10:50:35.724 :CLSDYNAM:2519627520: [ora.storage]{0:5:3} [start] StorageAgent::check kgfo returncode 1
    2019-02-12 10:50:35.724 :CLSDYNAM:2519627520: [ora.storage]{0:5:3} [start] (:CLSN00140:)StorageAgent::parsekgforretcodes OCR dgName OCRDG state 1


    Note:
    从日志看应该是在CRS启动时没有发现ASM DISKGROUP, asm 启动时在取asm 认证证数时出错,提示是ora-12547和ora-15055, Flex ASM中ASM server启动时要连接所有asm network. 下一步检查NODE1 的ASM listener.


    grid@anbobstb01:/home/grid> crsctl stat res -t
    --------------------------------------------------------------------------------
    Name Target State Server State details
    --------------------------------------------------------------------------------
    Local Resources
    --------------------------------------------------------------------------------
    ora.ARCHDG.dg
    ONLINE ONLINE anbobstb01 STABLE
    ora.ASMNET1LSNR_ASM.lsnr
    ONLINE ONLINE anbobstb01 STABLE
    ora.DATADG.dg
    ONLINE ONLINE anbobstb01 STABLE
    ora.LISTENER.lsnr
    ONLINE ONLINE anbobstb01 STABLE
    ora.MGMT.dg
    ONLINE ONLINE anbobstb01 STABLE
    ora.OCRDG.dg
    ONLINE ONLINE anbobstb01 STABLE
    ora.chad
    ONLINE ONLINE anbobstb01 STABLE
    ora.net1.network
    ONLINE ONLINE anbobstb01 STABLE
    ora.ons
    ONLINE ONLINE anbobstb01 STABLE
    --------------------------------------------------------------------------------
    Cluster Resources
    --------------------------------------------------------------------------------
    ora.LISTENER_SCAN1.lsnr
    1 ONLINE ONLINE anbobstb01 STABLE
    ora.MGMTLSNR
    1 ONLINE ONLINE anbobstb01 169.254.143.82 192.1
    68.43.33,STABLE
    ora.asm
    1 ONLINE ONLINE anbobstb01 Started,STABLE
    2 ONLINE OFFLINE STABLE
    3 OFFLINE OFFLINE STABLE
    ora.cvu
    1 ONLINE ONLINE anbobstb01 STABLE
    ora.mgmtdb
    1 ONLINE ONLINE anbobstb01 Open,STABLE
    ora.anbobstb01.vip
    1 ONLINE ONLINE anbobstb01 STABLE
    ora.anbobstb02.vip
    1 ONLINE INTERMEDIATE anbobstb01 FAILED OVER,STABLE
    ora.qosmserver
    1 ONLINE ONLINE anbobstb01 STABLE
    ora.rptstby.db
    1 OFFLINE OFFLINE Instance Shutdown,ST
    ABLE
    2 ONLINE ONLINE anbobstb01 Open,Readonly,HOME=/
    oracle/app/oracle/pr
    oduct/12.2.0/db_1,ST
    ABLE
    ora.scan1.vip
    1 ONLINE ONLINE anbobstb01 STABLE
    --------------------------------------------------------------------------------
    grid@anbobstb01:/home/grid> crsctl stat res -t -init
    --------------------------------------------------------------------------------
    Name  Target  State  Server   State details Cluster Resources
    --------------------------------------------------------------------------------
    ora.asm
    1 ONLINE ONLINE anbobstb01 Started,STABLE
    ora.cluster_interconnect.haip
    1 ONLINE ONLINE anbobstb01 STABLE
    ora.crf
    1 ONLINE ONLINE anbobstb01 STABLE
    ora.crsd
    1 ONLINE ONLINE anbobstb01 STABLE
    ora.cssd
    1 ONLINE ONLINE anbobstb01 STABLE
    ora.cssdmonitor
    1 ONLINE ONLINE anbobstb01 STABLE
    ora.ctssd
    1 ONLINE ONLINE anbobstb01 OBSERVER,STABLE
    ora.diskmon
    1 OFFLINE OFFLINE STABLE
    ora.drivers.acfs
    1 ONLINE ONLINE anbobstb01 STABLE
    ora.evmd
    1 ONLINE ONLINE anbobstb01 STABLE
    ora.gipcd
    1 ONLINE ONLINE anbobstb01 STABLE
    ora.gpnpd
    1 ONLINE ONLINE anbobstb01 STABLE
    ora.mdnsd
    1 ONLINE ONLINE anbobstb01 STABLE
    ora.storage
    1 ONLINE ONLINE anbobstb01 STABLE
    --------------------------------------------------------------------------------


    grid@anbobstb01:/home/grid> ocrdump tmp/ocr.dmp
    PROT-310: Not all keys were dumped due to permissions.
    grid@anbobstb01:/home/grid> vi tmp/ocr.dmp


    [SYSTEM.ASM.CREDENTIALS.USERS.CRSUSER__ASM_001]
    ORATEXT : bb15e951dcbc4fc2ff3aec3bfe1f0424:grid --credentials is exist
    SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_NONE, USER_NAME : grid, GROUP_NAME : oinstall}


    grid@anbobstb01:~$oifcfg getif
    bond0 133.96.43.0 global public
    bond1 192.168.43.0 global cluster_interconnect,asm


    grid@anbobstb01:/oracle/app/12.2.0/grid/bin> ps -ef|grep lsnr
    grid 1411 1 0 2018 ? 00:01:53 oracle/app/12.2.0/grid/bin/tnslsnr LISTENER_SCAN1 -no_crs_notify -inherit
    grid 6585 1 0 2018 ? 00:00:21 oracle/app/12.2.0/grid/bin/tnslsnr listener_dg -inherit
    grid 25826 19172 0 10:57 pts/3 00:00:00 grep --color=auto lsnr
    grid 72783 1 0 2018 ? 00:02:01 oracle/app/12.2.0/grid/bin/tnslsnr MGMTLSNR -no_crs_notify -inherit
    grid 78242 1 0 Feb12 ? 00:00:05 oracle/app/12.2.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
    grid 80521 1 0 Feb12 ? 00:00:35 oracle/app/12.2.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit


    grid@anbobstb01:/oracle/app/12.2.0/grid/bin> lsnrctl status ASMNET1LSNR_ASM
    LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 13-FEB-2019 10:57:43
    Copyright (c) 1991, 2016, Oracle. All rights reserved.
    Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM)))
    STATUS of the LISTENER
    ------------------------
    Alias ASMNET1LSNR_ASM
    Version TNSLSNR for Linux: Version 12.2.0.1.0 - Production
    Start Date 12-FEB-2019 18:21:02
    Uptime 0 days 16 hr. 36 min. 40 sec
    Trace Level off
    Security ON: Local OS Authentication
    SNMP OFF
    Listener Parameter File oracle/app/12.2.0/grid/network/admin/listener.ora
    Listener Log File oracle/app/grid/diag/tnslsnr/anbobstb01/asmnet1lsnr_asm/alert/log.xml
    Listening Endpoints Summary...
    (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=ASMNET1LSNR_ASM)))
    (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.43.33)(PORT=1526)))
    The listener supports no services
    The command completed successfully


    grid@anbobstb01:/oracle/app/grid/diag/tnslsnr/anbobstb01/asmnet1lsnr_asm/trace> vi asmnet1lsnr_asm.log


    2019-02-13T10:57:16.203184+08:00
    Incoming connection from 192.168.43.34 rejected
    13-FEB-2019 10:57:16 * 12546
    TNS-12546: TNS:permission denied
    TNS-12560: TNS:protocol adapter error
    TNS-00516: Permission denied
      
    grid@anbobstb01:/oracle/app/grid/diag/tnslsnr/anbobstb01/asmnet1lsnr_asm/trace> telnet 192.168.43.33 1526
    Trying 192.168.43.33...
    Connected to 192.168.43.33.
    Escape character is '^]'.
    Connection closed by foreign host.


    注意:
    从日志看应该是在CRS启动时没有发现ASM DISKGROUP,  asm 启动时在asm 认证时出错,提示是ora-12547和ora-15055, Flex ASM中ASM server启动时要连接所有asm network.  下一步检查NODE1 的ASM listener.

      grid@anbobstb01:/home/grid> crsctl stat res -t
      --------------------------------------------------------------------------------
      Name   Target  State   Server     State detaiLocal Resources
      --------------------------------------------------------------------------------
      ora.ARCHDG.dg
      ONLINE ONLINE anbobstb01 STABLE
      ora.ASMNET1LSNR_ASM.lsnr
      ONLINE ONLINE anbobstb01 STABLE
      ora.DATADG.dg
      ONLINE ONLINE anbobstb01 STABLE
      ora.LISTENER.lsnr
      ONLINE ONLINE anbobstb01 STABLE
      ora.MGMT.dg
      ONLINE ONLINE anbobstb01 STABLE
      ora.OCRDG.dg
      ONLINE ONLINE anbobstb01 STABLE
      ora.chad
      ONLINE ONLINE anbobstb01 STABLE
      ora.net1.network
      ONLINE ONLINE anbobstb01 STABLE
      ora.ons
      ONLINE ONLINE anbobstb01 STABLE
      --------------------------------------------------------------------------------
      Cluster Resources
      --------------------------------------------------------------------------------
      ora.LISTENER_SCAN1.lsnr
      1 ONLINE ONLINE anbobstb01 STABLE
      ora.MGMTLSNR
      1 ONLINE ONLINE anbobstb01 169.254.143.82 192.1
      68.43.33,STABLE
      ora.asm
      1 ONLINE ONLINE anbobstb01 Started,STABLE
      2 ONLINE OFFLINE STABLE
      3 OFFLINE OFFLINE STABLE
      ora.cvu
      1 ONLINE ONLINE anbobstb01 STABLE
      ora.mgmtdb
      1 ONLINE ONLINE anbobstb01 Open,STABLE
      ora.anbobstb01.vip
      1 ONLINE ONLINE anbobstb01 STABLE
      ora.anbobstb02.vip
      1 ONLINE INTERMEDIATE anbobstb01 FAILED OVER,STABLE
      ora.qosmserver
      1 ONLINE ONLINE anbobstb01 STABLE
      ora.rptstby.db
      1 OFFLINE OFFLINE Instance Shutdown,ST
      ABLE
      2 ONLINE ONLINE anbobstb01 Open,Readonly,HOME=/
      oracle/app/oracle/pr
      oduct/12.2.0/db_1,ST
      ABLE
      ora.scan1.vip
      1 ONLINE ONLINE anbobstb01 STABLE
      --------------------------------------------------------------------------------
      grid@anbobstb01:/home/grid> crsctl stat res -t -init
      --------------------------------------------------------------------------------
      Name Target State Server State details
      --------------------------------------------------------------------------------
      Cluster Resources
      --------------------------------------------------------------------------------
      ora.asm
      1 ONLINE ONLINE anbobstb01 Started,STABLE
      ora.cluster_interconnect.haip
      1 ONLINE ONLINE anbobstb01 STABLE
      ora.crf
      1 ONLINE ONLINE anbobstb01 STABLE
      ora.crsd
      1 ONLINE ONLINE anbobstb01 STABLE
      ora.cssd
      1 ONLINE ONLINE anbobstb01 STABLE
      ora.cssdmonitor
      1 ONLINE ONLINE anbobstb01 STABLE
      ora.ctssd
      1 ONLINE ONLINE anbobstb01 OBSERVER,STABLE
      ora.diskmon
      1 OFFLINE OFFLINE STABLE
      ora.drivers.acfs
      1 ONLINE ONLINE anbobstb01 STABLE
      ora.evmd
      1 ONLINE ONLINE anbobstb01 STABLE
      ora.gipcd
      1 ONLINE ONLINE anbobstb01 STABLE
      ora.gpnpd
      1 ONLINE ONLINE anbobstb01 STABLE
      ora.mdnsd
      1 ONLINE ONLINE anbobstb01 STABLE
      ora.storage
      1 ONLINE ONLINE anbobstb01 STABLE
      --------------------------------------------------------------------------------


      grid@anbobstb01:/home/grid> ocrdump tmp/ocr.dmp
      PROT-310: Not all keys were dumped due to permissions.
      grid@anbobstb01:/home/grid> vi tmp/ocr.dmp


      [SYSTEM.ASM.CREDENTIALS.USERS.CRSUSER__ASM_001]
      ORATEXT : bb15e951dcbc4fc2ff3aec3bfe1f0424:grid --credentials is exist
      SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_NONE, USER_NAME : grid, GROUP_NAME : oinstall}


      grid@anbobstb01:~$oifcfg getif
      bond0 133.96.43.0 global public
      bond1 192.168.43.0 global cluster_interconnect,asm


      grid@anbobstb01:/oracle/app/12.2.0/grid/bin> ps -ef|grep lsnr
      grid 1411 1 0 2018 ? 00:01:53 oracle/app/12.2.0/grid/bin/tnslsnr LISTENER_SCAN1 -no_crs_notify -inherit
      grid 6585 1 0 2018 ? 00:00:21 oracle/app/12.2.0/grid/bin/tnslsnr listener_dg -inherit
      grid 25826 19172 0 10:57 pts/3 00:00:00 grep --color=auto lsnr
      grid 72783 1 0 2018 ? 00:02:01 oracle/app/12.2.0/grid/bin/tnslsnr MGMTLSNR -no_crs_notify -inherit
      grid 78242 1 0 Feb12 ? 00:00:05 oracle/app/12.2.0/grid/bin/tnslsnr LISTENER -no_crs_notify -inherit
      grid 80521 1 0 Feb12 ? 00:00:35 oracle/app/12.2.0/grid/bin/tnslsnr ASMNET1LSNR_ASM -no_crs_notify -inherit


      grid@anbobstb01:/oracle/app/12.2.0/grid/bin> lsnrctl status ASMNET1LSNR_ASM
      LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 13-FEB-2019 10:57:43
      Copyright (c) 1991, 2016, Oracle. All rights reserved.
      Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=ASMNET1LSNR_ASM)))
      STATUS of the LISTENER
      ------------------------
      Alias ASMNET1LSNR_ASM
      Version TNSLSNR for Linux: Version 12.2.0.1.0 - Production
      Start Date 12-FEB-2019 18:21:02
      Uptime 0 days 16 hr. 36 min. 40 sec
      Trace Level off
      Security ON: Local OS Authentication
      SNMP OFF
      Listener Parameter File /oracle/app/12.2.0/grid/network/admin/listener.ora
      Listener Log File /oracle/app/grid/diag/tnslsnr/anbobstb01/asmnet1lsnr_asm/alert/log.xml
      Listening Endpoints Summary...
      (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=ASMNET1LSNR_ASM)))
      (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.43.33)(PORT=1526)))
      The listener supports no services
      The command completed successfully


      grid@anbobstb01:/oracle/app/grid/diag/tnslsnr/anbobstb01/asmnet1lsnr_asm/trace> vi asmnet1lsnr_asm.log


      2019-02-13T10:57:16.203184+08:00
      Incoming connection from 192.168.43.34 rejected
      13-FEB-2019 10:57:16 * 12546
      TNS-12546: TNS:permission denied
      TNS-12560: TNS:protocol adapter error
      TNS-00516: Permission denied
        
      grid@anbobstb01:/oracle/app/grid/diag/tnslsnr/anbobstb01/asmnet1lsnr_asm/trace> telnet 192.168.43.33 1526
      Trying 192.168.43.33...
      Connected to 192.168.43.33.
      Escape character is '^]'.
      Connection closed by foreign host.


      注意:
      实例1上看当前使用是没问题的, 但是上面运行的ASM listener没有服务, 使用telnet 发现很快会被拒绝, 检查iptables 没有限制,使用tcpdump 发现是监听进程发出的reset package. 如果当前的ASM Listener没有服务,那么Flex ASM 集群间就没有办法通信。跟监听连接相关的限制可能是sqlnet.ora.

      grid@anbobstb01:/oracle/app/12.2.0/grid/network/admin> vi sqlnet.ora
      NAMES.DIRECTORY_PATH= (TNSNAMES,EZCONNECT)

      ADR_BASE = /oracle/app/grid
      TCP.VALIDNODE_CHECKING=yes
      TCP.INVITED_NODES=(...)


      注意:
      发现果然有sqlnet.ora中配置白名单,但是sqlnet.ora文件是从primary database复制过来的, 而primary和standby的Private network(ASM network) 不是一个子网段,所以standby side的白名单中并没有ASM network, 而没有服务。


      解决方法

      解决起来就简单了,在sqlnet.ora中增加ASM network的网段值。 这里提醒下,以后增长监听白名单,记的除了前端应用IP,还要加PUBLIC NETWORK, PRIVATE NETWORK, SCAN IP, ASM NETWORK..


      原创:张维照

      (点击“阅读原文”查看原文)


      资源下载

      关注微信:数据和云(OraNews)回复关键字获取

      2018DTCC , 数据库大会PPT

      2018DTC,2018 DTC 大会 PPT

      ENMOBK《Oracle性能优化与诊断案例》

      DBALIFE ,“DBA 的一天”海报

      DBA04 ,DBA 手记4 电子书

      122ARCH ,Oracle 12.2体系结构图

      2018OOW ,Oracle OpenWorld 资料

      产品推荐

      云和恩墨Bethune Pro企业版,集监控,巡检,安全于一身,你的专属数据库实时监控和智能巡检平台,漂亮的不像实力派,你值得拥有!



      云和恩墨zData一体机现已发布超融合版本和精简版,支持各种简化场景部署,零数据丢失备份一体机ZDBM也已发布,欢迎关注。



      最后修改时间:2020-05-08 00:18:19
      文章转载自张维照,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

      评论