ORA-32701 error in alert.log M000 hang event ‘not in wait’ during flush AWR

张维照 2019-05-31


环境是一套11.2.0.3  2nodes RAC on hpux-ia31, alert中出现ora-32701 hangmgr错误, 从trace文件中发现是m000进程是mmon的辅助进程,用于flush AWR相关数据,有一个wait event: enq: WF – contention, 这也是flush AWR数据时相关的enqueue等待,但是blocker进程是not in wait, (另mmon hang是可以直接kill 该进程spid,  通常会在不重启实例的前掉下重启该进程)。



# db alert

Thread 2 advanced to log sequence 170773 (LGWR switch)
  Current log# 8 seq# 170773 mem# 0: /dev/tza_oravg02/rtza_redo08
Thu May 26 17:08:11 2016
Errors in file /oracle/app/oracle/diag/rdbms/ucisa/anbob2/trace/anbob2_dia0_29439.trc  (incident=720137):
ORA-32701: Possible hangs up to hang ID=0 detected
Incident details in: /oracle/app/oracle/diag/rdbms/ucisa/anbob2/incident/incdir_720137/anbob2_dia0_29439_i720137.trc
Thu May 26 17:08:12 2016
Sweep [inc][720137]: completed
System State dumped to trace file /oracle/app/oracle/diag/rdbms/ucisa/anbob2/incident/incdir_720137/anbob2_m001_22862_i720137_a.trc
DIA0 terminating blocker (ospid: 13964 sid: 5649 ser#: 7241) of hang with ID = 3
    requested by master DIA0 process on instance 1
    Hang Resolution Reason: Although the number of affected sessions did not
    justify automatic hang resolution initially, this previously ignored
    hang was automatically resolved.
    by terminating session sid: 5649 ospid: 13964
DIA0 successfully terminated session sid:5649 ospid:13964 with status 31.
Thu May 26 17:09:46 2016
Errors in file /oracle/app/oracle/diag/rdbms/ucisa/anbob2/trace/anbob2_dia0_29439.trc  (incident=720138):
ORA-32701: Possible hangs up to hang ID=0 detected
Incident details in: /oracle/app/oracle/diag/rdbms/ucisa/anbob2/incident/incdir_720138/anbob2_dia0_29439_i720138.trc
Thu May 26 17:09:47 2016
Sweep [inc][720138]: completed
Thu May 26 17:09:47 2016
Sweep [inc2][720138]: completed
Sweep [inc2][720137]: completed
System State dumped to trace file /oracle/app/oracle/diag/rdbms/ucisa/anbob2/incident/incdir_720138/anbob2_m003_23913_i720138_a.trc
DIA0 terminating blocker (ospid: 13964 sid: 5649 ser#: 7241) of hang with ID = 3
    requested by master DIA0 process on instance 1
    Hang Resolution Reason: Although the number of affected sessions did not
    justify automatic hang resolution initially, this previously ignored
    hang was automatically resolved.
   by terminating the process
DIA0 successfully terminated process ospid:13964.
Thu May 26 17:17:16 2016
Thread 2 advanced to log sequence 170774 (LGWR switch)
  Current log# 9 seq# 170774 mem# 0: /dev/tza_oravg02/rtza_redo09

# trace file: /oracle/app/oracle/diag/rdbms/ucisa/anbob2/trace/anbob2_dia0_29439.trc

Dump continued from file: /oracle/app/oracle/diag/rdbms/ucisa/anbob2/trace/anbob2_dia0_29439.trc
ORA-32701: Possible hangs up to hang ID=0 detected

========= Dump for incident 720137 (ORA 32701) ========
----- Beginning of Customized Incident Dump(s) -----
There are resolvable hangs on your system.  Hang Manger will
attempt to resolve these hangs.  Some information about these
is output below.  Complete information is available in an
incident trace file on instance 1.
If the hang is to be resolved by terminating the session or
process that is the root or victim of the hang, additional
information will be output on the local instance of that
session or process.  Below are the hangs for which resolution
will be attempted.

*** 2016-05-26 17:08:11.765
Resolvable Hangs in the System
                      Root       Chain Total               Hang               
   Hang Hang          Inst Root  #hung #hung  Hang   Hang  Resolution         
     ID Type Status   Num  Sess   Sess  Sess  Conf   Span  Action             
  ----- ---- -------- ---- ----- ----- ----- ------ ------ -------------------      3 HANG RSLNPEND    2  5649     2     2   HIGH GLOBAL Terminate Process  
  Hang Resolution Reason: Although the number of affected sessions did not
    justify automatic hang resolution initially, this previously ignored
    hang was automatically resolved.
      inst# SessId  Ser#     OSPID PrcNm Event
      ----- ------ ----- --------- ----- -----          1   6238 38599      3697  M000 enq: WF - contention
          2   5649  7241     13964  M000 not in wait
 Dumping process info of pid[212.13964] (sid:5649, ser#:7241)
    requested by master DIA0 process on instance 1.

*** 2016-05-26 17:08:11.765
Process diagnostic dump for oracle@qdtza2 (M000), OS id=13964,
pid: 212, proc_ser: 66, sid: 5649, sess_ser: 7241 
os thread scheduling delay history: (sampling every 1.000000 secs)
  0.000000 secs at [ 17:08:11 ]
    NOTE: scheduling delay has not been sampled for 0.103181 secs  0.000667 secs from [ 17:08:07 - 17:08:12 ], 5 sec avg
  0.003777 secs from [ 17:07:12 - 17:08:12 ], 1 min avg
  0.000976 secs from [ 17:03:12 - 17:08:12 ], 5 min avg
loadavg : 0.19 0.18 0.17                                                                                    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Swapinfo : 
        Avail = 380207.03Mb Used = 165771.44Mb
        Swap free = 214385.22Mb Kernel rsvd = 15198.44Mb
        Free Mem  = 83933.84Mb 
  F S      UID   PID  PPID  C PRI NI             ADDR   SZ            WCHAN    STIME TTY       TIME COMD
1401 R   oracle 13964     1 254 178 20 e000000d77ef2040 102092                - 17:00:09 ?         7:57 ora_m000_anbob2        <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Short stack dump: 
Process diagnostic dump actual duration=0.392000 sec
  (max dump time=15.000000 sec)

*** 2016-05-26 17:08:12.156
current sql: insert into wrh$_sql_bind_metadata   (snap_id, dbid,    sql_id, name, position, dup_position,    datatype, datatype_string,    character_sid, precision, scale, max_length)  SELECT /*+ ordered use_nl(bnd) index(bnd sql_id) */      :lah_snap_id, :dbid,      bnd.sql_id, name, position, dup_position,      datatype, dataty
                    SO: 0xc000002071d7c428, type: 2, owner: 0x0000000000000000, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
                     proc=0xc000002071d7c428, name=process, file=ksu.h LINE:12616 ID:, pg=0
                    (process) Oracle pid:212, ser:66, calls cur/top: 0xc000001fb321ec20/0xc000001fb31300e8
                              flags : (0x2) SYSTEM
                              flags2: (0x30),  flags3: (0x10) 
                              intr error: 0, call error: 0, sess error: 0, txn error 0
                              intr queue: empty
                      ksudlp FALSE at location: 0
  (post info) last post received: 0 0 80
              last post received-location: kji.h LINE:3418 ID:kjata: wake up enqueue owner
              last process to post me: c0000020b1d72b30 2 6
              last post sent: 0 0 26
              last post sent-location: ksa2.h LINE:285 ID:ksasnd
              last process posted by me: c000002071d60198 2 6
                      (latch info) wait_event=0 bits=0
                      Process Group: DEFAULT, pseudo proc: 0xc0000020a20d45a0
                      O/S info: user: oracle, term: UNKNOWN, ospid: 13964 
                      OSD pid info: Unix process pid: 13964, image: oracle@qdtza2 (M000)
                      SO: 0xc0000020505807f8, type: 10, owner: 0xc000002071d7c428, flag: INIT/-/-/0x00 if: 0x1 c: 0x1
                       proc=0xc000002071d7c428, name=FileOpenBlock, file=ksfd.h LINE:6448 ID:, pg=0
                      (FOB) c0000020505807f8 flags=2050 fib=c0000020c8afa058 incno=0 pending i/o cnt=0
                       fno=820 lblksz=16384 fsiz=1105919
				service name: SYS$BACKGROUND
                      Current Wait Stack:
                        Not in wait; last wait ended 7 min 48 sec ago           <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
                      There are 1 sessions blocked by this session.
                      Dumping one waiter:
                        inst: 1, sid: 6238, ser: 38599
                        wait event: 'enq: WF - contention'
                          p1: 'name|mode'=0x57460006
                          p2: '0'=0x46
                          p3: '0'=0x0
                        row_wait_obj#: 4294967295, block#: 0, row#: 0, file# 0         <<<<<<<<<<<<<<<<<<<<<<<<<<
                        min_blocked_time: 445 secs, waiter_cache_ver: 50989
                      Wait State:
                        fixed_waits=0 flags=0x20 boundary=0x0000000000000000/-1
                      Session Wait History:
                          elapsed time of 7 min 48 sec since last wait
                       0: waited for 'gc cr block 2-way'
                          =0x3, =0x115a3, =0x1
                          wait_id=5905 seq_num=5906 snap_id=1
                          wait times: snap=0.000424 sec, exc=0.000424 sec, total=0.000424 sec
                          wait times: max=infinite
                          wait counts: calls=1 os=1
                          occurred after 0.000712 sec of elapsed time
                       1: waited for 'gc current block 2-way'
                          =0x3, =0x13148, =0x1
                          wait_id=5904 seq_num=5905 snap_id=1
                          wait times: snap=0.000325 sec, exc=0.000325 sec, total=0.000325 sec
                          wait times: max=infinite
                          wait counts: calls=1 os=1
                          occurred after 0.001039 sec of elapsed time
                       2: waited for 'enq: WF - contention'
                          name|mode=0x57460006, 0=0x38, 0=0x0
                          wait_id=5903 seq_num=5904 snap_id=1
                          wait times: snap=0.000257 sec, exc=0.000257 sec, total=0.000257 sec
                          wait times: max=infinite
                          wait counts: calls=2 os=2
                          occurred after 1.078632 sec of elapsed time
                       3: waited for 'enq: WF - contention'
                          name|mode=0x57460006, 0=0x46, 0=0x0
                          wait_id=5902 seq_num=5903 snap_id=1
                          wait times: snap=0.000282 sec, exc=0.000282 sec, total=0.000282 sec
                          wait times: max=infinite
                          wait counts: calls=2 os=2
                          occurred after 0.002117 sec of elapsed time
                       4: waited for 'gc current block 2-way'
                          =0x123, =0x31410, =0x2000001
                          wait_id=5901 seq_num=5902 snap_id=1
                          wait times: snap=0.000276 sec, exc=0.000276 sec, total=0.000276 sec
                          wait times: max=infinite
                          wait counts: calls=1 os=1
                          occurred after 0.000529 sec of elapsed time
                       5: waited for 'gc current block 2-way'
                          =0x123, =0x71a2, =0x2000009
                          wait_id=5900 seq_num=5901 snap_id=1
                          wait times: snap=0.000141 sec, exc=0.000141 sec, total=0.000141 sec
                          wait times: max=infinite
                          wait counts: calls=1 os=1
                          occurred after 0.000231 sec of elapsed time

有个类似 Bug 20733574 : HANG DETECTED WHILE CREATING AWR SNAPSHOT, 但当时因为收集信息不足,开发未确认为bug;
还有一种解决方法是跳过收集与该表相关的数据(这个案例是wrh$_sql_bind_metadata记录的是SQL Bind Metadata),使用下面的命令:
alter system set “_awr_disabled_flush_tables” = ‘wrh$_sql_bind_metadata’;
该命令是动态的,不需要重启实例, 禁用多个表时参数用逗号分隔;

原理禁用了和上面表相关的统计信息flush ash to awr, 除了上述的bug ,还有其它一些原因时如空间不足、hang、也可以禁用,除了上面的禁用方法还有另一种禁用方法,先查询AWR Table Info (KEW layer)使用下面的SQL:

select table_id_kewrtb, table_name_kewrtb from x$kewrtb order by 1;

从上面的查询中得到table_id用于下面的level 值,比如我们这里表wrh$_sql_bind_metadata对应的是56

alter session set events ‘immediate trace name awr_flush_table_off level 56’;

后期如果想打开flushing to awr如果使用第一种方法就是把”_awr_disabled_flush_tables” reset掉;如果第二种方法禁用,打开使用

alter session set events ‘immediate trace name awr_flush_table_on level 56’;