最近刚好在复习文件头相关的知识,刚好把dump相关的内容梳理了一下,这次主要针对hanganalyze和dump systemstate分析死锁的过程做个记录。
1、数据库死锁
开启两个会话,模拟数据库死锁操作。
--session 1,进行update object_id=101的记录,不提交
SQL> show user;
USER is "DC"
SQL> select count(*) from test;
COUNT(*)
----------
86307
SQL> select OBJECT_NAME from test where object_id=101;
OBJECT_NAME
--------------------------------------------------------------------------------
FIXED_OBJ$
SQL> update /* session 1 */ test set object_name='dc' where object_id=101;
1 row updated.
SQL>
--session 2,进行update object_id=101的记录,不提交
[oracle@db01 ~]$ dc
SQL*Plus: Release 11.2.0.4.0 Production on Fri Jun 6 14:24:47 2025
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> update /* session 2 */ test set object_name='hp' where object_id=101;
可以看到,session 2在进行update操作后,因为session 1没有提交,一直就处于hang住的状态,如果连接数和并发足够,数据库系统可能也会被hang住没法使用。
2、hanganalyze dump
现在要分析数据库目前hang住的原因,进行hanganalyze dump操作。如果此时数据库无法登录,可以通过“sqlplus -prelim / as sysdba”进行登录操作。
--hanganalyze 级别
10 Dump all processes (IGN state)
5 Level 4 + Dump all processes involved in wait chains (NLEAF state)
4 Level 3 + Dump leaf nodes (blockers) in wait chains (LEAF,LEAF_NW,IGN_DMP state)
3 Level 2 + Dump only processes thought to be in a hang (IN_HANG state)
1-2 Only HANGANALYZE output, no process dump at all
SQL> oradebug setmypid
Statement processed.
SQL> oradebug hanganalyze 3
Hang Analysis in /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_30882.trc
SQL> exit
--dump日志如下
[oracle@db01 ~]$ more /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_30882.trc
*** 2025-06-06 15:03:44.078
Processing Oradebug command 'hanganalyze 3'
*** 2025-06-06 15:03:44.131
===============================================================================
HANG ANALYSIS:
instances (db_name.oracle_sid): orcl.orcl
oradebug_node_dump_level: 3
analysis initiated by oradebug
os thread scheduling delay history: (sampling every 1.000000 secs)
0.000000 secs at [ 15:03:43 ]
NOTE: scheduling delay has not been sampled for 0.587238 secs 0.000000 secs from [ 15:03:39 - 15:03:44 ], 5 sec avg
0.000000 secs from [ 15:02:44 - 15:03:44 ], 1 min avg
0.000000 secs from [ 14:58:44 - 15:03:44 ], 5 min avg
vktm time drift history
09:52:22.682 HR fwd drift by 1075918 us at 1749174742681732
09:52:26.334 HR fwd drift by 3653060 us at 1749174746334792
09:52:26.334 LR time stalled at 1749174746
09:52:27.335 LR stall ended at 1749174747 drift 1 us
===============================================================================
Chains most likely to have caused the hang:
[a] Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'
Chain 1 Signature Hash: 0x38c48850
===============================================================================
Non-intersecting chains:
-------------------------------------------------------------------------------
Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 1 (orcl.orcl)
os id: 28050
process id: 20, oracle@db01 (TNS V1-V3)
session id: 12
session serial #: 827
}
is waiting for 'enq: TX - row lock contention' with wait info:
{
p1: 'name|mode'=0x54580006
p2: 'usn<<16 | slot'=0x2000e
p3: 'sequence'=0x81d
time in wait: 20.313051 sec
timeout after: never
wait id: 122
blocking: 0 sessions
current sql: update /* session 2 */ test set object_name='hp' where object_id=101
short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+160<-ksl
iwat()+2022<-kslwaitctx()+163<-ksqcmi()+2848<-ksqgtlctx()+3501<-ksqgelctx()+557<-ktuGetTxForXid()+131<-ktcwit1()+336<-kdddgb()+8364<-kdusru
()+461<-updrowFastPath()+1209<-qerupFetch()+2366<-updaul()+1321<-updThreePhaseExe()+318<-updexe()+418<-opiexe()+10378<-kpoal8()+2118<-opiod
r()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+26
wait history:
* time between current wait and wait #1: 0.000685 sec
1. event: 'SQL*Net message from client'
time waited: 1 min 9 sec
wait id: 121 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
* time between wait #1 and #2: 0.000004 sec
2. event: 'SQL*Net message to client'
time waited: 0.000001 sec
wait id: 120 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
* time between wait #2 and #3: 0.000021 sec
3. event: 'log file sync'
time waited: 0.000372 sec
wait id: 119 p1: 'buffer#'=0x75c
p2: 'sync scn'=0x381858
}
and is blocked by
=> Oracle session identified by:
{
instance: 1 (orcl.orcl)
os id: 28307
process id: 21, oracle@db01 (TNS V1-V3)
session id: 762
session serial #: 11101
}
which is waiting for 'SQL*Net message from client' with wait info:
{
p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
time in wait: 35.218218 sec
timeout after: never
wait id: 42
blocking: 1 session
current sql: <none>
short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-read()+14<-ntpfprd()+117<-nsbasic_brc
()+376<-nsbrecv()+69<-nioqrc()+495<-opikndf2()+978<-opitsk()+831<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<
-ssthrdmain()+265<-main()+201<-__libc_start_main()+245
wait history:
* time between current wait and wait #1: 0.000004 sec
1. event: 'SQL*Net message to client'
time waited: 0.000001 sec
wait id: 41 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
* time between wait #1 and #2: 0.004361 sec
2. event: 'SQL*Net message from client'
time waited: 49.733491 sec
wait id: 40 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
* time between wait #2 and #3: 0.000004 sec
3. event: 'SQL*Net message to client'
time waited: 0.000001 sec
wait id: 39 p1: 'driver id'=0x62657100
p2: '#bytes'=0x1
}
Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'
Chain 1 Signature Hash: 0x38c48850
-------------------------------------------------------------------------------
===============================================================================
Extra information that will be dumped at higher levels:
[level 4] : 1 node dumps -- [LEAF] [LEAF_NW]
[level 5] : 1 node dumps -- [NO_WAIT] [INVOL_WT] [SINGLE_NODE] [NLEAF] [SINGLE_NODE_NW]
State of ALL nodes
([nodenum]/cnode/sid/sess_srno/session/ospid/state/[adjlist]):
[11]/1/12/827/0xf2c6e940/28050/NLEAF/[761]
[761]/1/762/11101/0xf1653540/28307/LEAF/分析trc日志可以发现,阻塞会话的源头session id: 762,session serial #: 11101,在系统中kill掉这个会话,session 2的会话不再hang住。
3、systemstate分析
再看下dump systemstate的结果分析
SQL> oradebug setmypid
Statement processed.
SQL> oradebug dump systemstate 10
Statement processed.
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_32013.trc
--使用awk工具进行格式化
[oracle@db01 ~]$ awk -f ass109.awk /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_32013.trc
Starting Systemstate 1
.................................
Ass.Awk Version 1.0.9 - Processing /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_32013.trc
System State 1
~~~~~~~~~~~~~~~~
1:
2: 0: waiting for 'pmon timer'
3: 0: waiting for 'rdbms ipc message'
4: 0: waiting for 'VKTM Logical Idle Wait'
5: 0: waiting for 'rdbms ipc message'
6: 0: waiting for 'DIAG idle wait'
7: 0: waiting for 'rdbms ipc message'
8: 0: waiting for 'DIAG idle wait'
9: 0: waiting for 'rdbms ipc message'
10: 0: waiting for 'rdbms ipc message'
11: 0: waiting for 'rdbms ipc message'
12: 0: waiting for 'rdbms ipc message'
13: 0: waiting for 'smon timer'
14: 0: waiting for 'rdbms ipc message'
15: 0: waiting for 'rdbms ipc message'
16: 0: waiting for 'rdbms ipc message'
17:
18:
19: 0: waiting for 'rdbms ipc message'
20: 0: waiting for 'SQL*Net message from client'
21: 0: waiting for 'SQL*Net message from client'
22: 0: waiting for 'Space Manager: slave idle wait'
23: 0: waiting for 'rdbms ipc message'
24: 0: waiting for 'rdbms ipc message'
25: 0: waiting for 'rdbms ipc message'
26: 0: waiting for 'rdbms ipc message'
27: 0: waiting for 'Streams AQ: qmn coordinator idle wait'
28: 0: waiting for 'rdbms ipc message'
29: 0: waiting for 'enq: TX - row lock contention'[Enqueue TX-00050005-0000081E]
30: 0: waiting for 'SQL*Net message from client'
31:
38: 0: waiting for 'Streams AQ: qmn slave idle wait'
39: 0: waiting for 'Streams AQ: waiting for time management or cleanup tasks'
Blockers
~~~~~~~~
Above is a list of all the processes. If they are waiting for a resource
then it will be given in square brackets. Below is a summary of the
waited upon resources, together with the holder of that resource.
Notes:
~~~~~
o A process id of '???' implies that the holder was not found in the
systemstate.
Resource Holder State
Enqueue TX-00050005-0000081E 20: 0: waiting for 'SQL*Net message from client'
Object Names
~~~~~~~~~~~~
Enqueue TX-00050005-0000081E
20142 Lines Processed.
可以看到格式化出来的结果,pid 29请求Enqueue TX-00050005-0000081E被pid 20阻塞
SQL> select s.sid,s.serial# from v$session s,v$process p where s.PADDR=p.ADDR and p.pid=20;
SID SERIAL#
---------- ----------
12 831
这里是重新登入会话进行的模拟,所以和上面章节的hanganalyze结果不一致 ,kill掉源头或者commit之后,session 2恢复正常「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




