这是一个典型的性能低下的数据库,Top 5等待事件都值得关注。注意到这个数据库中Latch Free是最严重的竞争。
Top 5 Wait Events ~~~~~~~~~~~~~~~~~ Wait % Total Event Waits Time (cs) Wt Time -------------------------------------------- ------------ ------------ ------- latch free 149,015 14,299,942 53.72 db file scattered read 1,781,670 2,793,591 10.49 buffer busy waits 45,001 2,386,174 8.96 log file switch (checkpoint incomplete) 19,527 1,991,844 7.48 enqueue 6,523 1,809,849 6.80 -------------------------------------------------------------
由于Latch Free是一个汇总等待事件,我们需要从v$latch视图获得具体的Latch竞争主要是由哪些Latch引起的。在Statspack Report中同样存在这样一部分数据:
Latch Sleep breakdown for DB: HHCIMS Instance: hhcims Snaps: 154 -174 -> ordered by misses desc Get Spin & Latch Name Requests Misses Sleeps Sleeps 1->4 -------------------------- -------------- ------- ------ ----------------------- cache buffers chains 5,186,232,773 617,065 80,524 610224/6250/158/433/0 library cache 108,899,100 144,642 61,327 99020/31173/13558/891/0 row cache objects 83,115,714 46,477 1,610 44892/1565/17/3/0 shared pool 7,091,076 7,403 3,669 5426/420/1464/93/0 cache buffers lru chain 18,885,002 6,405 400 6024/369/9/3/0 enqueues 41,504,627 1,559 110 1486/59/6/8/0 redo writing 370,920 1,524 178 1347/176/1/0/0 redo allocation 4,170,274 1,184 95 1098/78/7/1/0
注意到cache buffers chains正是主要的Latch竞争。实际上,这个数据库在繁忙的时段,基本上处于停顿状态,大量进程等待latch free竞争,这些获得Session的等待事件可以很容易地从v$session_wait视图中查询得到:
SID SEQ# EVENT ---------- ---------- ----------------------------------------------- 4 14378 latch free 43 1854 latch free 176 977 latch free 187 4393 latch free 111 8715 latch free 209 48534 latch free 379 1008 latch free 455 1974 latch free 478 24713 latch free 388 444 latch free 369 855 latch free 264 567 enqueue 438 563 enqueue 355 563 enqueue 531 567 enqueue 513 819 enqueue 612 55 refresh controlfile command
如果需要具体确定热点对象,可以从v$latch_children中查询具体的子Latch信息,以下是一个生产环境中的部分信息摘录:
SQL> SELECT * 2 FROM (SELECT addr, child#, gets, misses, sleeps, immediate_gets igets, 3 immediate_misses imiss, spin_gets sgets 4 FROM v$latch_children 5 WHERE NAME = 'cache buffers chains' 6 ORDER BY sleeps DESC) 7 WHERE ROWNUM < 11; ADDR CHILD# GETS MISSES SLEEPS IGETS IMISS SGETS ---------------- ---------- ---------- ---------- ---------- ---------- ---------- ----- 0000000406F24860 1093 1759617932 518883201 2587204 9765719 32224 0 0000000406F180E0 1081 509119236 13183658 623513 9720398 16398 0 0000000406F19180 1082 3264036800 7000899 336205 9669611 10077 0 0000000406E7F500 934 3252332119 40742230 218898 10945581 43083 0 00000004072A4A30 1757 1356574143 8959581 134591 7165568 17482 0 0000000406B6A4D0 175 1252888934 873697 103414 14699959 5887 0 0000000406F1A220 1083 2190392429 2054269 98862 9692515 5444 0 0000000406E462A0 879 840225741 4716843 79799 10789540 5743 0 0000000406BAAF70 237 2697876674 7349818 78180 8168375 3124 0 0000000406B9A470 221 3051410653 6924588 69161 9942446 4118 0 10 rows selected.
X$BH中还存在另外一个关键字段HLADDR,即Hash Chain Latch Address,这个字段可以和v$latch_child.addr进行关联,这样就可以把具体的Latch竞争和数据块关联起来,再结合dba_extents视图,就可以找到具体的热点竞争对象。
到具体热点竞争对象之后,可以进一步地结合v$sqlarea或者v$sqltext视图,找到频繁操作这些对象的SQL,对其进行优化,就可以缓解或解决热点块竞争的问题。通过以下查询可以实现以上的思想,获取当前持有最热点数据块的Latch及Buffer信息:
SQL> SELECT b.addr, a.ts#, a.dbarfil, a.dbablk, a.tch, b.gets, b.misses, b.sleeps 2 FROM (SELECT * 3 FROM (SELECT addr, ts#, file#, dbarfil, dbablk, tch, hladdr 4 FROM x$bh 5 ORDER BY tch DESC) 6 WHERE ROWNUM < 11) a, 7 (SELECT addr, gets, misses, sleeps 8 FROM v$latch_children 9 WHERE NAME = 'cache buffers chains') b 10 WHERE a.hladdr = b.addr 11 / ADDR TS# DBARFIL DBABLK TCH GETS MISSES SLEEPS ---------------- --- ---------- ---------- ---- ---------- ---------- ---------- 0000000406AF3670 42 51 209612 216 2068740950 1396285 18734 0000000406B14C70 42 51 209644 216 1840436663 1671667 20622 0000000406B36270 42 51 209676 216 3447942770 2985525 29424 0000000406B57870 42 51 209708 216 1902183007 1913809 18806 0000000406B78E70 42 51 209740 216 2035257361 2314202 21304 0000000406B9A470 42 51 209772 216 3061630199 6932017 69275 0000000406E962C0 41 50 558579 217 723102299 32807 1112 00000004071BDF70 41 50 190251 217 3409957883 657572 9196 00000004071F9310 41 50 587887 217 1492232461 88013 4418 000000040721A710 32 33 4703 224 756763990 69890 1470 10 rows selected.
利用前面提到的SQL,可以找到这些热点Buffer的对象信息:
SQL> SELECT distinct e.owner, e.segment_name, e.segment_type 2 FROM dba_extents e, 3 (SELECT * 4 FROM (SELECT addr, ts#, file#, dbarfil, dbablk, tch 5 FROM x$bh 6 ORDER BY tch DESC) 7 WHERE ROWNUM < 11) b 8 WHERE e.relative_fno = b.dbarfil 9 AND e.block_id <= b.dbablk 10 AND e.block_id + e.blocks > b.dbablk; OWNER SEGMENT_NAME SEGMENT_TYPE --------------- ------------------------------ -------------------- BOSSV2 HYUIDX_MOBILESEG INDEX BOSSV2 HY_PLATFORM TABLE BOSSV2 MAIDX_BATTASK_USERLIST_DATSEND INDEX PARTITION BOSSV2 MAIDX_BATTASK_USERLIST_STATUS INDEX PARTITION HSQUERY HSOLAP_RPTJOB TABLE
结合v$sqltext或v$sqlarea,可以找到操作这些对象的相关SQL,让我们继续查询:
SQL> break on hash_value skip 1 SQL> SELECT /*+ rule */ hash_value,sql_text 2 FROM v$sqltext 3 WHERE (hash_value, address) IN ( 4 SELECT a.hash_value, a.address 5 FROM v$sqltext a, 6 (SELECT DISTINCT a.owner, a.segment_name, a.segment_type 7 FROM dba_extents a, 8 (SELECT dbarfil, dbablk 9 FROM (SELECT dbarfil, dbablk 10 FROM x$bh 11 ORDER BY tch DESC) 12 WHERE ROWNUM < 11) b 13 WHERE a.relative_fno = b.dbarfil 14 AND a.block_id <= b.dbablk 15 AND a.block_id + a.blocks > b.dbablk) b 16 WHERE upper(a.sql_text) LIKE '%' || b.segment_name || '%' 17 AND b.segment_type = 'TABLE') 18 ORDER BY hash_value, address, piece 19 / HASH_VALUE SQL_TEXT ---------- ---------------------------------------------------------------- 175397557 select sum(snum) from HS_UNISMS_ORDERLOG_XUDING f where f.snum>: 1 and f.c_id=:2 180602532 select * from HS_UNISMS_PAYLOG_99DVD t where status=:1 and t.re pterrorcode=9999 order by t.paylog_id 388500828 select * from HS_UNISMS_PAYLOG_99DVD t where status=:1 and t.re pterrorcode=9999 and t.paytime >=to_date('2006-05-14 00:00:00', 'yyyy-mm-dd hh24:mi:ss') and t.paytime <=to_date('2006-05-16 00 :00:00','yyyy-mm-dd hh24:mi:ss') order by t.paylog_id 724740081 select HS_SCSMS_ORDERLOG_SEQ.NEXTVAL from DUAL 773054905 select count(*) from HS_UNISMS_PAYLOG_99DVD t where t.paytime>=t o_date(:1,'yyyy-mm-dd hh24:mi:ss') and t.paytime<=to_date(:2,'yy yy-mm-dd hh24:mi:ss') and t.status=0 1071368535 select count(snum) from HS_UNISMS_ORDERLOG_XUDING f where f.snum >:1 and f.c_id=:2 1109655340 insert into UM_PUT_M_ORDER values (UM_M_ORDER_SEQ.NEXTVAL,:1,:2, :3,:4,sysdate,:5,:6,:7,:8,:9,:10,null,:11,:12,:13) 1451196467 select HS_ZJSMS_ORDERLOG_SEQ.NEXTVAL from DUAL 1542951187 SELECT * FROM UM_PUT_M_ORDER 1669311552 insert into hm_user_info(USER_ID,PASSWORD,user_add2,user_seq_id) values(:1,:2,:3,HM_USER_INFO_SEQ.NEXTVAL) 1669935485 insert into HS_UNISMS_PAYLOG_99DVD values (HS_UNISMS_PAYLOG_99DV D_seq.NEXTVAL,:1,sysdate,:2,:3,:4,null,9999,9999,9999,9999,00000 0,null,null) 1697705163 insert into hm_user_info(USER_ID,M_PHONE,password,user_seq_id) v alues(:1,:2,:3,HM_USER_INFO_SEQ.NEXTVAL) 1866895271 select * from UM_PUT_M_ORDER 1979706000 select mobile,c_id,status,snum from HS_UNISMS_ORDERLOG_XUDING t where t.status=:1 2152157475 select HS_UNISMS_ORDERLOG_SEQ.NEXTVAL from DUAL 2248706418 select * from UM_PUT_M_ORDER where MOBILEID=:1 and ORDERID=:2 or der by datetime desc 2336420675 select HS_JSSMS_ORDERLOG_SEQ.NEXTVAL from DUAL 2963798180 select * from HS_UNISMS_PAYLOG_99DVD t where status=:1 and t.re pterrorcode=9999 and t.paytime <=to_date('2006-05-15 12:00:00', 'yyyy-mm-dd hh24:mi:ss') order by t.paylog_id 。。。。。。。。。 83 rows selected.
找到这些SQL之后,剩下的问题就简单了,可以通过优化SQL减少数据的访问,避免或优化某些容易引起争用的操作(如connect by等操作)来减少热点块竞争。