问题描述
library cache: mutex X等待事件是当一个会话以 exclusive mode持有library cache mutex时,另一个会话当前无论何时申请此mutex时必须等待其释放。很多时候对 library cache做不同的操作时都需要申请一个mutex, 所以最重要的是确认mutex的位置”location”, 该位置有助于分析该类等待事件的原因。
对应参数:
P1 = “idn” = Unique Mutex Identifier
P2 = Mutex “value” = in order to find the mutex value, read note 1298015.1
P3 = “where” = location in code (internal identifier) where mutex is being waited for
P1 is idn – can be used for finding the cursor related to mutex
For library cache: mutex X
1. P1 represents library cache hash bucket number (if idn <= 131072)
2. P1 represents hash value of the library cache object under protection (if idn > 131072)
mutex的问题排查较为复杂,通常一个现象有多个原因,并且mutex问题又会引起其它级连问题。常见原因:
1. 频繁的hard parse
2. High sql version count,需要检查一个很长的chain of versions
3. 引为某些原因导致的Invalidations and reloads
4. shared pool配置过小
5. cursor_sharing=similar和session_cached_cursors配置不当
6. BUGs
专家解答
类问题时通常需要引集的信息:
• AWR Report
• ASH Report
• Hanganalyze trace
• Systemstate dump (contain call stack level)
系统级可用的相关视图:GV$MUTEX_SLEEP 和GV$MUTEX_SLEEP_HISTORY
分析该事件的相关SQL:
-- which library cache objects are the target of the operation. select * from (select case when (kglhdadr = kglhdpar) then 'Parent' else 'Child '||kglobt09 end cursor, kglhdadr ADDRESS, substr(kglnaobj,1,20) NAME, kglnahsh HASH_VALUE, kglobtyd TYPE, kglobt23 LOCKED_TOTAL, kglobt24 PINNED_TOTAL, kglhdexc EXECUTIONS, kglhdnsp NAMESPACE from x$kglob -- where kglobtyd != 'CURSOR' order by kglobt24 desc) where rownum <= 10; -- to find blocker session SELECT count (*), to_number(substr(to_char(rawtohex(p2raw)), 1, 8),'XXXXXXXX') blocking_sid FROM v$session WHERE event = 'library cache: mutex X' group by to_number(substr(to_char(rawtohex(p2raw)),1, 8), 'XXXXXXXX'); -- to find mutex from ash select event, p1, count(1) from v$active_session_history where sample_time > (sysdate - 20/1440) and event = 'library cache: mutex X' group by event, p1 order by 3; -- if idn > 131072, the P1 is the object the mutex is against.For the topmost P1 run: select KGLNAOBJ, KGLNAOWN, KGLHDNSP, KGLOBTYP from x$kglob where KGLNAHSH= {value of P1}
最近遇到过一个案例数据库几个session 在执行相同的insert values sql时library cache: mutex X争用严重.
SQL> select child_number,count(*) from v$sql where sql_id='b92jmb1qngsyd' group by child_number; CHILD_NUMBER COUNT(*) ------------ ---------- 0 47 1 47 2 47 3 47 4 47 5 46 6 46 7 47 8 48 9 46 10 47 11 47 12 47 13 47 14 47 15 48 16 47 17 47 ... 83 47 84 47 85 47 86 47 87 47 88 47 89 47 90 47 91 47 92 47 93 46 94 47 95 47 96 47 97 47 98 47 99 47 100 rows selected. sql> @no_shared SQL_ID NONSHARED_REASON COUNT(*) ------------- ----------------------------- ---------- b92jmb1qngsyd BIND_MISMATCH 4730 b92jmb1qngsyd PURGED_CURSOR 2 b92jmb1qngsyd HASH_MATCH_FAILED 1 b92jmb1qngsyd BIND_LENGTH_UPGRADEABLE 952 select /*+rule*/m.position,m.bind_name , m.max_length,count(*) child_cursor_count from v$sql s, v$sql_bind_metadata m where s.sql_id = 'b92jmb1qngsyd' and s.child_address = m.address group by m.position,m.bind_name , m.max_length order by 1, 2; -- 确认存在部分变量varchar长度区间问题 SQL> select mutex_type,location_id,location,sleeps from x$mutex_sleep where location_id=85; MUTEX_TYPE LOCATION_ID LOCATION SLEEPS -------------------------------- ----------- ---------------------------------------- ---------- Library Cache 85 kgllkdl1 85 15001254
当时有做SSD 但级别不够(10), 确认了blocker 是no in wait, 并且是on cpu, 执行的是相同的insert sql, 该sql是应用同步数据使用执行频率较高。
当时也有做errorstack
----- Abridged Call Stack Trace ----- 146297 ksedsts()+544<-kjzdssdmp()+400<-kjzdpcrshnfy()+512<-kstdmp()+416<-dbkedDefDump()+6032<-ksedmp()+64<-ksdxfdmp()+1360< -ksdxcb()+3216<-sspuser()+688<-<kernel><-kglrdtin()+1121<-kglrdti()+80<-kksAllocCursorStat()+1152<-kksLoadChild()+14400<-kkslod()+112<-kglobld()+1872 146298 <-kglobpn()+1744<-kglpim()+832<-kglpin()+2496<-kxsGetRuntimeLock()+1840 146299 ----- End of Abridged Call Stack Trace -----
以call stack和event为关键字,在MOS中发现一篇Session Spin on Kglrdtin Holding ‘Library Cache: Mutex X’ (文档 ID 2219897.1)
和当前的版本现象较为相似因当前的平台hpux无相关one off patch,且无升级11.2.0.4计划记划,这里只记录一下该问题。
— another case
SQL> select count(*) from v$sql where sql_id='4uqp4jczbf1qt'; COUNT(*) ---------- 21372 SQL> select version_count from v$sqlarea where sql_id='4uqp4jczbf1qt'; VERSION_COUNT ------------- 74 SQL> @no_shared Enter value for sqlid: 4uqp4jczbf1qt SQL_ID NONSHARED_REASON COUNT(*) ------------- ----------------------------- ---------- 4uqp4jczbf1qt BIND_MISMATCH 21428 4uqp4jczbf1qt HASH_MATCH_FAILED 290 4uqp4jczbf1qt BIND_LENGTH_UPGRADEABLE 17489 SQL> @pd obso Show all parameters and session values from x$ksppi/x$ksppcv... INDX I_HEX NAME VALUE DESCRIPTION ---------- ----- -------------------------------- ---------- ------------------------------------------------ 1723 6BB _cursor_obsolete_threshold 100 Number of cursors per parent before obsoletion. SQL> select /*+rule*/ m.position,m.bind_name , m.max_length,count(*) child_cursor_count 2 from v$sql s, v$sql_bind_metadata m 3 where s.sql_id = '4uqp4jczbf1qt' 4 and m.datatype=1 5 and s.child_address = m.address group by m.position,m.bind_name , m.max_length 6 order by 1, 2; POSITION BIND_NAME MAX_LENGTH CHILD_CURSOR_COUNT ---------- ------------------------------ ---------- ------------------ 1 V0 32 22192 2 V1 32 22192 7 V6 32 22192 8 V7 32 22192 9 V8 32 22192 10 V9 32 22192 12 V11 32 22192 13 V12 32 18682 13 V12 128 3188 13 V12 2000 322 16 V15 32 22192 17 V16 32 21977 17 V16 128 215 18 V17 32 22192 19 V18 32 22192 20 V19 32 22192 21 V20 32 22192 22 V21 32 22192 30 V29 32 22192 31 V30 32 22192 32 V31 32 22192 33 V32 32 22192 34 V33 32 22192 35 V34 32 22192 36 V35 32 22192 37 V36 32 22192 38 V37 32 22192 39 V38 32 20037 39 V38 128 2155 42 V41 32 22192 43 V42 32 22192 47 V46 32 22192
Event 10503 level <buffer length>
EVENT 10503 enables users to specify a character bind buffer length. Depending on the length used, the character binds in the child cursor can all be created using the same bind length, skipping bind graduation and keeping the child chain relatively small. This helps to alleviate a potential cursor-sharing problem related to graduated binds.
SQL> alter system set events='10503 trace name context forever, level 2000'; System altered. SQL> oradebug setmypid Statement processed. SQL> oradebug eventdump system 10503 trace name context forever, level 2000 10949 trace name context forever,level 1 28401 trace name context forever,level 1 SQL> select address,hash_value from v$sqlarea where sql_id='4uqp4jczbf1qt'; ADDRESS HASH_VALUE ---------------- ---------- C000001F574C4F90 1052182233 SQL> exec dbms_shared_pool.purge('C000001F574C4F90,1052182233','C',65); PL/SQL procedure successfully completed. SQL> select address,hash_value,last_load_time from v$sqlarea where sql_id='4uqp4jczbf1qt'; ADDRESS HASH_VALUE LAST_LOAD_TIME ---------------- ---------- ----------------- C000001F574C4F90 1052182233 20190211 15:00:02 SQL> select count(*) from v$sql where sql_id='4uqp4jczbf1qt'; COUNT(*) ---------- 22700 SQL> alter system flush shared_pool; System altered. SQL> select count(*) from v$sql where sql_id='4uqp4jczbf1qt'; COUNT(*) ---------- 0 SQL> select /*+rule*/ m.position,m.bind_name , m.max_length,count(*) child_cursor_count 2 from v$sql s, v$sql_bind_metadata m 3 where s.sql_id = '4uqp4jczbf1qt' 4 and m.datatype=1 5 and s.child_address = m.address group by m.position,m.bind_name , m.max_length 6 order by 1, 2; POSITION BIND_NAME MAX_LENGTH CHILD_CURSOR_COUNT ---------- ------------------------------ ---------- ------------------ 1 V0 2000 86 2 V1 2000 86 7 V6 2000 86 8 V7 2000 86 9 V8 2000 86 10 V9 2000 86 12 V11 2000 86 13 V12 2000 86 16 V15 2000 86 17 V16 2000 86 18 V17 2000 86 19 V18 2000 86 20 V19 2000 86 21 V20 2000 86 22 V21 2000 86 30 V29 2000 86 31 V30 2000 86 32 V31 2000 86 33 V32 2000 86 34 V33 2000 86 35 V34 2000 86 36 V35 2000 86 37 V36 2000 86 38 V37 2000 86 39 V38 2000 86 42 V41 2000 86 43 V42 2000 86 47 V46 2000 86