暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Performance Tunning: cursor: mutex X & cursor: mutex S high wait in 12.2.0.1

原创 Anbob 2017-08-12
4069
朋友那一套12C R2多组户架构3节点RAC数据库反应数据库响应缓慢,找到我,因远程不方便,只拿到事后一份AWR,数据库出现较高的cursor: mutex X 和cursor: mutex S, 12.2目前的资料较少,这个问题个人感觉BUG的可能性大,但MOS中未发现匹配的NOTE.这里仅记录一下。
DB Name	DB Id	Unique Name	Role	Edition	Release	RAC	CDB
ANBOB 3056822707 ANBOB PRIMARY EE 12.2.0.1.0 YES YES

Snap Id Snap Time Sessions Cursors/Session Instances Pluggable Databases Open
Begin Snap: 3264 03-Aug-17 11:00:16 230 1.9 3 4
End Snap: 3265 03-Aug-17 12:00:04 1034 2.1 3 4
Elapsed: 59.79 (mins)
DB Time: 57,548.34 (mins)

Per Second Per Transaction Per Exec Per Call
DB Time(s): 962.4 248.5 9.51 9.73
DB CPU(s): 5.1 1.3 0.05 0.05
Background CPU(s): 0.6 0.1 0.01 0.00
Redo size (bytes): 190,081.8 49,081.9
Logical read (blocks): 369,483.3 95,406.1
Block changes: 1,458.0 376.5
Physical read (blocks): 3,537.2 913.4
Physical write (blocks): 1,765.4 455.8
Read IO requests: 126.6 32.7
Write IO requests: 89.7 23.2
Read IO (MB): 27.6 7.1
Write IO (MB): 13.8 3.6
IM scan rows: 0.0 0.0
Session Logical Read IM: 0.0 0.0
Global Cache blocks received: 3,082.4 795.9
Global Cache blocks served: 2,789.8 720.4
User calls: 98.9 25.5
Parses (SQL): 53.2 13.7
Hard parses (SQL): 10.3 2.7
SQL Work Area (MB): 11.0 2.9
Logons: 1.7 0.4
Executes (SQL): 101.2 26.1
Rollbacks: 0.1 0.0
Transactions: 3.9





























































































EventWaitsTotal Wait Time (sec)Avg Wait% DB timeWait Class
cursor: mutex X20,558,7133.3M159.05ms94.7Concurrency
cursor: mutex S14,188,12385.5K6.03ms2.5Concurrency
enq: TX - row lock contention21829.9K137.16 s0.9Application
DB CPU18.4K0.5
gc buffer busy acquire4,853,4643703.3763.02us0.1Cluster
gc cr block busy2,036,1722445.81.20ms0.1Cluster
gc cr block 2-way4,170,5401718.3412.01us0Cluster
inactive session1,3791379.81000.60ms0Other
gc cr block 3-way2,071,6641102.1531.98us0Cluster
gc cr multi block mixed551,888554.51.00ms0Cluster

time model































































Statistic NameTime (s)% of DB Time% of Total CPU Time
parse time elapsed3,401,559.9298.51
failed parse elapsed time2,578,581.0974.68
sql execute elapsed time49,749.931.44
DB CPU18,373.740.5390.26
connection management call elapsed time393.190.01
hard parse elapsed time233.070.01
hard parse (sharing criteria) elapsed time75.110
PL/SQL compilation elapsed time11.990
sequence load elapsed time4.250

SQL ordered by Version Count



  • Only Statements with Version Count greater than 20 are displayed





































































































Version CountExecutionsSQL IdSQL ModulePDB NameSQL Text
7,87320f0h5rpzmhju11QZDB3PDBselect SYS_CONTEXT('USERENV', ...
3,225awxuv71j1wzbhXXHCPDBselect distinct s.sowner, s.vn...
2,666f0h5rpzmhju11XXHCPDBselect SYS_CONTEXT('USERENV', ...
2,187gtuz08rcbk69yXXHCPDBselect distinct s.sowner, s.vn...
1,4667n9pz49qw4f39XXHCPDBselect decode(upper(failover_m...
1,0389zg9qd9bm4spuXXHCPDBupdate user$ set spare6=DECODE...
6866nas5twtqzkk0CDB$ROOTselect /*+ no_monitor no_state...
20725121ffmrc95v7gCDB$ROOTselect i.obj#, i.ts#, i.file#,...
936a1xgxtssv5rrpsqlplus@node1 (TNS V1-V3)QZDB3PDBselect sum(used_blocks), ts.ts...
936a1xgxtssv5rrpsqlplus@node1 (TNS V1-V3)XXHCPDBselect sum(used_blocks), ts.ts...
936a1xgxtssv5rrpsqlplus@node1 (TNS V1-V3)AJZPDBselect sum(used_blocks), ts.ts...

Mutex Sleep Summary
ordered by number of sleeps desc

























































Mutex TypeLocationSleepsWait Time (ms)
Cursor ParentkkscsAddChildNode [KKSPRTLOC28]334,812,2742,336,374,941
Cursor ParentkkscsPruneChild [KKSPRTLOC27]81,186,15030,246,997
Library Cachekgllkc1 57722,195110,128
Library Cachekgllkdl1 85211,65031,718
Cursor Pinkkslce [KKSCHLPIN2]75,33363,530
Cursor Parentkksheqd [KKSPRTLOC12]1,96612,282
Cursor Parentkksfbc [KKSPRTLOC3]1,2145,765
Library CachekglReleaseHandleReference 1241,114148

f0h5rpzmhju11
select SYS_CONTEXT('USERENV', 'SERVER_HOST'), SYS_CONTEXT('USERENV', 'DB_UNIQUE_NAME'), SYS_CONTEXT('USERENV', 'INSTANCE_NAME'), SYS_CONTEXT('USERENV', 'INSTANCE'), STARTUP_TIME, SYS_CONTEXT('USERENV', 'DB_DOMAIN'), SYS_CONTEXT('USERENV', 'SERVICE_NAME') from v$instance
awxuv71j1wzbh
select distinct s.sowner, s.vname from sys.snap$ s, sys.snap_reftime$ r where r.masobj# in (73811) and s.sowner = r.sowner and s.vname = r.vname and bitand(s.flag3, 67108864) = 67108864
gtuz08rcbk69y
select distinct s.sowner, s.vname from sys.snap$ s, sys.snap_reftime$ r where r.masobj# in (73811) and s.sowner = r.sowner and s.vname = r.vname and s.mowner = r.mowner and s.master = r.master and bitand(s.flag3, 67108864) = 67108864

note:
从AWR可以看出数据库的解析出现了严重的问题,cursor: mutex X占了94%的DB TIME同时cursor: mutex S伴随出现。同时mutex 函数kkscsAddChildNode [KKSPRTLOC28]和kkscsPruneChild [KKSPRTLOC27] 增加和裁剪子游标时产生了问题,数据库存在high version SQL, version高于200的都是数据库的递归SQL,最大的达到了7000多。同时ASH 中显示也是该上面的3个SQL.
关于MUTEX
mutex很像latch,mutex 是一种低级的粒度串行机制控制访问SGA中的共享数据结构, 但是相比mutex更小更轻量并且更快, get时少几倍的指令调用, 在10g r2 版本引入,属于KGX (Kernel Generic Mutex Module). 每个结构对象都有自己的mutex保护,父游标和子游标都有各自的mutex; 每个结构又被多个的mutex保护,在11G R2中有超过60种原因会导致不能重用已存在的child cursor,hard parse产生新的child cursor;soft parse选择已存在的child cursor,查找已存在的child cursor时,持有的是cursor: mutex S,  而增加新的Child Cursor持有的是cursor: mutex X.
分析此类问题需要收集的信息:
• AWR Report
• ASH Report
• Hanganalyze trace
• Systemstate dump
What are Wait Events (on Mutexes)?
•Somebody requests a Mutex
•Can not get it by spinning
•And thus, sleeps
•Sleeping is recorded as wait time
•Spinning is not recorded as wait-, but as CPU time
Cursor: Mutex X情景:
Wants: Exclusive mode on Mutex of Parent / Child
• Building a new cursor under a parent
• Although this operation is cheaper, building many cursors under a parent cursor is not recommended.
• Capture SQL bind data(PEEK)
• Build or Update statistics blocks,Modify cursor-related statistics
• Mutex is in the parent cursor.
Cursor: mutex S情景:
Wants: Shared mode on Mutex of Parent / Child
• Change the reference count (“in flux”)= “new guy is interested / spinning”
• Parent examination
• When finding a cursor to execute, the parent must be examined. The examination of the parent is performed using the mutex, cursor: mutex S.
• When the parent cursor has many child cursors involved, this waits will come as the server process has to traverse the entire list of child cursors under the parent to find a match.
• Mutex is in the parent cursor.
Complex - cursor: mutex S/X情景:
•Root-caused by invalidated child cursor(s) => Too many cursor objects in Library Cache
Diagnosis:
•Oracle Wait Interface
•10046 Level 12 session trace (=sql_trace event)  根据bind oacdty检查绑定变量的类型是否存在bind mismatch
•v$sql_shared_cursor plus cursor trace [296377.1]
SQL不能共享的原因:
 select * from
(select sql_id, nonshared_reason, count(*) from v$sql_shared_cursor
unpivot
(nonshared_value for nonshared_reason in (
UNBOUND_CURSOR as 'UNBOUND_CURSOR',
SQL_TYPE_MISMATCH as 'SQL_TYPE_MISMATCH',
OPTIMIZER_MISMATCH as 'OPTIMIZER_MISMATCH',
OUTLINE_MISMATCH as 'OUTLINE_MISMATCH',
STATS_ROW_MISMATCH as 'STATS_ROW_MISMATCH',
LITERAL_MISMATCH as 'LITERAL_MISMATCH',
FORCE_HARD_PARSE as 'FORCE_HARD_PARSE',
EXPLAIN_PLAN_CURSOR as 'EXPLAIN_PLAN_CURSOR',
BUFFERED_DML_MISMATCH as 'BUFFERED_DML_MISMATCH',
PDML_ENV_MISMATCH as 'PDML_ENV_MISMATCH',
INST_DRTLD_MISMATCH as 'INST_DRTLD_MISMATCH',
SLAVE_QC_MISMATCH as 'SLAVE_QC_MISMATCH',
TYPECHECK_MISMATCH as 'TYPECHECK_MISMATCH',
AUTH_CHECK_MISMATCH as 'AUTH_CHECK_MISMATCH',
BIND_MISMATCH as 'BIND_MISMATCH',
DESCRIBE_MISMATCH as 'DESCRIBE_MISMATCH',
LANGUAGE_MISMATCH as 'LANGUAGE_MISMATCH',
TRANSLATION_MISMATCH as 'TRANSLATION_MISMATCH',
BIND_EQUIV_FAILURE as 'BIND_EQUIV_FAILURE',
INSUFF_PRIVS as 'INSUFF_PRIVS',
INSUFF_PRIVS_REM as 'INSUFF_PRIVS_REM',
REMOTE_TRANS_MISMATCH as 'REMOTE_TRANS_MISMATCH',
LOGMINER_SESSION_MISMATCH as 'LOGMINER_SESSION_MISMATCH',
INCOMP_LTRL_MISMATCH as 'INCOMP_LTRL_MISMATCH',
OVERLAP_TIME_MISMATCH as 'OVERLAP_TIME_MISMATCH',
EDITION_MISMATCH as 'EDITION_MISMATCH',
MV_QUERY_GEN_MISMATCH as 'MV_QUERY_GEN_MISMATCH',
USER_BIND_PEEK_MISMATCH as 'USER_BIND_PEEK_MISMATCH',
TYPCHK_DEP_MISMATCH as 'TYPCHK_DEP_MISMATCH',
NO_TRIGGER_MISMATCH as 'NO_TRIGGER_MISMATCH',
FLASHBACK_CURSOR as 'FLASHBACK_CURSOR',
ANYDATA_TRANSFORMATION as 'ANYDATA_TRANSFORMATION',
PDDL_ENV_MISMATCH as 'PDDL_ENV_MISMATCH',
TOP_LEVEL_RPI_CURSOR as 'TOP_LEVEL_RPI_CURSOR',
DIFFERENT_LONG_LENGTH as 'DIFFERENT_LONG_LENGTH',
LOGICAL_STANDBY_APPLY as 'LOGICAL_STANDBY_APPLY',
DIFF_CALL_DURN as 'DIFF_CALL_DURN',
BIND_UACS_DIFF as 'BIND_UACS_DIFF',
PLSQL_CMP_SWITCHS_DIFF as 'PLSQL_CMP_SWITCHS_DIFF',
CURSOR_PARTS_MISMATCH as 'CURSOR_PARTS_MISMATCH',
STB_OBJECT_MISMATCH as 'STB_OBJECT_MISMATCH',
CROSSEDITION_TRIGGER_MISMATCH as 'CROSSEDITION_TRIGGER_MISMATCH',
PQ_SLAVE_MISMATCH as 'PQ_SLAVE_MISMATCH',
TOP_LEVEL_DDL_MISMATCH as 'TOP_LEVEL_DDL_MISMATCH',
MULTI_PX_MISMATCH as 'MULTI_PX_MISMATCH',
BIND_PEEKED_PQ_MISMATCH as 'BIND_PEEKED_PQ_MISMATCH',
MV_REWRITE_MISMATCH as 'MV_REWRITE_MISMATCH',
ROLL_INVALID_MISMATCH as 'ROLL_INVALID_MISMATCH',
OPTIMIZER_MODE_MISMATCH as 'OPTIMIZER_MODE_MISMATCH',
PX_MISMATCH as 'PX_MISMATCH',
MV_STALEOBJ_MISMATCH as 'MV_STALEOBJ_MISMATCH',
FLASHBACK_TABLE_MISMATCH as 'FLASHBACK_TABLE_MISMATCH',
LITREP_COMP_MISMATCH as 'LITREP_COMP_MISMATCH',
PLSQL_DEBUG as 'PLSQL_DEBUG',
LOAD_OPTIMIZER_STATS as 'LOAD_OPTIMIZER_STATS',
ACL_MISMATCH as 'ACL_MISMATCH',
FLASHBACK_ARCHIVE_MISMATCH as 'FLASHBACK_ARCHIVE_MISMATCH',
LOCK_USER_SCHEMA_FAILED as 'LOCK_USER_SCHEMA_FAILED',
REMOTE_MAPPING_MISMATCH as 'REMOTE_MAPPING_MISMATCH',
LOAD_RUNTIME_HEAP_FAILED as 'LOAD_RUNTIME_HEAP_FAILED',
HASH_MATCH_FAILED as 'HASH_MATCH_FAILED',
PURGED_CURSOR as 'PURGED_CURSOR',
BIND_LENGTH_UPGRADEABLE as 'BIND_LENGTH_UPGRADEABLE',
USE_FEEDBACK_STATS as 'USE_FEEDBACK_STATS'
))
where nonshared_value = 'Y'
group by sql_id, nonshared_reason
)
where sql_id = '&sqlid' ;

模拟一种 cursor: mutex S/X wait 情况:
•Generating 64 child cursors for one SQL_ID
•Accessing them 20x parallel
➔ Delay to create new children (X)
➔ Delay to select good child (S)
Similar Problem with CHAR binds
•Bind buffers are size 32, 128, 2000 or 4000 bytes
•Changing CHAR bind length invalidates
•Reason BIND_LENGTH_UPGRADEABLE= 4^n cursor versions
情况较复杂应该根据自己的实际情况分析,可能的一些解决方案如下:

• One Quick Fix for cursor: mutex S/X System is loaded with heavy mutex waits due to high number of cursors (=version count)
=> frequently flush this cursor with dbms_shared_pool.purge (look out for new parsing issues = CPU)
•  One solution for cursor: mutex S/X Application uses jdbc setter methods now properly on INTEGER column (=2)
#•setNUMBER(2) => Bind Var. is NUMBER
#setNULL(2, java.sql.Types.INTEGER)
=> Bind Var. is NUMBER
•Check how the application does handle bind variables Avoid BIND MISMATCH at (nearly) any cost
•Reduce the number of cursor versions below 100 More will lead to overhead
•Look for matching Oracle bugs in your RDBMS release
•Upgrade to 11.2.0.3 or higher 11.2.0.2 is worst version for cursor issues IMHO
• Increase the SGA if the shared pool is too small and heavy hard parse occurs due to reloads.
• For 11g, make sure cursor_sharing is not similar, as it has been deprecated. This may also cause cursor: mutex S waits:Recommended to use FORCE.
• Increase SESSION_CACHED_CURSORS to higher value if there are too many parses and waits on cursor: pin S.
• Use CURSOR_SHARING to FORCE to avoid waits on cursor: mutex S & cursor: mutex X due to huge version counts.
• If possible run with _LIBRARY_CACHE_ADVICE = FALSE and see if that helps
• On HP platforms ensure HPUX_SCHED_NOAGE parameter is set to 178.
• Obsoleting parent cursors if VERSION_COUNT exceeds a threshold (cursor: mutex S).
• When the child cursors grow beyond certain count be it 20 or 100, it obsoletes the parent cursors. In order to activate this enhancement bug set following:
For 11.2.0.2, apply the 11.2.0.2.2 PSU 12431716 atleast and do the following.
• _CURSOR_FEATURES_ENABLED to 1026
• event 106001 with value 100 (as the parameter _CURSOR_OBSOLETE_THRESHOLD is not present)
• Set , event= 106001 trace name context forever, level 100 in init.ora
For 11.2.0.3 and above, do the following.
• _CURSOR_OBSOLETE_THRESHOLD to 100 (default. this is the number of child cursor after which we obsolete it)
在有些情况下是存在真的hot对象,可以上面的方法都无法解决,需要手动打散:
To identify if the issue is a hot mutex use the following SQL :
SQL> select KGLNAOBJ, KGLNAOWN, KGLHDNSP,
KGLOBTYP, KGLNAHSV, KGLOBT23, KGLOBT24
from X$KGLOB
where KGLOBT23 > 100000 or KGLOBT24 > 100000
order by KGLOBT24;

Note:
相隔几分钟运行上面的SQL,如果KGLOBT23 或KGLOBT24 短时间内暴涨就说明该对象是HOT OBJ.或者从ASH或v$session数据根据当时的p1值得到。
Mutex wait Parameter Description
P1 Hash value of cursor
P2 Mutex value (top 2 bytes contain SID holding mutex in exclusive mode, and bottom two bytes usually hold the value 0)
P3 Mutex where (an internal code locator) OR'd with Mutex Sleeps
-- find hot obj
select KGLNAOBJ, KGLNAOWN, KGLHDNSP, KGLOBTYP,
KGLNAHSV
from X$KGLOB
where KGLNAHSH= {value of P1};
-- find blocker
SELECT count (*),
to_number(substr(to_char(rawtohex(p2raw)), 1, 8),'XXXXXXXX') blocking_sid FROM v$session
WHERE event = 'library cache: mutex X‘
group by to_number(substr(to_char(rawtohex(p2raw)),1, 8), 'XXXXXXXX');
-- hot copy
You can apply the hot copy fix from 11.2.0.2 onwards w/o anyinterim patches.
• Set the following parameters in init.ora:-
_KGL_HOT_OBJECT_COPIES={a}

_KGL_DEBUG="name='{b}' schema='{c}' namespace={d} debug=33554432"
or (where the object is a cursor)
_KGL_DEBUG="hash='{e}' debug=33554432"
NOTE: Only with Oracle Support Guidance
Where:
{a} - half the CPU count
{b} - KGLNAOBJ from SQL
{c} - KGLNAOWN from SQL
{d} - KGLHDNSP from SQL
{e} - KGLNAHSV from SQL.
The same can be performed using dbms_shared_pool APIs in 11.2.0.2+.
SQL> EXEC DBMS_SHARED_POOL.MARKHOT('username','object_name',KGLHDNSP);
<or, for a cursor>
SQL> EXEC DBMS_SHARED_POOL.MARKHOT('KGLNAHSV', 0);

NOTE: dbms_shared_pool.markhot只是实例周期的改变,重启后失效,需要每次重启后重复手动配置.
12.2 这个案例的临时解决方案
上面介绍了那么多,再回到这个12.2的案例, 这么高的mutex等待与跟high version脱离不了关系,简单咨询客户后确认了没有对12.2默认参数做调整,sql verison能达到7000多,我查询了12.2的隐藏参数_CURSOR_OBSOLETE_THRESHOLD原来现在默认已加大到了8192, 该参数在11G中也只是100,12C R1版本是也在1024. 并且high version的SQL都是递归内部SQL,无法调应用解决,于是让客户调整_CURSOR_OBSOLETE_THRESHOLD参数调到200,并重启实例生效。问题消失了,并且上面的那个high version的SQL从TOP中消失。
当然如果再深入查,至少要在当时做SSD或ash 中找到进程信息,或尝试禁用_optim_peek_user_binds, sql ACS\\ECS这些特性,或者是个未知的BUG, 我相信未来肯定会有客户同样遇到,希望提供一个分析的方向。
 
--   update      2018-5-2  ---
如果以上修改无效,不访安装bug 25054064 的12.2 DBRU.







25054064TST&PERF:EBS: VERY HIGH VERSION COUNT AND CURSOR MUTEX WAITS AFTER PLUG INTO CDB
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论