暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

【Oracle】child cursor过多高并发环境下产生cursor:mutex x和cursor:mutex s等待

故障描述:

客户数据库Oracle 12.1.0.2 RAC某个节点16点30分钟左右SQL_ID gr5jm3av0n5sx出现了大量的cursor:mutex x、cursor:mutex s等待,CPU使用率接近100%

系统环境:Oracle 12.1C RAC HP-UNIX

1.数据库出现大量的cursor:mutex x和cursor :mutex s等待

交易库01.png

2.查看相关SQL_ID gr5jm3av0n5sx的相关信息

SQL_ID  gr5jm3av0n5sx, child number 107
-------------------------------------
UPDATE T_BPM_PROCESSINSTANCE T_BPM_PROCESSINSTANCE
   SET PROCESSSTATE       = :p1,
       PREVIOUSSTATE      = :p2,
       CLUSTERNODEID      = :p3,
       CURACTIVITYDEFID   = :p4,
       CURACTIVITYID      = :p5,
       PREVACTIVITYINSTID = :p6,
       CURACTIVITYSTATE   = :p7,
       ELEMENTLOOPCOUNTER = :p8,
       STARTTIME          = :p9,
       LASTUPDATETIME     = :p10,
       ACTIVITYCOUNT      = :p11,
       CURSEQUENCENUM     = :p12,
       TIMERSTATE         = :p13,
       PREDACTIVITIES     = :p14,
       SUSPENDACTIVITIES  = :p15,
       SKIPPEDACTIVITIES  = :p16,
       JOINEDELEMENTIDS   = :p17,
       JOINEDTRANSITIONS  = :p18,
       METADATA           = :p19,
       ISEXECUTING        = :p20,
       ASYNCELEMENTDEFID  = :p21,
       ASYNCRETRYCOUNT    = :p22,
       ASYNCELEMENTINSTID = :p23,
       CURELMNTSTARTTIME  = :p24,
       BUSINESSSTATUS     = :p25,
       PI_EXTN_05         = :p26,
       PI_EXTN_08         = :p27,
       PI_EXTN_12         = :p28,
       PI_EXTN_13         = :p29,
       PI_EXTN_21         = :p30,
       PI_EXTN_26         = :p31,
       PI_EXTN_27         = :p32
 WHERE PROCESSINSTANCEID = :p33
   AND L2_PARTITION_KEY = :p34

Plan hash value: 2404250912
--------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT                    |                          |       |       |     4 (100)|          |       |       |
|   1 |  UPDATE                             | T_BPM_PROCESSINSTANCE    |       |       |            |          |       |       |
|*  2 |   TABLE ACCESS BY GLOBAL INDEX ROWID| T_BPM_PROCESSINSTANCE    |     1 |   369 |     4   (0)| 00:00:01 | ROWID | ROWID |
|*  3 |    INDEX UNIQUE SCAN                | PK_T_BPM_PROCESSINSTANCE |     1 |       |     3   (0)| 00:00:01 |       |       |
--------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("L2_PARTITION_KEY"=:P34)
   3 - access("PROCESSINSTANCEID"=:P33)

sys@TRDDB1>select version_count from v$sqlarea where sql_id='gr5jm3av0n5sx';

VERSION_COUNT
-------------
          108

该SQL的历史执行信息:

BEGIN_TIME          INSTANCE_NUMBER MODULE                                        PLAN_HASH_VALUE       EXEC    PER_GET   PER_ROWS    TIME_MS   PER_READ
------------------- --------------- --------------------------------------------- --------------- ---------- ---------- ---------- ---------- ----------
2019-06-04 16:45:24               1 ddsserver@wbesngx202 (TNS V1-V3)                   2404250912     359560         11          1        .49          0
2019-06-04 16:45:23               2 ddsserver@wbesesb203 (TNS V1-V3)                   2404250912     775550         12          1      46.17          0
2019-06-04 16:30:04               2 ddsserver@wbesesb203 (TNS V1-V3)                   2404250912     489342         26          1     4984.1          0
2019-06-04 16:30:04               1 ddsserver@wbesngx202 (TNS V1-V3)                   2404250912     322380         12          1        .46          0
2019-06-04 16:15:31               2 ddsserver@wbesesb203 (TNS V1-V3)                   2404250912     717255         19          1    1652.25          0
2019-06-04 16:15:31               1 ddsserver@wbesngx202 (TNS V1-V3)                   2404250912     298229         12          1        .46          0
2019-06-04 16:00:26               2 ddsserver@wbesesb203 (TNS V1-V3)                   2404250912     691853         12          1        .79          0
2019-06-04 16:00:26               1 ddsserver@wbesngx202 (TNS V1-V3)                   2404250912     330588         11          1        .45          0
2019-06-04 15:45:20               2 ddsserver@wbesesb203 (TNS V1-V3)                   2404250912     679735         12          1        .81          0
2019-06-04 15:45:20               1 ddsserver@wbesngx202 (TNS V1-V3)                   2404250912     381596         12          1        .43          0
2019-06-04 15:30:16               2 ddsserver@wbesesb203 (TNS V1-V3)                   2404250912     775566         13          1      20.56          0
2019-06-04 15:30:15               1 ddsserver@wbesngx202 (TNS V1-V3)                   2404250912     359154         11          1        .42          0
2019-06-04 15:15:10               2 ddsserver@wbesesb203 (TNS V1-V3)                   2404250912     751541         13          1      19.75          0
2019-06-04 15:15:10               1 ddsserver@wbesngx202 (TNS V1-V3)                   2404250912     483641         11          1        .44          0
2019-06-04 15:00:05               2 ddsserver@wbesesb203 (TNS V1-V3)                   2404250912     437762         13          1        .46          0
2019-06-04 15:00:05               1 ddsserver@wbesngx202 (TNS V1-V3)                   2404250912     367565         12          1        .43          0
2019-06-04 14:45:21               2 ddsserver@wbesesb203 (TNS V1-V3)                   2404250912     436110         13          1        .83          0
2019-06-04 14:45:21               1 ddsserver@wbesngx202 (TNS V1-V3)                   2404250912     345284         12          1        .43          0
2019-06-04 14:30:16               2 ddsserver@wbesesb203 (TNS V1-V3)                   2404250912     546521         13          1       2.03          0
2019-06-04 14:30:16               1 ddsserver@wbesngx202 (TNS V1-V3)                   2404250912     336119         12          1        .43          0
2019-06-04 14:15:11               2 ddsserver@wbesesb203 (TNS V1-V3)                   2404250912     455908         12          1         .5          0
2019-06-04 14:15:11               1 ddsserver@wbesngx202 (TNS V1-V3)                   2404250912     370684         11          1        .42          0
2019-06-04 14:00:06               2 ddsserver@wbesesb203 (TNS V1-V3)                   2404250912     418343         11          1        .42          0

3、cursor:mutex x、cursor:mutex s等待事件

cursor:mutex x、cursor:mutex s是属于解析类等待事件,是因为解析该SQL时引起(软解析也会有该等待事件)oracle有如下的解释

cursor: mutex S
A session waits on this event when it is requesting a mutex in shared mode, when another session is currently holding a this mutex in exclusive mode on the same cursor object.

Cursor: mutex S情景:
Wants: Shared mode on Mutex of Parent / Child
• Change the reference count (“in flux”)= “new guy is interested / spinning”
• Parent examination
• When finding a cursor to execute, the parent must be examined. The examination of the parent is performed using the mutex, cursor: mutex S.
• When the parent cursor has many child cursors involved, this waits will come as the server process has to traverse the entire list of child cursors under the parent to find a match.
• Mutex is in the parent cursor.

cursor: mutex X:
The session requests the mutex for a cursor object in exclusive mode, and it must wait because the resource is busy. The mutex is busy because either the mutex is being held in exclusive mode by another session or the mutex is being held shared by one or more sessions. The existing mutex holder(s) must release the mutex before the mutex can be granted exclusively.

cursor: mutex X情景:
Wants: Exclusive mode on Mutex of Parent / Child
• Building a new cursor under a parent
• Although this operation is cheaper, building many cursors under a parent cursor is not recommended.
• Capture SQL bind data(PEEK)
• Build or Update statistics blocks,Modify cursor-related statistics
• Mutex is in the parent cursor.
交易库02.png

4、parent cursor出现high version count:绑定变量不匹配、统计信息引起、绑定变量长度不一致

交易库03.png
交易库04.png

绑定变量类型不一致或者分级主要发生在以下几个变量:P4、P5、P6、P9、P14、P21
交易库05.png

5、High Waits On cursor: mutex X After Upgrading The Database to 12c, 18c & 19c (Doc ID 2625815.1)

APPLIES TO:
Oracle Database - Enterprise Edition - Version 12.1.0.2 to 19.5.0.0.0 [Release 12.1 to 18]
Information in this document applies to any platform.
SYMPTOMS
• High waits on cursor: mutex X after upgrading the database to 12.1.0.2, 12.2.0.1, 18c or 19c.
• CPU utilization spikes to 100% due to the mutex waits leading to database hang or application outage.
• Huge version count is seen for the SQL statements during the problem period.
• Login or connection storm is observed during the problem.
• Huge sleeps on functions kkscsAddChildNode & kkscsPruneChild as per ‘Mutex Sleep Summary’ from AWR report.
Mutex Type Location Sleeps Wait Time (ms)
• Cursor Parent kkscsAddChildNode [KKSPRTLOC34] 302,348,122 140,622
• Cursor Parent kkscsPruneChild [KKSPRTLOC35] 4,763,695 -493,302
• Cursor Pin kkslce [KKSCHLPIN2] 1,232,845 659,278

• Hanganalyze trace shows most of the chains running the same cursor from different processes waiting on ‘cursor: mutex X’ with the below stack.
The stack trace:
<-kgxExclusive<-kkscsAddChildNode<-kxscod<-kkscsCompareBinds<-kkscscid_bnd_eval<-kkscsCheckCriteria<-kkscsCheckCursor<-kkscsSearchChildList<-kksfbc<-

CHANGES
Upgrading the database to 12c and higher.
CAUSE
This is due to below internal bug which is fixed in 20.1.
Unpublished Bug 28889389 : HIGH WAITS ON CURSOR: MUTEX X AFTER UPGRADE TO 12.1.0.2
This bug is hit after having observed that multiple unique sessions are waiting for a parent cursor mutex in exclusive mode on the same cursor under the following stack:-
ksfwaitctx()+28<-kgxWait()+1164<-kgxExclusive()+652<-kkscsAddChildNode()
‘cursor: mutex X’ would have been taken on the parent cursor to find out the matching child cursor for that particular SQL. Some kind of logon storm where all sessions are executing the same cursor at the same time leading to this mutex contention. The child cursor sharing code has been optimized to reduce the Mutex Contention so that exclusive mutex is not acquired for searching the child cursors for match under the same parent.

SOLUTION
To Apply the Patch 28889389 which has the optimized code for cursor mutex while searching the parent cursor for the match.

可能触碰到上述BUG

6、建议:

1 采取定期purge version count child的方式来规避(也通过规范绑定变量和控制统计信息收集策略来控制产生child cursor的数量)

2 设置隐含参数_cursor_obsolete_threshold控制子游标数量,可以设置为100

3 将该SQL进一步打散
将频繁执行的SQL分区拆解,分散竞争,如以下SQL通过注释将同一条SQL分解为2条,就分散了竞争:

    select /*SQL 1*/ a from t_a where id=?
    select /*SQL 2*/ a from t_a where id=?

4 打上相关patchset规避Oracle Bug引起的cursor竞争。

最后修改时间:2020-03-27 13:23:36
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论