暂无图片
分享
大量library cache lock导致连接数爆满

早上发现RAC两个节点连接数都达到5000+,直接导致新连接不能进来,关掉应用后sqlPlus /as sysdba进入数据库查看等待事件有大量library cache lock,也看不到有执行的sql_id

请问专家如何排查问题根本原因,部分信息和AWR如下

SQL> select count(1),sql_id from v$session where event='library cache lock'  group by sql_id;


  COUNT(1) SQL_ID

---------- -------------

      4702


SQL> select event,count(1) from v$session group by event;


EVENT                                                              COUNT(1)

---------------------------------------------------------------- ----------

Backup: MML write backup piece                                            1

db file sequential read                                                   3

PING                                                                      1

gcs remote message                                                        6

GCR sleep                                                                 1

Space Manager: slave idle wait                                            1

Backup: MML create a backup piece                                         2

SQL*Net message to client                                                 1

SQL*Net message from client                                             315

pmon timer                                                                1

smon timer                                                                1


EVENT                                                              COUNT(1)

---------------------------------------------------------------- ----------

DIAG idle wait                                                            2

rdbms ipc message                                                        42

VKTM Logical Idle Wait                                                    1

ges remote message                                                        1

ASM background timer                                                      1

wait for unread message on broadcast channel                              2

Streams AQ: qmn slave idle wait                                           1

library cache lock                                                     4702

JOX Jit Process Sleep                                                     1

PX Deq: Execution Msg                                                     2

LNS ASYNC end of log                                                      1


EVENT                                                              COUNT(1)

---------------------------------------------------------------- ----------

Streams AQ: qmn coordinator idle wait                                     1

class slave wait                                                         11

Streams AQ: waiting for time management or cleanup tasks                  1


25 rows selected.





Specify the Report Name

~~~~~~~~~~~~~~~~~~~~~~~

The default report file name is addmrpt_2_63116_63117.txt.  To use this name,

press <return> to continue, otherwise enter an alternative.


Enter value for report_name: 


Using the report name addmrpt_2_63116_63117.txt



Running the ADDM analysis on the specified pair of snapshots ...



Generating the ADDM report for this analysis ...



          ADDM Report for Task 'TASK_266909'

          ----------------------------------


Analysis Period

---------------

AWR snapshot range from 63116 to 63117.

Time period starts at 18-JUN-19 09.00.06 AM

Time period ends at 18-JUN-19 10.00.41 AM


Analysis Target

---------------

Database 'GZKDB' with DB ID 3858906879.

Database version 11.2.0.3.0.

ADDM performed an analysis of instance GZKDB2, numbered 2 and hosted at

gzkserver2.


Activity During the Analysis Period

-----------------------------------

Total database time was 1486301 seconds.

The average number of active sessions was 408.89.


Summary of Findings

-------------------

   Description                     Active Sessions      Recommendations

                                   Percent of Activity

   ------------------------------  -------------------  ---------------

1  Session Connect and Disconnect  363.49 | 88.9        1

2  Shared Pool Latches             363 | 88.78          1

3  "User I/O" wait Class           27.24 | 6.66         0



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



          Findings and Recommendations

          ----------------------------


Finding 1: Session Connect and Disconnect

Impact is 366.93 active sessions, 88.9% of total activity.

----------------------------------------------------------

Session connect and disconnect calls were consuming significant database time.


   Recommendation 1: Application Analysis

   Estimated benefit is 363.49 active sessions, 88.9% of total activity.

   ---------------------------------------------------------------------

   Action

      Investigate application logic for possible reduction of connect and

      disconnect calls. For example, you might use a connection pool scheme in

      the middle tier.



Finding 2: Shared Pool Latches

Impact is 366.43 active sessions, 88.78% of total activity.

-----------------------------------------------------------

Contention for latches related to the shared pool was consuming significant

database time.

Waits for "library cache lock" amounted to 88% of database time.


   Recommendation 1: Application Analysis

   Estimated benefit is 363 active sessions, 88.78% of total activity.

   -------------------------------------------------------------------

   Action

      Investigate the cause for latch contention using the given blocking

      sessions or modules.

   Rationale

      The session with ID 7079 and serial number 30793 in instance number 1

      was the blocking session responsible for 14% of this recommendation's

      benefit.


   Symptoms That Led to the Finding:

   ---------------------------------

      Wait class "Concurrency" was consuming significant database time.

      Impact is 363.4 active sessions, 88.88% of total activity.



Finding 3: "User I/O" wait Class

Impact is 27.5 active sessions, 6.66% of total activity.

--------------------------------------------------------

Wait class "User I/O" was consuming significant database time.

The throughput of the I/O subsystem was not significantly lower than expected.

The Oracle instance memory (SGA and PGA) was adequately sized.


   No recommendations are available.




~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


          Additional Information

          ----------------------


Miscellaneous Information

-------------------------

Wait class "Application" was not consuming significant database time.

Wait class "Cluster" was not consuming significant database time.

Wait class "Commit" was not consuming significant database time.

Wait class "Configuration" was not consuming significant database time.

CPU was not a bottleneck for the instance.

Wait class "Network" was not consuming significant database time.

The network latency of the cluster interconnect was within acceptable limits

of 1 milliseconds.

Hard parsing of SQL statements was not consuming significant database time.

End of Report

Report written to addmrpt_2_63116_63117.txt

SQL> 


收藏
分享
6条回答
默认
最新
Albert

select to_char(sample_time, 'yyyymmdd hh24:mi:ss') sample_time,

       event,

       BLOCKING_INST_ID||'_'||BLOCKING_SESSION,

       count(*)

  from gv$active_session_history 

where to_char(sample_time, 'yyyymmdd hh24:mi:ss') between

       '**LIBRARY CACHE LOCK时间段1***' and '**LIBRARY CACHE LOCK时间段2**'

group by to_char(sample_time, 'yyyymmdd hh24:mi:ss'), event,BLOCKING_INST_ID||'_'||BLOCKING_SESSION

order by 1

 看一下大量ACTIVE被那个会话阻塞,阻塞会话在干什么,另外确认一下V$SGA_RESIZE_OPS中是否存在SGA 剧烈调整

暂无图片 评论
暂无图片 有用 0
Albert

看到AWR中有大量Shared Pool Latches,确认故障期间段是否操作系统存在换页问题、SHARED_POOL大小,Library cache reload情况

暂无图片 评论
暂无图片 有用 0

故障期间页面文件使用率确实很高80%以上。因为现在已经把这些会话全kill掉了··需要等再次出现才能跑您给的SQL来出结果了

暂无图片 评论
暂无图片 有用 0
章芋文

应该是用错误密码频繁链接数据库导致:

connection management call elapsed time: % DB time 88.9


2) 28401事件用于关闭11g数据库中用户持续输入错误密码时的延迟用户验证特性,避免用户持续输入错误密码时产生大量的row cache lock或library cache lock等待,严重时使数据库完全不能登录。

暂无图片 评论
暂无图片 有用 1
问题已关闭: 问题已经得到解决
暂无图片 评论
暂无图片 有用 0
回答交流
提交
问题信息
请登录之后查看
邀请回答
暂无人订阅该标签,敬请期待~~
暂无图片墨值悬赏