大量library cache lock导致连接数爆满

好名字可以让你的朋友更容易记住你

2019-06-18

oracle

早上发现RAC两个节点连接数都达到5000+，直接导致新连接不能进来，关掉应用后sqlPlus /as sysdba进入数据库查看等待事件有大量library cache lock，也看不到有执行的sql_id

请问专家如何排查问题根本原因，部分信息和AWR如下

SQL> select count(1),sql_id from v$session where event='library cache lock' group by sql_id;

COUNT(1) SQL_ID

---------- -------------

4702

SQL> select event,count(1) from v$session group by event;

EVENT COUNT(1)

---------------------------------------------------------------- ----------

Backup: MML write backup piece 1

db file sequential read 3

PING 1

gcs remote message 6

GCR sleep 1

Space Manager: slave idle wait 1

Backup: MML create a backup piece 2

SQL*Net message to client 1

SQL*Net message from client 315

pmon timer 1

smon timer 1

EVENT COUNT(1)

---------------------------------------------------------------- ----------

DIAG idle wait 2

rdbms ipc message 42

VKTM Logical Idle Wait 1

ges remote message 1

ASM background timer 1

wait for unread message on broadcast channel 2

Streams AQ: qmn slave idle wait 1

library cache lock 4702

JOX Jit Process Sleep 1

PX Deq: Execution Msg 2

LNS ASYNC end of log 1

EVENT COUNT(1)

---------------------------------------------------------------- ----------

Streams AQ: qmn coordinator idle wait 1

class slave wait 11

Streams AQ: waiting for time management or cleanup tasks 1

25 rows selected.

Specify the Report Name

~~~~~~~~~~~~~~~~~~~~~~~

The default report file name is addmrpt_2_63116_63117.txt. To use this name,

press <return> to continue, otherwise enter an alternative.

Enter value for report_name:

Using the report name addmrpt_2_63116_63117.txt

Running the ADDM analysis on the specified pair of snapshots ...

Generating the ADDM report for this analysis ...

ADDM Report for Task 'TASK_266909'

----------------------------------

Analysis Period

---------------

AWR snapshot range from 63116 to 63117.

Time period starts at 18-JUN-19 09.00.06 AM

Time period ends at 18-JUN-19 10.00.41 AM

Analysis Target

---------------

Database 'GZKDB' with DB ID 3858906879.

Database version 11.2.0.3.0.

ADDM performed an analysis of instance GZKDB2, numbered 2 and hosted at

gzkserver2.

Activity During the Analysis Period

-----------------------------------

Total database time was 1486301 seconds.

The average number of active sessions was 408.89.

Summary of Findings

-------------------

Description Active Sessions Recommendations

Percent of Activity

------------------------------ ------------------- ---------------

1 Session Connect and Disconnect 363.49 | 88.9 1

2 Shared Pool Latches 363 | 88.78 1

3 "User I/O" wait Class 27.24 | 6.66 0

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Findings and Recommendations

----------------------------

Finding 1: Session Connect and Disconnect

Impact is 366.93 active sessions, 88.9% of total activity.

----------------------------------------------------------

Session connect and disconnect calls were consuming significant database time.

Recommendation 1: Application Analysis

Estimated benefit is 363.49 active sessions, 88.9% of total activity.

---------------------------------------------------------------------

Action

Investigate application logic for possible reduction of connect and

disconnect calls. For example, you might use a connection pool scheme in

the middle tier.

Finding 2: Shared Pool Latches

Impact is 366.43 active sessions, 88.78% of total activity.

-----------------------------------------------------------

Contention for latches related to the shared pool was consuming significant

database time.

Waits for "library cache lock" amounted to 88% of database time.

Recommendation 1: Application Analysis

Estimated benefit is 363 active sessions, 88.78% of total activity.

-------------------------------------------------------------------

Action

Investigate the cause for latch contention using the given blocking

sessions or modules.

Rationale

The session with ID 7079 and serial number 30793 in instance number 1

was the blocking session responsible for 14% of this recommendation's

benefit.

Symptoms That Led to the Finding:

---------------------------------

Wait class "Concurrency" was consuming significant database time.

Impact is 363.4 active sessions, 88.88% of total activity.

Finding 3: "User I/O" wait Class

Impact is 27.5 active sessions, 6.66% of total activity.

--------------------------------------------------------

Wait class "User I/O" was consuming significant database time.

The throughput of the I/O subsystem was not significantly lower than expected.

The Oracle instance memory (SGA and PGA) was adequately sized.

No recommendations are available.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Additional Information

----------------------

Miscellaneous Information

-------------------------

Wait class "Application" was not consuming significant database time.

Wait class "Cluster" was not consuming significant database time.

Wait class "Commit" was not consuming significant database time.

Wait class "Configuration" was not consuming significant database time.

CPU was not a bottleneck for the instance.

Wait class "Network" was not consuming significant database time.

The network latency of the cluster interconnect was within acceptable limits

of 1 milliseconds.

Hard parsing of SQL statements was not consuming significant database time.

End of Report

Report written to addmrpt_2_63116_63117.txt

SQL>

6条回答

默认

最新

好名字可以让你的朋友更容易记住你

AWR分析结果: 1560841445231_session42_106.log.html

有用 0

Albert

select to_char(sample_time, 'yyyymmdd hh24:mi:ss') sample_time,

event,

BLOCKING_INST_ID||'_'||BLOCKING_SESSION,

count(*)

from gv$active_session_history

where to_char(sample_time, 'yyyymmdd hh24:mi:ss') between

'**LIBRARY CACHE LOCK时间段1***' and '**LIBRARY CACHE LOCK时间段2**'

group by to_char(sample_time, 'yyyymmdd hh24:mi:ss'), event,BLOCKING_INST_ID||'_'||BLOCKING_SESSION

order by 1

看一下大量ACTIVE被那个会话阻塞，阻塞会话在干什么，另外确认一下V$SGA_RESIZE_OPS中是否存在SGA 剧烈调整

有用 0

Albert

看到AWR中有大量Shared Pool Latches，确认故障期间段是否操作系统存在换页问题、SHARED_POOL大小，Library cache reload情况

有用 0

好名字可以让你的朋友更容易记住你

故障期间页面文件使用率确实很高80%以上。因为现在已经把这些会话全kill掉了··需要等再次出现才能跑您给的SQL来出结果了

有用 0

章芋文

应该是用错误密码频繁链接数据库导致：

connection management call elapsed time: % DB time 88.9

2） 28401事件用于关闭11g数据库中用户持续输入错误密码时的延迟用户验证特性，避免用户持续输入错误密码时产生大量的row cache lock或library cache lock等待，严重时使数据库完全不能登录。

有用 1

好名字可以让你的朋友更容易记住你

问题已关闭: 问题已经得到解决

有用 0

回答交流

提交

问题信息

请登录之后查看

邀请回答

暂无人订阅该标签，敬请期待~~

墨值悬赏