故障概述
故障分析
### 截取时间26分开始的event
SAMPLE_TIME EVENT COUNT(*)
----------------- ---------------------------------------------------------------- ----------
20240319 09:25:59 db file sequential read 25
20240319 09:26:07 db file sequential read 19
20240319 09:26:09 23
20240319 09:26:09 db file sequential read 24
20240319 09:26:17 db file sequential read 11
20240319 09:26:19 14
20240319 09:26:19 db file sequential read 21
20240319 09:26:29 16
20240319 09:26:29 db file sequential read 43
20240319 09:26:37 db file sequential read 14
20240319 09:26:40 19
SAMPLE_TIME EVENT COUNT(*)
----------------- ---------------------------------------------------------------- ----------
20240319 09:26:40 db file sequential read 40
20240319 09:26:50 12
20240319 09:26:50 db file sequential read 44
20240319 09:26:57 gc buffer busy acquire 33
20240319 09:27:00 db file sequential read 12
20240319 09:27:00 gc buffer busy release 18
20240319 09:27:00 buffer busy waits 37
20240319 09:27:00 enq: SQ - contention 58
20240319 09:27:00 enq: CT - CTWR process start/stop 373 <<<<<< CTWR 开始出现
20240319 09:27:07 cr request retry 31
20240319 09:27:07 buffer busy waits 37
SAMPLE_TIME EVENT COUNT(*)
----------------- ---------------------------------------------------------------- ----------
20240319 09:27:07 library cache lock 145
20240319 09:27:07 gc buffer busy acquire 231
20240319 09:27:10 13
20240319 09:27:10 gc buffer busy release 18
20240319 09:27:10 library cache lock 95
20240319 09:27:10 latch: redo allocation 101
20240319 09:27:10 buffer busy waits 112
20240319 09:27:10 enq: SQ - contention 339
20240319 09:27:10 enq: CT - CTWR process start/stop 565
20240319 09:27:17 gc cr failure 11
20240319 09:27:17 cr request retry 29
SAMPLE_TIME EVENT COUNT(*)
----------------- ---------------------------------------------------------------- ----------
20240319 09:27:17 buffer busy waits 44
20240319 09:27:17 gc buffer busy acquire 245
20240319 09:27:17 library cache lock 350
20240319 09:27:20 11
20240319 09:27:20 gc buffer busy release 17
20240319 09:27:20 buffer busy waits 160
20240319 09:27:20 latch: redo allocation 225
20240319 09:27:20 library cache lock 289
20240319 09:27:20 enq: SQ - contention 511
20240319 09:27:20 enq: CT - CTWR process start/stop 565
20240319 09:27:27 cr request retry 36
SAMPLE_TIME EVENT COUNT(*)
----------------- ---------------------------------------------------------------- ----------
20240319 09:27:27 buffer busy waits 47
……
SAMPLE_TIME EVENT COUNT(*)
----------------- ---------------------------------------------------------------- ----------
20240319 09:28:11 latch: redo copy 11
20240319 09:28:11 gc current block 2-way 12
20240319 09:28:11 gc buffer busy acquire 17
20240319 09:28:11 23
20240319 09:28:11 enq: CT - CTWR process start/stop 30
20240319 09:28:11 enq: US - contention 32 <<<<<<< enq: US - contention 出现
20240319 09:28:11 db file sequential read 32
20240319 09:28:11 enq: SQ - contention 41
20240319 09:28:11 enq: TX - index contention 206
20240319 09:28:11 latch: redo allocation 242
20240319 09:28:11 row cache lock 391
SAMPLE_TIME EVENT COUNT(*)
----------------- ---------------------------------------------------------------- ----------
20240319 09:28:11 buffer busy waits 638
20240319 09:28:11 gc buffer busy release 981
20240319 09:28:18 gc cr grant 2-way 19
20240319 09:28:18 db file sequential read 27
20240319 09:28:18 buffer busy waits 31
20240319 09:28:18 gc buffer busy acquire 87
20240319 09:28:18 enq: TX - index contention 114
20240319 09:28:18 gc buffer busy release 125
20240319 09:28:22 12
20240319 09:28:22 db file sequential read 17
20240319 09:28:22 gc current grant busy 18
SAMPLE_TIME EVENT COUNT(*)
----------------- ---------------------------------------------------------------- ----------
20240319 09:28:22 gc index operation 30
20240319 09:28:22 gc current block 2-way 59
20240319 09:28:22 gc cr multi block request 67
20240319 09:28:22 gc buffer busy acquire 71
20240319 09:28:22 buffer busy waits 102
20240319 09:28:22 gc current grant 2-way 166
20240319 09:28:22 gc cr grant 2-way 355
20240319 09:28:22 enq: US - contention 467
20240319 09:28:22 enq: TX - index contention 590
20240319 09:28:28 11
### SGA内存抖动
INST_ID COMPONENT STATUS OPER_TYPE OPER_MODE PARAMETER INITIAL_MB TARGET_MB FINAL_MB START_TIME END_TIME
---------- ------------------------- --------- ------------- --------- ------------------------- ---------- ---------- ---------- ----------------- -----------------
2 streams pool COMPLETE GROW IMMEDIATE streams_pool_size 1024 1280 1280 24-03-16 23:35:39 24-03-16 23:35:49
2 streams pool COMPLETE GROW IMMEDIATE streams_pool_size 1024 1280 1280 24-03-16 23:35:39 24-03-16 23:35:50
2 DEFAULT buffer cache COMPLETE SHRINK IMMEDIATE db_cache_size 57856 57600 57600 24-03-16 23:35:39 24-03-16 23:35:50
2 streams pool COMPLETE GROW IMMEDIATE streams_pool_size 1280 1536 1536 24-03-16 23:35:51 24-03-16 23:35:52
2 DEFAULT buffer cache COMPLETE SHRINK IMMEDIATE db_cache_size 57600 57344 57344 24-03-16 23:35:51 24-03-16 23:35:52
2 streams pool COMPLETE GROW IMMEDIATE streams_pool_size 1536 1792 1792 24-03-16 23:35:53 24-03-16 23:35:53
2 DEFAULT buffer cache COMPLETE SHRINK IMMEDIATE db_cache_size 57344 57088 57088 24-03-16 23:35:53 24-03-16 23:35:53
2 streams pool COMPLETE GROW IMMEDIATE streams_pool_size 1536 1792 1792 24-03-16 23:35:53 24-03-16 23:35:54
2 DEFAULT buffer cache COMPLETE SHRINK IMMEDIATE db_cache_size 57344 57088 57088 24-03-16 23:35:53 24-03-16 23:35:54
2 streams pool COMPLETE GROW IMMEDIATE streams_pool_size 1536 1792 1792 24-03-16 23:35:53 24-03-16 23:35:55
2 DEFAULT buffer cache COMPLETE SHRINK IMMEDIATE db_cache_size 57344 57088 57088 24-03-16 23:35:53 24-03-16 23:35:55
2 DEFAULT buffer cache COMPLETE SHRINK IMMEDIATE db_cache_size 57088 56832 56832 24-03-16 23:35:56 24-03-16 23:35:57
2 streams pool COMPLETE GROW IMMEDIATE streams_pool_size 1792 2048 2048 24-03-16 23:35:56 24-03-16 23:35:57
2 streams pool COMPLETE GROW IMMEDIATE streams_pool_size 2048 2304 2304 24-03-16 23:35:58 24-03-16 23:35:58
2 DEFAULT buffer cache COMPLETE SHRINK IMMEDIATE db_cache_size 56832 56576 56576 24-03-16 23:35:58 24-03-16 23:35:58
2 streams pool COMPLETE GROW IMMEDIATE streams_pool_size 2048 2304 2304 24-03-16 23:35:58 24-03-16 23:36:42
2 DEFAULT buffer cache COMPLETE SHRINK IMMEDIATE db_cache_size 56832 56576 56576 24-03-16 23:35:58 24-03-16 23:36:42
2 DEFAULT buffer cache COMPLETE SHRINK IMMEDIATE db_cache_size 56832 56576 56576 24-03-16 23:35:58 24-03-16 23:36:45
2 streams pool COMPLETE GROW IMMEDIATE streams_pool_size 2048 2304 2304 24-03-16 23:35:58 24-03-16 23:36:45
2 DEFAULT buffer cache COMPLETE SHRINK IMMEDIATE db_cache_size 56576 56320 56320 24-03-16 23:37:12 24-03-16 23:37:13
2 streams pool COMPLETE GROW IMMEDIATE streams_pool_size 2304 2560 2560 24-03-16 23:37:12 24-03-16 23:37:13
2 java pool COMPLETE SHRINK DEFERRED java_pool_size 1792 1536 1536 24-03-18 00:44:20 24-03-18 00:44:20
2 DEFAULT buffer cache COMPLETE GROW DEFERRED db_cache_size 56320 56576 56576 24-03-18 00:44:20 24-03-18 00:44:20
2 DEFAULT buffer cache COMPLETE GROW DEFERRED db_cache_size 56576 56832 56832 24-03-18 15:45:37 24-03-18 15:45:37
2 java pool COMPLETE SHRINK DEFERRED java_pool_size 1536 1280 1280 24-03-18 15:45:37 24-03-18 15:45:37
2 DEFAULT buffer cache COMPLETE GROW DEFERRED db_cache_size 56832 57088 57088 24-03-19 00:42:46 24-03-19 00:42:46
2 java pool COMPLETE SHRINK DEFERRED java_pool_size 1280 1024 1024 24-03-19 00:42:46 24-03-19 00:42:46
2 DEFAULT buffer cache ERROR SHRINK IMMEDIATE db_cache_size 57088 56832 57088 24-03-19 09:27:06 24-03-19 09:27:30
2 large pool ERROR GROW IMMEDIATE large_pool_size 16384 16640 16384 24-03-19 09:27:06 24-03-19 09:27:30
2 DEFAULT buffer cache COMPLETE SHRINK IMMEDIATE db_cache_size 57088 56832 56832 24-03-19 09:27:41 24-03-19 09:27:44
2 large pool COMPLETE GROW IMMEDIATE large_pool_size 16384 16640 16640 24-03-19 09:27:41 24-03-19 09:27:44
###alert log
2024-03-19T09:26:25.344548+08:00
LOGMINER: End mining logfile for session 1 thread 2 sequence 506739, +DATA/db02/ONLINELOG/group26_261.log
2024-03-19T09:26:27.913945+08:00
LOGMINER: Begin mining logfile for session 1 thread 2 sequence 506740, +DATA/db02/ONLINELOG/group27_271.log
2024-03-19T09:26:30.445225+08:00
Archived Log entry 1763450 added for T-2.S-506739 ID 0x2611a2c0 LAD:1
2024-03-19T09:27:44.459395+08:00
public DBA buffer size has already reached the maximum size of 231112704. <<<<<<09:27:44,达到最大值。
public DBA buffer size has already reached the maximum size of 231112704.
public DBA buffer size has already reached the maximum size of 231112704.
public DBA buffer size has already reached the maximum size of 231112704.
public DBA buffer size has already reached the maximum size of 231112704.
public DBA buffer size has already reached the maximum size of 231112704.
public DBA buffer size has already reached the maximum size of 231112704.
####ctwr进程情况
db02:/home/oracle(db022)$ps -ef|grep ctwr
oracle 11731356 1 7 Mar 16 - 35:39 ora_ctwr_db022
oracle 66781778 36112030 0 10:09:51 pts/8 0:00 grep ctwr
db02:/home/oracle(db022)$ora log
'TAIL-1000F'||A.VALUE||'/'||'ALERT_'||B.INSTANCE_NAME||'.LOG'
----------------------------------------------------------------------------------------------------
tail -1000f /oracle/app/oracle/diag/rdbms/db02/db022/trace/alert_db022.log
db02:/home/oracle(db022)$cd /oracle/app/oracle/diag/rdbms/db02/db022/trace/
db02:/oracle/app/oracle/diag/rdbms/db02/db022/trace(db022)$
db02:/oracle/app/oracle/diag/rdbms/db02/db022/trace(db022)$ls *ctwr*
db022_ctwr_11731356.trc db022_ctwr_11731356.trm
db02:/oracle/app/oracle/diag/rdbms/db02/db022/trace(db022)$ls -l *ctwr*
-rw-r----- 1 oracle asmadmin 22475 Mar 19 09:28 db022_ctwr_11731356.trc
-rw-r----- 1 oracle asmadmin 5406 Mar 19 09:28 db022_ctwr_11731356.trm
db02:/oracle/app/oracle/diag/rdbms/db02/db022/trace(db022)$
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production
Build label: RDBMS_12.2.0.1.0_AIX.PPC64_170125.170330
ORACLE_HOME: oracle/app/oracle/product/12.2.0/db
System name: AIX
Node name: db02
Release: 2
Version: 7
Machine: 00C057F84C00
Instance name: db022
Redo thread mounted by this instance: 2
Oracle process number: 111
Unix process pid: 11731356, image: oracle@ db02 (CTWR)
*** SESSION ID:(10435.14611) 2024-03-19T09:27:05.997453+08:00
*** CLIENT ID:() 2024-03-19T09:27:05.997469+08:00
*** SERVICE NAME:(SYS$BACKGROUND) 2024-03-19T09:27:05.997478+08:00
*** MODULE NAME:() 2024-03-19T09:27:05.997486+08:00
*** ACTION NAME:() 2024-03-19T09:27:05.997493+08:00
*** CLIENT DRIVER:() 2024-03-19T09:27:05.997500+08:00
*** TRACE FILE RECREATED AFTER BEING REMOVED ***
*** 2024-03-19T09:27:05.945577+08:00
*** 2024-03-19T09:27:06.048862+08:00
*** 2024-03-19T09:27:44.277588+08:00
*** 2024-03-19T09:27:44.459500+08:00
分析:BCT 是使用的large pool 来记录下数据块变化的。
### 问题时间段,回滚段争用的SQL
SQL> select sql_id,count(*)
2 from dba_hist_active_sess_history
3 where to_char(sample_time,'yyyymmdd hh24:mi:ss') between '20240319 09:20:00' and '20240319 09:40:00'
4 and event = 'enq: US - contention'
5 group by sql_id having count(*) >10
6 order by count(*);
SQL_ID COUNT(*)
------------- ----------
4znn9r0p7umd9 11
7a7xv00dvrxvu 19
azw81fzvdfwca 20
0rs28rxgacsf1 20
czvbh2m27qs8p 21
6t2846xks3a81 22
44stykyzw08fa 23
94mjrjps3pr6u 25
cmnwf6d828r63 26
81v5pbhnj71zc 30
73n4whp1snxkf 31
SQL_ID COUNT(*)
------------- ----------
2gxg2xxc0pc2w 40
cynayy51r7zpt 48
99b0p63w1zrv8 57
gfqwhjsjvkhr5 66
3607cxxpdqdtp 66
1x6vr8rj0f8jf 67
1dtvp0yk8214q 69
fyy6aquyrjats 77
138a4dcps6ssa 87
2kgaujdnr44rs 107
4mvbqc4zp3g13 109
SQL_ID COUNT(*)
------------- ----------
ac5aahr706km9 126
5t52qm5vhzvag 141
dpk5hxj79y5fd 166
ay26uw9mxy2hy 243
aty5x3fkfrqhb 526
4wgq4vxaf18zp 639
2q83h41tbg7rn 1952
cwzr69ka04rms 3507
30 rows selected
分析总结及建议
1)在09:27分时,数据库出现大量enq: CT - CTWR process start/stop及少量的enq: SQ – contention等待事件。启用BCT功能,用于Rman备份的块跟踪功能。
2)在09:27:07 出现 大量的library cache lock 等待事件,持续到09:28:01时。
3)在09:28:11时,enq: CT - CTWR process start/stop消失 ,enq: US – contention和enq: TX - index contention事件开始出现,在09:28:52时达到最大值。09:29:03时, enq: IV - contention开始出现。
4)在09:32:55时, 数据库恢复正常。期间未手动处理会话。
enq: CT - CTWR process start/stop 是针对BCT的等待事件,在alert日志中显示了public DBA buffer size has already reached the maximum size of 231112704信息,同时查询SGA在9:27分钟发现large pool有grow的操作,说明内存有抖动。 enq: US – contention 针对的online rollback segments的争用,当前数据库已经设置隐藏参数 _rollback_segment_count = 2000 。

本文作者:李 波(上海新炬中北团队)
本文来源:“IT那活儿”公众号

文章转载自IT那活儿,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




