公司在2017-04-14 :11:40分钟的时候交易缓慢报警了! 应用器报某个INSERT语句插不进去,在等待,等待了6分钟后才OK!
立即做了个ASH报表,做的时候已经12:30分了,为此做个50分钟之内的报表。
Top User Events
| Event | Event Class | % Event | Avg Active Sessions |
|---|---|---|---|
| row cache lock | Concurrency | 69.98 | 29.27 |
| log file switch (checkpoint incomplete) | Configuration | 14.67 | 6.14 |
| buffer busy waits | Concurrency | 4.74 | 1.98 |
| CPU + Wait for CPU | CPU | 3.85 | 1.61 |
| enq: KO - fast object checkpoint | Application | 1.21 | 0.51 |
从事件上看 ROW CACHE LOCK占用很大的比例,为此想到的是插入语句的序列CACHE SIZE的值,经过检查全为0!为此建议开发人员增加该值大小。
不过冷静下来想了想,这个值一直都是0,而且运行近两年的时间,也没有发生过该问题,主要并发量小,而一直以来交易量保存每天一万笔。 所以必然还有其它的原因。 我们继续看第二个等待事件是:log file switch (...) 日志发生了切换。本库是JAVA开发人员按照默认方式安装的。日志才三组,每组大小50MB,而且11.2.0.1版本ORACLE会不到50MB就发生切换,基本上是25MB切换 一次。平时也是这样的,其中挂号的是什么鬼呢? checkpiont incomplete 增量检查点,对日志切换不仅是把日志归档成归档日志造成的IO外,还有发生检查点事件,触发DBWR进程写入脏数据到数据文件中的IO消耗量。最后个等待事件是ENQ:KO-FAST OBJECT CHECKPOINT。另外个BUFFER BUSY WAITS一般表示热点数据块,内存中的内存块获取之类的。
Top Background Events
| Event | Event Class | % Activity | Avg Active Sessions |
|---|---|---|---|
| db file async I/O submit | System I/O | 1.50 | 0.63 |
背景事件是异步IO提交,似乎没啥
Top Event P1/P2/P3 Values
| Event | % Event | P1 Value, P2 Value, P3 Value | % Activity | Parameter 1 | Parameter 2 | Parameter 3 |
|---|---|---|---|---|---|---|
| row cache lock | 69.98 | "13","0","5" | 69.98 | cache id | mode | request |
| buffer busy waits | 5.18 | "1","668","1" | 1.91 | file# | block# | class# |
| db file async I/O submit | 1.50 | "3618","0","0" | 0.04 | requests | interrupt | timeout |
| enq: KO - fast object checkpoint | 1.21 | "1263468550","65709","1" | 0.04 | name|mode | 2 | 0 |
--查询rowcache 名称
select * from v$rowcache where cache# =13;
13 PARENT dc_sequences 38 38 0 86378558 7657 0 0 0 86378557 86378557 0 0 0
Top SQL Command Types
| SQL Command Type | Distinct SQLIDs | % Activity | Avg Active Sessions |
|---|---|---|---|
| INSERT | 9 | 71.28 | 29.82 |
| SELECT | 95 | 18.64 | 7.80 |
| UPDATE | 8 | 4.42 | 1.85 |
| DELETE | 1 | 1.41 | 0.59 |
插入语句影响重大
Top SQL with Top Events
| SQL ID | Planhash | Sampled # of Executions | % Activity | Event | % Event | Top Row Source | % RwSrc | SQL Text |
|---|---|---|---|---|---|---|---|---|
| fxp5m3qftzm88 | 2651026499 | 485 | 58.39 | row cache lock | 57.69 | SEQUENCE | 57.69 | INSERT INTO CCPS_UNNORMAL_TRAD... |
| f5hzarwyhpfvw | 2510948005 | 138 | 11.20 | log file switch (checkpoint incomplete) | 7.00 | TABLE ACCESS - BY INDEX ROWID | 7.00 | select trd.tr_no, trd.tr_refer... |
| CPU + Wait for CPU | 3.13 | TABLE ACCESS - BY INDEX ROWID | 2.95 | |||||
| 5g9cws88pf1h3 | 2744291311 | 78 | 10.09 | row cache lock | 9.45 | SEQUENCE | 9.45 | INSERT INTO CCPS_CREDITINFO_RE... |
| 6n4kmsrhkwtuv | 2418717600 | 12 | 1.50 | log file switch (checkpoint incomplete) | 1.50 | UPDATE | 1.50 | UPDATE CCPS_TRADERECORD SET TR... |
| d5mkmt2q9a209 | 424208048 | 11 | 1.47 | row cache lock | 0.84 | SEQUENCE | 0.84 | insert into PBS_FIRST_FAILURE_... |
这里看到具体语句与等待事件的关联
Top SQL with Top Row Sources
| SQL ID | PlanHash | Sampled # of Executions | % Activity | Row Source | % RwSrc | Top Event | % Event | SQL Text |
|---|---|---|---|---|---|---|---|---|
| fxp5m3qftzm88 | 2651026499 | 485 | 58.39 | SEQUENCE | 57.69 | row cache lock | 57.69 | INSERT INTO CCPS_UNNORMAL_TRAD... |
| f5hzarwyhpfvw | 2510948005 | 138 | 11.20 | TABLE ACCESS - BY INDEX ROWID | 11.01 | log file switch (checkpoint incomplete) | 7.00 | select trd.tr_no, trd.tr_refer... |
| 5g9cws88pf1h3 | 2744291311 | 78 | 10.09 | SEQUENCE | 9.45 | row cache lock | 9.45 | INSERT INTO CCPS_CREDITINFO_RE... |
| 6n4kmsrhkwtuv | 2418717600 | 12 | 1.50 | UPDATE | 1.50 | log file switch (checkpoint incomplete) | 1.50 | UPDATE CCPS_TRADERECORD SET TR... |
| d5mkmt2q9a209 | 424208048 | 11 | 1.47 | SEQUENCE | 0.84 | row cache lock | 0.84 | insert into PBS_FIRST_FAILURE_... |
Top Sessions
| Sid, Serial# | % Activity | Event | % Event | User | Program | # Samples Active | XIDs |
|---|---|---|---|---|---|---|---|
| 196, 1 | 1.50 | db file async I/O submit | 1.50 | SYS | oracle@oraclemain (DBW0) | 189/300 [ 63%] | 0 |
| 630,32770 | 1.02 | log file switch (checkpoint incomplete) | 0.43 | OSSC | JDBC Thin Client | 54/300 [ 18%] | 0 |
这里看到是IO不够用
Top Blocking Sessions
| Blocking Sid (Inst) | % Activity | Event Caused | % Event | User | Program | # Samples Active | XIDs |
|---|---|---|---|---|---|---|---|
| 548,52773( 1) | 32.18 | row cache lock | 31.13 | OSSC | JDBC Thin Client | 69/300 [ 23%] | 0 |
| buffer busy waits | 1.05 | ||||||
| 326,11183( 1) | 22.58 | row cache lock | 21.41 | OSSC | JDBC Thin Client | 67/300 [ 22%] | 0 |
| buffer busy waits | 1.18 | ||||||
| 294, 1( 1) | 15.55 | log file switch (checkpoint incomplete) | 14.85 | SYS | oracle@oraclemain (LGWR) | 27/300 [ 9%] | 0 |
| 121,49180( 1) | 4.55 | row cache lock | 4.55 | OSSC | JDBC Thin Client | 42/300 [ 14%] | 0 |
| 632, 5109( 1) | 4.52 | row cache lock | 4.45 | OSSC | JDBC Thin Client | 54/300 [ 18%] | 1 |
这里列举的是TOP阻塞会话,当没有列出之间的阻塞关系!
平时AWR报表TOP事件
Top 5 Timed Foreground Events
| Event | Waits | Time(s) | Avg wait (ms) | % DB time | Wait Class |
|---|---|---|---|---|---|
| DB CPU | 125,832 | 82.74 | |||
| db file sequential read | 2,266,724,412 | 14,841 | 0 | 9.76 | User I/O |
| read by other session | 501,071,537 | 8,295 | 0 | 5.45 | User I/O |
| log file sync | 211,543 | 2,358 | 11 | 1.55 | Commit |
| direct path read | 2,618,600 | 706 | 0 | 0.46 | User I/O |
故障的时候 一天的AWR
Top 5 Timed Foreground Events
| Event | Waits | Time(s) | Avg wait (ms) | % DB time | Wait Class |
|---|---|---|---|---|---|
| DB CPU | 125,211 | 42.65 | |||
| row cache lock | 2,906 | 95,587 | 32893 | 32.56 | Concurrency |
| log file switch (checkpoint incomplete) | 164 | 21,510 | 131160 | 7.33 | Configuration |
| db file sequential read | 2,296,221,457 | 17,368 | 0 | 5.92 | User I/O |
| read by other session | 507,074,109 | 9,899 | 0 | 3.37 | User I/O |
从时间模型来看 平时大部分消耗在CPU上
Time Model Statistics
| Statistic Name | Time (s) | % of DB Time |
|---|---|---|
| sql execute elapsed time | 131,197.46 | 86.27 |
| DB CPU | 125,832.36 | 82.74 |
| RMAN cpu time (backup/restore) | 3,051.49 | 2.01 |
| parse time elapsed | 192.19 | 0.13 |
| hard parse elapsed time | 109.57 | 0.07 |
| PL/SQL execution elapsed time | 84.78 | 0.06 |
| sequence load elapsed time | 63.43 | 0.04 |
| connection management call elapsed time | 7.02 | 0.00 |
| hard parse (sharing criteria) elapsed time | 6.96 | 0.00 |
| hard parse (bind mismatch) elapsed time | 6.61 | 0.00 |
| inbound PL/SQL rpc elapsed time | 5.64 | 0.00 |
| PL/SQL compilation elapsed time | 2.44 | 0.00 |
| repeated bind elapsed time | 0.07 | 0.00 |
| DB time | 152,084.87 | |
| background elapsed time | 16,324.51 | |
| background cpu time | 3,717.39 |
故障时间模型序列LOAD时间消耗很多
Time Model Statistics
| Statistic Name | Time (s) | % of DB Time |
|---|---|---|
| sql execute elapsed time | 268,546.98 | 91.47 |
| DB CPU | 125,210.95 | 42.65 |
| sequence load elapsed time | 102,802.97 | 35.01 |
| parse time elapsed | 258.69 | 0.09 |
| RMAN cpu time (backup/restore) | 252.58 | 0.09 |
| PL/SQL execution elapsed time | 148.45 | 0.05 |
| hard parse elapsed time | 135.99 | 0.05 |
| hard parse (sharing criteria) elapsed time | 9.09 | 0.00 |
| hard parse (bind mismatch) elapsed time | 8.83 | 0.00 |
| connection management call elapsed time | 7.92 | 0.00 |
| inbound PL/SQL rpc elapsed time | 5.62 | 0.00 |
| PL/SQL compilation elapsed time | 3.32 | 0.00 |
| repeated bind elapsed time | 0.10 | 0.00 |
| DB time | 293,604.60 | |
| background elapsed time | 20,707.04 | |
| background cpu time | 1,032.05 |
负载
| Host Name | Platform | CPUs | Cores | Sockets | Memory (GB) |
|---|---|---|---|---|---|
| oraclemain | Linux x86 64-bit | 8 | 8 | 2 | 31.36 |
| Snap Id | Snap Time | Sessions | Cursors/Session | |
|---|---|---|---|---|
| Begin Snap: | 15639 | 12-Apr-17 01:00:57 | 305 | 2.0 |
| End Snap: | 15662 | 13-Apr-17 00:00:11 | 302 | 2.6 |
| Elapsed: | 1,379.23 (mins) | |||
| DB Time: | 2,534.75 (mins) |
| Snap Id | Snap Time | Sessions | Cursors/Session | |
|---|---|---|---|---|
| Begin Snap: | 15687 | 14-Apr-17 01:00:32 | 301 | 1.9 |
| End Snap: | 15710 | 15-Apr-17 00:00:01 | 299 | 2.2 |
| Elapsed: | 1,379.48 (mins) | |||
| DB Time: | 4,893.41 (mins) |
负载公式:DBTIME/ELAPSED/CPUS*100||'%'
故障时候:4,893.41 1,379.48/8*100=44.34%
平时:2,534.75/1379.48/8=22.96%
ASH系统活跃时间事件列表发现 11:45-11:50 之间的10分钟内,其他时间就恢复了正常
Activity Over Time
| Slot Time (Duration) | Slot Count | Event | Event Count | % Event |
|---|---|---|---|---|
| 11:39:18 (42 secs) | 132 | row cache lock | 63 | 0.50 |
| CPU + Wait for CPU | 28 | 0.22 | ||
| log file switch (checkpoint incomplete) | 19 | 0.15 | ||
| 11:40:00 (5.0 min) | 1,638 | row cache lock | 1,059 | 8.43 |
| log file switch (checkpoint incomplete) | 358 | 2.85 | ||
| buffer busy waits | 129 | 1.03 | ||
| 11:45:00 (5.0 min) | 3,363 | row cache lock | 2,789 | 22.20 |
| log file switch (checkpoint incomplete) | 333 | 2.65 | ||
| buffer busy waits | 219 | 1.74 | ||
| 11:50:00 (5.0 min) | 5,279 | row cache lock | 4,186 | 33.32 |
| log file switch (checkpoint incomplete) | 814 | 6.48 | ||
| buffer busy waits | 243 | 1.93 | ||
| 11:55:00 (5.0 min) | 1,647 | row cache lock | 694 | 5.52 |
| log file switch (checkpoint incomplete) | 342 | 2.72 | ||
| CPU + Wait for CPU | 182 | 1.45 | ||
| 12:00:00 (5.0 min) | 178 | enq: KO - fast object checkpoint | 61 | 0.49 |
| CPU + Wait for CPU | 60 | 0.48 | ||
| db file async I/O submit | 31 | 0.25 | ||
| 12:05:00 (5.0 min) | 118 | CPU + Wait for CPU | 34 | 0.27 |
| db file async I/O submit | 30 | 0.24 | ||
| enq: KO - fast object checkpoint | 16 | 0.13 | ||
| 12:10:00 (5.0 min) | 59 | CPU + Wait for CPU | 34 | 0.27 |
| db file async I/O submit | 8 | 0.06 | ||
| enq: KO - fast object checkpoint | 6 | 0.05 | ||
| 12:15:00 (5.0 min) | 63 | CPU + Wait for CPU | 45 | 0.36 |
| db file async I/O submit | 6 | 0.05 | ||
| db file sequential read | 4 | 0.03 | ||
| 12:20:00 (5.0 min) | 42 | CPU + Wait for CPU | 32 | 0.25 |
| read by other session | 4 | 0.03 | ||
| db file async I/O submit | 2 | 0.02 | ||
| 12:25:00 (4.4 min) | 43 | CPU + Wait for CPU | 38 | 0.30 |
| LNS wait on SENDREQ | 1 | 0.01 | ||
| control file parallel write | 1 | 0.01 |
再对照AWR报表的LOAD PROFILE
平时Load Profile
| Per Second | Per Transaction | Per Exec | Per Call | |
|---|---|---|---|---|
| DB Time(s): | 1.8 | 0.6 | 0.07 | 0.02 |
| DB CPU(s): | 1.5 | 0.5 | 0.05 | 0.01 |
| Redo size: | 11,543.2 | 3,670.9 | ||
| Logical reads: | 553,397.7 | 175,987.7 | ||
| Block changes: | 52.9 | 16.8 | ||
| Physical reads: | 33,193.0 | 10,555.8 | ||
| Physical writes: | 9.2 | 2.9 | ||
| User calls: | 105.7 | 33.6 | ||
| Parses: | 23.3 | 7.4 | ||
| Hard parses: | 0.4 | 0.1 | ||
| W/A MB processed: | 0.8 | 0.3 | ||
| Logons: | 0.1 | 0.0 | ||
| Executes: | 27.8 | 8.9 | ||
| Rollbacks: | 1.1 | 0.3 | ||
| Transactions: | 3.1 |
故障Load Profile
| Per Second | Per Transaction | Per Exec | Per Call | |
|---|---|---|---|---|
| DB Time(s): | 3.6 | 1.0 | 0.12 | 0.03 |
| DB CPU(s): | 1.5 | 0.4 | 0.05 | 0.01 |
| Redo size: | 18,203.0 | 5,029.0 | ||
| Logical reads: | 559,076.9 | 154,458.5 | ||
| Block changes: | 113.9 | 31.5 | ||
| Physical reads: | 33,314.6 | 9,204.0 | ||
| Physical writes: | 19.1 | 5.3 | ||
| User calls: | 121.1 | 33.5 | ||
| Parses: | 25.2 | 7.0 | ||
| Hard parses: | 0.5 | 0.1 | ||
| W/A MB processed: | 0.6 | 0.2 | ||
| Logons: | 0.1 | 0.0 | ||
| Executes: | 30.8 | 8.5 | ||
| Rollbacks: | 1.2 | 0.3 | ||
| Transactions: | 3.6 |
故障的时候发生了大量的更改,造成脏数据比较多!
通过追查DBA_HIST_ACTIVE_SESS_HISTORY查看之间的阻塞关系
从这里发现 log file switch (checkpoint incomplete) 事件阻塞了 insert 交易表;
而insert 交易表(686,60887)阻塞了 insert CREDITINFO_RECURRING表 导致buffer busy waits;
并且还阻塞同类insert 交易表导致row cache lock;
SID=347 是个查询语句 该语句查看了下是全表扫描,它等待ENQ:KO- FAST-OBEJCT CHECKPOINT;并且该语句查询表是CREDITINFO_RECURRING,大小520MB
该等待事件是由于当进行TABLE FULL SCAN或并行查询整个段时,对象级别发生checkpoint,由于direct path read必须要从磁盘中读入到PGA中,因此必须等待checkpoint完成,将脏块写回磁盘。相当于写阻塞了读。当发生该等待事件,一个简单的查询就将变得非常慢。难道我没有关掉直接路径读取参数?经检查虽然是开着的
Log file switch(checkpoint incomplete)等待事件
Oracle日志切换会产生一个增量检查点,但这个检查点,不同于alter system checkpoint,后者会启动dbwn进程,将内存中的已修改的数据立即写入数据文件。但前者不会,前者只需要保证整个日志组一轮切换后,比如日志组2,切换到3,然后1,在到第二组的时候,必须将“脏”数据写入数据文件(因为如果此时还没有写入数据库,再次由1切换到2的时候,假设此后实例崩溃,就造成数据的丢失)。
检查点的工作职责是通知dbwn写数据,更新某些数据文件头信息,更新控制文件信息。既然是日志切换时检查点未完成。那么基本上就是日志切换了一圈,dbwn进程还没有完成工作。由此可见原因如下
1.日志文件过小
2.日志组过少
3.dbwn进程工作的慢 (oracle11g中db_write_process参数是cpu数/8得来的,一般情况下不去修改此参数)
增加日志组
alter database add logfile group 4 '? REDO04.LOG' size 1g;
最终是加了4组1g的日志文件
alter system switch logfile;
alter database drop logfile group 1;
对于要删除的日志组,除了当前组不能删除,还有一种active状态下的文件组不可删除,因为该日志组的信息还没有完成checkpoint。
alter system checkpoint;
在切换日志删除
Oracle给出的log file awitch 类型的等待事件有五种:
目前解决的办法:
1.添加日志组
2.按照实际情况增大日志组成员的大小。或者同时做1,2
服务器进程正要写入重做记录的时刻,若重做日志文件已满不能继续写入操作,则进程想LGWR请求执行对日志文件的切换。服务器进程由于LGWR,直到日志文件切换结束为止,需等待log file switch completion事件。但日志文件的切换结束时,如果将要投入使用的重做日志文件,还有没有完成的工作,就需要另外等待如下事件。
(1)如果对欲重新使用的重做日志文件尚未结束检查点,进程就应该等待由DBWR来结束检查点。这时,进程将等待log file switch (checkpoint incomplete)事件。
(2)如果对欲重新使用的重做日志文件尚未完成对党工作,进程就应该等待ARCH进程来结束归档工作。这时,进程将等待log file switch (archiving needed)事件。这个事件只在归档模式数据库上发生。
(3)如果对于欲重新使用的重做日志文件尚未完成对private strand的flush工作,就应该等待这个工作结束。这时,进程将等待log file switch (private strand flush incomplete)事件。
以上的三种等待现象在重做日志文件被循环使用的情况下,将生成许多重做数据,所以在尚未完成工作就重新使用时发生。因此这些等待现象一向是与log file switch completion等待现象一起出现的。准确的说,服务器进程首先等待log file switch completion事件,特殊情况下还会等待log file switch (checkpoint incomplete)、log file switch (archiving needed)和log file switch (private strand flush incomplete)事件。
因为名字相似,所以给管理人员带来了相当混乱的几个等待现象,发生原因和解决方式相同。发生原因是比起事务所创建的重做数据,重做日志文件过小。所以解决方法是将重做日志文件的大小调整为足够大。而且,使用Direct load operation或nologging选项对减少重做数据的量也是有帮助的。
我们理下思绪来
1 insert 语句 在等待 row cache lock 阻塞它的是 另外一条 同样并非的insert语句 同一张表
2 最上阻塞insert 语句在等待 log file switch (checkpoint incomplete)
3 log file switch (checkpoint incomplete) 等待事件是说 我要进行日志组切换了,需要DBWR把脏数据写回磁盘数据文件中。在完成之前日志不能写redo log file。 而应用程序JAVA是插入并提交,在提交过程中等待日志写,日志写又等待日志切换,日志切换在等待DBWR写。在数据库层次上没有该提交行为,为批量提交不等待。
SQL> show parameter commit
NAME TYPE VALUE
------------------------------ ----------- ------------------------------
commit_logging string
commit_point_strength integer 1
commit_wait string
commit_write string
SQL>
这里就有个疑问了 是什么影响了DBWR和 LOG 写IO 带宽的影响?
因为平时的时候都没啥问题啊
从这个日志归档切换来看, 2017-04-14日11点产出10个归档,是整个一天最大的量。如果说因为切换太频繁导致阻塞的话,就说不过去,虽然11G默认安装3组日志,每组50MB,平时25MB切换一下。可你看红色框的其他时间点切换也很频繁的。尤其是我在做索引重建压缩时产生110个归档日志,也没报警阻塞了交易业务。
从历史活跃会话还发现个一个全表查询产生了直接路径读取,读取前要把该表的内存块写回数据文件中。该表有520MB。
发现这个语句也比较经常运行!好像也不是主犯!
从故障AWR TOP SQL 中发现有个UPDATE语句每次执行了150秒
SQL ordered by Elapsed Time
| Elapsed Time (s) | Executions | Elapsed Time per Exec (s) | %Total | %CPU | %IO | SQL Text |
|---|---|---|---|---|---|---|
| 3,069.77 | 20 | 153.49 | 1.05 | 4.02 | 40.37 | UPDATE CCPS_TRADERECORD T SET ... |
每次CPU花费时间不多
SQL ordered by CPU Time
| CPU Time (s) | Executions | CPU per Exec (s) | %Total | Elapsed Time (s) | %CPU | %IO | SQL Text |
|---|---|---|---|---|---|---|---|
| 123.48 | 20 | 6.17 | 0.10 | 3,069.77 | 4.02 | 40.37 | UPDATE CCPS_TRADERECORD T SET ... |
| User I/O Time (s) | Executions | UIO per Exec (s) | %Total | Elapsed Time (s) | %CPU | %IO | SQL Text |
|---|---|---|---|---|---|---|---|
| 21,805.12 | 3,675 | 5.93 | 75.11 | 125,315.54 | 87.23 | 17.40 | select trd.tr_no, trd.tr_refer... |
| 1,239.21 | 20 | 61.96 | 4.27 | 3,069.77 | 4.02 | 40.37 | UPDATE CCPS_TRADERECORD T SET ... |
SQL ordered by Reads
| Physical Reads | Executions | Reads per Exec | %Total | Elapsed Time (s) | %CPU | %IO | SQL Id | SQL Text |
|---|---|---|---|---|---|---|---|---|
| 2,151,293 | 20 | 107,564.65 | 0.08 | 3,069.77 | 4.02 | 40.37 | 42xh9862twhsh | UPDATE CCPS_TRADERECORD T SET ... |
通过历史性关联查询出来这个语句实际的情况
select p.begin_interval_time, s.*,b.*
from dba_hist_sqlstat s
inner join dba_hist_snapshot p on p.snap_id = s.snap_id
left join dba_hist_sqlbind b on s.snap_id = b.snap_id and s.sql_id = b.sql_id
where s.sql_id = '42xh9862twhsh'
order by 1
当天执行了20次 累积影响了99万行,每次修改近5万行。自然造成大量的脏数据。从时间点上看很接近故障发生时间点。 而且查遍故障14日前后到19日所有的AWR报表中没有该语句再次执行。
经业务后台日志也发现其他同事手工修改执行该语句。因此该功能将被废弃。
CPU资源监控:
%steal:管理程序(hypervisor)为另一个虚拟进程提供服务而等待虚拟 CPU 的百分比
%nice:显示在用户级别,用于nice操作,所占用 CPU 总时间的百分比。
sar -u
Linux 2.6.32-504.23.4.el6.x86_64 (oraclemain) 04/14/17 _x86_64_ (8 CPU)
00:00:01 CPU %user %nice %system %iowait %steal %idle
10:00:01 all 17.76 0.00 3.15 9.47 0.00 69.62
10:10:01 all 18.76 0.00 3.43 27.63 0.00 50.18
10:20:01 all 18.48 0.00 3.37 23.54 0.00 54.62
10:30:01 all 16.83 0.00 2.69 3.61 0.00 76.87
10:40:01 all 17.59 0.00 2.61 1.80 0.00 78.00
10:50:01 all 20.04 0.00 3.04 1.73 0.00 75.19
11:00:01 all 18.18 0.00 2.78 1.90 0.00 77.14
11:10:01 all 16.20 0.00 2.92 2.03 0.00 78.85
11:20:01 all 15.84 0.00 2.91 1.59 0.00 79.65
11:30:01 all 15.54 0.00 2.94 9.14 0.00 72.38
11:40:01 all 15.35 0.00 3.39 19.69 0.00 61.57
11:50:01 all 8.52 0.00 2.18 12.12 0.00 77.17
12:00:01 all 22.85 0.00 3.39 10.39 0.00 63.37
12:10:01 all 20.00 0.00 3.57 13.66 0.00 62.77
12:20:01 all 16.21 0.00 3.12 3.28 0.00 77.39
12:30:01 all 14.58 0.00 2.59 1.55 0.00 81.28
19:10:01 all 17.89 0.00 4.50 18.83 0.00 58.78
那条UPDATE语句发生在10和11点两个时间段中,从上面可以看出IO等待比平时时段高,基本上是20%,在10点的达到27%,而故障点时候连续10-19之间。19点RMAN备份也是18.83%。可以说IO吞吐量还是可以的,难道会事IO并发能力不行?
系统交换活动信息监控
sar -W:
pswpin/s:每秒系统换入的交换页面(swap page)数量
pswpout/s:每秒系统换出的交换页面(swap page)数量
Linux 2.6.32-504.23.4.el6.x86_64 (oraclemain) 04/14/17 _x86_64_ (8 CPU)
00:00:01 pswpin/s pswpout/s
09:50:01 0.37 0.74
10:00:01 1.19 8.00
10:10:01 0.35 13.09
10:20:01 0.09 16.09
10:30:01 0.04 2.72
11:20:01 0.44 0.06
11:30:01 0.12 4.90
11:40:01 0.22 8.30
11:50:01 1.87 0.94
12:00:01 1.76 1.80
12:10:01 1.60 3.04
12:20:01 0.37 0.00
12:30:01 0.58 0.00
那个UPDATE对应两个时间点都发生了交换内存换到物理内存,自然增加了速度缓慢!
优化方案
1 增加序列CACHE值
2 增加两组日志
3 优化全表扫描增加索引
4 废弃该功能
AWR和ASH 报表使用方法
sqlplus / as sysdba
@?/rdbms/admin/awrrpt.sql;
@?/rdbms/admin/ashrpt.sql;
欢迎关注,欢迎留言,欢迎分享!




