一、现象描述
业务反馈,正常一条sql执行时间为7分钟左右,今天执行3小时仍然未执行完成。检查业务用户会话和sql,未发现存在性能问题。
二、数据库检查
1、数据库等待事件
I EVENT HCOUNT
-- ---------------------------------------- ----------
1 direct path read 28
log file sync 9
[CPU]:direct path read 4
SQL*Net message from client 2
PX Deq: Execute Reply 2
control file sequential read 2
DFS lock handle 2
SQL*Net message from dblink 2
[CPU]:PX Deq: Execution Msg 1
db file sequential read 1
OFS idle 1
[CPU]:SQL*Net message to client 1
2 DFS lock handle 2
SQL*Net message from dblink 2
[CPU]:ASM IO for non-blocking poll 1
[CPU]:PX Deq: Execution Msg 1
OFS idle 1
观察发现1时左右出现部分direct path read 和少量log file sync,看起来很正常的等待事件,但是注意到这两个等待事件均与io存在一定联系。决定导出awr报告整体检查一下。 业务人员的insert语句可能与log file sync有关。
2、DB TIME分析
Host Name | Platform | CPUs | Cores | Sockets | Memory (GB) |
---|---|---|---|---|---|
XXXXXX | Linux x86 64-bit | 136 | 80 | 4 | .51 |
Snap Id | Snap Time | Sessions | Cursors/Session | Instances | |
---|---|---|---|---|---|
Begin Snap: | 33733 | 01-Jun-23 00:00:51 | 4539 | 6.7 | 2 |
End Snap: | 33735 | 01-Jun-23 01:00:03 | 4513 | 6.8 | 2 |
Elapsed: | 59.19 (mins) | ||||
DB Time: | 5,808.30 (mins) |
计算得出AAS=5,808.30/59.19=98.44,得出DBtime约为98.44。
3、Load Profile分析
Load Profile
Per Second | Per Transaction | Per Exec | Per Call | |
---|---|---|---|---|
DB Time(s): | 98.1 | 0.1 | 0.00 | 0.00 |
DB CPU(s): | 11.6 | 0.0 | 0.00 | 0.00 |
Background CPU(s): | 0.9 | 0.0 | 0.00 | 0.00 |
Redo size (bytes): | 13,223,011.7 | 8,083.6 | ||
Logical read (blocks): | 1,216,670.5 | 743.8 | ||
Block changes: | 61,679.0 | 37.7 | ||
Physical read (blocks): | 248,251.2 | 151.8 | ||
Physical write (blocks): | 4,921.1 | 3.0 | ||
Read IO requests: | 4,221.4 | 2.6 | ||
Write IO requests: | 2,005.4 | 1.2 | ||
Read IO (MB): | 1,939.5 | 1.2 | ||
Write IO (MB): | 38.5 | 0.0 | ||
IM scan rows: | 0.0 | 0.0 | ||
Session Logical Read IM: | 0.0 | 0.0 | ||
Global Cache blocks received: | 63.5 | 0.0 | ||
Global Cache blocks served: | 248.5 | 0.2 | ||
User calls: | 39,401.9 | 24.1 | ||
Parses (SQL): | 1,875.6 | 1.2 | ||
Hard parses (SQL): | 6.3 | 0.0 | ||
SQL Work Area (MB): | 5.1 | 0.0 | ||
Logons: | 5.8 | 0.0 | ||
User logons: | 4.0 | 0.0 | ||
Executes (SQL): | 26,525.6 | 16.2 | ||
Rollbacks: | 292.5 | 0.2 | ||
Transactions: | 1,635.8 |
由Load Profile可以看到DB Time(s)=98.1,计算误差不大。已知DB TIME=DB CPU+前台等待等待消耗时间总和。
则DB CPU消耗比为11.6/98.1=11.82%,那么前台等待占用系统CPU为98.1-11.6=86.5。占比为88.18%。
看来DB TIME主要消耗在前台等待事件上。CPU并不是很繁忙,此时如果检查操作系统CPU负载应该是11.82左右。
由DB CPU/DB TIME=11.6/98.1=11.82%,说明数据库压力不在CPU上而在其他资源。又后文Top 10 Foreground Events by Total Wait Time也可以看出这台数据库不是CPU繁忙型数据库。 Per Transaction/s=0.1说明数据库事务量很小。看来只是由很少量的事务导致。
同时观察发现Logical read达到了1,216,670.5blocks=9.28G/s,Physical read为248,251blocks=1,939.46M/s=1.894G/s。IO很高。可以猜测数据库目前IO压力很大。
4、Wait Event分析
Top 10 Foreground Events by Total Wait Time
Event | Waits | Total Wait Time (sec) | Avg Wait | % DB time | Wait Class |
---|---|---|---|---|---|
direct path read | 6,833,084 | 76.1K | 11.14ms | 21.8 | User I/O |
log file sync | 4,606,859 | 73.4K | 15.94ms | 21.1 | Commit |
read by other session | 7,578,839 | 70.4K | 9.29ms | 20.2 | User I/O |
DB CPU | 41.3K | 11.8 | |||
latch: cache buffers chains | 3,671,533 | 24.4K | 6.65ms | 7.0 | Concurrency |
db file sequential read | 5,073,866 | 23.7K | 4.67ms | 6.8 | User I/O |
gc buffer busy acquire | 1,617,397 | 15.5K | 9.58ms | 4.4 | Cluster |
DFS lock handle | 134,504 | 6116.4 | 45.47ms | 1.8 | Other |
db file parallel read | 231,471 | 3728.7 | 16.11ms | 1.1 | User I/O |
log file switch completion | 4,238 | 3311.4 | 781.35ms | 1.0 | Configuration |
Wait Classes by Total Wait Time
Wait Class | Waits | Total Wait Time (sec) | Avg Wait Time | % DB time | Avg Active Sessions |
---|---|---|---|---|---|
User I/O | 37,273,106 | 174,732 | 4.69ms | 50.1 | 49.2 |
Commit | 4,606,863 | 73,423 | 15.94ms | 21.1 | 20.7 |
DB CPU | 41,267 | 11.8 | 11.6 | ||
Concurrency | 5,006,864 | 28,564 | 5.70ms | 8.2 | 8.0 |
Cluster | 2,474,600 | 15,886 | 6.42ms | 4.6 | 4.5 |
Other | 5,515,050 | 13,208 | 2.39ms | 3.8 | 3.7 |
System I/O | 7,352,088 | 9,984 | 1.36ms | 2.9 | 2.8 |
Configuration | 580,624 | 3,423 | 5.90ms | 1.0 | 1.0 |
Network | 133,976,692 | 1,815 | 13.55us | .5 | 0.5 |
Application | 787,056 | 92 | 117.25us | .0 | 0.0 |
Scheduler | 31 | 0 | 3.24ms | .0 | 0.0 |
**正常时间段AWR Event部分**
Top 10 Foreground Events by Total Wait Time
Event | Waits | Total Wait Time (sec) | Avg Wait | % DB time | Wait Class |
---|---|---|---|---|---|
log file sync | 4,367,701 | 21.1K | 4.84ms | 59.1 | Commit |
DB CPU | 12.2K | 34.0 | |||
DFS lock handle | 57,467 | 6152.1 | 107.05ms | 17.2 | Other |
db file sequential read | 2,015,152 | 1247.2 | 618.94us | 3.5 | User I/O |
inactive transaction branch | 84,879 | 1048.3 | 12.35ms | 2.9 | Other |
db file parallel read | 319,735 | 286.8 | 897.08us | .8 | User I/O |
SQL*Net break/reset to client | 937,966 | 94 | 100.18us | .3 | Application |
log file switch completion | 1,124 | 61.3 | 54.52ms | .2 | Configuration |
gc cr grant 2-way | 560,582 | 61.2 | 109.15us | .2 | Cluster |
control file sequential read | 132,926 | 59.7 | 449.48us | .2 | System I/O |
Wait Classes by Total Wait Time
Wait Class | Waits | Total Wait Time (sec) | Avg Wait Time | % DB time | Avg Active Sessions |
---|---|---|---|---|---|
Commit | 4,367,704 | 21,122 | 4.84ms | 59.1 | 5.9 |
DB CPU | 12,156 | 34.0 | 3.4 | ||
Other | 6,688,822 | 11,978 | 1.79ms | 33.5 | 3.3 |
System I/O | 4,611,304 | 1,700 | 368.74us | 4.8 | 0.5 |
User I/O | 2,709,399 | 1,600 | 590.50us | 4.5 | 0.4 |
Cluster | 1,266,518 | 267 | 210.72us | .7 | 0.1 |
Application | 940,842 | 96 | 102.13us | .3 | 0.0 |
Configuration | 179,494 | 68 | 381.21us | .2 | 0.0 |
Concurrency | 303,737 | 51 | 167.34us | .1 | 0.0 |
Network | 31,835,193 | 44 | 1.38us | .1 | 0.0 |
由Top 10 Foreground Events by Total Wait Time看到存在大量direct path read、log file sync、read by other session、latch: cache buffers chains、db file sequential read、gc buffer busy acquire等待,等待时间和次数均很高。且大部分为IO相关等待事件。
- direct path read:说明数据库由数据文件直接读取至pga,绕过buffer cache,会产生大量io。多数与全表扫描有关,可能产生的原因有并行查询、表过大等
- log file sync:redo向量由log buffer写入redo log,与io效率相关或者sql产生大量redo信息有关。
- read by other session:这个等待事件来自buffer busy waits分裂产生,11g之后版本存在read by other session和buffer busy waits。read by other session表示等待其他会话将该block读入buffer cache,也就是多个会话访问的数据块不在buffer cache中。
- buffer busy waits:表示这个数据块已经在buffer cache中,多个会话一起读取,但是这个数据块可能已经被pin住。
- latch: cache buffers chains:表示多个会话访问同一个cbc链表。有可能是同一个数据块,也可能是不同数据块但是被同一个cbc latch保护。
- db file sequential read:单块读
- gc buffer busy acquire:本节点会话访问远程节点数据块,但是该块正在被远程节点另外一个会话读取或者pin住
同时看到direct path read的平均等待时间达到11.14ms,log file sync平均等待时间15.94ms。同时对比正常时间段Commit类IO延迟约为4.84ms ,此时达到15.94ms。正常时间段USER IO类延迟约为590.50us,此时为4.69ms。说明数据库存在严重IO延迟。
Host CPU
CPUs | Cores | Sockets | Load Average Begin | Load Average End | %User | %System | %WIO | %Idle |
---|---|---|---|---|---|---|---|---|
136 | 80 | 4 | 11.98 | 19.29 | 8.9 | 2.7 | 5.7 | 88.2 |
Instance CPU
%Total CPU | %Busy CPU | %DB time waiting for CPU (Resource Manager) |
---|---|---|
9.4 | 79.3 | 0.0 |
IO Profile
Read+Write Per Second | Read per Second | Write Per Second | |
---|---|---|---|
Total Requests: | 7,701.7 | 4,272.4 | 3,429.3 |
Database Requests: | 6,226.8 | 4,221.4 | 2,005.4 |
Optimized Requests: | 0.0 | 0.0 | 0.0 |
Redo Requests: | 1,410.1 | 15.7 | 1,394.4 |
Total (MB): | 2,034.5 | 1,956.0 | 78.5 |
Database (MB): | 1,977.9 | 1,939.5 | 38.5 |
Optimized Total (MB): | 0.0 | 0.0 | 0.0 |
Redo (MB): | 42.0 | 15.4 | 26.6 |
Database (blocks): | 253,172.3 | 248,251.2 | 4,921.1 |
Via Buffer Cache (blocks): | 7,466.4 | 2,549.2 | 4,917.2 |
Direct (blocks): | 245,706.0 | 245,702.0 | 3.9 |
IO Profile
Read+Write Per Second | Read per Second | Write Per Second | |
---|---|---|---|
Total Requests: | 7,701.7 | 4,272.4 | 3,429.3 |
Database Requests: | 6,226.8 | 4,221.4 | 2,005.4 |
Optimized Requests: | 0.0 | 0.0 | 0.0 |
Redo Requests: | 1,410.1 | 15.7 | 1,394.4 |
Total (MB): | 2,034.5 | 1,956.0 | 78.5 |
Database (MB): | 1,977.9 | 1,939.5 | 38.5 |
Optimized Total (MB): | 0.0 | 0.0 | 0.0 |
Redo (MB): | 42.0 | 15.4 | 26.6 |
Database (blocks): | 253,172.3 | 248,251.2 | 4,921.1 |
Via Buffer Cache (blocks): | 7,466.4 | 2,549.2 | 4,917.2 |
Direct (blocks): | 245,706.0 | 245,702.0 | 3.9 |
由IO Profile部分也可以看到,IO压力主要来自Direct (blocks)部分,也就是这个前文提到的直接路径读。
5、Time Model
Statistic Name | Time (s) | % of DB Time | % of Total CPU Time |
---|---|---|---|
sql execute elapsed time | 266,976.48 | 76.61 | |
DB CPU | 41,267.22 | 11.84 | 93.01 |
parse time elapsed | 492.06 | 0.14 | |
connection management call elapsed time | 469.13 | 0.13 | |
hard parse elapsed time | 101.33 | 0.03 | |
sequence load elapsed time | 67.83 | 0.02 | |
PL/SQL execution elapsed time | 66.14 | 0.02 | |
inbound PL/SQL rpc elapsed time | 46.38 | 0.01 | |
RMAN cpu time (backup/restore) | 18.96 | 0.01 | 0.04 |
hard parse (sharing criteria) elapsed time | 9.38 | 0.00 | |
PL/SQL compilation elapsed time | 8.44 | 0.00 | |
hard parse (bind mismatch) elapsed time | 4.30 | 0.00 | |
failed parse elapsed time | 0.12 | 0.00 | |
repeated bind elapsed time | 0.10 | 0.00 | |
DB time | 348,497.74 | ||
background elapsed time | 15,250.23 | ||
background cpu time | 3,100.49 | 6.99 | |
total CPU time | 44,367.71 |
由时间模型部分可以看出,所有的等待均发生在sql执行上。
6、sql定位
Wait Class | Waits | %Time -outs | Total Wait Time (s) | Avg wait | %DB time |
---|---|---|---|---|---|
User I/O | 33,635,499 | 0 | 174,232 | 5.18ms | 50.00 |
Commit | 4,606,859 | 0 | 73,423 | 15.94ms | 21.07 |
DB CPU | 41,267 | 11.84 | |||
Concurrency | 4,757,628 | 0 | 28,497 | 5.99ms | 8.18 |
Cluster | 2,421,470 | 0 | 15,864 | 6.55ms | 4.55 |
Other | 2,523,748 | 2 | 10,088 | 4.00ms | 2.89 |
Configuration | 580,392 | 0 | 3,420 | 5.89ms | 0.98 |
Network | 133,311,696 | 0 | 1,811 | 13.58us | 0.52 |
System I/O | 87,968 | 0 | 299 | 3.40ms | 0.09 |
Application | 781,215 | 0 | 76 | 97.02us | 0.02 |
Scheduler | 31 | 0 | 0 | 3.24ms | 0.00 |
User I/O Time (s) | Executions | UIO per Exec (s) | %Total | Elapsed Time (s) | %CPU | %IO | SQL Id | SQL Module | SQL Text |
---|---|---|---|---|---|---|---|---|---|
74,083.15 | 37 | 2,002.25 | 42.40 | 79,983.03 | 7.22 | 92.62 | dxhqr0gqdymq6 | reprocessing_proc@t3-12 (TNS V1-V3) | t3. |
37,420.62 | 100 | 374.21 | 21.42 | 65,040.52 | 12.08 | 57.53 | 3tdshgkcnsrkp | t1-2 (TNS V1-V3) | SELECTt3 |
35,292.96 | 100 | 352.93 | 20.20 | 62,183.62 | 12.00 | 56.76 | 3naxcf9vfw6pb | t1-2 (TNS V1-V3) | SELECT ....... |
6,049.21 | 18,238 | 0.33 | 3.46 | 6,118.74 | 1.37 | 98.86 | 5nrfydnna14c9 | t1-2 (TNS V1-V3) | SELECT orcl.......(:... |
6,046.19 | 18,227 | 0.33 | 3.46 | 6,094.33 | 1.30 | 99.21 | 62aucwpgnjk9m | t1-2 (TNS V1-V3) | SELECT * FROM ...... |
5,934.97 | 12,533,839 | 0.00 | 3.40 | 6,292.34 | 3.91 | 94.32 | 2jgqnryypnru9 | score_turn@ljfmgnx-1 (TNS V1-V3) | select ...... |
3,013.67 | 1,048 | 2.88 | 1.72 | 3,266.44 | 7.47 | 92.26 | gnh9xtq72j73c | event_mv@t3-12 (TNS V1-V3) | MERGE INTO ...... |
2,656.72 | 1 | 2,656.72 | 1.52 | 3,057.36 | 13.16 | 86.90 | 8dz4mja4jf547 | oracle@t1 (TNS V1-V3) | SELECT /*+ OPAQUE_TRANSFORM PA... |
2,466.72 | 579,812 | 0.00 | 1.41 | 2,730.23 | 5.34 | 90.35 | cdzd9cwna6s5n | t3c@t3-11 (TNS V1-V3) | insert into ...... |
2,240.69 | 1,113,233 | 0.00 | 1.28 | 2,806.99 | 6.70 | 79.83 | 1ftpd1ckx73f0 | t3c@t3-6 (TNS V1-V3) | insert into ...... |
可以看到dxhqr0gqdymq6这条sql占整体io等待42.40%。检查这条sql看一下
log file sync 9
direct path read dxhqr0gqdymq6 30
0 2 SQL*Net message from dblink 38acyh9ad8asp 1
SQL*Net message from dblink f9f43ahgkf079 1
OFS idle 1
DFS lock handle 03555fs1w4dxz 1
[CPU]:PX Deq: Execution Msg d6cmxrws2zdth 1
DFS lock handle gsa7fruxqfcd3 1
[CPU]:PGA memory operation 9zg9qd9bm4spu 1
[CPU]:ASM IO for non-blocking poll
等待事件direct path read 对应的sql的确是这条sql。而且一般在oltp系统中,存在直接路径读不会是好现象。怀疑执行计划出现问题
三、sql分析
1、执行计划
SQL> select * from table(dbms_xplan.display_cursor('&sql_id',null,'advanced'));
Enter value for sql_id: dxhqr0gqdymq6
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID dxhqr0gqdymq6, child number 0
-------------------------------------
Plan hash value: 2670919594
---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 72 (100)| | | |
| 1 | PARTITION RANGE ALL | | 1 | 569 | 72 (0)| 00:00:01 | 1 | 35 |
|* 2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| yyyyyyyyyyyyyy | 1 | 569 | 72 (0)| 00:00:01 | 1 | 35 |
|* 3 | INDEX RANGE SCAN | xxxxxxxxxx | 1 | | 71 (0)| 00:00:01 | 1 | 35 |
---------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 179K(100)| | | | | | |
| 1 | PX COORDINATOR | | | | | | | | | | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 | 4 | 2276 | 179K (1)| 00:00:08 | | | Q1,00 | P->S | QC (RAND) |
| 3 | PX BLOCK ITERATOR | | 4 | 2276 | 179K (1)| 00:00:08 | 1 | 35 | Q1,00 | PCWC | |
|* 4 | TABLE ACCESS FULL| xxxxxx | 4 | 2276 | 179K (1)| 00:00:08 | 1 | 35 | Q1,00 | PCWP | |
------------------------------------------------------------------------------------------------------------------------------------
查看执行计划果然存在两个,第二个执行计划出现了全表扫描。
查看一下执行计划产生时间
SQL> SELECT distinct sql_id,plan_hash_value,to_char(timestamp, 'yyyy-mm-dd hh24:mi:ss') timestamp from dba_hist_sql_plan where sql_id='dxhqr0gqdymq6' order by timestamp;
SQL_ID PLAN_HASH_VALUE TIMESTAMP
------------------ --------------- --------------------
dxhqr0gqdymq6 2670919594 2022-11-20 00:50:02
dxhqr0gqdymq6 1342217163 2023-06-01 00:10:08
2023-06-01 00:10:08发生了执行计划变化,生成新的执行计划
2、历史执行情况
PLAN GET DISK WRITE ROWS ROWS USER_IO(MS) AVG(MS) ELA(MS) CPU(MS) CLUSTER(MS) PLSQL
END_TI I NAME HASH VALUE EXEC PRE EXEC PRE EXEC PER EXEC ROW_P PRE EXEC PRE FETCH PER EXEC IO_SPEED PRE EXEC PRE EXEC PER EXEC PER EXEC
------ - --------------- ------------- ---------- ------------ -------- -------- ----- ----------- --------- ----------- -------- -------- -------- ----------- --------
01 00 1 orcl 1342217163 12 24,182,514 ######## 0 28 2 1 ######## 10 ######## 152,397 0 0
01 00 1 orcl 2670919594 2 356,426 240 0 8 4 2 138 0 2,441 2,292 13 0
01 01 1 orcl 1342217163 23 26,766,445 ######## 0 2.4K 1,06 2 ######## 11 ######## 171,443 0 0
01 01 1 orcl 2670919594 0 0 0 0 0 0 0 0 0 0 0 0 0
01 01 1 orcl 1342217163 29 25,478,411 ######## 0 3.3K 1,16 2 ######## 9 ######## 158,599 0 0
01 01 1 orcl 2670919594 0 0 0 0 0 0 0 0 0 0 0 0 0
01 02 1 orcl 2670919594 0 0 0 0 0 0 0 0 0 0 0 0 0
01 02 1 orcl 1342217163 28 26,608,808 ######## 0 1.9K 70 2 ######## 9 ######## 165,539 0 0
01 02 1 orcl 1342217163 27 26,512,324 ######## 0 876 32 2 ######## 9 ######## 164,108 0 0
01 02 1 orcl 2670919594 0 0 0 0 0 0 0 0 0 0 0 0 0
01 03 1 orcl 2670919594 0 0 0 0 0 0 0 0 0 0 0 0 0
01 03 1 orcl 2670919594 0 0 0 0 0 0 0 0 0 0 0 0 0
01 03 1 orcl 1342217163 29 25,360,225 ######## 0 971 33 1 ######## 9 ######## 156,660 0 0
01 03 1 orcl 1342217163 28 0 0 0 1.0K 36 2 ######## 9 ######## 165,974 0 0
01 04 1 orcl 2670919594 0 0 0 0 0 0 0 0 0 0 0 0 0
01 04 1 orcl 1342217163 28 26,130,755 ######## 0 1.4K 50 2 ######## 9 ######## 161,758 0 0
01 04 1 orcl 1342217163 29 25,777,698 ######## 0 851 29 2 ######## 9 ######## 159,442 0 0
01 04 1 orcl 2670919594 0 0 0 0 0 0 0 0 0 0 0 0 0
29 01 1 orcl 2670919594 12 32,044 0 0 1.7K 1,48 1 0 0 111 110 0 0
29 01 1 orcl 2670919594 12 30,452 0 0 1.4K 1,21 1 0 0 104 103 0 0
29 02 1 orcl 2670919594 12 26,985 0 0 1.1K 92 1 0 0 87 86 0 0
29 08 1 orcl 2670919594 12 45,213 0 0 4.1K 3,47 1 0 0 146 145 0 0
29 08 1 orcl 2670919594 12 37,641 3 0 3K 2,50 1 1 23 125 123 0 0
29 09 1 orcl 2670919594 14 50,952 0 0 5.6K 4,05 1 0 0 169 168 0 0
既然是执行计划发生变化,旧的执行计划效率比新的好很多,那么修改为原先的执行计划就可以了
3、调整执行计划
SQL> @coe_xfr_sql_profile.sql
Parameter 1:
SQL_ID (required)
Enter value for 1: dxhqr0gqdymq6
PLAN_HASH_VALUE AVG_ET_SECS
--------------- -----------
2670919594 2.441
1342217163 2034.511
coe_xfr_sql_profile.sql计算出来的平均执行时间来看,新的执行计划的确比原始的执行计划慢很多,平均执行时间达到2034秒。
Parameter 2:
PLAN_HASH_VALUE (required)
Enter value for 2: 2670919594
Values passed to coe_xfr_sql_profile:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SQL_ID : "dxhqr0gqdymq6"
PLAN_HASH_VALUE: "2670919594"
SQL>BEGIN
2 IF :sql_text IS NULL THEN
3 RAISE_APPLICATION_ERROR(-20100, 'SQL_TEXT for SQL_ID &&sql_id. was not found in memory (gv$sqltext_with_newlines) or AWR (dba_hist_sqltext).');
4 END IF;
5 END;
6 /
SQL>SET TERM OFF;
SQL>BEGIN
2 IF :other_xml IS NULL THEN
3 RAISE_APPLICATION_ERROR(-20101, 'PLAN for SQL_ID &&sql_id. and PHV &&plan_hash_value. was not found in memory (gv$sql_plan) or AWR (dba_hist_sql_plan).');
4 END IF;
5 END;
6 /
SQL>SET TERM OFF;
Execute coe_xfr_sql_profile_dxhqr0gqdymq6_2670919594.sql
on TARGET system in order to create a custom SQL Profile
with plan 2670919594 linked to adjusted sql_text.
COE_XFR_SQL_PROFILE completed.
固定执行计划
SQL>@coe_xfr_sql_profile_dxhqr0gqdymq6_2670919594.sql
SQL>REM
SQL>REM $Header: 215187.1 coe_xfr_sql_profile_dxhqr0gqdymq6_2670919594.sql 11.4.4.4 2023/06/01 carlos.sierra $
SQL>REM
SQL>REM Copyright (c) 2000-2012, Oracle Corporation. All rights reserved.
SQL>REM
SQL>REM AUTHOR
SQL>REM carlos.sierra@oracle.com
SQL>REM
SQL>REM SCRIPT
SQL>REM coe_xfr_sql_profile_dxhqr0gqdymq6_2670919594.sql
SQL>REM
SQL>REM DESCRIPTION
SQL>REM This script is generated by coe_xfr_sql_profile.sql
SQL>REM It contains the SQL*Plus commands to create a custom
SQL>REM SQL Profile for SQL_ID dxhqr0gqdymq6 based on plan hash
SQL>REM value 2670919594.
SQL>REM The custom SQL Profile to be created by this script
SQL>REM will affect plans for SQL commands with signature
SQL>REM matching the one for SQL Text below.
SQL>REM Review SQL Text and adjust accordingly.
SQL>REM
SQL>REM PARAMETERS
SQL>REM None.
SQL>REM
SQL>REM EXAMPLE
SQL>REM SQL> START coe_xfr_sql_profile_dxhqr0gqdymq6_2670919594.sql;
SQL>REM
SQL>REM NOTES
SQL>REM 1. Should be run as SYSTEM or SYSDBA.
SQL>REM 2. User must have CREATE ANY SQL PROFILE privilege.
SQL>REM 3. SOURCE and TARGET systems can be the same or similar.
SQL>REM 4. To drop this custom SQL Profile after it has been created:
SQL>REM EXEC DBMS_SQLTUNE.DROP_SQL_PROFILE('coe_dxhqr0gqdymq6_2670919594');
SQL>REM 5. Be aware that using DBMS_SQLTUNE requires a license
SQL>REM for the Oracle Tuning Pack.
SQL>REM 6. If you modified a SQL putting Hints in order to produce a desired
SQL>REM Plan, you can remove the artifical Hints from SQL Text pieces below.
SQL>REM By doing so you can create a custom SQL Profile for the original
SQL>REM SQL but with the Plan captured from the modified SQL (with Hints).
SQL>REM
SQL>WHENEVER SQLERROR EXIT SQL.SQLCODE;
SQL>REM
SQL>VAR signature NUMBER;
SQL>VAR signaturef NUMBER;
SQL>REM
SQL>DECLARE
2 sql_txt CLOB;
3 h SYS.SQLPROF_ATTR;
4 PROCEDURE wa (p_line IN VARCHAR2) IS
5 BEGIN
6 DBMS_LOB.WRITEAPPEND(sql_txt, LENGTH(p_line), p_line);
7 END wa;
8 BEGIN
9 DBMS_LOB.CREATETEMPORARY(sql_txt, TRUE);
10 DBMS_LOB.OPEN(sql_txt, DBMS_LOB.LOB_READWRITE);
11 -- SQL Text pieces below do not have to be of same length.
12 -- So if you edit SQL Text (i.e. removing temporary Hints),
13 -- there is no need to edit or re-align unmodified pieces.
...
43 DBMS_LOB.CLOSE(sql_txt);
44 h := SYS.SQLPROF_ATTR(
45 q'[BEGIN_OUTLINE_DATA]',
46 q'[IGNORE_OPTIM_EMBEDDED_HINTS]',
47 q'[OPTIMIZER_FEATURES_ENABLE('19.1.0')]',
48 q'[DB_VERSION('19.1.0')]',
49 q'[OPT_PARAM('_px_adaptive_dist_method' 'off')]',
50 q'[OPT_PARAM('_optimizer_strans_adaptive_pruning' 'false')]',
51 q'[OPT_PARAM('_optimizer_nlj_hj_adaptive_join' 'false')]',
52 q'[OPT_PARAM('optimizer_dynamic_sampling' 7)]',
53 q'[ALL_ROWS]',
...
58 :signature := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE(sql_txt);
59 :signaturef := DBMS_SQLTUNE.SQLTEXT_TO_SIGNATURE(sql_txt, TRUE);
60 DBMS_SQLTUNE.IMPORT_SQL_PROFILE (
61 sql_text => sql_txt,
62 profile => h,
63 name => 'coe_dxhqr0gqdymq6_2670919594',
64 description => 'coe dxhqr0gqdymq6 2670919594 '||:signature||' '||:signaturef||'',
65 category => 'DEFAULT',
66 validate => TRUE,
67 replace => TRUE,
68 force_match => TRUE /* TRUE:FORCE (match even when different literals in SQL). FALSE:EXACT (similar to CURSOR_SHARING) */ );
69 DBMS_LOB.FREETEMPORARY(sql_txt);
70 END;
71 /
PL/SQL procedure successfully completed.
SQL>WHENEVER SQLERROR CONTINUE
SQL>SET ECHO OFF;
SIGNATURE
---------------------
216342918363871086
SIGNATUREF
---------------------
11874508927927845661
... manual custom SQL Profile has been created
COE_XFR_SQL_PROFILE_dxhqr0gqdymq6_2670919594 completed
SQL>SQL>
SQL>
SQL>
4、检查固定后的执行计划
SQL>select * from table(dbms_xplan.display_cursor('&sql_id',null,'advanced'));
Enter value for sql_id: dxhqr0gqdymq6
old 1: select * from table(dbms_xplan.display_cursor('&sql_id',null,'advanced'))
new 1: select * from table(dbms_xplan.display_cursor('dxhqr0gqdymq6',null,'advanced'))
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------
SQL_ID dxhqr0gqdymq6, child number 0
-------------------------------------
Plan hash value: 2670919594
---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 72 (100)| | | |
| 1 | PARTITION RANGE ALL | | 1 | 569 | 72 (0)| 00:00:01 | 1 | 35 |
|* 2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| XXXXXXXXXXXXXX | 1 | 569 | 72 (0)| 00:00:01 | 1 | 35 |
|* 3 | INDEX RANGE SCAN | IDX_YYYYYY | 1 | | 71 (0)| 00:00:01 | 1 | 35 |
---------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('19.1.0')
DB_VERSION('19.1.0')
OPT_PARAM('_px_adaptive_dist_method' 'off')
OPT_PARAM('_optimizer_strans_adaptive_pruning' 'false')
OPT_PARAM('_optimizer_nlj_hj_adaptive_join' 'false')
OPT_PARAM('optimizer_dynamic_sampling' 7)
ALL_ROWS
END_OUTLINE_DATA
*/
Predicate Information (identified by operation id):
---------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
Query Block Registry:
---------------------
<q o="2" f="y"><n><![CDATA[SEL$1]]></n><f><h><t><![CDATA[YYYYYYYYYYYYY]]></t><s><![CDATA[SEL$1]]></s></h></f></q>
SQL_ID dxhqr0gqdymq6, child number 1
-------------------------------------
Plan hash value: 2670919594
---------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 72 (100)| | | |
| 1 | PARTITION RANGE ALL | | 1 | 569 | 72 (0)| 00:00:01 | 1 | 35 |
|* 2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| YYYYYYYYYYYYY | 1 | 569 | 72 (0)| 00:00:01 | 1 | 35 |
|* 3 | INDEX RANGE SCAN | IDX_YYYYYYYYYYYYY | 1 | | 71 (0)| 00:00:01 | 1 | 35 |
---------------------------------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('19.1.0')
DB_VERSION('19.1.0')
OPT_PARAM('optimizer_dynamic_sampling' 7)
OPT_PARAM('_px_adaptive_dist_method' 'off')
OPT_PARAM('_optimizer_strans_adaptive_pruning' 'false')
OPT_PARAM('_optimizer_nlj_hj_adaptive_join' 'false')
ALL_ROWS
END_OUTLINE_DATA
*/
Predicate Information (identified by operation id):
---------------------------------------------------
Column Projection Information (identified by operation id):
-----------------------------------------------------------
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 10
---------------------------------------------------------------------------
0 - STATEMENT
- ALL_ROWS
- DB_VERSION('19.1.0')
- IGNORE_OPTIM_EMBEDDED_HINTS
- OPTIMIZER_FEATURES_ENABLE('19.1.0')
- OPT_PARAM('_optimizer_nlj_hj_adaptive_join' 'false')
- OPT_PARAM('_optimizer_strans_adaptive_pruning' 'false')
- OPT_PARAM('_px_adaptive_dist_method' 'off')
- OPT_PARAM('optimizer_dynamic_sampling' 7)
Note
-----
- SQL profile coe_dxhqr0gqdymq6_2670919594 used for this statement
Query Block Registry:
---------------------
<q o="2" f="y"><n><![CDATA[SEL$1]]></n><f><h><t><![CDATA[YYYYYYYYYYYYY]]></t><s><![CDATA[SEL$1]]></s></h></f></q>
288 rows selected.
已经使用了profile coe_dxhqr0gqdymq6_2670919594,让子弹飞一会,检查数据库状态
C I EVENT HCOUNT
- -- ---------------------------------------- ----------
0 1 log file sync 3
SQL*Net message from client 2
SQL*Net message from dblink 2
DFS lock handle 2
OFS idle 1
[CPU]:PX Deq: Execution Msg 1
PX Deq: Execute Reply 1
[CPU]:db file sequential read 1
[CPU]:SQL*Net message from client 1
0 2 DFS lock handle 2
SQL*Net message from dblink 2
[CPU]:PX Deq: Execution Msg 1
OFS idle 1
等待事件恢复正常。几分钟后业务回复重新执行脚本7分钟内完成。那么问题可以确定是那条select查询语句执行计划改变导致大量物理读,从而影响到数据库整体的IO性能
四、原因分析
1、检查统计信息
SQL>col SUBPARTITION_NAME for a50
SQL>SELECT SUBPARTITION_NAME,OWNER,TABLE_NAME,LAST_ANALYZED,STALE_STATS,OBJECT_TYPE,PARTITION_NAME from dba_tab_stATistics where TABLE_NAME='YYYYYYYYYYYYY' ;
TABLE TABLE LAST OLD PARTITION
SUBPARTITION_NAME OWNER NAME ANALYZED STATS OBJECT_TYPE NAME
-------------------------------------------------- --------------- ----------------------------------- -------- ----- ------------ --------------------
orcl YYYYYYYYYYYYY 20230421 YES TABLE
orcl YYYYYYYYYYYYY 20221231 NO PARTITION PART_202212
orcl YYYYYYYYYYYYY 20230301 NO PARTITION PART_202302
orcl YYYYYYYYYYYYY 20230331 NO PARTITION PART_202303
orcl YYYYYYYYYYYYY 20221201 NO PARTITION PART_202211
orcl YYYYYYYYYYYYY 20221030 NO PARTITION PART_202210
orcl YYYYYYYYYYYYY 20230131 NO PARTITION PART_202301
orcl YYYYYYYYYYYYY 20230531 NO PARTITION PART_202305
orcl YYYYYYYYYYYYY 20230525 YES PARTITION PART_202306
orcl YYYYYYYYYYYYY 20230430 NO PARTITION PART_202304
orcl YYYYYYYYYYYYY 20230525 NO PARTITION PART_MAX
orcl YYYYYYYYYYYYY 20221101 NO PARTITION PART_202203
orcl YYYYYYYYYYYYY 20221027 NO PARTITION PART_202011
orcl YYYYYYYYYYYYY 20221101 NO PARTITION PART_202201
orcl YYYYYYYYYYYYY 20230125 NO PARTITION PART_202207
orcl YYYYYYYYYYYYY 20221107 NO PARTITION PART_202204
orcl YYYYYYYYYYYYY 20221107 NO PARTITION PART_202205
orcl YYYYYYYYYYYYY 20221027 NO PARTITION PART_202010
orcl YYYYYYYYYYYYY 20230125 NO PARTITION PART_202206
orcl YYYYYYYYYYYYY 20221027 NO PARTITION PART_202009
orcl YYYYYYYYYYYYY 20230125 NO PARTITION PART_202209
orcl YYYYYYYYYYYYY 20221101 NO PARTITION PART_202202
orcl YYYYYYYYYYYYY 20230125 NO PARTITION PART_202208
orcl YYYYYYYYYYYYY 20221027 NO PARTITION PART_202012
orcl YYYYYYYYYYYYY 20220701 NO PARTITION PART_202108
orcl YYYYYYYYYYYYY 20220701 NO PARTITION PART_202111
orcl YYYYYYYYYYYYY 20220701 NO PARTITION PART_202107
orcl YYYYYYYYYYYYY 20220701 NO PARTITION PART_202106
orcl YYYYYYYYYYYYY 20220701 NO PARTITION PART_202103
orcl YYYYYYYYYYYYY 20220701 NO PARTITION PART_202104
orcl YYYYYYYYYYYYY 20220701 NO PARTITION PART_202105
orcl YYYYYYYYYYYYY 20220701 NO PARTITION PART_202112
orcl YYYYYYYYYYYYY 20220701 NO PARTITION PART_202110
orcl YYYYYYYYYYYYY 20220701 NO PARTITION PART_202101
orcl YYYYYYYYYYYYY 20220701 NO PARTITION PART_202102
orcl YYYYYYYYYYYYY 20220701 NO PARTITION PART_202109
36 rows selected.
分区PART_202306统计信息失效。
2、检查统计信息失效原因
SQL> select * from mon_mods_all$ where OBJ#='3193250';
OBJ# INSERTS UPDATES DELETES TIMESTAMP FLAGS DROP_SEGMENTS
---------- ---------- ---------- ---------- --------- ---------- -------------
3193250 2374840 0 25164 01-JUN-23 0 0
查询该分表dml记录发现该分区6月1日0点过后存在大量dml操作,导致分区统计信息失效,从而导致执行计划由索引范围扫描该表为全表扫描。由于该表很大,数据库为避免全表扫描涌入buffer cache导致内存紧张及shared pool相关latch争用,选择了直接路径读,绕过buffer cache直接进入pga。大量的直接路径读导致了主机io压力较大。
io压力过大导致出现io延迟,从而影响数据库整体性能。导致业务sql执行效率下降。将执行计划修改为索引范围扫描后io减少,数据库整理性能马上恢复。