**版本架构:**Oracle 19c 双节点RAC+双节点DG
**摘要:**记录一次TOP SQL由于统计信息缺失引发的AAS和CPU突刺,优化器只能基于统计信息来判断最优执行路径。如果统计信息缺失或过期,系统随时可能出现严重性能问题。
一、问题背景
2024年11月29号早上,生产营业系统突然收到CPU占用率超过阈值的告警。通过监控发现,数据库侧的 AAS(Active Session Average)飙升到95,有一条新上线的SQL消耗异常,导致CPU使用率直线拉升。

二、问题定位
1. Top SQL定位
从监控(上面图片)看,左面有一条TOP SQL占比31%,大概率是元凶,继续分析分线,该SQL第一次出现时间在 07:15:29,说明是新上线的SQL,从监控中快速定位:
-- 默认显示最近15分钟的TOP SQL
SQL_ID 4cwfmj3fkzrh8
PLAN_HASH_VALUE 2962550836
FIRST_LOAD_TIME 2024-11-29/07:15:29
LAST_LOAD_TIME 2024-11-29 10:35:25
-- sql信息
SELECT SUBSID, CITY, POORDERNUMBER, BIZTYPE, GOODSID, ORDERID,
CUSTOMERPHONE, SERVICECODE, SERVICESTATUS, RECEIVETIME, EFFTIME,
EXPTIME, PROVINCERELATIONID, SERVICECODETYPE, OPPOORDERSOURCE,
INFORMSOURCE, INTIME, DATASOURCE, ORDERNUMBER, APICODE, USERTYPE,
SELLERMEMO FROM OJ_RES_GOOD_SYN
WHERE SUBSID = :1
AND EXPTIME IS NOT NULL
AND EXPTIME >= SYSDATE
AND EXPTIME <= :2
AND CITY = 1117
–查看sql的第一次载入时间
select FIRST_LOAD_TIME from v$sql where sql_id=&1;
2.分析原因
在监控(上面图片)右侧显示的实时的等待事件,此时占比最高的是 ON CPU / runqueue,这个等待事件不是select distinct WAIT_CLASS from v$event_name;13个类别里面的,一般可以通过V$session status='ACTIVE' and state!='WAITING'、或者V$ASH中event列为NULL判断,也是说如果没有等待事件此时SQL正在使用或等待CPU。到这里可以断定问题大概率发生在sql执行流程中的执行阶段,可能占用CPU较高的是sql的排序操作、复杂算子的计算、函数调用等
接着分析SQL的执行计划
SQL_ID 4cwfmj3fkzrh8, child number 0
-------------------------------------
SELECT SUBSID, CITY, POORDERNUMBER, BIZTYPE, GOODSID, ORDERID,
CUSTOMERPHONE, SERVICECODE, SERVICESTATUS, RECEIVETIME, EFFTIME,
EXPTIME, PROVINCERELATIONID, SERVICECODETYPE, OPPOORDERSOURCE,
INFORMSOURCE, INTIME, DATASOURCE, ORDERNUMBER, APICODE, USERTYPE,
SELLERMEMO FROM OJ_RES_GOOD_SYN WHERE SUBSID = :1 AND
EXPTIME IS NOT NULL AND EXPTIME >= SYSDATE AND EXPTIME <= :2 AND
CITY = 1117
Plan hash value: 881293612
------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1470 (100)| | | |
|* 1 | FILTER | | | | | | | |
|* 2 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| OJ_RES_GOOD_SYN | 15 | 13680 | 1470 (0)| 00:00:01 | ROWID | ROWID |
|* 3 | INDEX RANGE SCAN | IDX_RIGHTGOOD_SERVICEGOOD | 239K| | 28 (0)| 00:00:01 | | |
------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(:2>=SYSDATE@!)
2 - filter(("EXPTIME"<=:2 AND "SUBSID"=TO_NUMBER(:1) AND "EXPTIME">=SYSDATE@!))
3 - access("CITY"=1117)
索引信息
SYS@PANDAe1>@ind %.OJ_RES_GOOD_SYN
Display indexes where table or index name matches %%.OJ_RES_GOOD_SYN%...
TABLE_OWNER TABLE_NAME INDEX_NAME POS# COLUMN_NAME DSC
-------------------- ------------------------------ ------------------------------ ---- ------------------------------ ----
PANDA OJ_RES_GOOD_SYN IDX_CS_REC_SUBS_RIGHT_SUBSID 1 SUBSID <<<< 正确的
IDX_RIGHTGOOD_CUSTOMERPHONE 1 CUSTOMERPHONE
IDX_RIGHTGOOD_SERVICEGOOD 1 CITY <<<< 错误的
2 GOODSID
3 SERVICECODE
INDEX_OWNER TABLE_NAME INDEX_NAME IDXTYPE UNIQ STATUS PART TEMP H LFBLKS NDK NUM_ROWS CLUF LAST_ANALYZED DEGREE VISIBILIT
-------------------- ------------------------------ ------------------------------ ---------- ---- -------- ---- ---- -- ---------- ------------- ---------- ---------- ------------------- ------ ---------
PANDA OJ_RES_GOOD_SYN IDX_CS_REC_SUBS_RIGHT_SUBSID NORMAL NO N/A YES N 3 129071 7690240 34905836 34226961 2024-11-29 10:35:23 1 VISIBLE <<<<<<<<<刚才这个索引是没有统计信息的
OJ_RES_GOOD_SYN IDX_RIGHTGOOD_CUSTOMERPHONE NORMAL NO N/A YES N 3 148744 6300672 34905829 34425431 2024-11-29 10:35:01 1 VISIBLE
OJ_RES_GOOD_SYN IDX_RIGHTGOOD_SERVICEGOOD NORMAL NO VALID NO N 4 379096 13397 34905828 9643184 2024-11-29 10:34:44 1 VISIBLE
执行计划使用了索引 IDX_RIGHTGOOD_SERVICEGOOD走了了INDEX RANGE SCAN并且谓词中只通过CITY字段进行了过滤,这个字段是一个分区键,选择性很差。在where条件后面有一个SUBSID选择新很好并且存在索引IDX_CS_REC_SUBS_RIGHT_SUBSID
select * from (select CITY,count(*) from PANDA.OJ_RES_GOOD_SYN group by CITY order by count(*) desc)where rownum<=10;
CITY COUNT(*)
---------- ----------
1116 24489838
1117 11353009
1118 7236
3 rows selected.
select * from (select SUBSID,count(*) from PANDA.OJ_RES_GOOD_SYN group by SUBSID order by count(*) desc)where rownum<=10;
SUBSID COUNT(*)
---------- ----------
3.1639E+12 2944
3.1641E+12 2855
3.1639E+12 2476
3.1639E+12 2204
3.1421E+12 2194
3.1640E+12 2143
3.1621E+12 2099
3.1641E+12 2015
3.1640E+12 1939
3.1640E+12 1909
10 rows selected.
3. 检查统计信息
执行计划异常,有好的索引不走,检查涉及表的统计信息,发现统计信息过期 (STALE_STATS=YES/NULL忘记了**),这意味着优化器可能生成了不合理的执行计划。**
SELECT owner, table_name, partition_name, stale_stats, last_analyzed
FROM dba_tab_statistics
WHERE table_name = UPPER('OJ_RES_GOOD_SYN')
AND owner = 'PANDA';
OWNER TABLE_NAME PARTITION_NAME STALE LAST_ANALYZED
---------- -------------------- -------------------- ----- ----------------
OWNER TABLE_NAME PARTITION_NAME 29-AUG-20
三、处理过程
**之前 ****OJ_RES_GOOD_SYN**没有统计信息,优化器无法正确评估成本。收集表统计信息
exec DBMS_STATS.GATHER_TABLE_STATS ( ownname => 'PANDA', tabname => 'OJ_RES_GOOD_SYN', cascade => true, estimate_percent => dbms_stats.auto_sample_size, method_opt => 'FOR TABLE FOR ALL COLUMNS SIZE REPEAT', degree => 8, no_invalidate => false );
收集后,执行计划重新生成,正确走上IDX_CS_REC_SUBS_RIGHT_SUBSID索引,也走了CITY的分区裁剪
SQL_ID 4cwfmj3fkzrh8, child number 3
-------------------------------------
SELECT SUBSID, CITY, POORDERNUMBER, BIZTYPE, GOODSID, ORDERID,
CUSTOMERPHONE, SERVICECODE, SERVICESTATUS, RECEIVETIME, EFFTIME,
EXPTIME, PROVINCERELATIONID, SERVICECODETYPE, OPPOORDERSOURCE,
INFORMSOURCE, INTIME, DATASOURCE, ORDERNUMBER, APICODE, USERTYPE,
SELLERMEMO FROM OJ_RES_GOOD_SYN WHERE SUBSID = :1 AND
EXPTIME IS NOT NULL AND EXPTIME >= SYSDATE AND EXPTIME <= :2 AND
CITY = 1117
Plan hash value: 2962550836
--------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 134 (100)| | | |
|* 1 | FILTER | | | | | | | |
| 2 | PARTITION RANGE ITERATOR | | 1 | 912 | 134 (0)| 00:00:01 | 131 | 196 |
|* 3 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| OJ_RES_GOOD_SYN | 1 | 912 | 134 (0)| 00:00:01 | 131 | 196 |
|* 4 | INDEX RANGE SCAN | IDX_CS_REC_SUBS_RIGHT_SUBSID | 32 | | 133 (0)| 00:00:01 | 131 | 196 |
--------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(:2>=SYSDATE@!)
3 - filter(("CITY"=1117 AND "EXPTIME"<=:2 AND "EXPTIME">=SYSDATE@!))
4 - access("SUBSID"=TO_NUMBER(:1))
四、效果验证
- 统计信息收集后,执行计划恢复正确,CPU瞬间回落,系统恢复正常。
- SQL性能指标对比:
| 计划 | LIO/Exec(块次) | CPU/Exec(secs) | Elapsed/Exec(secs) |
|---|---|---|---|
| 坏计划 881293612 | 2,112,709.98 | 38.84 | 43.73 |
| 好计划 2962550836 | 173.73 | 0.00 | 0.01 |
Summary Execution Statistics Over Time of SQL_ID:4cwfmj3fkzrh8
Avg Avg
Snapshot Avg LIO Avg PIO CPU (secs) Elapsed (secs)
Time INSTANCE_NUMBER Execs Per Exec Per Exec Per Exec Per Exec
------------ --------------- -------- ------------------- ------------------- ------------------- -------------------
29-NOV 07:30 1 6 2,465,321.50 108,749.83 18.59 49.14
29-NOV 08:00 1 257 8,457,635.70 1,496.14 34.31 36.93
29-NOV 08:30 1 814 0.00 0.00 36.69 36.74
29-NOV 09:00 1 1,329 2,255,022.27 0.00 39.74 39.84
29-NOV 09:30 1 1,706 0.00 0.00 42.84 42.99
29-NOV 10:00 1 1,757 1,434,805.84 0.00 45.29 45.58
29-NOV 10:30 1 102 176,184.53 7.97 54.41 54.88
29-NOV 10:30 1 1,669 173.73 5.38 0.00 0.01
-------- ------------------- ------------------- ------------------- -------------------
avg 1,848,642.95 13,782.42 33.98 38.26
sum 7,640
Per-Plan Execution Statistics Over Time
Avg Avg
Plan Snapshot Avg Rows Avg LIO Avg PIO CPU (secs) Elapsed (secs)
Hash Value Time INSTANCE_NUMBER Execs Per Exec Per Exec Per Exec Per Exec Per Exec
---------- ------------ --------------- -------- ------------------- ------------------- ------------------- ------------------- -------------------
881293612 29-NOV 07:30 1 6 0.00 2,465,321.50 108,749.83 18.59 49.14
29-NOV 08:00 1 257 0.00 8,457,635.70 1,496.14 34.31 36.93
29-NOV 08:30 1 814 0.01 0.00 0.00 36.69 36.74
29-NOV 09:00 1 1,329 0.02 2,255,022.27 0.00 39.74 39.84
29-NOV 09:30 1 1,706 0.01 0.00 0.00 42.84 42.99
29-NOV 10:00 1 1,757 0.01 1,434,805.84 0.00 45.29 45.58
29-NOV 10:30 1 102 0.02 176,184.53 7.97 54.41 54.88
********** -------- ------------------- ------------------- ------------------- ------------------- -------------------
avg 0.01 2,112,709.98 15,750.56 38.84 43.73
sum 5,971
2962550836 29-NOV 10:30 1 1,669 0.01 173.73 5.38 0.00 0.01
********** -------- ------------------- ------------------- ------------------- ------------------- -------------------
avg 0.01 173.73 5.38 0.00 0.01
sum




