暂无图片
暂无图片
3
暂无图片
暂无图片
暂无图片

一次新上的SQL引发的CPU飙升

**版本架构:**Oracle 19c 双节点RAC+双节点DG

**摘要:**记录一次TOP SQL由于统计信息缺失引发的AAS和CPU突刺,优化器只能基于统计信息来判断最优执行路径。如果统计信息缺失或过期,系统随时可能出现严重性能问题。

一、问题背景

2024年11月29号早上,生产营业系统突然收到CPU占用率超过阈值的告警。通过监控发现,数据库侧的 AAS(Active Session Average)飙升到95,有一条新上线的SQL消耗异常,导致CPU使用率直线拉升。

二、问题定位

1. Top SQL定位

从监控(上面图片)看,左面有一条TOP SQL占比31%,大概率是元凶,继续分析分线,该SQL第一次出现时间在 07:15:29,说明是新上线的SQL,从监控中快速定位:

-- 默认显示最近15分钟的TOP SQL SQL_ID 4cwfmj3fkzrh8 PLAN_HASH_VALUE 2962550836 FIRST_LOAD_TIME 2024-11-29/07:15:29 LAST_LOAD_TIME 2024-11-29 10:35:25 -- sql信息 SELECT SUBSID, CITY, POORDERNUMBER, BIZTYPE, GOODSID, ORDERID, CUSTOMERPHONE, SERVICECODE, SERVICESTATUS, RECEIVETIME, EFFTIME, EXPTIME, PROVINCERELATIONID, SERVICECODETYPE, OPPOORDERSOURCE, INFORMSOURCE, INTIME, DATASOURCE, ORDERNUMBER, APICODE, USERTYPE, SELLERMEMO FROM OJ_RES_GOOD_SYN WHERE SUBSID = :1 AND EXPTIME IS NOT NULL AND EXPTIME >= SYSDATE AND EXPTIME <= :2 AND CITY = 1117

–查看sql的第一次载入时间

select FIRST_LOAD_TIME from v$sql where sql_id=&1;

2.分析原因

在监控(上面图片)右侧显示的实时的等待事件,此时占比最高的是 ON CPU / runqueue,这个等待事件不是select distinct WAIT_CLASS from v$event_name;13个类别里面的,一般可以通过V$session status='ACTIVE' and state!='WAITING'、或者V$ASH中event列为NULL判断,也是说如果没有等待事件此时SQL正在使用或等待CPU。到这里可以断定问题大概率发生在sql执行流程中的执行阶段,可能占用CPU较高的是sql的排序操作、复杂算子的计算、函数调用等

接着分析SQL的执行计划

SQL_ID 4cwfmj3fkzrh8, child number 0 ------------------------------------- SELECT SUBSID, CITY, POORDERNUMBER, BIZTYPE, GOODSID, ORDERID, CUSTOMERPHONE, SERVICECODE, SERVICESTATUS, RECEIVETIME, EFFTIME, EXPTIME, PROVINCERELATIONID, SERVICECODETYPE, OPPOORDERSOURCE, INFORMSOURCE, INTIME, DATASOURCE, ORDERNUMBER, APICODE, USERTYPE, SELLERMEMO FROM OJ_RES_GOOD_SYN WHERE SUBSID = :1 AND EXPTIME IS NOT NULL AND EXPTIME >= SYSDATE AND EXPTIME <= :2 AND CITY = 1117 Plan hash value: 881293612 ------------------------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| E-Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | | | 1470 (100)| | | | |* 1 | FILTER | | | | | | | | |* 2 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| OJ_RES_GOOD_SYN | 15 | 13680 | 1470 (0)| 00:00:01 | ROWID | ROWID | |* 3 | INDEX RANGE SCAN | IDX_RIGHTGOOD_SERVICEGOOD | 239K| | 28 (0)| 00:00:01 | | | ------------------------------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(:2>=SYSDATE@!) 2 - filter(("EXPTIME"<=:2 AND "SUBSID"=TO_NUMBER(:1) AND "EXPTIME">=SYSDATE@!)) 3 - access("CITY"=1117)

索引信息

SYS@PANDAe1>@ind %.OJ_RES_GOOD_SYN Display indexes where table or index name matches %%.OJ_RES_GOOD_SYN%... TABLE_OWNER TABLE_NAME INDEX_NAME POS# COLUMN_NAME DSC -------------------- ------------------------------ ------------------------------ ---- ------------------------------ ---- PANDA OJ_RES_GOOD_SYN IDX_CS_REC_SUBS_RIGHT_SUBSID 1 SUBSID <<<< 正确的 IDX_RIGHTGOOD_CUSTOMERPHONE 1 CUSTOMERPHONE IDX_RIGHTGOOD_SERVICEGOOD 1 CITY <<<< 错误的 2 GOODSID 3 SERVICECODE INDEX_OWNER TABLE_NAME INDEX_NAME IDXTYPE UNIQ STATUS PART TEMP H LFBLKS NDK NUM_ROWS CLUF LAST_ANALYZED DEGREE VISIBILIT -------------------- ------------------------------ ------------------------------ ---------- ---- -------- ---- ---- -- ---------- ------------- ---------- ---------- ------------------- ------ --------- PANDA OJ_RES_GOOD_SYN IDX_CS_REC_SUBS_RIGHT_SUBSID NORMAL NO N/A YES N 3 129071 7690240 34905836 34226961 2024-11-29 10:35:23 1 VISIBLE <<<<<<<<<刚才这个索引是没有统计信息的 OJ_RES_GOOD_SYN IDX_RIGHTGOOD_CUSTOMERPHONE NORMAL NO N/A YES N 3 148744 6300672 34905829 34425431 2024-11-29 10:35:01 1 VISIBLE OJ_RES_GOOD_SYN IDX_RIGHTGOOD_SERVICEGOOD NORMAL NO VALID NO N 4 379096 13397 34905828 9643184 2024-11-29 10:34:44 1 VISIBLE

执行计划使用了索引 IDX_RIGHTGOOD_SERVICEGOOD走了了INDEX RANGE SCAN并且谓词中只通过CITY字段进行了过滤,这个字段是一个分区键,选择性很差。在where条件后面有一个SUBSID选择新很好并且存在索引IDX_CS_REC_SUBS_RIGHT_SUBSID

select * from (select CITY,count(*) from PANDA.OJ_RES_GOOD_SYN group by CITY order by count(*) desc)where rownum<=10; CITY COUNT(*) ---------- ---------- 1116 24489838 1117 11353009 1118 7236 3 rows selected. select * from (select SUBSID,count(*) from PANDA.OJ_RES_GOOD_SYN group by SUBSID order by count(*) desc)where rownum<=10; SUBSID COUNT(*) ---------- ---------- 3.1639E+12 2944 3.1641E+12 2855 3.1639E+12 2476 3.1639E+12 2204 3.1421E+12 2194 3.1640E+12 2143 3.1621E+12 2099 3.1641E+12 2015 3.1640E+12 1939 3.1640E+12 1909 10 rows selected.

3. 检查统计信息

执行计划异常,有好的索引不走,检查涉及表的统计信息,发现统计信息过期 (STALE_STATS=YES/NULL忘记了**),这意味着优化器可能生成了不合理的执行计划。**

SELECT owner, table_name, partition_name, stale_stats, last_analyzed FROM dba_tab_statistics WHERE table_name = UPPER('OJ_RES_GOOD_SYN') AND owner = 'PANDA'; OWNER TABLE_NAME PARTITION_NAME STALE LAST_ANALYZED ---------- -------------------- -------------------- ----- ---------------- OWNER TABLE_NAME PARTITION_NAME 29-AUG-20

三、处理过程

**之前 ****OJ_RES_GOOD_SYN**没有统计信息,优化器无法正确评估成本。收集表统计信息

exec DBMS_STATS.GATHER_TABLE_STATS ( ownname => 'PANDA', tabname => 'OJ_RES_GOOD_SYN', cascade => true, estimate_percent => dbms_stats.auto_sample_size, method_opt => 'FOR TABLE FOR ALL COLUMNS SIZE REPEAT', degree => 8, no_invalidate => false );

收集后,执行计划重新生成,正确走上IDX_CS_REC_SUBS_RIGHT_SUBSID索引,也走了CITY的分区裁剪

SQL_ID 4cwfmj3fkzrh8, child number 3 ------------------------------------- SELECT SUBSID, CITY, POORDERNUMBER, BIZTYPE, GOODSID, ORDERID, CUSTOMERPHONE, SERVICECODE, SERVICESTATUS, RECEIVETIME, EFFTIME, EXPTIME, PROVINCERELATIONID, SERVICECODETYPE, OPPOORDERSOURCE, INFORMSOURCE, INTIME, DATASOURCE, ORDERNUMBER, APICODE, USERTYPE, SELLERMEMO FROM OJ_RES_GOOD_SYN WHERE SUBSID = :1 AND EXPTIME IS NOT NULL AND EXPTIME >= SYSDATE AND EXPTIME <= :2 AND CITY = 1117 Plan hash value: 2962550836 -------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | -------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 134 (100)| | | | |* 1 | FILTER | | | | | | | | | 2 | PARTITION RANGE ITERATOR | | 1 | 912 | 134 (0)| 00:00:01 | 131 | 196 | |* 3 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| OJ_RES_GOOD_SYN | 1 | 912 | 134 (0)| 00:00:01 | 131 | 196 | |* 4 | INDEX RANGE SCAN | IDX_CS_REC_SUBS_RIGHT_SUBSID | 32 | | 133 (0)| 00:00:01 | 131 | 196 | -------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(:2>=SYSDATE@!) 3 - filter(("CITY"=1117 AND "EXPTIME"<=:2 AND "EXPTIME">=SYSDATE@!)) 4 - access("SUBSID"=TO_NUMBER(:1))

四、效果验证

  • 统计信息收集后,执行计划恢复正确,CPU瞬间回落,系统恢复正常。
  • SQL性能指标对比:
计划 LIO/Exec(块次) CPU/Exec(secs) Elapsed/Exec(secs)
坏计划 881293612 2,112,709.98 38.84 43.73
好计划 2962550836 173.73 0.00 0.01
Summary Execution Statistics Over Time of SQL_ID:4cwfmj3fkzrh8 Avg Avg Snapshot Avg LIO Avg PIO CPU (secs) Elapsed (secs) Time INSTANCE_NUMBER Execs Per Exec Per Exec Per Exec Per Exec ------------ --------------- -------- ------------------- ------------------- ------------------- ------------------- 29-NOV 07:30 1 6 2,465,321.50 108,749.83 18.59 49.14 29-NOV 08:00 1 257 8,457,635.70 1,496.14 34.31 36.93 29-NOV 08:30 1 814 0.00 0.00 36.69 36.74 29-NOV 09:00 1 1,329 2,255,022.27 0.00 39.74 39.84 29-NOV 09:30 1 1,706 0.00 0.00 42.84 42.99 29-NOV 10:00 1 1,757 1,434,805.84 0.00 45.29 45.58 29-NOV 10:30 1 102 176,184.53 7.97 54.41 54.88 29-NOV 10:30 1 1,669 173.73 5.38 0.00 0.01 -------- ------------------- ------------------- ------------------- ------------------- avg 1,848,642.95 13,782.42 33.98 38.26 sum 7,640 Per-Plan Execution Statistics Over Time Avg Avg Plan Snapshot Avg Rows Avg LIO Avg PIO CPU (secs) Elapsed (secs) Hash Value Time INSTANCE_NUMBER Execs Per Exec Per Exec Per Exec Per Exec Per Exec ---------- ------------ --------------- -------- ------------------- ------------------- ------------------- ------------------- ------------------- 881293612 29-NOV 07:30 1 6 0.00 2,465,321.50 108,749.83 18.59 49.14 29-NOV 08:00 1 257 0.00 8,457,635.70 1,496.14 34.31 36.93 29-NOV 08:30 1 814 0.01 0.00 0.00 36.69 36.74 29-NOV 09:00 1 1,329 0.02 2,255,022.27 0.00 39.74 39.84 29-NOV 09:30 1 1,706 0.01 0.00 0.00 42.84 42.99 29-NOV 10:00 1 1,757 0.01 1,434,805.84 0.00 45.29 45.58 29-NOV 10:30 1 102 0.02 176,184.53 7.97 54.41 54.88 ********** -------- ------------------- ------------------- ------------------- ------------------- ------------------- avg 0.01 2,112,709.98 15,750.56 38.84 43.73 sum 5,971 2962550836 29-NOV 10:30 1 1,669 0.01 173.73 5.38 0.00 0.01 ********** -------- ------------------- ------------------- ------------------- ------------------- ------------------- avg 0.01 173.73 5.38 0.00 0.01 sum
最后修改时间:2025-08-27 11:05:14
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论