暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

巡检之定位异常负载波动

思考一下,当你维护大量数据库时,100台以上,巡检时,怎么快速发现下面的异常负载波动
image.png
image.png

这里我们关注两种异常
1、正常周期性波动的指标的异常发现
2、突增突降异常发现

巡检方法举例
如下当你有一张VMSTAT的监控表
image.png
可以使用以下SQL进行监控(根据你的暗具体需求调整)

with t as (select to_char(EXECTIME,'yyyy-mm-dd')d,to_char(EXECTIME,'hh24') h,hostname,round(avg(100-id)) CPU from SYS_VMSTAT  where 
EXECTIME between trunc(sysdate-15,'dd') and  trunc(sysdate,'dd')
group by to_char(EXECTIME,'yyyy-mm-dd'),to_char(EXECTIME,'hh24') ,hostname)
,t2 as (select d,h,hostname,CPU,
count(CPU)over(partition by h,hostname order by d ROWS BETWEEN 2 PRECEDING and 1 PRECEDING) pre_ct,
sum(CPU)over(partition by h,hostname order by d ROWS BETWEEN 2 PRECEDING and 1 PRECEDING) pre,
count(CPU)over(partition by h,hostname order by d ROWS BETWEEN current row and 1 following) cnt,
sum(CPU)over(partition by h,hostname order by d ROWS BETWEEN current row and 1 following) sum_CPU
from t)
select D||' '||H ETIME,hostname,CPU,PRE/PRE_CT PRE,PRE_CT,
case 
when (CPU>(pre/pre_ct)*2  and CPU>40) then 'CPU>40,超过前两天同时段200%' 
when (CPU>(pre/pre_ct)*1.5 and CPU>60) then 'CPU>60,超过前两天同时段150%'
when ((sum_CPU/cnt)>(pre/pre_ct)*1.3 and CPU>60) then 'CPU>60,连续2天超过前两天同时段150%'
when (CPU>(pre/pre_ct)*1.3 and CPU>70)  then 'CPU>70,超过之前两天同时段150%'
else 'other' end message
from t2 where pre is not null and
(
(CPU>(pre/pre_ct)*2  and CPU>40) --CPU>40,超过前两天同时段200%
or (CPU>(pre/pre_ct)*1.5 and CPU>60) --CPU>60,超过前两天同时段150%
or ((sum_CPU/cnt)>(pre/pre_ct)*1.3 and CPU>60)  --CPU>60,连续2天超过前两天同时段150%
or (CPU>(pre/pre_ct)*1.3 and CPU>70) --CPU>70,超过之前两天同时段150%
) 
and  h between '08' and '17' --只关注上班时段
order by etime desc,hostname;

输出如下
image.png

最后修改时间:2025-08-13 09:47:47
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论