

“AMP平台简介:自动化运维管理平台--内置大量运维操作原子场景,支持常见数据库、中间件、云平台、网络设备等运维操作自动化能力。支撑八大运维场景自动化,覆盖常见运维工作。”

巡检对象及内容
一键巡检的目的是针对宝兰德BES的CPU状态、SERVER运行状态、JVM堆运行情况、线程运行情况、队列运行情况以及JDBC的运行状态进行一个全面巡检。
数据来源
巡检脚本的原始数据为BES监控程序采集的bes.log日志,该日志每十秒写入新的采集数据,如下图所示:

实现脚本
#!/bin/bash
##获取要巡检的最新一轮的数据信息
LOG_DIR="/bes/monitor_shsnc/BES95xMon.V01/bin"
V_TIME_1=`cat $LOG_DIR/bes.log | tail -1 | awk -F "[" '{print $2}'| awk -F "]" '{print $1}'`
V_TIME_2=`date -d "10 second ago $V_TIME_1" +"%Y-%m-%d %H:%M:%S"`
V_LOGNUM=`cat $LOG_DIR/bes.log|grep "$V_TIME_2"| wc -l`
##定义出参,0代表巡检结果正常,反之则为异常。
MONI_CPU_COUNT=0
MONI_STATUS_COUNT=0
MONI_JVM_COUNT=0
MONI_THREAD_COUNT=0
MONI_QUEUE_COUNT=0
MONI_JDBC_COUNT=0
##判断各项巡检指标是否正常
if [ $V_LOGNUM -eq 24 ]
then
cat $LOG_DIR/bes.log | grep "$V_TIME_2" |grep -v "Connected to" |while read line
do
InstanceName=`echo ${line} | awk -F '|' '{print $2}'`
MonitorItem=`echo ${line} | awk -F '|' '{print $3}'`
if [ "${MonitorItem}" == "Status" ] ##对SERVER状态进行判断
then
Status=`echo ${line} | awk -F '|' '{print $4}'`
if [ "${Status}" != "OK" ]
then
((MONI_STATUS_COUNT++))
echo "${InstanceName} Status is ${Status},Please check! "
fi
elif [ "${MonitorItem}" == "CpuUsed" ] ##对CPU运行状态进行判断
then
CpuUsed_1=`echo ${line} | awk -F '|' '{print $4}'`
CpuUsed=`awk 'BEGIN{printf ('$CpuUsed_1')*100}'`
if [ $CpuUsed -ge 20000 ]
then
((MONI_CPU_COUNT++))
echo "${InstanceName} CpuUsed is ${CpuUsed},Please check! "
fi
elif [ "${MonitorItem}" == "JVMUsed" ] ##对JVM运行状态进行判断
then
JVMUsed=`echo ${line} | awk -F '|' '{print $4}'`
JVMMax=`echo ${line} | awk -F '|' '{print $6}'`
JVMUsage_1=`echo ${line} | awk -F '|' '{print $8}'`
JVMUsage=`awk 'BEGIN{printf ('$JVMUsage_1')*100}'`
PermUsed=`echo ${line} | awk -F '|' '{print $10}'`
PermMax=`echo ${line} | awk -F '|' '{print $12}'`
PermUsage_1=`echo ${line} | awk -F '|' '{print $14}'`
PermUsage=`awk 'BEGIN{printf ('$PermUsage_1')*100}'`
if [ $JVMUsage -ge 8000 -o $PermUsage -ge 8000 ]
then
((MONI_JVM_COUN++))
echo "${InstanceName} JVMUsed is ${JVMUsed},Please check! "
fi
elif [ "${MonitorItem}" == "CurrentThread" ] ##对线程运行状态进行判断
then
CurrentThread=`echo ${line} | awk -F '|' '{print $4}'`
MaxThread=`echo ${line} | awk -F '|' '{print $6}'`
ThreadUsage_1=`echo ${line} | awk -F '|' '{print $8}'`
ThreadUsage=`awk 'BEGIN{printf ('$ThreadUsage_1')*100}'`
BusyThread=`echo ${line} | awk -F '|' '{print $10}'`
if [ $ThreadUsage -ge 8500 ]
then
((MONI_THREAD_COUNT++))
echo "${InstanceName} CurrentThread is ${CurrentThread},Please check! "
fi
elif [ "${MonitorItem}" == "CurrentQueue" ] ##对队列运行状态进行判断
then
CurrentQueue=`echo ${line} | awk -F '|' '{print $4}'`
MaxQueue=`echo ${line} | awk -F '|' '{print $6}'`
QueueUsage_1=`echo ${line} | awk -F '|' '{print $8}'`
QueueUsage=`awk 'BEGIN{printf ('$QueueUsage_1')*100}'`
if [ $QueueUsage -ge 8000 ]
then
((MONI_QUEUE_COUNT++))
echo "${InstanceName} CurrentQueue is ${CurrentQueue},Please check! "
fi
elif [ "${MonitorItem}" == "dataSourceName" ] ##对JDBC数据源运行状态进行判断
then
dataSourceName=`echo ${line} | awk -F '|' '{print $4}'`
CurrentActiveNum=`echo ${line} | awk -F '|' '{print $6}'`
createJDBCCount=`echo ${line} | awk -F '|' '{print $8}'`
maxJDBCCount=`echo ${line} | awk -F '|' '{print $10}'`
spareJDBCCount=`echo ${line} | awk -F '|' '{print $12}'`
JDBCUsage_1=`echo ${line} | awk -F '|' '{print $14}'`
JDBCUsage=`awk 'BEGIN{printf ('$JDBCUsage_1')*100}'`
if [ $JDBCUsage -ge 9500 ]
then
((MONI_JDBC_COUNT++))
echo "${InstanceName} dataSourceName is ${dataSourceName},Please check! "
fi
else
echo "${MonitorItem}" is error!
fi
done
else
echo "$LOG_DIR"/bes.log is wrong,please check!
fi
具体配置以及结果展示





本文作者:程 红(上海新炬王翦团队)
本文来源:“IT那活儿”公众号





