暂无图片
暂无图片
1
暂无图片
暂无图片
暂无图片

EMCC之脚本和 SNMPv1 陷阱使用方法

原创 bicewow 2022-06-13
1197

说明

本文章旨在介绍EMCC13《通知–脚本和 SNMPv1 陷阱》的使用方法,在EMCC中,可以设置《意外事件–意外事件规则》,关联《通知–脚本和 SNMPv1 陷阱》,实现对EMCC警告信息的拦截,然后按照《脚本和 SNMPv1 陷阱》中定义的脚本格式进行输出,使监控变得更加灵活可控,可读性更强。
下面将用如下场景演示此功能:
监控目标主机代理进程运行状态
情况1:代理进程意外关闭,需要告警,输出相关告警信息;
情况2:代理进程在计划内进行维护重启,不告警,仅显示相关信息。

实验环境:
EMCC 13.5.0.0

实践过程

整个流程的实现需要以下4步:
1、编辑脚本,实现相关功能
2、将脚本放置到OMS主机,指定位置
3、在《通知–脚本和 SNMPv1 陷阱》定义方法
4、在《意外事件–意外事件规则》关联上述方法

编辑脚本

脚本如下,此脚本通过OMS主机执行,脚本中涉及到的变量是内置在EMCC中,可以参考
EMCC变量官方说明
脚本大体逻辑为,在检查目标主机相关端口,判断代理是异常关闭或者计划内重启

[root@em13c os_shell]# pwd
/backup/scripts/os_shell
[root@em13c os_shell]# cat itsm.sh
#!/bin/sh

TEST_AGENT=/backup/scripts/os_shell/log/test_agent.log
TJDATE=`date '+%Y-%m-%d %H:%M:%S'`
echo "$TJDATE itsm.sh RUN .....">> $TEST_AGENT
. /home/oracle/.bash_profile
echo "$TJDATE SEVERITY_CODE:$SEVERITY_CODE TARGET_TYPE:$TARGET_TYPE .....">>  $TEST_AGENT
if [ "$SEVERITY_CODE" = "FATAL" -o  "$SEVERITY_CODE" = "WARNING" -o "$SEVERITY_CODE" = "CRITICAL" ] && [ "$TARGET_TYPE" = "Agent" ]
then        
            HOST_IP_1=`grep -w "$HOST_NAME$" /backup/scripts/os_shell/hosts.txt|awk '{gsub("","",$1)}{print $3}'`
            n=1
            TEL_OS_RESULT="0"
            TEL_DB_RESULT="0"
            while (( $n <= 15 )) 
            do
                sleep 5
                wdate=`date '+%Y-%m-%d %H:%M:%S'`
                TEL_OS_RESULT=`/u01/app/oracle/product/19.0.0/db_1/jdk/bin/java Telnet $HOST_IP_1 22|grep "successful" |wc -l`
                TEL_DB_RESULT=`/u01/app/oracle/product/19.0.0/db_1/jdk/bin/java Telnet $HOST_IP_1 3872|grep "successful" |wc -l`
                echo "$wdate telnt $n times TEL_OS_RESULT:$TEL_OS_RESULT TEL_DB_RESULT:$TEL_DB_RESULT .....">> $TEST_AGENT
                    if [ "$TEL_OS_RESULT" = "1" -a "$TEL_DB_RESULT" = "1" ];then
                            break
                    fi
                    (( n++ ))
            done
            wdate=`date '+%Y-%m-%d %H:%M:%S'`
            echo "$wdate TEL_OS_RESULT:$TEL_OS_RESULT TEL_DB_RESULT:$TEL_DB_RESULT .....">> $TEST_AGENT
            if [ "$TEL_OS_RESULT" = "1" -a "$TEL_DB_RESULT" = "1" ]
            then 
                    echo $TJDATE' '$TITLE' '$MESSAGE ', WRONG MESSAGE, OS and Agent is Running, EM Agent Restarted.'  >> $TEST_AGENT
            else 
                    echo $TJDATE' '$TITLE' '$MESSAGE ', OS and Agent is Not Running'  >> $TEST_AGENT
            fi
    fi
fi

exit 0

脚本放置到OMS主机

脚本放置到OMS主机相关目录待用。

[root@em13c os_shell]# pwd
/backup/scripts/os_shell
[root@em13c os_shell]# ll
总用量 12
-rw-r--r-- 1 oracle oinstall 1497 6月  13 10:35 hosts.txt
-rwxr-xr-x 1 oracle oinstall 1810 6月  13 13:36 itsm.sh
-rwxr-xr-x 1 root   root     3445 6月  13 11:20 itsm.sh2022
drwxr-xr-x 2 oracle oinstall   88 6月  13 11:28 log

在《通知–脚本和 SNMPv1 陷阱》定义方法

111.png
2.png
3.png
4.png

在《意外事件–意外事件规则》关联上述方法

12.png
13.png
14.png
15.png
16.png
17.png
18.png
19.png
21.png
22.png23.png

结果演示

情况1:代理进程意外关闭
在达到执行时间内,端口不通,判定为意外关闭

[oracle@zstest bin]$ ./emctl stop agent
Oracle Enterprise Manager Cloud Control 13c Release 5  
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
Stopping agent ... stopped.

--日志输出到OMS主机
2022-06-13 13:59:15 itsm.sh RUN .....
2022-06-13 13:59:15 SEVERITY_CODE:CRITICAL TARGET_TYPE:Agent .....
2022-06-13 13:59:20 telnt 1 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 13:59:25 telnt 2 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 13:59:30 telnt 3 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 13:59:35 telnt 4 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 13:59:41 telnt 5 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 13:59:46 telnt 6 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 13:59:51 telnt 7 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 13:59:56 telnt 8 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 14:00:01 telnt 9 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 14:00:07 telnt 10 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 14:00:12 telnt 11 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 14:00:17 telnt 12 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 14:00:22 telnt 13 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 14:00:27 telnt 14 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 14:00:33 telnt 15 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 14:00:33 TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 13:59:15  Agent Unreachable (REASON = Unable to connect to the agent at https://zstest:3872/emd/main/ [Connection refused (Connection refused)]). Host is reachable. , OS and Agent is Not Running

情况2:代理进程计划内重启
日志按照重启输出

[oracle@zstest bin]$ ./emctl stop agent
Oracle Enterprise Manager Cloud Control 13c Release 5  
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
Stopping agent ... stopped.

--停顿片刻

[oracle@zstest bin]$ ./emctl start agent
Oracle Enterprise Manager Cloud Control 13c Release 5  
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
Starting agent ............... started.

--日志输出到OMS主机
2022-06-13 13:53:51 itsm.sh RUN .....
2022-06-13 13:53:51 SEVERITY_CODE:FATAL TARGET_TYPE:Agent .....
2022-06-13 13:53:56 telnt 1 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 13:54:02 telnt 2 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 13:54:07 telnt 3 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 13:54:12 telnt 4 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 13:54:17 telnt 5 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 13:54:22 telnt 6 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 13:54:27 telnt 7 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 13:54:32 telnt 8 times TEL_OS_RESULT:1 TEL_DB_RESULT:0 .....
2022-06-13 13:54:38 telnt 9 times TEL_OS_RESULT:1 TEL_DB_RESULT:1 .....
2022-06-13 13:54:38 TEL_OS_RESULT:1 TEL_DB_RESULT:1 .....
2022-06-13 13:53:51  Agent has stopped monitoring. The following errors are reported : agent shutdown. , WRONG MESSAGE, OS and Agent is Running, EM Agent Restarted.

关于EMCC之脚本和 SNMPv1 陷阱(OS Commands and Scripts)使用演示到此结束。

附加

脚本超时参数

在脚本执行中用到了sleep 5,且循环了15次,整个脚本执行时间在75s以上,由于EMCC对脚本执行有默认超时限制30s,超过此值,脚本会被kill掉,所以需要根据实际情况调整此值。

命令行调整

在EMCC服务端主机执行如下命令即可。
[oracle@em13c bin]$ pwd
/u02/app/middleware/bin
[oracle@em13c bin]$ ./emctl get property -name  oracle.sysman.core.notification.os_cmd_timeout -sysman_pwd "weblogic123"
Oracle Enterprise Manager Cloud Control 13c Release 5  
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
Value for property oracle.sysman.core.notification.os_cmd_timeout at Global level is 30
[oracle@em13c bin]$ ./emctl set property -name oracle.sysman.core.notification.os_cmd_timeout -value 60 -sysman_pwd "weblogic123"
Oracle Enterprise Manager Cloud Control 13c Release 5  
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
Property oracle.sysman.core.notification.os_cmd_timeout has been set to value 60 for all Management Servers
OMS restart is not required to reflect the new property value
[oracle@em13c bin]$ ./emctl get property -name  oracle.sysman.core.notification.os_cmd_timeout -sysman_pwd "weblogic123"
Oracle Enterprise Manager Cloud Control 13c Release 5  
Copyright (c) 1996, 2021 Oracle Corporation.  All rights reserved.
Value for property oracle.sysman.core.notification.os_cmd_timeout at Global level is 60

控制台调整

31.png
32.png
33.png

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

文章被以下合辑收录

评论