暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

0032.E ib报错WARNING Autodisabled ports FAILURE-1 sensors NOT OK处理

rundba 2021-05-01
1123

作者:王坤,微信公众号:rundba,转载请注明出处。

如需公众号转发,请联系wx:landnow。

情况描述:之前ib01和ib02的10口有线路直连,正常。后被更换到5口。

正常情况下,关机后随意更换端口,均可正常提供服务。

但换到5口后,指示灯不亮后,查看ib健康状况,有报错,通过清理报错,并启用端口5自动连接后恢复正常。

1. 更换到5口,报错

查看监控状况,报错

    [root@exasw-ibb01 ~]# showunhealthy
    WARNING Autodisabled ports
    FAILURE - 1 sensors NOT OK

    2. 环境测试

    使用env_test测试

      [root@exasw-ibb01 ~]# env_test
      Environment test started:
      Starting Environment Daemon test:
      Environment daemon running
      Environment Daemon test returned OK
      Starting Voltage test:
      Voltage ECB OK
      Measured 3.3V Main = 3.28 V
      Measured 3.3V Standby = 3.35 V
      Measured 12V = 11.90 V
      Measured 5V = 4.99 V
      Measured VBAT = 3.03 V
      Measured 2.5V = 2.49 V
      Measured 1.8V = 1.78 V
      Measured I4 1.2V = 1.22 V
      Voltage test returned OK
      Starting PSU test:
      PSU 0 present OK
      PSU 1 present OK
      PSU test returned OK
      Starting Temperature test:
      Back temperature 35
      Front temperature 37
      SP temperature 51
      Switch temperature 48, maxtemperature 49
      Temperature test returned OK
      Starting FAN test:
      Fan 0 not present
      Fan 1 running at rpm 12426
      Fan 2 running at rpm 12317
      Fan 3 running at rpm 12099
      Fan 4 not present
      FAN test returned OK
      Starting Connector test:
      Connector test returned OK
      Starting Onboard ibdevice test:
      Switch OK
      All Internal ibdevices OK
      Onboard ibdevice test returned OK
      Starting SSD test:
      SSD test returned OK
      Starting Auto-link-disable test:
      WARNING Autodisabled ports
      Auto-link-disable test returned 1 faults
      Environment test FAILED #测试失败

      3. 查看错误

      有auto-link-disable报错

        spsh
        -> show faulty
        Target | Property | Value
        ------------------------------------------------------+---------------------------------------------------------------+---------------------------------------------------------------------------------------------
        /SP/faultmgmt/0 | fru | SYS
        /SP/faultmgmt/0/faults/0 | class | fault.device.ib.auto-link-disable #此处又auto-link禁用提示
        /SP/faultmgmt/0/faults/0 | sunw-msg-id | ---
        /SP/faultmgmt/0/faults/0 | component | SYS
        /SP/faultmgmt/0/faults/0 | uuid | cf425a70-59e4-6711-cb37-a48938f5e257
        /SP/faultmgmt/0/faults/0 | timestamp | 2020-06-11/09:30:51
        /SP/faultmgmt/0/faults/0 | fru_serial_number | AK00276771
        /SP/faultmgmt/0/faults/0 | fru_part_number | 7052970
        /SP/faultmgmt/0/faults/0 | fru_name | Sun Datacenter InfiniBand Switch 36
        /SP/faultmgmt/0/faults/0 | fru_manufacturer | Sun Microsystems
        /SP/faultmgmt/0/faults/0 | system_component_manufacturer | Sun Microsystems
        /SP/faultmgmt/0/faults/0 | system_component_name | Sun Datacenter InfiniBand Switch 36
        /SP/faultmgmt/0/faults/0 | system_component_part_number | 7052970
        /SP/faultmgmt/0/faults/0 | system_component_serial_number | AK00276771
        /SP/faultmgmt/0/faults/0 | chassis_manufacturer | Sun Microsystems
        /SP/faultmgmt/0/faults/0 | chassis_name | Sun Datacenter InfiniBand Switch 36
        /SP/faultmgmt/0/faults/0 | chassis_part_number | 7052970
        /SP/faultmgmt/0/faults/0 | chassis_serial_number | AK00276771
        /SP/faultmgmt/0/faults/0 | system_manufacturer | Sun Microsystems
        /SP/faultmgmt/0/faults/0 | system_name | Sun Datacenter InfiniBand Switch 36
        /SP/faultmgmt/0/faults/0 | system_part_number | 7052970
        /SP/faultmgmt/0/faults/0 | system_serial_number | AK00276771

        4. 清理历史错误和计数

        清理历史错误

          [root@exasw-iba01 ~]#  ibclearerrors
          ## Summary: 7 nodes cleared 0 errors

          清理历史计数

            [root@exasw-iba01 ~]# ibclearcounters
            ## Summary: 7 nodes cleared 0 errors

            5. 查看当前已连接ib端口

            显示5口已连接

              [root@exasw-iba01 ~]#  listlinkup
              Connector 0A Not present
              Connector 1A Not present
              Connector 2A Not present
              Connector 3A Not present
              Connector 4A Not present
              Connector 5A Present <-> Switch Port 30 is up (Enabled)
              Connector 6A Present <-> Switch Port 35 is up (Enabled)
              Connector 7A Present <-> Switch Port 33 is up (Enabled)
              Connector 8A Present <-> Switch Port 31 is up (Enabled)
              Connector 9A Present <-> Switch Port 14 is up (Enabled)
              Connector 10A Not present
              Connector 11A Present <-> Switch Port 18 is up (Enabled)
              Connector 12A Present <-> Switch Port 11 is up (Enabled)
              Connector 13A Present <-> Switch Port 09 is up (Enabled)
              Connector 14A Present <-> Switch Port 07 is up (Enabled)
              Connector 15A Present <-> Switch Port 05 is up (Enabled)
              Connector 16A Present <-> Switch Port 03 is up (Enabled)
              Connector 17A Present <-> Switch Port 01 is up (Enabled)
              Connector 0B Not present
              Connector 1B Not present
              Connector 2B Not present
              Connector 3B Not present
              Connector 4B Not present
              Connector 5B Not present
              Connector 6B Not present
              Connector 7B Not present
              Connector 8B Not present
              Connector 9B Not present
              Connector 10B Not present
              Connector 11B Not present
              Connector 12B Not present
              Connector 13B Not present
              Connector 14B Not present
              Connector 15B Not present
              Connector 16B Not present
              Connector 17B Not present

              6. 清理Fault Management Shell告警

              启用port 5自动连接

                enableswitchport --automatic 5      #写法错误,正确的应为5A

                再次查看告警

                  spsh
                  -> show faulty

                  如果ILOM中仍然显示ib auto link disabled,此时从Fault Management Shell清理告警

                  登录ILOM

                    # spsh

                    进入Fault Management会话 (CLI)

                      -> start SP/faultmgmt/shell
                      Are you sure you want to start SP/faultmgmt/shell (y/n)? y


                      faultmgmtsp> fmadm faulty
                      faultmgmtsp> fmadm repair cf425a70-59e4-6711-cb37-a48938f5e257

                      验证无错误

                        faultmgmtsp> fmadm faulty
                        No problems found                      #无错
                        faultmgmtsp> exit

                        -> show faulty #空
                        exit-> exit

                        7.  启用端口autolink后告警消除

                        查看健康状况-报错

                          [root@exasw-ibb01 ~]# showunhealthy 
                          WARNING Autodisabled ports
                          FAILURE - 1 sensors NOT OK

                          同时启用5(A)口和10(A)口auot link

                            enableswitchport --automatic 5A
                            enableswitchport --automatic 10A

                            查看健康状况-错误已消失

                              [root@exasw-iba01 ~]# showunhealthy 
                              OK - No unhealthy sensors

                              8. 小结

                              通过对更换端口进行操作,发现更换后异常,近一步发现auto link被禁用,再次清理报错,并启用auto link后,错误消失。

                              —END—

                              长按二维码                                    

                                    加入>>西安ORACLE用户组

                                     

                                 请注明:来自rundba,申请加入西安ORACLE用户组                 

                                           


                              往期推荐



                              0031.E infiniband交换机未正常启动处理一则

                              0027.E EXADATA IB交换机内存使用率较高处理

                              0021.E exadata infiniband交换机磁盘空间使用率较高(90%)处理

                              0015.E exadata上使用setup_ssh_eq.sh快速配置等效性

                              0013.O ODA X7-2 HA实施文档

                              0011.O BBED-00209 BBED-00451解决方法

                              0010.C CDP7.1.6发布(2021-03)

                              0012.O【转】Oracle数据库勒索病毒自检工具

                              9.O 模拟2号数据文件头损坏恢复的方法

                              8.C  CDH中zookeeper简介

                              M.7 mysql主从复制搭建

                              O.6控制文件和redo丢失恢复

                              5.某金融企业tdsql部署架构

                              4.控制文件异常的几种恢复方法(三)

                              3.控制文件异常的几种恢复方法(二)

                              控制文件异常的几种恢复方法(一)

                              oracle 12c安装bbed



                              文章转载自rundba,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                              评论