暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

【磐维数据库】防火墙未关闭导致集群状态异常

原创 Darcy 2024-09-08
349

问题

  • 集群部署后状态异常,三个节点的 cm 分别认为自己是 standby,其余两个是 down;dn状态异常;
  • 节点1:
[omm@xxx cm_server]$ gs_om -t status --detail
[  CMServer State   ]

node        node_ip         instance                                      state
---------------------------------------------------------------------------------
1  xxx xxx   1    /panweidb/database/panweidb/cm/cm_server Standby
2  xxx xxx   2    /panweidb/database/panweidb/cm/cm_server Down
3  xxx xxx   3    /panweidb/database/panweidb/cm/cm_server Down

cm_ctl: can't connect to cm_server.
Maybe cm_server is not running, or timeout expired. Please try again.

[omm@xxx cm_server]$ gs_om -t query
[   Cluster State   ]

cluster_state   : Unavailable
redistributing  : No
current_az      : AZ_ALL

[  Datanode State   ]

    node    node_ip         port      instance     state
------------------------------------------------------------------------
1  xxx xxx   15400      6001       P Pending Need repair(Disconnected)
2  xxx xxx   15400      6002       S Pending Need repair(Disconnected)
3  xxx xxx   15400      6003       S Pending Need repair(Disconnected)
  • 节点2:
[omm@xxx ~]$ gs_om -t status --detail
[  CMServer State   ]

node        node_ip         instance                                      state
---------------------------------------------------------------------------------
1  xxx xxx   1    /panweidb/database/panweidb/cm/cm_server Down
2  xxx xxx   2    /panweidb/database/panweidb/cm/cm_server Standby
3  xxx xxx   3    /panweidb/database/panweidb/cm/cm_server Down

cm_ctl: can't connect to cm_server.
Maybe cm_server is not running, or timeout expired. Please try again.
  • 节点3:
[omm@xxx ~]$ gs_om -t status --detail
[  CMServer State   ]

node        node_ip         instance                                      state
---------------------------------------------------------------------------------
1  xxx xxx   1    /panweidb/database/panweidb/cm/cm_server Down
2  xxx xxx   2    /panweidb/database/panweidb/cm/cm_server Down
3  xxx xxx   3    /panweidb/database/panweidb/cm/cm_server Standby

cm_ctl: can't connect to cm_server.
Maybe cm_server is not running, or timeout expired. Please try again.

分析

  • 开始怀疑是 ssh 或 pssh 互信异常,检查后发现正常;
  • 使用 cm_ctl stop -n 1 停止节点后,使用 gs_ctl start 启动数据库,数据库状态正常;
  • 怀疑是选主异常,查看 dcc 相关日志时,发现 dcc.dlog 中有以下报错:
UTC+8 2024-09-02 11:39:27.614|DCF|8703|ERROR>[MEC]cs_connect fail,peer_url=xxx:15001, err code 501, err msg Failed to establish tcp connection to [xxx]:[15001], errno 113. [/root/component/dcf/DCF/src/network/mec/mec_func.c:463]
  • 检查防火墙状态,发现未关闭。

解决

  • 关闭三个节点的防火墙后,集群恢复正常。

原因

  • 怀疑是部署时未永久关闭防火墙。
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

文章被以下合辑收录

评论