暂无图片
暂无图片
2
暂无图片
暂无图片
暂无图片

南大通用Gbase8a MPP Cluster 管理-01集群状态管理

原创 飞天 2024-06-24
387

Gbase8a MPP Cluster产品介绍以及安装部署

南大通用GBase8a MPP Cluster介绍以及安装部署

Gbase8a集群管理工具-gcadmin

gcadmin工具是专门为DBA管理员提供的用于对集群进行管理和监控的工具软件。随GBase 8a数据库一起安装,部署在gcware/bin目录中。
image.png
查看版本:

[gbase@node1 ~]$ gcadmin -V
gcadmin 9.5 build 509af275af

查看帮助信息:

[gbase@node1 ~]$ gcadmin --help

 Usage: gcadmin <command> [arg1[, arg2...]]

 1.  gcadmin distribution <gcChangeInfo.xml> <p num> [d num] [extension] [pattern 1|2]
                          [db_user user_name] [db_pwd password] [dba_os_password password]       
                          [vc vc_name]                                                             : generate distribution, db_user and db_pwd shall input
                                                                                                     if database password changed and new distribution data nodes
                                                                                                     more than old distribution
 2.  gcadmin rmdistribution [ID] [vc vc_name]                                                      : remove distribution from vc
 3.  gcadmin addnodes gcChangeInfo.xml [vc_name | single_vc_add_to_rc]                             : add nodes to cluster or vc, parameter [single_vc_add_to_rc]                                                                                                                 used for single vc mode add nodes to root cluster
 4.  gcadmin rmnodes gcChangeInfo.xml [vc_name | single_vc_rm_to_rc]                               : remove nodes from cluster or vc, parameter [single_vc_rm_to_rc]                                                                                                             used for single vc mode remove nodes from default vc to root cluster
 5.  gcadmin showdistribution [node | f] [vc vc_name]                                              : show cluster distribution, or segments on nodes when
                                                                                                     use parameter [node],
                                                                                                     [vc vc_name] is unnecessary if only one vc
 6.  gcadmin switchmode <mode> [vc vc_name | coordinator]                                          : switch cluster mode, mode take value in
                                                                                                     [ normal | readonly | recovery ],
                                                                                                     [vc vc_name] is unnecessary if only one vc
 7.  gcadmin showlock [f]                                                                          : show current cluster lock information,
                                                                                                     include lock name, lock owner ip address, etc
 8.  gcadmin showddlevent [detail] [<tablename segname nodeip> | <tablename nodeip> | <max_fevent_num>]
                          [f] [vc vc_name]                                                         : show cluster ddl fail event,
                                                                                                     replicated table segname is [n0],
                                                                                                     [vc vc_name] is unnecessary if only one vc
 9.  gcadmin showdmlevent [detail] [<tablename segname nodeip> | <max_fevent_num>] [f] [vc vc_name]         : show current cluster dml fail event,                                                                                                                                        replicated table segname is [n0],
                                                                                                     [vc vc_name] is unnecessary if only one vc
 10. gcadmin showdmlstorageevent [detail] [[table_id segname nodeip] | <max_fevent_num>] [f] [vc vc_name]   : show current cluster dml storage fail event,
                                                                                                     replicated table segname is [n0],
                                                                                                     [vc vc_name] is unnecessary if only one vc
 11. gcadmin showcluster [c | vc vcname] [d] [g] [f] [nrt]                                         : show vc or cluster information, include all nodes,
                                                                                                     cluster state and cluster node information
 12. gcadmin getdistribution <ID> <distribution_info.xml> [vc vc_name]                             : get distribution information
 13. gcadmin setnodestate ip <state>                                                               : set one node state,state take value in: failure unavailable normal
 14. gcadmin showfailover [f]                                                                      : show failover information
 15. gcadmin showfailoverdetail <commitId> [xml_file_name]                                         : write failover information to file [xml_file_name]
 16. gcadmin createvc <create_vc.xml | e example_file_name>                                        : create virtual cluster
 17. gcadmin rmvc <vc_name>                                                                        : remove virtual cluster
 18. gcadmin importvc <import_vc.xml | e example_file_name>                                        : import vc_name corresponding vc to current vc
 19. gcadmin startvc <vc_name1 vc_name2 ...> <os_dba_user_name> <os_dba_password>                  : start virtual cluster
 20. gcadmin stopvc <vc_name1 vc_name2 ...> <os_dba_user_name> <os_dba_password>                   : stop virtual cluster
 21. gcadmin renamevc <old_vc_name> <new_vc_name>                                                  : rename virtual cluster
 22. gcadmin rmfeventlog ip                                                                        : remove all feventlog about ip
 23. gcadmin --help                                                                                : show help info
 24. gcadmin -V,--version                                                                          : show version info

集群状态管理

查看集群信息

gcadmin命令显示集群状态、虚拟集群模式、gcware节点信息、gcluster节点信息、gnode节点信息等。
[gbase@node1 ~]$ gcadmin
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  NORMAL

======================================
|  GBASE GCWARE CLUSTER INFORMATION  |
======================================
| NodeName |   IpAddress    | gcware |
--------------------------------------
| gcware1  | 192.168.100.10 |  OPEN  |
--------------------------------------
| gcware2  | 192.168.100.12 |  OPEN  |
--------------------------------------
| gcware3  | 192.168.100.14 |  OPEN  |
--------------------------------------
========================================================
|        GBASE COORDINATOR CLUSTER INFORMATION         |
========================================================
|   NodeName   |   IpAddress    | gcluster | DataState |
--------------------------------------------------------
| coordinator1 | 192.168.100.10 |   OPEN   |     0     |
--------------------------------------------------------
| coordinator2 | 192.168.100.12 |   OPEN   |     0     |
--------------------------------------------------------
| coordinator3 | 192.168.100.14 |   OPEN   |     0     |
--------------------------------------------------------
=========================================================================================================
|                                    GBASE DATA CLUSTER INFORMATION                                     |
=========================================================================================================
| NodeName |                IpAddress                 | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
|  node1   |              192.168.100.12              |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node2   |              192.168.100.10              |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node3   |              192.168.100.14              |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------

# gcadmin命令其实是gcadmin showcluster 的简写形式。gcadmin命令的输出与gcadmin showcluster c d g的输出相同。
参数说明:
c:显示节点时仅显示gcluster节点;
d: 显示节点时仅显示gnode节点;
g: 显示节点时仅显示gcware节点

# 显示集群所有节点信息
[gbase@node1 ~]$ gcadmin showcluster c d g
CLUSTER STATE:         ACTIVE
CLUSTER MODE:          NORMAL

======================================
|  GBASE GCWARE CLUSTER INFORMATION  |
======================================
| NodeName |   IpAddress    | gcware |
--------------------------------------
| gcware1  | 192.168.100.10 |  OPEN  |
--------------------------------------
| gcware2  | 192.168.100.12 |  OPEN  |
--------------------------------------
| gcware3  | 192.168.100.14 |  OPEN  |
--------------------------------------
========================================================
|        GBASE COORDINATOR CLUSTER INFORMATION         |
========================================================
|   NodeName   |   IpAddress    | gcluster | DataState |
--------------------------------------------------------
| coordinator1 | 192.168.100.10 |   OPEN   |     0     |
--------------------------------------------------------
| coordinator2 | 192.168.100.12 |   OPEN   |     0     |
--------------------------------------------------------
| coordinator3 | 192.168.100.14 |   OPEN   |     0     |
--------------------------------------------------------
=========================================================================================================
|                                    GBASE DATA CLUSTER INFORMATION                                     |
=========================================================================================================
| NodeName |                IpAddress                 | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
|  node1   |              192.168.100.12              |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node2   |              192.168.100.10              |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node3   |              192.168.100.14              |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
# 显示集群gcluster节点信息
[gbase@node1 ~]$ gcadmin showcluster c 
CLUSTER STATE:         ACTIVE
CLUSTER MODE:          NORMAL

========================================================
|        GBASE COORDINATOR CLUSTER INFORMATION         |
========================================================
|   NodeName   |   IpAddress    | gcluster | DataState |
--------------------------------------------------------
| coordinator1 | 192.168.100.10 |   OPEN   |     0     |
--------------------------------------------------------
| coordinator2 | 192.168.100.12 |   OPEN   |     0     |
--------------------------------------------------------
| coordinator3 | 192.168.100.14 |   OPEN   |     0     |
--------------------------------------------------------

# 显示集群gnode节点信息
[gbase@node1 ~]$ gcadmin showcluster d
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  NORMAL

=========================================================================================================
|                                    GBASE DATA CLUSTER INFORMATION                                     |
=========================================================================================================
| NodeName |                IpAddress                 | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
|  node1   |              192.168.100.12              |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node2   |              192.168.100.10              |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------
|  node3   |              192.168.100.14              |       1        | OPEN  |    OPEN    |     0     |
---------------------------------------------------------------------------------------------------------

# 显示集群gcware节点信息
[gbase@node1 ~]$ gcadmin showcluster g
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  NORMAL

======================================
|  GBASE GCWARE CLUSTER INFORMATION  |
======================================
| NodeName |   IpAddress    | gcware |
--------------------------------------
| gcware1  | 192.168.100.10 |  OPEN  |
--------------------------------------
| gcware2  | 192.168.100.12 |  OPEN  |
--------------------------------------
| gcware3  | 192.168.100.14 |  OPEN  |
--------------------------------------

查看集群状态

1、CLUSTER STATE:集群工作状态,有两个值:ACTIVE/shrinkOnly

  • Active:表示集群工作正常。
  • shrinkOnly: 表上集群数据量已达到容量上限。当达到容量限制值后,集群状态会被置为 shrinkOnly,该状态限制 insert和 load 操作,用户可通过 drop table 等操作缩小空间,再执行 show license 命令,该命令会进行容量刷新并更新集群状态。
    注意:当集群中gcware 节点的Online个数小于或等于gcware 节点总数的1/2 时,gcware集群的Leader无法完成选举。此时,gcadmin工具无法显示集群状态。数据库无法连接使用。

image.png

2、VIRTUAL CLUSTER MODE:集群模式,有三个值:normal/readonly/recovery

  • Normal模式: 集群正常模式,能执行所有SQL操作。
  • Readonly模式: 只读模式,只能执行SQL查询操作,不能执行DDL/DML/Loader操作。
    在执行扩容、替换或数据备份操作时集群会在一段时间内处于只读模式。
  • Recovery模式: 备份恢复模式,一般在执行集群数据恢复或特定场景时,使用该模式。
    该模式下,不允许进行任何SQL操作。

image.png

切换集群模式:

[gbase@node1 ~]$ gcadmin switchmode readonly

========== switch cluster mode...
  switch pre mode:                 [NORMAL]
  switch mode to                   [READONLY]
  switch after mode:                 [READONLY]
[gbase@node1 ~]$ gcadmin
CLUSTER STATE:         ACTIVE
VIRTUAL CLUSTER MODE:  READONLY

======================================
|  GBASE GCWARE CLUSTER INFORMATION  |
======================================
| NodeName |   IpAddress    | gcware |
--------------------------------------
| gcware1  | 192.168.100.10 |  OPEN  |
--------------------------------------
| gcware2  | 192.168.100.12 |  OPEN  |
--------------------------------------
| gcware3  | 192.168.100.14 |  OPEN  |
--------------------------------------

3、模组进程状态:有三个值:open / close / offline
模组进程状态用于监控集群各关键功能组件的进程运行情况。

  • Open状态:
    模组工作状态正常。
  • Offline状态:
    模组进程下线,一般为硬件故障,可排查设备是否突然断电或断网等。修复故障后,需重启相关进程。
  • Close状态:
    模组进程启动失败或意外关闭等。常见原因如:端口被占用,配置文件权限和参数错误等。须查看相关日志查找原因并重启相关进程。

image.png

模组进程种类:

  • GCware node:
    gcware进程名: gcware,gcware进程是gcware功能模组的主程序,负责各节点gcluster实例间共享信息。。
  • Coordinator node:
    gcluster进程名:gclusterd。gclusterd进程是gclluster功能模组的主程序,负责sql的解析,优化、执行计划的生产、执行调度。
    自动恢复进程名:gcrecover。grecover进程是数据不一致后进行数据恢复的主程序。
  • Data node:
    gnode进程名:gbased。gbased进程是gnode功能模组的主程序,负责节点的实际存储和sql分解后的执行。
    syncserver进程名:gc_sync_server。gc_sync_server进程是当gnode节点数据不一致后协助进行数据同步的主程序。

image.png
系统自动恢复:
gcrecover 遵循先恢复 DDL 操作,然后调用同步服务 gc_sync_server 恢复数据的原则。恢复后,系统自动将 1 转换为 0。
自动恢复原理:
当某个节点执行命令失败后,数据恢复工具可监控到错误日志,然后调用同步工具,自动修复节点数据不一致的情况,确保各节点数据的一致性。

模组监控工具:

模组监控工具是一种模组进程状态查看和保护程序,当监控的模组进程意外关闭,模组监控工具会自动尝试将其拉起,从而保障集群服务正常运行。

监控工具 运行节点 监控的模组名
gcware_monit gcware节点 gcware
gcware_monit gcware节点 gcware_mmonit
gcware_mmonit gcware节点 gcware_monit
gcmonit gcluster节点 gcluster
gcmonit gcluster节点 gcrecover
gcmonit gnode节点 gbase_节点IP
gcmonit gnode节点 syncserver_节点IP
gcmonit gcluster/gnode节点 gcmmonit
gcmmonit gcluster/gnode节点 gcmonit

模组监控工具使用:

语法: gcware_monit <--start|--stop|--restart|--status[=<prog_name>]|--help|--version>
[gbase@node1 ~]$ gcware_monit --status
+-----------------------------------------------------------------------------------------------------------------------------------+
|SEG_NAME                                 PROG_NAME                                                   STATUS              PID       |
+-----------------------------------------------------------------------------------------------------------------------------------+
|gcware                                   gcware                                                      Running             11999     |
|gcware_mmonit                            gcware_mmonit                                               Running             12861     |
+-----------------------------------------------------------------------------------------------------------------------------------+

语法: gcware_mmonit <--start|--stop|--restart|--help|--version>
语法: gcmonit <--start|--stop|--restart|--status[=<prog_name>]|--help|--version> 
[gbase@node1 ~]$ gcmonit --status
+-----------------------------------------------------------------------------------------------------------------------------------+
|SEG_NAME                                 PROG_NAME                                                   STATUS              PID       |
+-----------------------------------------------------------------------------------------------------------------------------------+
|gcluster                                 gclusterd                                                   Running             20444     |
|gcrecover                                gcrecover                                                   Running             21000     |
|gbase_192.168.100.10                     /data/192.168.100.10/gnode/server/bin/gbased                Running             19387     |
|syncserver_192.168.100.10                /data/192.168.100.10/gnode/server/bin/gc_sync_server        Running             20429     |
|gcmmonit                                 gcmmonit                                                    Running             20439     |
+-----------------------------------------------------------------------------
语法:gcmmonit <--start|--stop|--restart|--help|--version>

模组启停工具

gcware_services是gcware相关进程的启停工具。其中all参数包括gcware、gcware_monit、gcware_mmonit进程。

语法:gcware_services <gcware|all> <start|stop [--force]|restart [--force]|info>
[gbase@node1 ~]$ gcware_services all restart
Stopping GCWareMonit success!
Stopping gcware :                                          [  OK  ]
Starting gcware :                                          [  OK  ]
Starting GCWareMonit success!

gcluster_services是gclluster节点和gnode节点相关进程的启停工具。其中all参数也包括gcmonit和gcmmonit进程。

语法:gcluster_services <gcluster|gcrecover|gbase|syncserver|gbase_ip|syncserver_ip|all> <start|stop [--force]|restart [--force]|info>
[gbase@node1 ~]$ gcluster_services all info
/data/192.168.100.10/gcluster/server/bin/gclusterd is running
/data/192.168.100.10/gcluster/server/bin/gcrecover is running
/data/192.168.100.10/gnode/server/bin/gbased is running
/data/192.168.100.10/gnode/server/bin/gc_sync_server is running

4、数据一致性状态:有两个值:0 / 1
• 0:主备分片数据一致
• 1:主备分片数据不一致
image.png

集群同步日志管理

DDL event日志

在DDL语句执行成功的情况下,记录执行过程中出现异常造成主备分片不一致的节点信息。

语法:
gcadmin showddlevent [detail] [<tablename segname nodeip> | <tablename nodeip> | <max_fevent_num>] [f] [vc vc_name]
[gbase@node1 ~]$ gcadmin showddlevent 
Vc event count:0
[gbase@node1 ~]$ 

DML event日志

在DML语句执行成功的情况下,记录造成主备分片不一致的异常节点的信息。

语法:
gcadmin showdmlevent [detail] [<tablename segname nodeip> | <max_fevent_num>] [f] [vc vc_name] 
[gbase@node1 ~]$ gcadmin showdmlevent
Vc event count:0
[gbase@node1 ~]$ 

DMLstorageevent 日志

在故障节点元数据损坏,无法通过DML event日志自动恢复数据的情况下,故障信息记入DML storage event 日志,尝试进行元数据重建。

语法:
gcadmin showdmlstorageevent [detail] [[table_id segname nodeip] | <max_fevent_num>] [f] [vc vc_name] 
[gbase@node1 ~]$ gcadmin showdmlstorageevent
Vc event count:0
[gbase@node1 ~]$ 

Failover 日志

SQL执行时的信息一致性保障日志。当执行SQL的管理节点出现故障,无法正常完成SQL操作时,接管节点将读取记录于gcware中的failover日志执行回退操作,从而保障各节点数据执行的一致。

语法:gcadmin showfailover
gcadmin showfailover [f]                                                         : show failover information
gcadmin showfailoverdetail <commitId> [xml_file_name]                                        : write failover information to file [xml_file_name]
[gbase@node1 ~]$ gcadmin showfailover

gcadmin showfailover: no gcluster failover information now

总结

本文主要介绍了gbase8a集群状态管理,下一节将会介绍集群分布信息管理。

最后修改时间:2024-06-24 22:36:06
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

文章被以下合辑收录

评论