Gbase8a MPP Cluster产品介绍以及安装部署
南大通用GBase8a MPP Cluster介绍以及安装部署
Gbase8a集群管理工具-gcadmin
gcadmin工具是专门为DBA管理员提供的用于对集群进行管理和监控的工具软件。随GBase 8a数据库一起安装,部署在gcware/bin目录中。

查看版本:
[gbase@node1 ~]$ gcadmin -V
gcadmin 9.5 build 509af275af
查看帮助信息:
[gbase@node1 ~]$ gcadmin --help
Usage: gcadmin <command> [arg1[, arg2...]]
1. gcadmin distribution <gcChangeInfo.xml> <p num> [d num] [extension] [pattern 1|2]
[db_user user_name] [db_pwd password] [dba_os_password password]
[vc vc_name] : generate distribution, db_user and db_pwd shall input
if database password changed and new distribution data nodes
more than old distribution
2. gcadmin rmdistribution [ID] [vc vc_name] : remove distribution from vc
3. gcadmin addnodes gcChangeInfo.xml [vc_name | single_vc_add_to_rc] : add nodes to cluster or vc, parameter [single_vc_add_to_rc] used for single vc mode add nodes to root cluster
4. gcadmin rmnodes gcChangeInfo.xml [vc_name | single_vc_rm_to_rc] : remove nodes from cluster or vc, parameter [single_vc_rm_to_rc] used for single vc mode remove nodes from default vc to root cluster
5. gcadmin showdistribution [node | f] [vc vc_name] : show cluster distribution, or segments on nodes when
use parameter [node],
[vc vc_name] is unnecessary if only one vc
6. gcadmin switchmode <mode> [vc vc_name | coordinator] : switch cluster mode, mode take value in
[ normal | readonly | recovery ],
[vc vc_name] is unnecessary if only one vc
7. gcadmin showlock [f] : show current cluster lock information,
include lock name, lock owner ip address, etc
8. gcadmin showddlevent [detail] [<tablename segname nodeip> | <tablename nodeip> | <max_fevent_num>]
[f] [vc vc_name] : show cluster ddl fail event,
replicated table segname is [n0],
[vc vc_name] is unnecessary if only one vc
9. gcadmin showdmlevent [detail] [<tablename segname nodeip> | <max_fevent_num>] [f] [vc vc_name] : show current cluster dml fail event, replicated table segname is [n0],
[vc vc_name] is unnecessary if only one vc
10. gcadmin showdmlstorageevent [detail] [[table_id segname nodeip] | <max_fevent_num>] [f] [vc vc_name] : show current cluster dml storage fail event,
replicated table segname is [n0],
[vc vc_name] is unnecessary if only one vc
11. gcadmin showcluster [c | vc vcname] [d] [g] [f] [nrt] : show vc or cluster information, include all nodes,
cluster state and cluster node information
12. gcadmin getdistribution <ID> <distribution_info.xml> [vc vc_name] : get distribution information
13. gcadmin setnodestate ip <state> : set one node state,state take value in: failure unavailable normal
14. gcadmin showfailover [f] : show failover information
15. gcadmin showfailoverdetail <commitId> [xml_file_name] : write failover information to file [xml_file_name]
16. gcadmin createvc <create_vc.xml | e example_file_name> : create virtual cluster
17. gcadmin rmvc <vc_name> : remove virtual cluster
18. gcadmin importvc <import_vc.xml | e example_file_name> : import vc_name corresponding vc to current vc
19. gcadmin startvc <vc_name1 vc_name2 ...> <os_dba_user_name> <os_dba_password> : start virtual cluster
20. gcadmin stopvc <vc_name1 vc_name2 ...> <os_dba_user_name> <os_dba_password> : stop virtual cluster
21. gcadmin renamevc <old_vc_name> <new_vc_name> : rename virtual cluster
22. gcadmin rmfeventlog ip : remove all feventlog about ip
23. gcadmin --help : show help info
24. gcadmin -V,--version : show version info
集群状态管理
查看集群信息
gcadmin命令显示集群状态、虚拟集群模式、gcware节点信息、gcluster节点信息、gnode节点信息等。
[gbase@node1 ~]$ gcadmin
CLUSTER STATE: ACTIVE
VIRTUAL CLUSTER MODE: NORMAL
======================================
| GBASE GCWARE CLUSTER INFORMATION |
======================================
| NodeName | IpAddress | gcware |
--------------------------------------
| gcware1 | 192.168.100.10 | OPEN |
--------------------------------------
| gcware2 | 192.168.100.12 | OPEN |
--------------------------------------
| gcware3 | 192.168.100.14 | OPEN |
--------------------------------------
========================================================
| GBASE COORDINATOR CLUSTER INFORMATION |
========================================================
| NodeName | IpAddress | gcluster | DataState |
--------------------------------------------------------
| coordinator1 | 192.168.100.10 | OPEN | 0 |
--------------------------------------------------------
| coordinator2 | 192.168.100.12 | OPEN | 0 |
--------------------------------------------------------
| coordinator3 | 192.168.100.14 | OPEN | 0 |
--------------------------------------------------------
=========================================================================================================
| GBASE DATA CLUSTER INFORMATION |
=========================================================================================================
| NodeName | IpAddress | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
| node1 | 192.168.100.12 | 1 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
| node2 | 192.168.100.10 | 1 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
| node3 | 192.168.100.14 | 1 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
# gcadmin命令其实是gcadmin showcluster 的简写形式。gcadmin命令的输出与gcadmin showcluster c d g的输出相同。
参数说明:
c:显示节点时仅显示gcluster节点;
d: 显示节点时仅显示gnode节点;
g: 显示节点时仅显示gcware节点
# 显示集群所有节点信息
[gbase@node1 ~]$ gcadmin showcluster c d g
CLUSTER STATE: ACTIVE
CLUSTER MODE: NORMAL
======================================
| GBASE GCWARE CLUSTER INFORMATION |
======================================
| NodeName | IpAddress | gcware |
--------------------------------------
| gcware1 | 192.168.100.10 | OPEN |
--------------------------------------
| gcware2 | 192.168.100.12 | OPEN |
--------------------------------------
| gcware3 | 192.168.100.14 | OPEN |
--------------------------------------
========================================================
| GBASE COORDINATOR CLUSTER INFORMATION |
========================================================
| NodeName | IpAddress | gcluster | DataState |
--------------------------------------------------------
| coordinator1 | 192.168.100.10 | OPEN | 0 |
--------------------------------------------------------
| coordinator2 | 192.168.100.12 | OPEN | 0 |
--------------------------------------------------------
| coordinator3 | 192.168.100.14 | OPEN | 0 |
--------------------------------------------------------
=========================================================================================================
| GBASE DATA CLUSTER INFORMATION |
=========================================================================================================
| NodeName | IpAddress | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
| node1 | 192.168.100.12 | 1 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
| node2 | 192.168.100.10 | 1 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
| node3 | 192.168.100.14 | 1 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
# 显示集群gcluster节点信息
[gbase@node1 ~]$ gcadmin showcluster c
CLUSTER STATE: ACTIVE
CLUSTER MODE: NORMAL
========================================================
| GBASE COORDINATOR CLUSTER INFORMATION |
========================================================
| NodeName | IpAddress | gcluster | DataState |
--------------------------------------------------------
| coordinator1 | 192.168.100.10 | OPEN | 0 |
--------------------------------------------------------
| coordinator2 | 192.168.100.12 | OPEN | 0 |
--------------------------------------------------------
| coordinator3 | 192.168.100.14 | OPEN | 0 |
--------------------------------------------------------
# 显示集群gnode节点信息
[gbase@node1 ~]$ gcadmin showcluster d
CLUSTER STATE: ACTIVE
VIRTUAL CLUSTER MODE: NORMAL
=========================================================================================================
| GBASE DATA CLUSTER INFORMATION |
=========================================================================================================
| NodeName | IpAddress | DistributionId | gnode | syncserver | DataState |
---------------------------------------------------------------------------------------------------------
| node1 | 192.168.100.12 | 1 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
| node2 | 192.168.100.10 | 1 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
| node3 | 192.168.100.14 | 1 | OPEN | OPEN | 0 |
---------------------------------------------------------------------------------------------------------
# 显示集群gcware节点信息
[gbase@node1 ~]$ gcadmin showcluster g
CLUSTER STATE: ACTIVE
VIRTUAL CLUSTER MODE: NORMAL
======================================
| GBASE GCWARE CLUSTER INFORMATION |
======================================
| NodeName | IpAddress | gcware |
--------------------------------------
| gcware1 | 192.168.100.10 | OPEN |
--------------------------------------
| gcware2 | 192.168.100.12 | OPEN |
--------------------------------------
| gcware3 | 192.168.100.14 | OPEN |
--------------------------------------
查看集群状态
1、CLUSTER STATE:集群工作状态,有两个值:ACTIVE/shrinkOnly
- Active:表示集群工作正常。
- shrinkOnly: 表上集群数据量已达到容量上限。当达到容量限制值后,集群状态会被置为 shrinkOnly,该状态限制 insert和 load 操作,用户可通过 drop table 等操作缩小空间,再执行 show license 命令,该命令会进行容量刷新并更新集群状态。
注意:当集群中gcware 节点的Online个数小于或等于gcware 节点总数的1/2 时,gcware集群的Leader无法完成选举。此时,gcadmin工具无法显示集群状态。数据库无法连接使用。

2、VIRTUAL CLUSTER MODE:集群模式,有三个值:normal/readonly/recovery
- Normal模式: 集群正常模式,能执行所有SQL操作。
- Readonly模式: 只读模式,只能执行SQL查询操作,不能执行DDL/DML/Loader操作。
在执行扩容、替换或数据备份操作时集群会在一段时间内处于只读模式。 - Recovery模式: 备份恢复模式,一般在执行集群数据恢复或特定场景时,使用该模式。
该模式下,不允许进行任何SQL操作。

切换集群模式:
[gbase@node1 ~]$ gcadmin switchmode readonly
========== switch cluster mode...
switch pre mode: [NORMAL]
switch mode to [READONLY]
switch after mode: [READONLY]
[gbase@node1 ~]$ gcadmin
CLUSTER STATE: ACTIVE
VIRTUAL CLUSTER MODE: READONLY
======================================
| GBASE GCWARE CLUSTER INFORMATION |
======================================
| NodeName | IpAddress | gcware |
--------------------------------------
| gcware1 | 192.168.100.10 | OPEN |
--------------------------------------
| gcware2 | 192.168.100.12 | OPEN |
--------------------------------------
| gcware3 | 192.168.100.14 | OPEN |
--------------------------------------
3、模组进程状态:有三个值:open / close / offline
模组进程状态用于监控集群各关键功能组件的进程运行情况。
- Open状态:
模组工作状态正常。 - Offline状态:
模组进程下线,一般为硬件故障,可排查设备是否突然断电或断网等。修复故障后,需重启相关进程。 - Close状态:
模组进程启动失败或意外关闭等。常见原因如:端口被占用,配置文件权限和参数错误等。须查看相关日志查找原因并重启相关进程。

模组进程种类:
- GCware node:
gcware进程名: gcware,gcware进程是gcware功能模组的主程序,负责各节点gcluster实例间共享信息。。 - Coordinator node:
gcluster进程名:gclusterd。gclusterd进程是gclluster功能模组的主程序,负责sql的解析,优化、执行计划的生产、执行调度。
自动恢复进程名:gcrecover。grecover进程是数据不一致后进行数据恢复的主程序。 - Data node:
gnode进程名:gbased。gbased进程是gnode功能模组的主程序,负责节点的实际存储和sql分解后的执行。
syncserver进程名:gc_sync_server。gc_sync_server进程是当gnode节点数据不一致后协助进行数据同步的主程序。

系统自动恢复:
gcrecover 遵循先恢复 DDL 操作,然后调用同步服务 gc_sync_server 恢复数据的原则。恢复后,系统自动将 1 转换为 0。
自动恢复原理:
当某个节点执行命令失败后,数据恢复工具可监控到错误日志,然后调用同步工具,自动修复节点数据不一致的情况,确保各节点数据的一致性。
模组监控工具:
模组监控工具是一种模组进程状态查看和保护程序,当监控的模组进程意外关闭,模组监控工具会自动尝试将其拉起,从而保障集群服务正常运行。
| 监控工具 | 运行节点 | 监控的模组名 |
|---|---|---|
| gcware_monit | gcware节点 | gcware |
| gcware_monit | gcware节点 | gcware_mmonit |
| gcware_mmonit | gcware节点 | gcware_monit |
| gcmonit | gcluster节点 | gcluster |
| gcmonit | gcluster节点 | gcrecover |
| gcmonit | gnode节点 | gbase_节点IP |
| gcmonit | gnode节点 | syncserver_节点IP |
| gcmonit | gcluster/gnode节点 | gcmmonit |
| gcmmonit | gcluster/gnode节点 | gcmonit |
模组监控工具使用:
语法: gcware_monit <--start|--stop|--restart|--status[=<prog_name>]|--help|--version>
[gbase@node1 ~]$ gcware_monit --status
+-----------------------------------------------------------------------------------------------------------------------------------+
|SEG_NAME PROG_NAME STATUS PID |
+-----------------------------------------------------------------------------------------------------------------------------------+
|gcware gcware Running 11999 |
|gcware_mmonit gcware_mmonit Running 12861 |
+-----------------------------------------------------------------------------------------------------------------------------------+
语法: gcware_mmonit <--start|--stop|--restart|--help|--version>
语法: gcmonit <--start|--stop|--restart|--status[=<prog_name>]|--help|--version>
[gbase@node1 ~]$ gcmonit --status
+-----------------------------------------------------------------------------------------------------------------------------------+
|SEG_NAME PROG_NAME STATUS PID |
+-----------------------------------------------------------------------------------------------------------------------------------+
|gcluster gclusterd Running 20444 |
|gcrecover gcrecover Running 21000 |
|gbase_192.168.100.10 /data/192.168.100.10/gnode/server/bin/gbased Running 19387 |
|syncserver_192.168.100.10 /data/192.168.100.10/gnode/server/bin/gc_sync_server Running 20429 |
|gcmmonit gcmmonit Running 20439 |
+-----------------------------------------------------------------------------
语法:gcmmonit <--start|--stop|--restart|--help|--version>
模组启停工具
gcware_services是gcware相关进程的启停工具。其中all参数包括gcware、gcware_monit、gcware_mmonit进程。
语法:gcware_services <gcware|all> <start|stop [--force]|restart [--force]|info>
[gbase@node1 ~]$ gcware_services all restart
Stopping GCWareMonit success!
Stopping gcware : [ OK ]
Starting gcware : [ OK ]
Starting GCWareMonit success!
gcluster_services是gclluster节点和gnode节点相关进程的启停工具。其中all参数也包括gcmonit和gcmmonit进程。
语法:gcluster_services <gcluster|gcrecover|gbase|syncserver|gbase_ip|syncserver_ip|all> <start|stop [--force]|restart [--force]|info>
[gbase@node1 ~]$ gcluster_services all info
/data/192.168.100.10/gcluster/server/bin/gclusterd is running
/data/192.168.100.10/gcluster/server/bin/gcrecover is running
/data/192.168.100.10/gnode/server/bin/gbased is running
/data/192.168.100.10/gnode/server/bin/gc_sync_server is running
4、数据一致性状态:有两个值:0 / 1
• 0:主备分片数据一致
• 1:主备分片数据不一致

集群同步日志管理
DDL event日志
在DDL语句执行成功的情况下,记录执行过程中出现异常造成主备分片不一致的节点信息。
语法:
gcadmin showddlevent [detail] [<tablename segname nodeip> | <tablename nodeip> | <max_fevent_num>] [f] [vc vc_name]
[gbase@node1 ~]$ gcadmin showddlevent
Vc event count:0
[gbase@node1 ~]$
DML event日志
在DML语句执行成功的情况下,记录造成主备分片不一致的异常节点的信息。
语法:
gcadmin showdmlevent [detail] [<tablename segname nodeip> | <max_fevent_num>] [f] [vc vc_name]
[gbase@node1 ~]$ gcadmin showdmlevent
Vc event count:0
[gbase@node1 ~]$
DMLstorageevent 日志
在故障节点元数据损坏,无法通过DML event日志自动恢复数据的情况下,故障信息记入DML storage event 日志,尝试进行元数据重建。
语法:
gcadmin showdmlstorageevent [detail] [[table_id segname nodeip] | <max_fevent_num>] [f] [vc vc_name]
[gbase@node1 ~]$ gcadmin showdmlstorageevent
Vc event count:0
[gbase@node1 ~]$
Failover 日志
SQL执行时的信息一致性保障日志。当执行SQL的管理节点出现故障,无法正常完成SQL操作时,接管节点将读取记录于gcware中的failover日志执行回退操作,从而保障各节点数据执行的一致。
语法:gcadmin showfailover
gcadmin showfailover [f] : show failover information
gcadmin showfailoverdetail <commitId> [xml_file_name] : write failover information to file [xml_file_name]
[gbase@node1 ~]$ gcadmin showfailover
gcadmin showfailover: no gcluster failover information now
总结
本文主要介绍了gbase8a集群状态管理,下一节将会介绍集群分布信息管理。




