暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

华为GaussDB T gs_check

墨天轮 2019-09-28
366

gs_check

gs_check工具帮助用户进行集群运行状态(集群、双机和CM状态)、集群部署巡检项(目录权限、数据库版本、环境变量和参数等)、运行巡检项(连接状态、锁数量、游标数量和连接数量等)、管理数据库对象等选项的检查,以确保数据库处于正常可用状态。

前提条件

  • 集群预安装成功。
  • 集群安装成功。

语法

  • 健康检查(安装集群的用户)
    gs_check -i ITEM [...] [-U USER] [-L] [-X XMLFILE] [-l LOGFILE] [-o OUTPUTDIR] [--skip-root-items] [--format] gs_check -e SCENE_NAME [-U USER] [-L] [-X XMLFILE] [-l LOGFILE] [-o OUTPUTDIR] [--skip-root-items] [--format] [--time-out=SECS]
  • 显示帮助信息
    gs_check -? | --help
  • 显示版本号信息
    gs_check -V | --version

参数说明

  • -U

    运行集群的用户名称。

    取值范围:本集群用户名称。

  • -i

    指定检查项。-i参数值区分大小写,支持多指定项同时查询,以“,”隔开。格式:-i item或-i item1,item2。详细的集群运行环境检查项请参见表1 集群状态检查表。

  • -e

    指定检查组。-e参数值区分大小写,不支持多检查组同时查询。格式:-e SCENE_NAME。详细的集群运行环境检查组请参见表1 集群状态检查表。

  • -X

    指定配置文件。

    格式:-X XMLFILE。

  • -l

    指定日志文件路径。

    未指定该参数时,日志文件的默认存放目录是$GPHOME/script/gspylib/inspection/output/log。检查操作正常时不收集日志。

  • -o

    指定检查报告的输出目录。

    未指定该参数时,输出目录默认是$GPHOME/script/gspylib/inspection/output。

  • -L

    指定只执行本节点检查。

  • --time-out

    设置超时时间。

  • --format

    设置输出格式,默认default,只支持default。

  • --cid

    指定检查的ID,巡检内部使用。

  • --skip-root-items

    跳过具有root权限的项目。

  • -?,--help

    显示帮助信息。

  • -V,--version

    显示版本号信息。

表1 集群状态检查表

检查分组

检查组(SCENE_NAME

检查项

描述(OK:符合预期;NG:与预期不符)。

返回值NG条件。

运行环境

os

CheckTimeZone

检查时区一致性。

时区不一致。

CheckEncoding

检查编码格式。

编码格式不一致。

CheckFirewall

检查防火墙状态。

防火墙关闭。

CheckKernelVer

检查内核版本。

OS内核版本不一致。

CheckMaxHandle

检查句柄最大设置。

文件句柄配置不满足要求。

CheckSysParams

检查系统参数。

OS参数配置和建议值不一致。

CheckNTPD

检查NTPD服务。

NTPD服务未开启。

CheckPing

检查网络通畅。

节点间网络不通。

CheckDirPermissions

检查目录权限。

目录权限与标准不符。

CheckEnvProfile

检查环境变量。

环境变量不存在或者配置不正确。

CheckProcessCount

检查系统进程数。

配置不符合参数规范脚本定义。

CheckClusterVer

检查数据库版本。

数据库版本不符合补丁矩阵。

CheckCpuUsage

CPU使用率。

CPU使用率大于等于80%。

CheckDiskUsage

系统的磁盘占用率。

数据库相关的所有挂载盘使用率超过85%。

CheckHandleUsage

文件句柄使用率。

文件句柄使用率大于等于80%

CheckMemUsage

内存使用率。

内存使用率大于等于80%

CheckProcessStatus

数据库实例进程状态。

数据库实例不存在僵死进程。

CheckSwapUsage

检查Swp使用率。

不涉及。

集群状态

cluster

CheckClusterState

检查集群状态。

集群状态异常或进程不存在。

CheckDnVersion

检查数据库版本。

主备端数据库版本不一致。

CheckClusterBalance

检查主备平衡。

集群主备不平衡。

实例状态

instStatus

CheckConnCount

检查实例连接数。

返回实际连接数和最大连接数。

数据库状态

dbStatus

CheckTableCount

检查数据库表数量。

返回数据库表数量。

CheckUncommXacts

检查未决事务。

返回未决事务数。

CheckBufferUsage

缓冲池命中率。

返回缓冲区使用率。

CheckBackup

检查数据库备份。

最近一周内未做过备份(安装不足一周不检查)。

CheckDBConn

检查数据库可连接性(本地/远程)。

所有CN、DN不可执行本地/远程连接。

CheckDBStatus

检查数据库状态。

存在CN或DN的状态为非OPEN,或者主DN或CN的开启状态为非READ WRITE,备DN的开启状态为非READ ONLY。

CheckDBUser

检查数据库用户状态。

数据库用户状态不正常或者密码过期时间不足7天 。

数据库对象

dbInst

CheckTablespaceUsage

表空间使用率。

返回表空间使用率。

CheckLockNum

检查数据库锁数量。

返回锁数量。

CheckInstMemUsage

检查实例内存使用率。

不涉及。

CheckLogLogicalLimit

检查数据库日志逻辑限制。

不涉及。

CheckArchiveLogSpace

检查归档日志空间设置。

不涉及。

RAID卡检查

raid

CheckRaidHlth

检查RAID卡健康状态。

raid卡System Overview信息的Hlth不为Opt或raid组Virtual Drives信息的State不为Optl。

CheckCacheHlthDetail

检查超级电容具体健康信息。

不满足每个raid的Firmware_Status信息中:NVCache State为ok,Replacement required为no,No space to cache offload为no,Module microcode update required为no

CheckCacheDischarge

检查超级电容充放电信息。

上次充放电不正常。

CheckCacheHlth

检查raid的超级电容健康。

电容的Cachevault_Info信息中State不为Optimal。

-

CheckRaidEnv

检查满足raid条件的环境。

不满足raid卡的检查条件。

说明:
  • NG:与预期不符;OK:与预期一致。
  • 节点级别检查(包括运行环境、RAID卡检查)以节点为单位返回查询结果;集群级别检查(包括集群状态)以集群为单位返回查询结果;实例级别的检查(包括实例状态、数据库状态、数据库对象)以实例对象为单位返回查询结果。
  • RAID卡检查:对raid组的检查项进行检查时,gs_check会先对以下条件进行检查。如果以下各项条件均满足,则继续执行raid组检查;检测到任何一项条件不满足,则检查结束,并返回查询结果。
    • 服务器型号为TaiShan 2280 V2。不满足该条件则检查结束,结果为OK;
    • CPU为ARM架构。不满足该条件则检查结束,结果为OK;
    • 操作系统为EulerOS 2.0 SP8以上版本。不满足该条件则检查结束,结果为OK;
    • RAID卡型号为SAS3508。不满足该条件则检查结束,结果为NG。

示例

  • 以-i执行指定项查询:
    omm@plat1:/opt/software/gaussdb/script> gs_check -i CheckEncoding -X /opt/software/gaussdb/clusterconfig.xml Parsing the check items config file successfully Distribute the context file to remote hosts successfully Start to health check for the cluster. Total Items:1 Nodes:3 plat1 [=========================] 1/1 plat2 [=========================] 1/1 plat3 [=========================] 1/1 Start to analysis the check result CheckEncoding...............................OK {"description": "The encoding of each node in the cluster is consistent.", "item": "CheckEncoding", "error_msg": [], "type": 2, "result": "OK", "time": "2019-06-04 14:09:27", "data": {"plat1": {"encoding": "LANG=en_US.UTF-8"}, "plat2": {"encoding": "LANG=en_US.UTF-8"}, "plat3": {"encoding": "LANG=en_US.UTF-8"}}} ============================================== Success. All check items run completed. Total:1 Success:1 For more infomation please refer to /home/dbdata/zenith/om/script/inspection/output/CheckReport_2018020973511.tar.gz
    omm@plat1:/opt/software/gaussdb/script> gs_check -i CheckEncoding -X /opt/software/gaussdb/clusterconfig.xml -L start check ping successful check ping. [HOST] plat1 [NAM] CheckEncoding [RST] OK [VAL] {"encoding": "LANG=en_US.UTF-8"} [DESC] The encoding of each node in the cluster is consistent. [TIME] 2019-06-04 14:11:05 [RAW] bash -c "unset LANG;unset SSH_SENDS_LOCALE;source /etc/profile.d/lang.sh > /dev/null 2>&1;source /etc/profile > /dev/null 2>&1;source ~/.bashrc; > /dev/null 2>&1;source ~/.profile > /dev/null 2>&1;source ~/.bash_profile > /dev/null 2>&1;locale | grep '^LANG='"
  • 以-e执行分组查询:
    omm@plat1:/opt/software/gaussdb/script> gs_check -e dbStatus -X /opt/software/gaussdb/clusterconfig.xml Parsing the check items config file successfully Distribute the context file to remote hosts successfully Start to health check for the cluster. Total Items:3 Nodes:3 plat1 [=========================] 3/3 plat2 [=========================] 3/3 plat3 [=========================] 3/3 Start to analysis the check result CheckTableCount.............................OK {"description": "The number of database tables in the cluster does not exceed 1000.", "item": "CheckTableCount", "result": "OK", "time": "2019-07-15 19:54:43", "data": {"cn_401": {"GAUSS": "0"}}, "error_msg": []} ============================================== CheckUncommXacts............................OK {"description": "There are no pending transactions for CN instances in the cluster.", "item": "CheckUncommXacts", "result": "OK", "time": "2019-07-15 19:54:43", "data": {"cn_401": {"uncommitActs": 0}}, "error_msg": []} ============================================== CheckBufferUsage............................OK {"description": "The database buffer pool hit rate in the cluster is not less than 80%.", "item": "CheckBufferUsage", "result": "OK", "time": "2019-07-15 19:54:43", "data": {"DB2_3": {"USAGE": "96.69%", "BUFFER_GETS": "72624", "DISK_READS": "2402"}, "DB1_1": {"USAGE": "96.70%", "BUFFER_GETS": "72721", "DISK_READS": "2401"}}, "error_msg": []} ============================================== CheckBackup.................................OK {"description": "Check if the backup service process starts.", "item": "CheckBackup", "result": "OK", "time": "2019-07-15 19:54:43", "data": {"DB2_3": "", "DB2_4": "", "DB1_2": "", "DB1_1": "", "cn_401": ""}, "error_msg": []} ============================================== CheckDBConn.................................OK {"description": "Check the database connection status.", "item": "CheckDBConn", "result": "OK", "time": "2019-07-15 19:54:43", "data": {"plat1": {"DB2_3": "connectable", "DB2_4": "connectable", "cn_401": "connectable", "DB1_1": "connectable", "DB1_2": "connectable"}, "plat1": {"DB2_3": "connectable", "DB2_4": "connectable", "cn_401": "connectable", "DB1_1": "connectable", "DB1_2": "connectable"}, "plat3": {"DB2_3": "connectable", "DB2_4": "connectable", "cn_401": "connectable", "DB1_1": "connectable", "DB1_2": "connectable"}}, "error_msg": []} ============================================== CheckDBStatus...............................OK {"description": "The database is OPEN and master DN and CN open_status are READ WRITE, and the standby DN status is READ ONLY.", "item": "CheckDBStatus", "result": "OK", "time": "2019-07-15 19:54:43", "data": {"DB2_3": {"GAUSS": {"status": "OPEN", "open_status": "READ WRITE"}}, "DB2_4": {"GAUSS": {"status": "OPEN", "open_status": "READ ONLY"}}, "DB1_2": {"GAUSS": {"status": "OPEN", "open_status": "READ ONLY"}}, "DB1_1": {"GAUSS": {"status": "OPEN", "open_status": "READ WRITE"}}, "cn_401": {"GAUSS": {"status": "OPEN", "open_status": "READ WRITE"}}}, "error_msg": []} ============================================== CheckDBUser.................................OK {"description": "Check database user status.", "item": "CheckDBUser", "result": "OK", "time": "2019-07-15 19:54:43", "data": {"DB2_3": {"SYS": {"account_status": "OPEN", "cryptoperiod": "+0000179"}, "PERFADM": {"account_status": "OPEN", "cryptoperiod": "+0000179"}, "PUBLIC": {"account_status": "OPEN", "cryptoperiod": "+0000179"}}, "DB2_4": {"SYS": {"account_status": "OPEN", "cryptoperiod": "+0000179"}, "PERFADM": {"account_status": "OPEN", "cryptoperiod": "+0000179"}, "PUBLIC": {"account_status": "OPEN", "cryptoperiod": "+0000179"}}, "DB1_2": {"SYS": {"account_status": "OPEN", "cryptoperiod": "+0000179"}, "PERFADM": {"account_status": "OPEN", "cryptoperiod": "+0000179"}, "PUBLIC": {"account_status": "OPEN", "cryptoperiod": "+0000179"}}, "DB1_1": {"SYS": {"account_status": "OPEN", "cryptoperiod": "+0000179"}, "PERFADM": {"account_status": "OPEN", "cryptoperiod": "+0000179"}, "PUBLIC": {"account_status": "OPEN", "cryptoperiod": "+0000179"}}, "cn_401": {"SYS": {"account_status": "OPEN", "cryptoperiod": "+0000179"}, "PERFADM": {"account_status": "OPEN", "cryptoperiod": "+0000179"}, "PUBLIC": {"account_status": "OPEN", "cryptoperiod": "+0000179"}}}, "error_msg": []} ============================================== Analysis the check result successfully Success. All check items run completed. Total:7 Success:7 For more information please refer to /home/dbdata/zenith/om/script/gspylib/inspection/output/CheckReport_dbStatus_201907157167126183.tar.gz
    omm@plat1:/opt/software/gaussdb/script> gs_check -e dbStatus -X /opt/software/gaussdb/clusterconfig.xml -L [HOST] plat1 [NAM] CheckTableCount [RST] NONE [VAL] [DESC] The number of database tables in the cluster does not exceed 1000. [TIME] 2019-07-15 20:07:24 [RAW] [HOST] plat1 [NAM] CheckUncommXacts [RST] NONE [VAL] [DESC] There are no pending transactions for CN instances in the cluster. [TIME] 2019-07-15 20:07:24 [RAW] [HOST] plat1 [NAM] CheckBufferUsage [RST] OK [VAL] {"DB2_3": {"USAGE": "97.21%", "BUFFER_GETS": "86370", "DISK_READS": "2410"}} [DESC] The database buffer pool hit rate in the cluster is not less than 80%. [TIME] 2019-07-15 20:07:24 [RAW] SELECT SUM(DISK_READS), SUM(BUFFER_GETS) FROM DV_SESSIONS; [HOST] plat1 [NAM] CheckBackup [RST] OK [VAL] {"DB2_3": "", "DB1_2": ""} [DESC] Check if the backup service process starts. [TIME] 2019-07-15 20:07:24 [RAW] SELECT START_TIME FROM SYS_BACKUP_SETS; SELECT CREATED FROM ADM_USERS WHERE USERNAME = 'SYS'; [HOST] plat1 [NAM] CheckDBConn [RST] OK [VAL] {"plat1": {"DB2_3": "connectable", "DB2_4": "connectable", "cn_401": "connectable", "DB1_1": "connectable", "DB1_2": "connectable"}} [DESC] Check the database connection status. [TIME] 2019-07-15 20:07:24 [RAW] SELECT * FROM DV_VERSION; [HOST] plat1 [NAM] CheckDBStatus [RST] OK [VAL] {"DB2_3": {"GAUSS": {"status": "OPEN", "open_status": "READ WRITE"}}, "DB1_2": {"GAUSS": {"status": "OPEN", "open_status": "READ ONLY"}}} [DESC] The database is OPEN and master DN and CN open_status are READ WRITE, and the standby DN status is READ ONLY. [TIME] 2019-07-15 20:07:24 [RAW] SELECT NAME,STATUS,OPEN_STATUS FROM DV_DATABASE; [HOST] plat1 [NAM] CheckDBUser [RST] OK [VAL] {"DB2_3": {"SYS": {"account_status": "OPEN", "cryptoperiod": "+0000179"}, "PERFADM": {"account_status": "OPEN", "cryptoperiod": "+0000179"}, "PUBLIC": {"account_status": "OPEN", "cryptoperiod": "+0000179"}}, "DB1_2": {"SYS": {"account_status": "OPEN", "cryptoperiod": "+0000179"}, "PERFADM": {"account_status": "OPEN", "cryptoperiod": "+0000179"}, "PUBLIC": {"account_status": "OPEN", "cryptoperiod": "+0000179"}}} [DESC] Check database user status. [TIME] 2019-07-15 20:07:24 [RAW] SELECT DB_USERS.USERNAME,ADM_USERS.ACCOUNT_STATUS,DB_USERS.CRYPTOPERIOD FROM DB_USERS,ADM_USERS WHERE ADM_USERS.USERNAME = DB_USERS.USERNAME;
  • 以-e执行集群状态检查:
    omm@plat1:/opt/software/gaussdb/script> gs_check -e cluster -X /opt/software/gaussdb/clusterconfig.xml Parsing the check items config file successfully Distribute the context file to remote hosts successfully Start to health check for the cluster. Total Items:3 Nodes:3 plat1 [=========================] 3/3 plat2 [=========================] 3/3 plat3 [=========================] 3/3 Start to analysis the check result CheckClusterState...........................OK {"description": "The cluster status is normal.", "item": "CheckClusterState", "error_msg": [], "type": 1, "result": "OK", "time": "2019-06-04 14:20:10", "data": {"clusterStatus": "OK"}} ============================================== CheckDnVersion..............................OK {"description": "The database is in the same version.", "item": "CheckDnVersion", "error_msg": [], "type": 1, "result": "OK", "time": "2019-06-04 14:20:10", "data": {"DB1_1": "3e2b1c1", "DB1_2": "3e2b1c1", "DB2_1": "3e2b1c1", "DB2_2": "3e2b1c1"}} ============================================== CheckClusterBalance.........................OK {"description": "Cluster-based backup state.", "item": "CheckClusterBalance", "error_msg": [], "type": 1, "result": "OK", "time": "2019-06-04 14:20:10", "data": {"balanced": "OK"}} ============================================== Analysis the check result successfully Success. All check items run completed. Total:3 Success:3 For more infomation please refer to /home/dbdata/zenith/om/script/inspection/output/CheckReport_cluster_2018020973893.tar.gz
  • 以-U执行指定本用户查询(非本集群用户显示执行失败,不指定默认本集群用户):
    omm@plat1:/opt/software/gaussdb/script> gs_check -i CheckEncoding -X /opt/software/gaussdb/clusterconfig.xml -U omm Parsing the check items config file successfully Distribute the context file to remote hosts successfully Start to health check for the cluster. Total Items:1 Nodes:3 plat1 [=========================] 1/1 plat2 [=========================] 1/1 plat3 [=========================] 1/1 Start to analysis the check result CheckEncoding...............................OK {"description": "The encoding of each node in the cluster is consistent.", "item": "CheckEncoding", "error_msg": [], "type": 2, "result": "OK", "time": "2019-06-04 14:22:24", "data": {"plat1": {"encoding": "LANG=en_US.UTF-8"}, "plat2": {"encoding": "LANG=en_US.UTF-8"}, "plat3": {"encoding": "LANG=en_US.UTF-8"}}} ============================================== Analysis the check result successfully Success. All check items run completed. Total:1 Success:1 For more infomation please refer to /home/dbdata/zenith/om/script/inspection/output/CheckReport_2018020974137.tar.gz
  • 以--skip-root-items跳过分组内root检查项查询(假设CheckFirewall检查项为root检查项):
    omm@plat1:/opt/software/gaussdb/script> gs_check -i CheckProcessCount,CheckTimeZone,CheckFirewall -X /opt/software/gaussdb/clusterconfig.xml --skip-root-items Parsing the check items config file successfully Distribute the context file to remote hosts successfully Start to health check for the cluster. Total Items:2 Nodes:3 plat1 [=========================] 2/2 plat2 [=========================] 2/2 plat3 [=========================] 2/2 Start to analysis the check result CheckProcessCount...........................OK {"description": "The current number of processes and the maximum number of processes on each node in the cluster.", "item": "CheckProcessCount", "time": "2019-06-04 14:25:53", "data": {"plat1": {"maxuproc": "62712", "pscount": "27"}, "plat2": {"maxuproc": "62712", "pscount": "14"}, "plat3": {"maxuproc": "62712", "pscount": "24"}}, "error_msg": [], "result": "OK"} ============================================== CheckTimeZone...............................OK {"description": "The time zones of each node in the cluster are consistent.", "item": "CheckTimeZone", "time": "2019-06-04 14:25:53", "data": {"plat1": {"time_zone": "+0800"}, "plat2": {"time_zone": "+0800"}, "plat3": {"time_zone": "+0800"}}, "result": "OK", "error_msg": []} ============================================== Analysis the check result successfully Success. All check items run completed. Total:2 Success:2 For more infomation please refer to /home/dbdata/zenith/om/script/inspection/output/CheckReport_201809274251549598.tar.gz

相关命令

gs_preinstall,gs_install

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论