暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

第三章—DMDSC集群动态扩展节点

原创 随风 2024-01-22
226

DMDSC集群动态扩展节点


一、动态扩展节点需求

  DMDSC 集群支持动态扩展节点,每次扩展可以在原有基础上增加一个节点。动态扩展节点要求当前 DMDSC 集群的所有节点都为 OK 状态,所有 dmserver 实例都 处于 OPEN 状态,且可以正常访问。
  注意:扩展节点过程中,不应该有修改数据库状态或模式的操作。

二、开始部署

2.1、检查当前集群状态

[dmdba@dmdsc01 bin]$ dmcssm dmcssm.ini [monitor] 2022-06-16 13:44:57: CSS MONITOR V8 [monitor] 2022-06-16 13:45:17: CSS MONITOR SYSTEM IS READY. [monitor] 2022-06-16 13:45:17: Wait CSS Control Node choosed... [monitor] 2022-06-16 13:45:18: Wait CSS Control Node choosed succeed. show monitor current time:2022-06-16 13:45:21, n_group:3 =================== group[name = GRP_CSS, seq = 0, type = CSS, Control Node = 1] ======================================== [CSS0] auto check = TRUE, global info: [ASM0] auto restart = TRUE [DSC0] auto restart = TRUE [CSS1] auto check = TRUE, global info: [ASM1] auto restart = TRUE [DSC1] auto restart = TRUE ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts 2022-06-16 13:45:20 CSS0 0 5336 Normal Node OPEN WORKING OK TRUE 13419125 13421771 2022-06-16 13:45:20 CSS1 1 5337 Control Node OPEN WORKING OK TRUE 13451753 13454404 =================== group[name = GRP_ASM, seq = 1, type = ASM, Control Node = 0] ======================================== n_ok_ep = 2 ok_ep_arr(index, seqno): (0, 0) (1, 1) sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL crash process over flag is TRUE ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts 2022-06-16 13:45:20 ASM0 0 5436 Control Node OPEN WORKING OK TRUE 13432204 13434808 2022-06-16 13:45:20 ASM1 1 5437 Normal Node OPEN WORKING OK TRUE 13465137 13467745 =================== group[name = GRP_DSC, seq = 2, type = DB, Control Node = 0] ======================================== n_ok_ep = 2 ok_ep_arr(index, seqno): (0, 0) (1, 1) sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL crash process over flag is TRUE ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts 2022-06-16 13:45:20 DSC0 0 5236 Control Node OPEN WORKING OK TRUE 14261853 14263514 2022-06-16 13:45:20 DSC1 1 5236 Normal Node OPEN WORKING OK TRUE 14300882 14302532 ================================================================================================================== 已用时间: 2.891(毫秒). 执行号:313.

2.2、导出dmdcr_cfg文件

[dmdba@dmdsc01 ~]$ dmasmcmd DMASMCMD V8 ASM>export dcrdisk '/dev/raw/raw1' to '/home/dmdba/dmdcr_cfg_bak3.ini' ASMCMD export DCRDISK success. Used time: 4.969(ms).

2.3、新增节点DSC2添加日志

[dmdba@dmdsc01 ~]$ disql SYSDBA/SYSDBA 服务器[LOCALHOST:5236]:处于普通打开状态 登录使用时间 : 5.016(ms) disql V8 SQL> alter database add node logfile '+DMLOG/log/dsc2_log01.log' size 2048,'+DMLOG/log/dsc2_log02.log' size 2048; 操作已执行 已用时间: 504.634(毫秒). 执行号:300. --查看是否添加成功 [dmdba@dmdsc02 ~]$ dmasmtool dcr_ini=/dm8/dsc/config/dmdcr.ini DMASMTOOL V8 ASM>ls + disk groups total [4]...... NO.1 name: DMLOG NO.2 name: DMDATA NO.3 name: VOTE NO.4 name: DCR Used time: 4.446(ms). ASM>ls -r +DMLOG +DMLOG: dir : log +DMLOG/log: file : dsc0_log01.log file : dsc0_log02.log file : dsc1_log01.log file : dsc1_log02.log file : dsc2_log01.log file : dsc2_log02.log Used time: 32.808(ms).

2.4、拷贝dsc0配置文件到新服务器

[dmdba@dmdsc01 config]$ scp -r dsc0_config/ 192.168.10.102:///dm8/dsc/config/dsc2_config dmdba@192.168.10.102's password: dminit20220607111325.log 100% 1157 1.1KB/s 00:00 sqllog.ini 100% 481 0.5KB/s 00:00 dmwatcher.ini 100% 892 0.9KB/s 00:00 dmmal.ini 100% 841 0.8KB/s 00:00 dmarch.ini 100% 395 0.4KB/s 00:00 dm.ini

2.5、修改dmarch.ini文件

--节点dsc2 [dmdba@dmdw01 dsc2_config]$ vi dmarch.ini ARCH_LOCAL_SHARE = 1 [ARCHIVE_LOCAL] ARCH_TYPE = LOCAL ARCH_DEST = +DMDATA/DSC2/arch ARCH_FILE_SIZE = 1024 ARCH_SPACE_LIMIT = 51200 [ARCHIVE_REMOTE1] ARCH_TYPE = REMOTE ARCH_DEST = DSC0 ARCH_INCOMING_PATH = +DMDATA/DSC0/arch ARCH_FILE_SIZE = 1024 ARCH_SPACE_LIMIT = 51200 [ARCHIVE_REMOTE2] ARCH_TYPE = REMOTE ARCH_DEST = DSC1 ARCH_INCOMING_PATH = +DMDATA/DSC1/arch ARCH_FILE_SIZE = 1024 ARCH_SPACE_LIMIT = 51200 --节点dsc0和dsc1分别加入DSC2的归档路径 [dmdba@dmdsc02 dsc1_config]$ vi dmarch.ini [ARCHIVE_REMOTE2] ARCH_TYPE = REMOTE ARCH_DEST = DSC2 ARCH_INCOMING_PATH = +DMDATA/DSC2/arch ARCH_FILE_SIZE = 1024 ARCH_SPACE_LIMIT = 51200

2.6、修改dmmal.ini文件

--节点dsc0、dsc1、dsc2配置一样 [dmdba@dmdsc02 dsc1_config]$ vi dmmal.ini #DaMeng Database Mail Configuration file #this is comments MAL_CHECK_INTERVAL = 30 MAL_COMBIN_BUF_SIZE = 0 MAL_SEND_THRESHOLD = 2048 MAL_CONN_FAIL_INTERVAL = 10 MAL_LOGIN_TIMEOUT = 15 MAL_BUF_SIZE = 100 MAL_SYS_BUF_SIZE = 0 MAL_VPOOL_SIZE = 128 MAL_COMPRESS_LEVEL = 0 MAL_TEMP_PATH = [MAL_INST0] MAL_INST_NAME = DSC0 MAL_HOST = 192.168.10.100 MAL_PORT = 5736 MAL_INST_HOST = 192.168.2.100 MAL_INST_PORT = 5236 MAL_DW_PORT = 5836 MAL_LINK_MAGIC = 0 MAL_INST_DW_PORT = 5936 [MAL_INST1] MAL_INST_NAME = DSC1 MAL_HOST = 192.168.10.101 MAL_PORT = 5737 MAL_INST_HOST = 192.168.2.101 MAL_INST_PORT = 5236 MAL_DW_PORT = 5837 MAL_LINK_MAGIC = 0 MAL_INST_DW_PORT = 5937 [MAL_INST2] MAL_INST_NAME = DSC2 MAL_HOST = 192.168.10.102 MAL_PORT = 5738 MAL_INST_HOST = 192.168.2.102 MAL_INST_PORT = 5236 MAL_DW_PORT = 5838 MAL_LINK_MAGIC = 0 MAL_INST_DW_PORT = 5938

2.7、修改dmdcr.ini文件

--修改DSC2的dmdcr文件 DMDCR_PATH = /dev/raw/raw1 DMDCR_MAL_PATH = /dm8/dsc/config/dmasvrmal.ini DMDCR_SEQNO = 2 DMDCR_AUTO_OPEN_CHECK = 90 DMDCR_ASM_RESTART_INTERVAL = 30 #CSS认定ASM故障重启的时间 DMDCR_ASM_STARTUP_CMD = /dm8/bin/dmasmsvr dcr_ini=/dm8/dsc/config/dmdcr.ini DMDCR_DB_RESTART_INTERVAL = 60 #CSS认定DSC故障重启的时间 DMDCR_DB_STARTUP_CMD = /dm8/bin/dmserver path=/dm8/dsc/config/dsc2_config/dm.ini dcr_ini=/dm8/dsc/config/dmdcr.ini

2.8、修改dmasvrmal.ini

--修改dmasvrmal.ini文件(DSC0、1、2) [MAL_INST1] MAL_INST_NAME = ASM0 MAL_HOST = 192.168.10.100 #心跳地址 MAL_PORT = 5636 #MAL监听端口 [MAL_INST2] MAL_INST_NAME = ASM1 MAL_HOST = 192.168.10.101 MAL_PORT = 5637 [MAL_INST3] MAL_INST_NAME = ASM2 MAL_HOST = 192.168.10.102 MAL_PORT = 5638

2.9、修改dmdcr_cfg_bak.ini

按照每组的顺序写入,端口号不能重复 --修改dmdcr_cfg_bak3.ini [dmdba@dmdsc01 ~]$ vi dmdcr_cfg_bak3.ini # the file is auto-created by system, self edit is invalid! #DCR HDR DCR_N_GRP = 3 DCR_VTD_PATH = /dev/raw/raw2 DCR_OGUID = 45331 [GRP] DCR_GRP_TYPE = CSS DCR_GRP_NAME = GRP_CSS DCR_GRP_N_EP = 3 DCR_GRP_EP_ARR = {0,1,2} DCR_GRP_N_ERR_EP = 0 DCR_GRP_ERR_EP_ARR = {} DCR_GRP_DSKCHK_CNT = 60 [GRP] DCR_GRP_TYPE = ASM DCR_GRP_NAME = GRP_ASM DCR_GRP_N_EP = 3 DCR_GRP_EP_ARR = {0,1,2} DCR_GRP_N_ERR_EP = 0 DCR_GRP_ERR_EP_ARR = {} DCR_GRP_DSKCHK_CNT = 60 [GRP] [GRP_DSC] DCR_EP_NAME = DSC0 DCR_GRP_TYPE = DB DCR_GRP_NAME = GRP_DSC DCR_GRP_N_EP = 3 DCR_GRP_EP_ARR = {0,1,2} DCR_GRP_N_ERR_EP = 0 DCR_GRP_ERR_EP_ARR = {} DCR_GRP_DSKCHK_CNT = 60 [GRP_CSS] DCR_EP_NAME = CSS0 DCR_EP_HOST = 192.168.10.100 DCR_EP_PORT = 5336 [GRP_CSS] DCR_EP_NAME = CSS1 DCR_EP_HOST = 192.168.10.101 DCR_EP_PORT = 5337 [GRP_CSS] DCR_EP_NAME = CSS2 DCR_EP_HOST = 192.168.10.102 DCR_EP_PORT = 5338 [GRP_ASM] DCR_EP_NAME = ASM0 DCR_EP_SHM_KEY = 93360 DCR_EP_SHM_SIZE = 10 DCR_EP_HOST = 192.168.10.100 DCR_EP_PORT = 5436 DCR_EP_ASM_LOAD_PATH = /dev/raw [GRP_ASM] DCR_EP_NAME = ASM1 DCR_EP_SHM_KEY = 93361 DCR_EP_SHM_SIZE = 10 DCR_EP_HOST = 192.168.10.101 DCR_EP_PORT = 5437 DCR_EP_ASM_LOAD_PATH = /dev/raw [GRP_ASM] DCR_EP_NAME = ASM2 DCR_EP_SHM_KEY = 93362 DCR_EP_SHM_SIZE = 10 DCR_EP_HOST = 192.168.10.102 DCR_EP_PORT = 5438 DCR_EP_ASM_LOAD_PATH = /dev/raw [GRP_DSC] DCR_EP_NAME = DSC0 DCR_EP_SEQNO = 0 DCR_EP_PORT = 5236 DCR_CHECK_PORT = 5536 [GRP_DSC] DCR_EP_NAME = DSC1 DCR_EP_SEQNO = 1 DCR_EP_PORT = 5236 DCR_CHECK_PORT = 5537 [GRP_DSC] DCR_EP_NAME = DSC2 DCR_EP_SEQNO = 2 DCR_EP_PORT = 5236 DCR_CHECK_PORT = 5538

2.10、使用 dmasmcmd工具将新增节点信息写回磁盘

--在DSC0使用 dmasmcmd工具将新增节点信息写回磁盘,新增节点作为 error 节点 [dmdba@dmdsc01 ~]$ dmasmcmd DMASMCMD V8 ASM>extend dcrdisk '/dev/raw/raw1' from '/home/dmdba/dmdcr_cfg_bak3.ini' ASMCMD extend node for dcr disk success. ASMCMD extend node for vote disk success. Used time: 102.029(ms).

2.11、在 dmcssm 控制台执行扩展节点命令

--在节点dsc0上执行 [dmdba@dmdsc01 bin]$ dmcssm dmcssm.ini [monitor] 2022-06-16 13:44:57: CSS MONITOR V8 [monitor] 2022-06-16 13:45:17: CSS MONITOR SYSTEM IS READY. [monitor] 2022-06-16 13:45:17: Wait CSS Control Node choosed... [monitor] 2022-06-16 13:45:18: Wait CSS Control Node choosed succeed. show monitor current time:2022-06-16 13:45:21, n_group:3 =================== group[name = GRP_CSS, seq = 0, type = CSS, Control Node = 1] ======================================== [CSS0] auto check = TRUE, global info: [ASM0] auto restart = TRUE [DSC0] auto restart = TRUE [CSS1] auto check = TRUE, global info: [ASM1] auto restart = TRUE [DSC1] auto restart = TRUE ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts 2022-06-16 13:45:20 CSS0 0 5336 Normal Node OPEN WORKING OK TRUE 13419125 13421771 2022-06-16 13:45:20 CSS1 1 5337 Control Node OPEN WORKING OK TRUE 13451753 13454404 =================== group[name = GRP_ASM, seq = 1, type = ASM, Control Node = 0] ======================================== n_ok_ep = 2 ok_ep_arr(index, seqno): (0, 0) (1, 1) sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL crash process over flag is TRUE ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts 2022-06-16 13:45:20 ASM0 0 5436 Control Node OPEN WORKING OK TRUE 13432204 13434808 2022-06-16 13:45:20 ASM1 1 5437 Normal Node OPEN WORKING OK TRUE 13465137 13467745 =================== group[name = GRP_DSC, seq = 2, type = DB, Control Node = 0] ======================================== n_ok_ep = 2 ok_ep_arr(index, seqno): (0, 0) (1, 1) sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL crash process over flag is TRUE ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts 2022-06-16 13:45:20 DSC0 0 5236 Control Node OPEN WORKING OK TRUE 14261853 14263514 2022-06-16 13:45:20 DSC1 1 5236 Normal Node OPEN WORKING OK TRUE 14300882 14302532 ================================================================================================================== extend node [monitor] 2022-06-16 13:45:28: 执行扩展节点动作 [monitor] 2022-06-16 13:45:31: 通知当前活动的CSS执行清理操作 [monitor] 2022-06-16 13:45:31: 清理CSS(0)请求成功 [monitor] 2022-06-16 13:45:32: 清理CSS(1)请求成功 [monitor] 2022-06-16 13:45:32: 命令EXTENT NODE 执行成功

2.12、DSC2上启动CSS服务

--在DSC2启动新的 DMCSS服务,会DMCSS会自动拉起DMASM、DMDSC服务 [dmdba@dmdw01 ~]$ cd /dm8/bin [dmdba@dmdw01 bin]$ ./dmcss dcr_ini=/dm8/dsc/config/dmdcr.ini DMCSS V8 DMCSS IS READY [2022-06-16 17:16:42:869] [CSS]: 设置EP CSS0[0]为控制节点

2.13、修改dmcssm.ini配置文件

--在节点1上修改 [dmdba@dmdsc01 ~]$ cd /dm8/bin [dmdba@dmdsc01 bin]$ vi dmcssm.ini CSSM_OGUID = 45331 CSSM_CSS_IP = 192.168.10.100:5336 CSSM_CSS_IP = 192.168.10.101:5337 CSSM_CSS_IP = 192.168.10.102:5338 CSSM_LOG_PATH = ../log CSSM_LOG_FILE_SIZE = 512 CSSM_LOG_SPACE_LIMIT = 2048

2.14、启动 dmcssm 监视器

--查看集群状态,所有节点都为OK 状态,所有dmserver 实例都处于OPEN 状态,且可以正常访问,则动态扩展节点成功 [dmdba@dmdsc01 bin]$ dmcssm dmcssm.ini [monitor] 2022-06-16 17:13:14: CSS MONITOR V8 [monitor] 2022-06-16 17:13:34: CSS MONITOR SYSTEM IS READY. [monitor] 2022-06-16 17:13:34: Wait CSS Control Node choosed... [monitor] 2022-06-16 17:13:35: Wait CSS Control Node choosed succeed. [CSS1] [2022-06-16 17:13:38:290] [CSS]: 重启本地ASM实例,命令:[/dm8/bin/dmasmsvr dcr_ini=/dm8/dsc/config/dmdcr.ini] show show monitor current time:2022-06-16 17:20:11, n_group:3 =================== group[name = GRP_CSS, seq = 0, type = CSS, Control Node = 0] ======================================== [CSS0] auto check = TRUE, global info: [ASM0] auto restart = TRUE [DSC0] auto restart = TRUE [CSS1] auto check = TRUE, global info: [ASM1] auto restart = TRUE [DSC1] auto restart = TRUE [CSS2] auto check = TRUE, global info: [ASM2] auto restart = TRUE [DSC2] auto restart = TRUE ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts 2022-06-16 17:20:10 CSS0 0 5336 Control Node OPEN WORKING OK TRUE 18144846 18145294 2022-06-16 17:20:10 CSS1 1 5337 Normal Node OPEN WORKING OK TRUE 18185244 18185672 2022-06-16 17:20:10 CSS2 2 5338 Normal Node OPEN WORKING OK TRUE 116040 116253 =================== group[name = GRP_ASM, seq = 1, type = ASM, Control Node = 0] ======================================== n_ok_ep = 3 ok_ep_arr(index, seqno): (0, 0) (1, 1) (2, 2) sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL crash process over flag is TRUE ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts 2022-06-16 17:20:10 ASM0 0 5436 Control Node OPEN WORKING OK TRUE 18158229 18158634 2022-06-16 17:20:10 ASM1 1 5437 Normal Node OPEN WORKING OK TRUE 18198309 18198695 2022-06-16 17:20:10 ASM2 2 5438 Normal Node OPEN WORKING OK TRUE 129433 129602 =================== group[name = GRP_DSC, seq = 2, type = DB, Control Node = 0] ======================================== n_ok_ep = 3 ok_ep_arr(index, seqno): (0, 0) (1, 1) (2, 2) sta = OPEN, sub_sta = STARTUP break ep = NULL recover ep = NULL crash process over flag is TRUE ep: css_time inst_name seqno port mode inst_status vtd_status is_ok active guid ts 2022-06-16 17:20:10 DSC0 0 5236 Control Node OPEN WORKING OK TRUE 18916013 18916316 2022-06-16 17:20:10 DSC1 1 5236 Normal Node OPEN WORKING OK TRUE 18951417 18951721 2022-06-16 17:20:10 DSC2 2 5236 Normal Node OPEN WORKING OK TRUE 144464 144602 ==================================================================================================================

2.15、注册服务

--创建CSS服务 [root@dmdw01 root]# ./dm_service_installer.sh -t dmcss -dcr_ini /dm8/dsc/config/dmdcr.ini -p CSS Created symlink from /etc/systemd/system/multi-user.target.wants/DmCSSServiceCSS.service to /usr/lib/systemd/system/DmCSSServiceCSS.service. 创建服务(DmCSSServiceCSS)完成 --创建ASM服务 [root@dmdw01 root]# ./dm_service_installer.sh -t dmasmsvr -dcr_ini /dm8/dsc/config/dmdcr.ini -y DmCSSServiceCSS.service -p ASM Created symlink from /etc/systemd/system/multi-user.target.wants/DmASMSvrServiceASM.service to /usr/lib/systemd/system/DmASMSvrServiceASM.service. 创建服务(DmASMSvrServiceASM)完成 --创建DMserver服务 [root@dmdw01 root]# ./dm_service_installer.sh -t dmserver -dm_ini /dm8/dsc/config/dsc2_config/dm.ini -dcr_ini /dm8/dsc/config/dmdcr.ini -y DmASMSvrServiceASM.service -p DSC Created symlink from /etc/systemd/system/multi-user.target.wants/DmServiceDSC.service to /usr/lib/systemd/system/DmServiceDSC.service. 创建服务(DmServiceDSC)完成 [root@dmdw01 root]#

三、问题:动态扩展节点DSC2启动不了

执行完扩展命令后,启动DSC2的CSS服务,Dmserver服务启动不了
在这里插入图片描述

日志信息: 2022-06-16 15:10:18.660 [FATAL] database P0000003969 T0000000000000003969 os_sema2_create_low, exist other server is running, sema_value:2, after dec:1, errno:10! 2022-06-16 15:10:18.660 [INFO] database P0000003969 T0000000000000003969 Create semaphore for path[+DMDATA/data/dsc//dev/raw/raw1] failed, it is being startup by other process! 2022-06-16 15:10:18.660 [FATAL] database P0000003969 T0000000000000003969 instance DSC2 is running. 排查思路: --按照这个思路先进行排查下 1. 扩展节点前由用户保证所有 dmcss/dmasmsvr/dmserver 节点都是OK的,且都是活动的; 2. 每次扩展节点只能扩一个节点,扩展完成后可以再继续扩展节点; 3. 扩展节点的过程中不能出现修改实例状态或模式的操作; 4. 扩展节点的过程中,如果发生 dmcss/dmasmsvr/dmserver 实例故障,会导致扩展失败 5. 扩展过程中操作失误(比如未修改 dmmal.ini、asmsvrmal.ini,未增加日志文件),会导致扩展失败; 6. 执行完 extend node 命令,用户需要查看 log 文件,确认扩展操作是否成功; 7. 扩展失败可能会导致集群环境异常,需要退出所有 dmcss/dmasmsvr/dmserver,重新 init dcr 磁盘 解决方法: --停止所有节点的DMSERVER、ASM、CSS服务 [dmdba@dmdsc01 bin]$ DmServiceDSC stop && DmASMSvrServiceASM stop && DmCSSServiceCSS stop Stopping DmServiceDSC: [ OK ] Stopping DmASMSvrServiceASM: [ OK ] Stopping DmCSSServiceCSS: [ OK ] [dmdba@dmdsc01 bin]$ --清理err_ep_arr信息 [dmdba@dmdsc01 bin]$ dmasmcmd DMASMCMD V8 ASM>clear dcrdisk err_ep_arr '/dev/raw/raw1' 'GRP_DSC' Used time: 00:00:14.530. ASM> --重启所有节点CSS服务 [dmdba@dmdsc01 bin]$ DmCSSServiceCSS start Starting DmCSSServiceCSS: [ OK ]

社区地址:https://eco.dameng.com

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论