暂无图片
暂无图片
2
暂无图片
暂无图片
暂无图片

openEuler安装openGauss3.0.0 一主一从一级联备(各种踩雷)

原创 木底木叉 云和恩墨 2022-05-24
3934

openEuler安装openGauss3.0.0 一主一从一级联备

一、环境配置

1、hosts
172.16.220.141 op01
172.16.220.142 op02
172.16.220.143 op03

可以不做,preinstall会自动添加

2、防火墙
systemctl stop firewalld.service
systemctl disable firewalld.service
3、selinux
sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
setenforce 0
4、字符集
cat>> /etc/profile<<EOF
export LANG=en_US.UTF-8
EOF
5、时区
rm -fr /etc/localtime
ln -s /usr/share/zoneinfo/Asia/Shanghai  /etc/localtime
date -R
hwclock
6、交换内存
swapoff -a

cp /etc/fstab /etc/fstab.bak
sed -i '/swap/s/^/#/' /etc/fstab
cat /etc/fstab|grep -v ^#|grep -v '^$'
7、MTU
ifconfig ens33 mtu 1500
8、RemoveIPC
sed -i '/^RemoveIPC/d' /etc/systemd/logind.conf
sed -i '/^RemoveIPC/d' /usr/lib/systemd/system/systemd-logind.service
echo "RemoveIPC=no"  >> /etc/systemd/logind.conf
echo "RemoveIPC=no"  >> /usr/lib/systemd/system/systemd-logind.service
systemctl daemon-reload
systemctl restart systemd-logind
9、root远程登录
sed -i '/Banner/s/^/#/' /etc/ssh/sshd_config
sed -i '/PermitRootLogin/s/^/#/' /etc/ssh/sshd_config
echo -e "\n" >> /etc/ssh/sshd_config
echo "Banner none " >> /etc/ssh/sshd_config
echo "PermitRootLogin yes" >> /etc/ssh/sshd_config
10、内核参数
cat >> /etc/sysctl.conf << EOF
net.ipv4.tcp_max_tw_buckets = 10000
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_keepalive_time = 30
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_retries1 = 5
net.ipv4.tcp_syn_retries = 5
net.ipv4.tcp_synack_retries = 5
net.sctp.path_max_retrans = 10
net.sctp.max_init_retransmits = 10
net.sctp.association_max_retrans = 10
net.sctp.hb_interval = 30000
net.ipv4.tcp_retries2 = 12
vm.overcommit_memory = 0
net.sctp.sndbuf_policy = 0
net.sctp.rcvbuf_policy = 0
net.sctp.sctp_mem = 94500000 915000000 927000000
net.sctp.sctp_rmem = 8192 250000 16777216
net.sctp.sctp_wmem = 8192 250000 16777216
net.ipv4.tcp_rmem = 8192 250000 16777216
net.ipv4.tcp_wmem = 8192 250000 16777216
net.core.wmem_max = 21299200
net.core.rmem_max = 21299200
net.core.wmem_default = 21299200
net.core.rmem_default = 21299200
net.ipv4.ip_local_port_range = 26000 65535
kernel.sem = 250 6400000 1000 25600
vm.min_free_kbytes = 419430 ##suggest to set as physical memory * 5%
net.core.somaxconn = 65535
net.ipv4.tcp_syncookies = 1
net.sctp.addip_enable = 0
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fin_timeout = 60
kernel.shmall = 1152921504606846720
kernel.shmmax = 18446744073709551615
net.ipv4.tcp_sack = 1
net.ipv4.tcp_timestamps = 1
vm.extfrag_threshold = 500
vm.overcommit_ratio = 90
net.ipv4.ip_local_reserved_ports = 20050-20057,26000-26007
net.sctp.sctp_mem = 94500000 915000000 927000000
net.sctp.sctp_rmem = 8192 250000 16777216
net.sctp.sctp_wmem = 8192 250000 16777216

EOF

sysctl -p

这玩意还是别配,交给preinstall吧

11、资源限制
echo "* soft stack 3072" >> /etc/security/limits.conf
echo "* hard stack 3072" >> /etc/security/limits.conf
echo "* soft nofile 1000000" >> /etc/security/limits.conf
echo "* hard nofile 1000000" >> /etc/security/limits.conf
echo "* soft nproc unlimited" >> /etc/security/limits.d/90-nproc.conf
tail -n 4 /etc/security/limits.conf
tail -n 1 /etc/security/limits.d/90-nproc.conf
12、透明页

此处使用openoular,不管了,玩过oracle都知道怎么搞,具体方法不同操作系统不同,看官方文档

13、root互信
ssh-keygen -t rsa --三台主机都执行

1、2主机均发送至3
scp .ssh/id_rsa.pub  op03:~/.ssh/pub-1
scp .ssh/id_rsa.pub  op03:~/.ssh/pub-2

3号主机整合
cat id_rsa.pub >> authorized_keys
cat pub-1 >> authorized_keys
cat pub-2 >> authorized_keys

3号主机发送至1、2
scp authorized_keys op01:~/.ssh/
scp authorized_keys op02:~/.ssh/

验证
ssh op02 date
ssh op03 date

preinstall会把这个互信都删掉,他提示会配置root互信,但是没有成功,配不配置没啥影响。反正都会被干掉。

14、本地yum源
mount /dev/cdrom /mnt
cd /etc/yum.repos.d
mkdir bk
mv *.repo bk/
echo "[EL]" >> /etc/yum.repos.d/openEuler.repo
echo "name =Linux 7.x DVD" >> /etc/yum.repos.d/openEuler.repo
echo "baseurl=file:///mnt" >> /etc/yum.repos.d/openEuler.repo
echo "gpgcheck=0" >> /etc/yum.repos.d/openEuler.repo
echo "enabled=1" >> /etc/yum.repos.d/openEuler.repo
cat /etc/yum.repos.d/openEuler.repo

mount /dev/cdrom /mnt
15、安装包
mkdir /soft
chmod 777 -R /soft/

tar -zxvf openGauss-3.0.0-openEuler-64bit-all.tar.gz 
tar -zxvf openGauss-3.0.0-openEuler-64bit-om.tar.gz

下载请确认所需的操作系统、操作系统对应的安装包。操作系统不要选择SP3,神坑

16、配置文件
<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
    <!-- openGauss整体信息 -->
    <CLUSTER>
        <PARAM name="clusterName" value="gscluster" />
        <PARAM name="nodeNames" value="op01,op02,op03" />
        <PARAM name="gaussdbAppPath" value="/gauss/app" />
        <PARAM name="gaussdbLogPath" value="/gauss/log" />
        <PARAM name="tmpMppdbPath" value="/gauss/tmp"/>
        <PARAM name="gaussdbToolPath" value="/gauss/om" />
        <PARAM name="corePath" value="/gauss/corefile"/>
        <PARAM name="backIp1s" value="172.16.220.141,172.16.220.142,172.16.220.143"/>
    </CLUSTER>

    <!-- 每台服务器上的节点部署信息 -->
    <DEVICELIST>
        <!-- node1上的节点部署信息 -->
        <DEVICE sn="op01">
            <PARAM name="name" value="op01"/>
            <PARAM name="azName" value="AZ1"/>
            <PARAM name="azPriority" value="1"/>
            <!-- 如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP -->
            <PARAM name="backIp1" value="172.16.220.141"/>
            <PARAM name="sshIp1" value="172.16.220.141"/>
            <!--dn-->
            <PARAM name="dataNum" value="1"/>
                <PARAM name="dataPortBase" value="26000"/>
                <PARAM name="dataNode1" value="/gauss/data/db1,op02,/gauss/data/db1,op03,/gauss/data/db1"/>
            <PARAM name="dataNode1_syncNum" value="0"/>
        </DEVICE>

        <!-- node2上的节点部署信息,其中“name”的值配置为主机名称 -->
        <DEVICE sn="op02">
            <PARAM name="name" value="op02"/>
            <PARAM name="azName" value="AZ1"/>
            <PARAM name="azPriority" value="1"/>
            <!-- 如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP -->
            <PARAM name="backIp1" value="172.16.220.142"/>
            <PARAM name="sshIp1" value="172.16.220.142"/>
        </DEVICE>

        <!-- node3上的节点部署信息,其中“name”的值配置为主机名称 -->
        <DEVICE sn="op03">
            <PARAM name="name" value="op03"/>
            <PARAM name="azName" value="AZ1"/>
            <PARAM name="azPriority" value="1"/>
            <!-- 如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP -->
            <PARAM name="backIp1" value="172.16.220.143"/>
            <PARAM name="sshIp1" value="172.16.220.143"/>
            <PARAM name="cascadeRole" value="on"/>
        </DEVICE>
    </DEVICELIST>
</ROOT>

二、安装

1、预安装
cd /opt/software/openGauss/script 
chmod -R 775 /soft/script 
chown -R omm:dbgroup /soft/script 
[root@op01 script]# ./gs_preinstall -U omm -G dbgroup -X my_cluster.xml 
Parsing the configuration file.
Successfully parsed the configuration file.
Installing the tools on the local node.
Successfully installed the tools on the local node.
Are you sure you want to create trust for root (yes/no)?yes
Please enter password for root
Password: 
Successfully created SSH trust for the root permission user.
Setting host ip env
Successfully set host ip env.
Distributing package.
Begin to distribute package to tool path.
Successfully distribute package to tool path.
Begin to distribute package to package path.
Successfully distribute package to package path.
Successfully distributed package.
Are you sure you want to create the user[omm] and create trust for it (yes/no)? yes
Please enter password for cluster user.
Password: 
Please enter password for cluster user again.
Password: 
Generate cluster user password files successfully.

Successfully created [omm] user on all nodes.
Preparing SSH service.
Successfully prepared SSH service.
Installing the tools in the cluster.
Successfully installed the tools in the cluster.
Checking hostname mapping.
Successfully checked hostname mapping.
Creating SSH trust for [omm] user.
Please enter password for current user[omm].
Password: 
Checking network information.
All nodes in the network are Normal.
Successfully checked network information.
Creating SSH trust.
Creating the local key file.
Successfully created the local key files.
Appending local ID to authorized_keys.
Successfully appended local ID to authorized_keys.
Updating the known_hosts file.
Successfully updated the known_hosts file.
Appending authorized_key on the remote node.
Successfully appended authorized_key on all remote node.
Checking common authentication file content.
Successfully checked common authentication content.
Distributing SSH trust file to all node.
Distributing trust keys file to all node successfully.
Successfully distributed SSH trust file to all node.
Verifying SSH trust on all hosts.
Successfully verified SSH trust on all hosts.
Successfully created SSH trust.
Successfully created SSH trust for [omm] user.
Checking OS software.
Successfully check os software.
Checking OS version.
Successfully checked OS version.
Creating cluster's path.
Successfully created cluster's path.
Set and check OS parameter.
Setting OS parameters.
Successfully set OS parameters.
Warning: Installation environment contains some warning messages.
Please get more details by "/soft/script/gs_checkos -i A -h op01,op02,op03 --detail".
Set and check OS parameter completed.
Preparing CRON service.
Successfully prepared CRON service.
Setting user environmental variables.
Successfully set user environmental variables.
Setting the dynamic link library.
Successfully set the dynamic link library.
Setting Core file
Successfully set core path.
Setting pssh path
Successfully set pssh path.
Setting Cgroup.
Successfully set Cgroup.
Set ARM Optimization.
No need to set ARM Optimization.
Fixing server package owner.
Setting finish flag.
Successfully set finish flag.
Preinstallation succeeded.
2、os检查
./gs_checkos -i A -h op01,op02,op03 --detail 

这玩意弹出的提示让输入几个节点root密码或者yes、no,挺乱,反正我没搞明白,忽略

3、安装
[omm@op01 script]$ ./gs_install -X my_cluster.xml 
Parsing the configuration file.
Check preinstall on every node.
Successfully checked preinstall on every node.
Creating the backup directory.
Last time end with Start cluster.
Continue this step.
Successfully created the backup directory.
begin deploy..
[SUCCESS] op01:
Using omm:dbgroup to install database.
Using installation program path : /gauss/app_02c14696
$GAUSSHOME points to /gauss/app_02c14696, no need to create symbolic link.
[2022-05-24 08:29:27.624][59583][][gs_ctl]: gs_ctl started,datadir is /gauss/data/db1 
[2022-05-24 08:29:27.961][59583][][gs_ctl]: waiting for server to start...
.0 LOG:  [Alarm Module]can not read GAUSS_WARNING_TYPE env.
0 LOG:  [Alarm Module]Host Name: op01 
0 LOG:  [Alarm Module]Host IP: 172.16.220.141                                             
0 LOG:  [Alarm Module]Cluster Name: gscluster 
0 LOG:  [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 57

0 WARNING:  failed to open feature control file, please check whether it exists: FileName=gaussdb.version, Errno=2, Errmessage=No such file or directory.
0 WARNING:  failed to parse feature control file: gaussdb.version.
0 WARNING:  Failed to load the product control file, so gaussdb cannot distinguish product version.
0 LOG:  bbox_dump_path is set to /gauss/corefile/
2022-05-24 08:29:28.140 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 DB010  0 [REDO] LOG:  Recovery parallelism, cpu count = 2, max = 4, actual = 2
2022-05-24 08:29:28.140 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 DB010  0 [REDO] LOG:  ConfigRecoveryParallelism, true_max_recovery_parallelism:4, max_recovery_parallelism:4
gaussdb.state does not exist, and skipt setting since it is optional.2022-05-24 08:29:28.145 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  [Alarm Module]can not read GAUSS_WARNING_TYPE env.

2022-05-24 08:29:28.145 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  [Alarm Module]Host Name: op01 

2022-05-24 08:29:28.146 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  [Alarm Module]Host IP: 172.16.220.141 

2022-05-24 08:29:28.146 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  [Alarm Module]Cluster Name: gscluster 

2022-05-24 08:29:28.146 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 57

2022-05-24 08:29:28.153 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  loaded library "security_plugin"
2022-05-24 08:29:28.155 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  could not create any HA TCP/IP sockets
2022-05-24 08:29:28.157 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  InitNuma numaNodeNum: 1 numa_distribute_mode: none inheritThreadPool: 0.
2022-05-24 08:29:28.158 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (3300 Mbytes) is larger.
2022-05-24 08:29:28.783 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 00000  0 [CACHE] LOG:  set data cache  size(805306368)
2022-05-24 08:29:29.141 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 00000  0 [CACHE] LOG:  set metadata cache  size(268435456)
2022-05-24 08:29:33.682 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 00000  0 [SEGMENT_PAGE] LOG:  Segment-page constants: DF_MAP_SIZE: 8156, DF_MAP_BIT_CNT: 65248, DF_MAP_GROUP_EXTENTS: 4175872, IPBLOCK_SIZE: 8168, EXTENTS_PER_IPBLOCK: 1021, IPBLOCK_GROUP_SIZE: 4090, BMT_HEADER_LEVEL0_TOTAL_PAGES: 8323072, BktMapEntryNumberPerBlock: 2038, BktMapBlockNumber: 25, BktBitMaxMapCnt: 512
.2022-05-24 08:29:34.517 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  gaussdb: fsync file "/gauss/data/db1/gaussdb.state.temp" success
2022-05-24 08:29:34.517 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  create gaussdb state file success: db state(STARTING_STATE), server mode(Primary), connection index(1)
2022-05-24 08:29:34.549 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  max_safe_fds = 974, usable_fds = 1000, already_open = 16
bbox_dump_path is set to /gauss/corefile/
2022-05-24 08:29:34.589 628c26e8.1 [unknown] 140196187737152 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  user configure file is not found, it will be created.
.
[2022-05-24 08:29:36.485][59583][][gs_ctl]:  done
[2022-05-24 08:29:36.485][59583][][gs_ctl]: server started (/gauss/data/db1)
[FAILURE] op02:
[GAUSS-50201] : The /soft/script/my_cluster.xml does not exist.
[FAILURE] op03:
[GAUSS-50201] : The /soft/script/my_cluster.xml does not exist.

preinstall会把gs_install.sh文件权限搞没,不知道什么鬼,重新给个777吧。

[Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 57

看日志说内存不足,虚拟机,4G内存,当然不够,改成8G试试

三、启动

1、修改内存参数

安装报内存不足,修改下内存参数吧,三个节点

vi postgresql.conf
max_process_memory = 3GB
max_connections = 1000
shared_buffers = 256kB
2、启动节点
[omm@op01 ~]$ gs_om -t start
Starting cluster.
=========================================
Enter passphrase for key '/home/omm/.ssh/id_rsa': 
omm@op02's password: 
Enter passphrase for key '/home/omm/.ssh/id_rsa': 
omm@op03's password: 
[SUCCESS] op01
2022-05-24 08:56:55.003 628c2d56.1 [unknown] 139750693022784 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  could not create any HA TCP/IP sockets
2022-05-24 08:56:55.004 628c2d56.1 [unknown] 139750693022784 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (1478 Mbytes) is larger.
=========================================
[GAUSS-53600]: Can not start the database, the cmd is source /home/omm/.bashrc; python3 '/gauss/om/script/local/StartInstance.py' -U omm -R /gauss/app -t 300 --security-mode=off,  Error:
[GAUSS-51400] : Failed to execute the command: source /home/omm/.bashrc; python3 '/gauss/om/script/local/StartInstance.py' -U omm -R /gauss/app -t 300 --security-mode=off. Error:
[FAILURE] op02:
.[GAUSS-51400] : Failed to execute the command: source /home/omm/.bashrc; python3 '/gauss/om/script/local/StartInstance.py' -U omm -R /gauss/app -t 300 --security-mode=off. Error:
[FAILURE] op03:

互信又不灵了,不理他

3、检查状态
[omm@op01 ~]$ gs_om -t status --detail
[   Cluster State   ]

cluster_state   : Degraded
redistributing  : No
current_az      : AZ_ALL

[  Datanode State   ]

    nodenode_ip         port      instance                state
-------------------------------------------------------------------------------
1  op01 172.16.220.141  26000      6001 /gauss/data/db1   P Primary Normal
2  op02 172.16.220.142  26000      6002 /gauss/data/db1   S Down    Manually stopped
3  op03 172.16.220.143  26000      6003 /gauss/data/db1   C Down    Manually stopped
[omm@op01 ~]$ gsql -d postgres -p 26000
gsql ((openGauss 3.0.0 build 02c14696) compiled at 2022-04-01 18:12:19 commit 0 last mr  )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

openGauss=# select * from dbe_perf.replication_stat;
 pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | state | sender_sent_location | receiver_write_location | receiver_flush_location | receiver_replay_location | sync_priority | s
ync_state 
-----+----------+---------+------------------+-------------+-----------------+-------------+---------------+-------+----------------------+-------------------------+-------------------------+--------------------------+---------------+--
----------
(0 rows)

openGauss=# exit
openGauss-# ^[[A\
Invalid command \. Try \? for help.

就起来主节点,其它的手动起来试试吧

4、手动启动2节点
[omm@op02 ~]$ gs_om -t start
Starting cluster.
=========================================
[SUCCESS] op01:
[2022-05-24 09:10:09.927][26665][][gs_ctl]: gs_ctl started,datadir is /gauss/data/db1 
[2022-05-24 09:10:09.932][26665][][gs_ctl]:  another server might be running; Please use the restart command

[SUCCESS] op02
2022-05-24 09:10:11.338 628c3073.1 [unknown] 139686372580416 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  could not create any HA TCP/IP sockets
2022-05-24 09:10:11.340 628c3073.1 [unknown] 139686372580416 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (777 Mbytes) is larger.
[SUCCESS] op03:
[2022-05-24 09:10:14.414][22674][][gs_ctl]: gs_ctl started,datadir is /gauss/data/db1 
[2022-05-24 09:10:14.418][22674][][gs_ctl]:  another server might be running; Please use the restart command

Waiting for check cluster state...
Waiting for check cluster state...
Waiting for check cluster state...
Waiting for check cluster state...
[GAUSS-51607] : Failed to start cluster. After startup, the last check results were Degraded. Please check manually.
[omm@op02 ~]$ gs_om -t start                    
Starting cluster.
=========================================
[SUCCESS] op01:
[2022-05-24 09:11:04.166][28199][][gs_ctl]: gs_ctl started,datadir is /gauss/data/db1 
[2022-05-24 09:11:04.170][28199][][gs_ctl]:  another server might be running; Please use the restart command

[SUCCESS] op02:
[2022-05-24 09:11:05.495][29738][][gs_ctl]: gs_ctl started,datadir is /gauss/data/db1 
[2022-05-24 09:11:05.499][29738][][gs_ctl]:  another server might be running; Please use the restart command
[SUCCESS] op03:
[2022-05-24 09:11:07.538][24166][][gs_ctl]: gs_ctl started,datadir is /gauss/data/db1 
[2022-05-24 09:11:07.541][24166][][gs_ctl]:  another server might be running; Please use the restart command

Waiting for check cluster state...
Waiting for check cluster state...
Waiting for check cluster state...
^CTraceback (most recent call last):
  File "/gauss/om/script/gs_om", line 830, in <module>
    main()
  File "/gauss/om/script/gs_om", line 799, in main
    impl.doStart()
  File "/gauss/om/script/impl/om/OmImpl.py", line 88, in doStart
    self.doStartCluster()
  File "/gauss/om/script/impl/om/OLAP/OmImplOLAP.py", line 248, in doStartCluster
    time.sleep(5)
KeyboardInterrupt
5、手动启动3节点
omm@op03 ~]$ gs_om -t start
Starting cluster.
=========================================
[SUCCESS] op01:
[2022-05-24 08:37:36.598][65024][][gs_ctl]: gs_ctl started,datadir is /gauss/data/db1 
[2022-05-24 08:37:36.603][65024][][gs_ctl]:  another server might be running; Please use the restart command

[SUCCESS] op02:
[2022-05-24 08:37:38.377][42759][][gs_ctl]: gs_ctl started,datadir is /gauss/data/db1 
[2022-05-24 08:37:38.381][42759][][gs_ctl]:  another server might be running; Please use the restart command

[SUCCESS] op03:
[2022-05-24 08:37:39.204][35582][][gs_ctl]: gs_ctl started,datadir is /gauss/data/db1 
[2022-05-24 08:37:39.209][35582][][gs_ctl]:  another server might be running; Please use the restart command
Waiting for check cluster state...
Waiting for check cluster state...
Waiting for check cluster state...
Waiting for check cluster state...
[GAUSS-51607] : Failed to start cluster. After startup, the last check results were Degraded. Please check manually.
6、查看集群状态
[omm@op01 db1]$ gs_om -t status --detail
[   Cluster State   ]

cluster_state   : Degraded
redistributing  : No
current_az      : AZ_ALL

[  Datanode State   ]

    nodenode_ip         port      instance                state
-------------------------------------------------------------------------------
1  op01 172.16.220.141  26000      6001 /gauss/data/db1   P Primary Normal
2  op02 172.16.220.142  26000      6002 /gauss/data/db1   S Standby Need repair(System)
3  op03 172.16.220.143  26000      6003 /gauss/data/db1   C Cascade Need repair(System)

状态不对,那就修理一下吧

7、重建一下试试

2、3节点都执行

[omm@op03 ~]$  gs_ctl build -D /gauss/data/db1/ -b full
[2022-05-24 09:20:05.484][28612][][gs_ctl]: gs_ctl full build ,datadir is /gauss/data/db1
waiting for server to shut down.... done
server stopped
[2022-05-24 09:20:06.496][28612][][gs_ctl]: current workdir is (/home/omm).
[2022-05-24 09:20:06.496][28612][][gs_ctl]: fopen build pid file "/gauss/data/db1/gs_build.pid" success
[2022-05-24 09:20:06.496][28612][][gs_ctl]: fprintf build pid file "/gauss/data/db1/gs_build.pid" success
[2022-05-24 09:20:06.498][28612][][gs_ctl]: fsync build pid file "/gauss/data/db1/gs_build.pid" success
[2022-05-24 09:20:06.500][28612][][gs_ctl]: set gaussdb state file when full build build:db state(BUILDING_STATE), server mode(STANDBY_MODE), build mode(FULL_BUILD).
[2022-05-24 09:20:06.520][28612][dn_6001_6002_6003][gs_ctl]: build try host(172.16.220.141) port(26001) success
[2022-05-24 09:20:06.520][28612][dn_6001_6002_6003][gs_ctl]: connect to server success, build started.
[2022-05-24 09:20:06.520][28612][dn_6001_6002_6003][gs_ctl]: create build tag file success
[2022-05-24 09:20:06.583][28612][dn_6001_6002_6003][gs_ctl]: clear old target dir success
[2022-05-24 09:20:06.583][28612][dn_6001_6002_6003][gs_ctl]: create build tag file again success
[2022-05-24 09:20:06.584][28612][dn_6001_6002_6003][gs_ctl]: get system identifier success
[2022-05-24 09:20:06.584][28612][dn_6001_6002_6003][gs_ctl]: receiving and unpacking files...
[2022-05-24 09:20:06.585][28612][dn_6001_6002_6003][gs_ctl]: create backup label success
[2022-05-24 09:20:06.671][28612][dn_6001_6002_6003][gs_ctl]: xlog start point: 0/23F3AE8
[2022-05-24 09:20:06.671][28612][dn_6001_6002_6003][gs_ctl]: begin build tablespace list
[2022-05-24 09:20:06.671][28612][dn_6001_6002_6003][gs_ctl]: finish build tablespace list
[2022-05-24 09:20:06.671][28612][dn_6001_6002_6003][gs_ctl]: begin get xlog by xlogstream
[2022-05-24 09:20:06.671][28612][dn_6001_6002_6003][gs_ctl]: starting background WAL receiver
[2022-05-24 09:20:06.672][28612][dn_6001_6002_6003][gs_ctl]: starting walreceiver
[2022-05-24 09:20:06.672][28612][dn_6001_6002_6003][gs_ctl]: begin receive tar files
[2022-05-24 09:20:06.672][28612][dn_6001_6002_6003][gs_ctl]: receiving and unpacking files...
[2022-05-24 09:20:06.686][28612][dn_6001_6002_6003][gs_ctl]: build try host(172.16.220.141) port(26001) success
[2022-05-24 09:20:06.688][28612][dn_6001_6002_6003][gs_ctl]: check identify system success
[2022-05-24 09:20:06.689][28612][dn_6001_6002_6003][gs_ctl]: send START_REPLICATION 0/2000000 success
[2022-05-24 09:20:07.463][28612][dn_6001_6002_6003][gs_ctl]: finish receive tar files
[2022-05-24 09:20:07.463][28612][dn_6001_6002_6003][gs_ctl]: xlog end point: 0/4000058
[2022-05-24 09:20:07.463][28612][dn_6001_6002_6003][gs_ctl]: fetching MOT checkpoint
[2022-05-24 09:20:07.466][28612][dn_6001_6002_6003][gs_ctl]: waiting for background process to finish streaming...
[2022-05-24 09:20:12.033][28612][dn_6001_6002_6003][gs_ctl]: starting fsync all files come from source.
[2022-05-24 09:20:12.600][28612][dn_6001_6002_6003][gs_ctl]: finish fsync all files.
[2022-05-24 09:20:12.601][28612][dn_6001_6002_6003][gs_ctl]: build dummy dw file success
[2022-05-24 09:20:12.601][28612][dn_6001_6002_6003][gs_ctl]: rename build status file success
[2022-05-24 09:20:12.604][28612][dn_6001_6002_6003][gs_ctl]: full build build completed(/gauss/data/db1).
[2022-05-24 09:20:12.642][28612][dn_6001_6002_6003][gs_ctl]: waiting for server to start...
.0 LOG:  [Alarm Module]can not read GAUSS_WARNING_TYPE env.

0 LOG:  [Alarm Module]Host Name: op03 

0 LOG:  [Alarm Module]Host IP: 172.16.220.143 

0 LOG:  [Alarm Module]Cluster Name: gscluster 

0 LOG:  [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 57

0 WARNING:  failed to open feature control file, please check whether it exists: FileName=gaussdb.version, Errno=2, Errmessage=No such file or directory.
0 WARNING:  failed to parse feature control file: gaussdb.version.
0 WARNING:  Failed to load the product control file, so gaussdb cannot distinguish product version.
The core dump path is an invalid directory
2022-05-24 09:20:12.717 628c32cc.1 [unknown] 140271149931584 [unknown] 0 dn_6001_6002_6003 DB010  0 [REDO] LOG:  Recovery parallelism, cpu count = 2, max = 4, actual = 2
2022-05-24 09:20:12.717 628c32cc.1 [unknown] 140271149931584 [unknown] 0 dn_6001_6002_6003 DB010  0 [REDO] LOG:  ConfigRecoveryParallelism, true_max_recovery_parallelism:4, max_recovery_parallelism:4
2022-05-24 09:20:12.724 628c32cc.1 [unknown] 140271149931584 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  [Alarm Module]can not read GAUSS_WARNING_TYPE env.

2022-05-24 09:20:12.724 628c32cc.1 [unknown] 140271149931584 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  [Alarm Module]Host Name: op03 

2022-05-24 09:20:12.724 628c32cc.1 [unknown] 140271149931584 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  [Alarm Module]Host IP: 172.16.220.143 

2022-05-24 09:20:12.724 628c32cc.1 [unknown] 140271149931584 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  [Alarm Module]Cluster Name: gscluster 

2022-05-24 09:20:12.724 628c32cc.1 [unknown] 140271149931584 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 57

2022-05-24 09:20:12.727 628c32cc.1 [unknown] 140271149931584 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  loaded library "security_plugin"
2022-05-24 09:20:12.728 628c32cc.1 [unknown] 140271149931584 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  could not create any HA TCP/IP sockets
2022-05-24 09:20:12.730 628c32cc.1 [unknown] 140271149931584 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  InitNuma numaNodeNum: 1 numa_distribute_mode: none inheritThreadPool: 0.
2022-05-24 09:20:12.731 628c32cc.1 [unknown] 140271149931584 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (777 Mbytes) is larger.
2022-05-24 09:20:12.780 628c32cc.1 [unknown] 140271149931584 [unknown] 0 dn_6001_6002_6003 00000  0 [CACHE] LOG:  set data cache  size(805306368)
2022-05-24 09:20:12.819 628c32cc.1 [unknown] 140271149931584 [unknown] 0 dn_6001_6002_6003 00000  0 [CACHE] LOG:  set metadata cache  size(268435456)
2022-05-24 09:20:12.943 628c32cc.1 [unknown] 140271149931584 [unknown] 0 dn_6001_6002_6003 00000  0 [SEGMENT_PAGE] LOG:  Segment-page constants: DF_MAP_SIZE: 8156, DF_MAP_BIT_CNT: 65248, DF_MAP_GROUP_EXTENTS: 4175872, IPBLOCK_SIZE: 8168, EXTENTS_PER_IPBLOCK: 1021, IPBLOCK_GROUP_SIZE: 4090, BMT_HEADER_LEVEL0_TOTAL_PAGES: 8323072, BktMapEntryNumberPerBlock: 2038, BktMapBlockNumber: 25, BktBitMaxMapCnt: 512
2022-05-24 09:20:12.987 628c32cc.1 [unknown] 140271149931584 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  gaussdb: fsync file "/gauss/data/db1/gaussdb.state.temp" success
2022-05-24 09:20:12.988 628c32cc.1 [unknown] 140271149931584 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  create gaussdb state file success: db state(STARTING_STATE), server mode(Standby), connection index(1)
2022-05-24 09:20:13.022 628c32cc.1 [unknown] 140271149931584 [unknown] 0 dn_6001_6002_6003 00000  0 [BACKEND] LOG:  max_safe_fds = 972, usable_fds = 1000, already_open = 18
The core dump path is an invalid directory

[2022-05-24 09:20:13.666][28612][dn_6001_6002_6003][gs_ctl]:  done
[2022-05-24 09:20:13.666][28612][dn_6001_6002_6003][gs_ctl]: server started (/gauss/data/db1)
[2022-05-24 09:20:13.666][28612][dn_6001_6002_6003][gs_ctl]: fopen build pid file "/gauss/data/db1/gs_build.pid" success
[2022-05-24 09:20:13.666][28612][dn_6001_6002_6003][gs_ctl]: fprintf build pid file "/gauss/data/db1/gs_build.pid" success
[2022-05-24 09:20:13.667][28612][dn_6001_6002_6003][gs_ctl]: fsync build pid file "/gauss/data/db1/gs_build.pid" success
8、查看状态
[omm@op03 ~]$ gs_om -t status --detail
[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
current_az      : AZ_ALL

[  Datanode State   ]

    nodenode_ip         port      instance                state
-------------------------------------------------------------------------------
1  op01 172.16.220.141  26000      6001 /gauss/data/db1   P Primary Normal
2  op02 172.16.220.142  26000      6002 /gauss/data/db1   S Standby Normal
3  op03 172.16.220.143  26000      6003 /gauss/data/db1   C Standby Normal


[omm@op01 db1]$ gsql -d postgres -p 26000
gsql ((openGauss 3.0.0 build 02c14696) compiled at 2022-04-01 18:12:19 commit 0 last mr  )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

openGauss=# select * from dbe_perf.replication_stat;
       pid       | usesysid | usename |       application_name        |  client_addr   | client_hostname | client_port |         backend_start         |   state   | sender_sent_location | receiver_write_location | receiver_flush_locatio
n | receiver_replay_location | sync_priority | sync_state 
-----------------+----------+---------+-------------------------------+----------------+-----------------+-------------+-------------------------------+-----------+----------------------+-------------------------+-----------------------
--+--------------------------+---------------+------------
 139693219632896 |       10 | omm     | WalSender to Standby[dn_6003] | 172.16.220.143 |                 |       28752 | 2022-05-24 09:20:13.295994+08 | Streaming | 0/5000DA8            | 0/5000DA8               | 0/5000DA8             
  | 0/5000DA8                |             0 | Async
 139693066417920 |       10 | omm     | WalSender to Standby[dn_6002] | 172.16.220.142 |                 |       27824 | 2022-05-24 09:19:34.80319+08  | Streaming | 0/5000DA8            | 0/5000DA8               | 0/5000DA8             
  | 0/5000DA8                |             0 | Async
(2 rows)
9、重启试试
[omm@op03 db1]$ gs_om -t restart
Stopping cluster.
=========================================
Successfully stopped cluster.
=========================================
End stop cluster.
Starting cluster.
=========================================
[SUCCESS] op01
2022-05-24 10:20:45.110 628c40fd.1 [unknown] 140049071512640 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  could not create any HA TCP/IP sockets
2022-05-24 10:20:45.111 628c40fd.1 [unknown] 140049071512640 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (777 Mbytes) is larger.
[SUCCESS] op02
2022-05-24 10:20:48.129 628c4100.1 [unknown] 140500375858240 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  could not create any HA TCP/IP sockets
2022-05-24 10:20:48.130 628c4100.1 [unknown] 140500375858240 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (777 Mbytes) is larger.
[SUCCESS] op03
2022-05-24 10:20:51.132 628c4103.1 [unknown] 140383888440384 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  could not create any HA TCP/IP sockets
2022-05-24 10:20:51.134 628c4103.1 [unknown] 140383888440384 [unknown] 0 dn_6001_6002_6003 01000  0 [BACKEND] WARNING:  Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (777 Mbytes) is larger.
=========================================
Successfully started.
[omm@op03 db1]$ gs_om -t status --detail
[   Cluster State   ]

cluster_state   : Normal
redistributing  : No
current_az      : AZ_ALL

[  Datanode State   ]

    nodenode_ip         port      instance                state
-------------------------------------------------------------------------------
1  op01 172.16.220.141  26000      6001 /gauss/data/db1   P Primary Normal
2  op02 172.16.220.142  26000      6002 /gauss/data/db1   S Standby Normal
3  op03 172.16.220.143  26000      6003 /gauss/data/db1   C Cascade Normal

看来参数还是有问题

四、以上报错的解决办法

以上报错其实都是内存不足导致,所以安装的时候限制一下内存相关参数即可

./gs_install -X my_cluster.xml --gsinit-parameter="--encoding=UTF8" --dn-guc="max_connections=1000" --dn-guc="max_process_memory=2GB" --dn-guc="shared_buffers=128MB" --dn-guc="bulk_write_ring_size=128MB" --dn-guc="cstore_buffers=16MB"
最后修改时间:2022-05-24 22:20:34
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论