暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

openGauss启动失败原因与解决方案

原创 Nightingale 2023-01-29
4910

启动失败过程

[omm@dba data]$ gs_om -t start Starting cluster. ========================================= ========================================= [GAUSS-53600]: Can not start the database, the cmd is source /home/omm/.bashrc; python3 '/opt/software/opengauss/om/script/local/StartInstance.py' -U omm -R /opt/software/opengauss/install/app -t 300 --security-mode=off, Error: [FAILURE] dba: [GAUSS-51607] : Failed to start instance. Error: Please check the gs_ctl log for failure details. [2023-01-29 11:58:04.941][3089][][gs_ctl]: gs_ctl started,datadir is /opt/software/opengauss/install/data [2023-01-29 11:58:04.991][3089][][gs_ctl]: waiting for server to start... .0 LOG: [Alarm Module]can not read GAUSS_WARNING_TYPE env. 0 LOG: [Alarm Module]Host Name: dba 0 LOG: [Alarm Module]Host IP: dba. Copy hostname directly in case of taking 10s to use 'gethostbyname' when /etc/hosts does not contain <HOST IP> 0 LOG: [Alarm Module]Cluster Name: dbCluster 0 LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 57 0 WARNING: failed to open feature control file, please check whether it exists: FileName=gaussdb.version, Errno=2, Errmessage=No such file or directory. 0 WARNING: failed to parse feature control file: gaussdb.version. 0 WARNING: Failed to load the product control file, so gaussdb cannot distinguish product version. 2023-01-29 11:58:05.080 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 DB010 0 [REDO] LOG: Recovery parallelism, cpu count = 4, max = 4, actual = 4 2023-01-29 11:58:05.080 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 DB010 0 [REDO] LOG: ConfigRecoveryParallelism, true_max_recovery_parallelism:4, max_recovery_parallelism:4 2023-01-29 11:58:05.085 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 00000 0 [BACKEND] LOG: [Alarm Module]can not read GAUSS_WARNING_TYPE env. 2023-01-29 11:58:05.085 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 00000 0 [BACKEND] LOG: [Alarm Module]Host Name: dba 2023-01-29 11:58:05.085 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 00000 0 [BACKEND] LOG: [Alarm Module]Host IP: dba. Copy hostname directly in case of taking 10s to use 'gethostbyname' when /etc/hosts does not contain <HOST IP> 2023-01-29 11:58:05.085 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 00000 0 [BACKEND] LOG: [Alarm Module]Cluster Name: dbCluster 2023-01-29 11:58:05.085 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 00000 0 [BACKEND] LOG: [Alarm Module]Invalid data in AlarmItem file! Read alarm English name failed! line: 57 2023-01-29 11:58:05.087 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 00000 0 [BACKEND] LOG: loaded library "security_plugin" 2023-01-29 11:58:05.088 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 42809 0 [BACKEND] LOG: could not bind IPv4 socket at the 0 time: Cannot assign requested address 2023-01-29 11:58:05.088 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 42809 0 [BACKEND] HINT: Port 15400 is used, run 'netstat -anop|grep 15400' or 'lsof -i:15400'(need root) to see who is using this port. .2023-01-29 11:58:06.090 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 42809 0 [BACKEND] LOG: could not bind IPv4 socket at the 1 time: Cannot assign requested address 2023-01-29 11:58:06.090 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 42809 0 [BACKEND] HINT: Port 15400 is used, run 'netstat -anop|grep 15400' or 'lsof -i:15400'(need root) to see who is using this port. .2023-01-29 11:58:07.091 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 42809 0 [BACKEND] LOG: could not bind IPv4 socket at the 2 time: Cannot assign requested address 2023-01-29 11:58:07.091 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 42809 0 [BACKEND] HINT: Port 15400 is used, run 'netstat -anop|grep 15400' or 'lsof -i:15400'(need root) to see who is using this port. .2023-01-29 11:58:08.094 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 00000 0 [BACKEND] LOG: exec cmd: lsof -i:15400 sh: lsof: command not found 2023-01-29 11:58:08.100 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 00000 0 [BACKEND] LOG: exec cmd: netstat -anp | grep 15400 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) 2023-01-29 11:58:08.128 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 00000 0 [BACKEND] LOG: <netstat>:tcp 0 0 127.0.0.1:15400 0.0.0.0:* LISTEN 3092/gaussdb 2023-01-29 11:58:08.128 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 00000 0 [BACKEND] LOG: <netstat>:tcp6 0 0 ::1:15400 :::* LISTEN 3092/gaussdb 2023-01-29 11:58:08.129 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 42809 0 [BACKEND] FATAL: could not create listen socket for "192.168.162.78:15400" [2023-01-29 11:58:09.000][3089][][gs_ctl]: waitpid 3092 failed, exitstatus is 256, ret is 2 [2023-01-29 11:58:09.000][3089][][gs_ctl]: stopped waiting [2023-01-29 11:58:09.000][3089][][gs_ctl]: could not start server Examine the log output.. [omm@dba data]$

定位错误即 FATAL

2023-01-29 11:58:08.129 63d5eecd.1 [unknown] 140549185963904 [unknown] 0 dn_6001 42809 0 [BACKEND] FATAL: could not create listen socket for “192.168.162.78:15400”

解决思路

有错误信息可见是IP地址出现问题,先查询pg_hba.conf文件,在查询操作系统网卡信息

postgresql.conf

# - Connection Settings - listen_addresses = 'localhost,192.168.162.78' # what IP address(es) to listen on; # comma-separated list of addresses; # defaults to 'localhost'; use '*' for all # (change requires restart) local_bind_address = '192.168.162.78' port = 15400 # (change requires restart) max_connections = 5000 # (change requires restart) # Note: Increasing max_connections costs ~400 bytes of shared memory per # connection slot, plus lock space (see max_locks_per_transaction). #sysadmin_reserved_connections = 3 # (change requires restart) unix_socket_directory = '/opt/software/opengauss/tmp' # (change requires restart) #unix_socket_group = '' # (change requires restart) unix_socket_permissions = 0700 # begin with 0 to use octal notation # (change requires restart) # - Security and Authentication -

pg_hba.conf文件

[omm@dba data]$ tail -10 pg_hba.conf # IPv4 local connections: host all all 127.0.0.1/32 trust host all all 192.168.162.78/32 sha256 # IPv6 local connections: host all all ::1/128 trust # Allow replication connections from localhost, by a user with the # replication privilege. #local replication omm trust #host replication omm 127.0.0.1/32 trust #host replication omm ::1/128 trust [omm@dba data]$ [omm@dba data]$ [omm@dba data]$

操作系统网卡信息

[omm@dba data]$ cat /etc/sysconfig/network-scripts/ifcfg-ens33 TYPE="Ethernet" PROXY_METHOD="none" BROWSER_ONLY="no" BOOTPROTO="dhcp" DEFROUTE="yes" IPV4_FAILURE_FATAL="no" IPV6INIT="yes" IPV6_AUTOCONF="yes" IPV6_DEFROUTE="yes" IPV6_FAILURE_FATAL="no" IPV6_ADDR_GEN_MODE="stable-privacy" NAME="ens33" UUID="6896f8ed-e7a9-408e-9401-09ae5dcaabba" DEVICE="ens33" ONBOOT="yes" MTU="8192" [omm@dba data]$ [omm@dba data]$

由此可见,是因为使用了操作系统启用了IP自动分配,系统重启后IP地址发生变化,而openGauss的两个配置文件postgresql.conf和pg_hba.conf未修改。

解决方案-根据IP修改配置文件

修改postgresql.conf

# - Connection Settings - listen_addresses = 'localhost,192.168.43.43' # what IP address(es) to listen on; # comma-separated list of addresses; # defaults to 'localhost'; use '*' for all # (change requires restart) local_bind_address = '192.168.43.43' port = 15400 # (change requires restart) max_connections = 5000 # (change requires restart) # Note: Increasing max_connections costs ~400 bytes of shared memory per # connection slot, plus lock space (see max_locks_per_transaction). #sysadmin_reserved_connections = 3 # (change requires restart) unix_socket_directory = '/opt/software/opengauss/tmp' # (change requires restart) #unix_socket_group = '' # (change requires restart) unix_socket_permissions = 0700 # begin with 0 to use octal notation # (change requires restart)

修改pg_hba.conf

[omm@dba data]$ tail -10 pg_hba.conf # IPv4 local connections: host all all 127.0.0.1/32 trust host all all 192.168.43.43/32 sha256 # IPv6 local connections: host all all ::1/128 trust # Allow replication connections from localhost, by a user with the # replication privilege. #local replication omm trust #host replication omm 127.0.0.1/32 trust #host replication omm ::1/128 trust [omm@dba data]$ [omm@dba data]$

顺利启动

[omm@dba data]$ gs_om -t start Starting cluster. ========================================= [SUCCESS] dba 2023-01-29 12:45:15.831 63d5f9db.1 [unknown] 140479240804224 [unknown] 0 dn_6001 01000 0 [BACKEND] WARNING: could not create any HA TCP/IP sockets 2023-01-29 12:45:15.831 63d5f9db.1 [unknown] 140479240804224 [unknown] 0 dn_6001 01000 0 [BACKEND] WARNING: could not create any HA TCP/IP sockets 2023-01-29 12:45:15.833 63d5f9db.1 [unknown] 140479240804224 [unknown] 0 dn_6001 01000 0 [BACKEND] WARNING: Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (3608 Mbytes) is larger. ========================================= Successfully started. [omm@dba data]$ [omm@dba data]$ [omm@dba data]$ gs_om -t status ----------------------------------------------------------------------- cluster_name : dbCluster cluster_state : Normal redistributing : No ----------------------------------------------------------------------- [omm@dba data]$ [omm@dba data]$ gsql -d postgres -p 15400 gsql ((openGauss 3.1.1 build 70980198) compiled at 2023-01-06 09:34:59 commit 0 last mr ) Non-SSL connection (SSL connection is recommended when requiring high-security) Type "help" for help. openGauss=#
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论