GBase 8a 日志之 system.log 日志详解

Le 2022-02-22

778

gcluster 日志分为三类： trc 日志记录查询计划； express.log 记录 SQL执行过程以及执行过程中的警告和错误； system.log 主要记录 gcluster 的启停以及 crash 信息，日志存放路径为： /opt/gcluster/log/gcluster。

当 gcluster 正常启停和启动失败以及运行过程中出现 crash 时，信息都会记录到 system.log 中，通过分析这些日志来定位启动失败以及运行过程中出现 crash 的原因。当运行过程中出现 crash 时， system.log 中记录了宕机的堆栈信息， core 文件中记录了宕机的详细的堆栈信息， core 文件的存放路径在/opt/gcluster/userdata/gcluster/下，可通过 gdb 的方式来查看详细的 crash信息（发版的 gcluster 是带有符号的release 版本），命令如下：

gdb /opt/gcluster/server/bin/gclusterd
/opt/gcluster/userdata/gcluster/corefile
thread apply all bt

在运行过程中出现 crash 时， system.log 中记录了宕机的堆栈信息， core文件中记录了宕机的详细的堆栈信息，如果用户希望看到详细的堆栈信息，则需要在集群以及集群节点的配置文件中，开启该功能，具体如下：
集群层面：在每台集群节点机器的/opt/gcluster/config 路径下，找到配置文件 gbase_8a_gcluster.cnf，将文件中的 core-file 参数去掉参数前的注释符号“#” 。
集群节点：在每台集群节点机器的/opt/gnode/config 路径下，找到配置文件 gbase_8a_gbase.cnf，将文件中的 core-file 参数去掉参数前的注释符号“#” 。
进行完上述设置后，每台节点机器切换到 root 用户，运行 service gcwarerestart 命令，重新启动集群服务，使上述配置文件的设置生效。
下面举例为用户展示 system.log 文件中的 crash 信息。
system.log 中包含有 crash 信息，如下：

120323 0:18:09 [Note] Event Scheduler: Purging the queue. 6 events
120323 0:18:09 - gbased got signal 11 ;
This could be because you hit a bug. It is also possible that this
binary
or one of the libraries it was linked against is corrupt, improperly
built,
or misconfigured. This error can also be caused by malfunctioning
hardware.
We will try our best to scrape up some info that will hopefully help
diagnose
the problem, but since we have already crashed, something is
definitely wrong
and this may fail.
key_buffer_size=8384512
read_buffer_size=131072
max_used_connections=6
max_threads=5000
threads_connected=0
It is possible that gbased could use up to
key_buffer_size + (read_buffer_size +
sort_buffer_size)*max_threads = 10950336 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
thd: 0x675a560
Attempting backtrace. You can use the following information to find
out
where gbased died. If you see no messages after this, something went
terribly wrong...
stack_bottom = (nil) thread_stack 0x40000
gclusterd(my_print_stacktrace+0x2e) [0xea49fd]
gclusterd(handle_segfault+0x32e) [0x6e6b36]
/lib64/libpthread.so.0 [0x361300eb70]
gclusterd(ExpressChannel::lock(unsigned int)+0x130) [0xb68746]
gclusterd(DMLLog(int, std::string, char const*)+0xbf) [0x9308ff]
gclusterd(GCQueryExecutorStatement::UnionTableHandler::Execute()
+0x14d0) [0x8f5ec0]
gclusterd(GCQueryExecutorStatement::TaskThread::ExecuteUnionTabl
eTask(char*, unsigned int)+0x93) [0x8dd23f]
gclusterd(GCQueryExecutorStatement::TaskThread::execute()+0x100)
[0x8df976]
gclusterd(CGCThreadPool::GCThreadDispatchFunc(void*)+0x17)
[0x92b5b1]
gclusterd(wrapper_fn(void*)+0x28) [0x92cafe]
/lib64/libpthread.so.0 [0x361300673d]
/lib64/libc.so.6(clone+0x6d) [0x36124d44bd]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at (nil) is an invalid pointer
thd->thread_id=0
thd->killed=NOT_KILLED Writing a core file

gbase gbase 8a gbase南大通用 gbase 8a mpp cluster

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

GBase 8a 日志之 system.log 日志详解

评论