1.背景
SequoiaDB 巨杉数据库是一款金融级分布式数据库,包括了分布式 NewSQL、分布式文件系统与对象存储、与高性能 NoSQL 三种存储模式,分别对应分布式在线交易、非结构化数据和内容管理、以及海量数据管理和高性能访问场景。
集群一般会使用三副本方式以确保数据安全。假若集群发生因硬件故障等原因导致的节点故障或集群异常,数据库管理员应进行系统的分析和诊断,以确保集群正常工作,不会影响用户的正常使用。本文将与大家分享一下基本的 SequoiaDB 数据库诊断方法。
2. 数据库集群诊断
1)确定 SequoiaDB 的安装路径
# cat etc/default/sequoiadbNAME=sdbcmSDBADMIN_USER=sdbadminINSTALL_DIR=/opt/sequoiadb
2)列出集群节点信息
$ sdblist -l$ sdblist -t all

$ sdbstart -p 1182011820: 69 bytes out==>db role: data_test errorFailed resolving arguments(error=-6), exit<==Error: Start [/opt/sequoiadb/bin/../conf/local/11820] failed, rc: 127(Invalid Argument)Total: 1; Succeed: 0; Failed: 1
$ vi opt/sequoiadb/conf/local/11820/sdb.confsvcname=11820dbpath=/opt/sequoiadb/database/data/11820logfilesz=64weight=10sortbuf=256sharingbreak=180000role=data_testcatalogaddr=sdb1:11803,sdb2:11803,sdb3:11803
3)检查集群节点是否正常
>cd /opt/sequoiadb/bin/$ sdb 'db = new Sdb("localhost",11810,"username","password")'$ sdb 'db.snapshot(SDB_SNAP_DATABASE)'{"TotalNumConnects": 0,"TotalDataRead": 787373,"TotalIndexRead": 0,……"ErrNodes": [{"NodeName": "sdb1:11820","Flag": -129},{"NodeName": "sdb2:11820","Flag": -129}]}Return 1 row(s).Takes 0.27826s.
$ sdb 'db.snapshot(SDB_SNAP_DATABASE)'{"TotalNumConnects": 1,……"ErrNodes": []}
$ sdb 'data = new Sdb("sdb2",11820)'$ sdb ' data.snapshot(SDB_SNAP_DATABASE)'{"NodeName": "sdb2:11820","HostName": "sdb2","ServiceName": "11820","GroupName": "dg1","IsPrimary": false,"ServiceStatus": false,"Status": "FullSync",......
2019-11-08-21.38.26.332510 Level:EVENTPID:3151 TID:3208Function:_onAttach Line:217File:SequoiaDB/engine/cls/clsReplSession.cppMessage:Session[Type:Sync-Dest,NodeID:1008,TID:1]: The db data is abnormal, need to synchronize full data2019-11-08-21.38.26.333890 Level:EVENTPID:3151 TID:3208Function:_fullSync Line:722File:SequoiaDB/engine/cls/clsReplSession.cppMessage:Session[Type:Sync-Dest,NodeID:1008,TID:1]: Start the synchronization of full
4)检查集群是否可用
【检查办法】
$cd /opt/sequoiadb/bin/$ sdb 'db = new Sdb("localhost",11810)'$ sdb 'db.sample.employee.insert({"code":1,"name":"test1"})'$ sdb 'db.sample.employee.find()'$ sdb 'db.sample.employee.count()'
$ sdb 'db.sample.employee. find ()'sdb.js:505 uncaught exception: -5File Exist
$ vi opt/sequoiadb/database/coord/11810/diaglog/sdbdiag.log2019-11-08-21.38.26.971524 Level:ERRORPID:89651 TID:90037Function:_queryOrDoOnCL Line:1076File:SequoiaDB/engine/coord/coordQueryOperator.cppMessage:Query failed on node[{ GroupID:1000, NodeID:1002, ServiceID:2(SHARD) }], rc: -52019-11-08-21.38.26.971661 Level:ERRORPID:89651 TID:90037Function:execute Line:491File:SequoiaDB/engine/coord/coordQueryOperator.cppMessage:Query failed, rc: -52019-11-08-21.38.26.971679 Level:ERRORPID:89651 TID:90037Function:_onQueryReqMsg Line:1850File:SequoiaDB/engine/pmd/pmdProcessor.cppMessage:Execute operator[Query] failed, rc: -5
日志中可以看到,“Query failed on node[{ GroupID:1000, NodeID:1002, ServiceID:2(SHARD) }], rc: -5”错误信息代表着真正的错误来自数据节点:分区组1000,节点ID1000,ServiceID:2错误码-5。接着在命令行使用 db.listReplicaGroups() 可以得到复制组信息:
$ sdb 'db.listReplicaGroups()'{……{"HostName": "sdb3","Status": 1,"dbpath": "/opt/sequoiadb/database/data/11820/","Service": [{"Type": 0,"Name": "11820"},{"Type": 1,"Name": "11821"},{"Type": 2,"Name": "11822"}],"NodeID": 1002},],"GroupID": 1000,"GroupName": "dg1","PrimaryNode": 1002,"Role": 0,"SecretID": 1969965962,"Status": 1,"Version": 7,"_id": {"$oid": "5d843fd23e28e361958a76bc"}}
vi /opt/sequoiadb/database/data/11820/diaglog/sdbdiag.log2019-11-08-21.38.26.584673 Level:ERRORPID:4347 TID:4370Function:open Line:66File:SequoiaDB/engine/oss/ossMmap.cppMessage:Failed to open file, rc: -52019-11-08-21.38.26.584698 Level:ERRORPID:4347 TID:4370Function:openStorage Line:700File:SequoiaDB/engine/dms/dmsStorageBase.cppMessage:Failed to open opt/sequoiadb/database/data/11820/sample.1.data, rc=-52019-11-08-21.38.26.584721 Level:ERRORPID:4347 TID:4370Function:open Line:1172File:SequoiaDB/engine/dms/dmsStorageUnit.cppMessage:Open storage data su failed, rc: -52019-11-08-21.38.26.584756 Level:ERRORPID:4347 TID:4370Function:rtnCreateCollectionSpaceCommand Line:1160File:SequoiaDB/engine/rtn/rtnCommandImpl.cppMessage:Failed to create collection space sample at /opt/sequoiadb/database/data/11820/, rc: -5
[sdbadmin@sdb3 11820]$ lltotal 1233564drwxrwxrwx. 2 sdbadmin sdbadmin_group 4096 Sep 19 19:56 archivelogdrwxrwxrwx. 2 sdbadmin sdbadmin_group 4096 Sep 19 19:56 bakfiledrwxrwxrwx. 2 sdbadmin sdbadmin_group 4096 Nov 8 06:11 diaglog-rw-r-----. 1 sdbadmin sdbadmin_group 0 Nov 8 05:27 sample.1.data-rw-r-----. 1 sdbadmin sdbadmin_group 0 Nov 8 05:27 sample.1.idx……drwxrwxrwx. 2 sdbadmin sdbadmin_group 4096 Sep 19 19:56 tmp
$ scp -r sdbadmin@sdb2:/opt/sequoiadb/database/data/11820/sample.1.* .$ sdb 'var dg = db.getRG("dg1")'$ sdb 'dg.stop()'$ sdb 'dg.start()'
$ sdb 'db.sample.employee.find()'{"_id": {"$oid": "5dc5755ec73f4486ee4efe40"},"a": 1}Return 1 row(s)
3.总结
往期技术干货巨杉Tech | 基于Kafka+Spark+SequoiaDB实时处理架构快速实战
巨杉Tech | SparkSQL+SequoiaDB 性能调优策略
巨杉Tech | 使用 etlAlchemy 工具迁移数据实战
巨杉Tech | SequoiaDB 巨杉数据库高可用容灾测试
巨杉Tech | 使用 SequoiaDB + Docker + Nodejs 搭建 Web 服务器
巨杉学习笔记 | SequoiaDB MySQL导入导出工具使用实战

最后修改时间:2019-12-04 09:46:44
文章转载自巨杉数据库,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。






