暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

2021生产HBase初始化失败,快狠准解决

若泽大数据 2021-03-04
8554

一、HBase 启动失败,查看master log

错误一: Master startup cannot progress, in holding-pattern until region onlined

2021-02-08 08:45:25,520 INFO [PEWorker-15] procedure.ServerCrashProcedure: removed crashed server hadoop04,16020,1612702831541 after splitting done
2021-02-08 08:45:25,520 INFO [PEWorker-12] procedure.ServerCrashProcedure: removed crashed server hadoop03,16020,1612702830065 after splitting done
2021-02-08 08:45:25,520 INFO [PEWorker-16] procedure.ServerCrashProcedure: removed crashed server hadoop05,16020,1612702830065 after splitting done
2021-02-08 08:45:25,520 INFO [PEWorker-14] procedure.ServerCrashProcedure: removed crashed server hadoop02,16020,1612702829990 after splitting done
2021-02-08 08:45:25,520 INFO [PEWorker-13] procedure.ServerCrashProcedure: removed crashed server hadoop01,16020,1612702830108 after splitting done
2021-02-08 08:45:25,539 INFO [PEWorker-16] procedure2.ProcedureExecutor: Finished pid=4, state=SUCCESS; ServerCrashProcedure server=hadoop05,16020,1612702830065, splitWal=true, meta=false in 5.4110sec
2021-02-08 08:45:25,539 INFO [PEWorker-14] procedure2.ProcedureExecutor: Finished pid=1, state=SUCCESS; ServerCrashProcedure server=hadoop02,16020,1612702829990, splitWal=true, meta=false in 5.8310sec
2021-02-08 08:45:25,539 INFO [PEWorker-13] procedure2.ProcedureExecutor: Finished pid=2, state=SUCCESS; ServerCrashProcedure server=hadoop01,16020,1612702830108, splitWal=true, meta=false in 5.6480sec
2021-02-08 08:45:25,539 INFO [PEWorker-15] procedure2.ProcedureExecutor: Finished pid=3, state=SUCCESS; ServerCrashProcedure server=hadoop04,16020,1612702831541, splitWal=true, meta=false in 5.5300sec
2021-02-08 08:45:25,539 INFO [PEWorker-12] procedure2.ProcedureExecutor: Finished pid=5, state=SUCCESS; ServerCrashProcedure server=hadoop03,16020,1612702830065, splitWal=true, meta=true in 5.3010sec
2021-02-08 08:45:25,615 WARN [master/hadoop01:16000:becomeActiveMaster] master.HMaster: hbase:namespace,,1579505733727.1008ebe9ec36aa590581795aa076361f. is NOT online; state={1008ebe9ec36aa590581795aa076361f state=CLOSED, ts=1612745125446, server=hadoop03,16020,1612424044665}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined.
2021-02-08 08:45:25,651 WARN [master/hadoop01:16000.Chore.1] master.CatalogJanitor: atlas_entity_audit,,1611409787700.771f4de62962bf16ff21dce6af291f4d., unknown_server=hadoop01,16020,1612690760022/IDSS:GOOD,,1611409014714.6814da0f242a0cac9d69739ccc14b82f., unknown_server=hadoop01,16020,1612690760022/IDSS:GOOD,?\xFF\xFF\xFF,1611409014714.e5e73d64068a282e2a56612354eea262., unknown_server=hadoop01,16020,1612690760022/IDSS:GOOD,\x7F\xFF\xFF\xFE,1611409014714.e770ad8ba684f288756a75f46584ed82., unknown_server=hadoop02,16020,1612690751717/IDSS:GOOD,\xBF\xFF\xFF\xFD,1611409014714.98592e766049e7f8d58db2f21a2d7344., unknown_server=hadoop04,16020,1612690754642/hbase:acl,,1579505735886.218a8bd215747fd4091b92fc47edc3d5., unknown_server=hadoop03,16020,1612424044665/hbase:namespace,,1579505733727.1008ebe9ec36aa590581795aa076361f., unknown_server=hadoop04,16020,1612690754642/IDSS:TIMES,,1612424053581.7b65d91d68148c890a7953c8914adf57., unknown_server=hadoop05,16020,1612690756464/IDSS:CITY,,1612412337308.aa217d2a3c4edd49549c1b26e12e4680.
2021-02-08 08:45:26,615 WARN [master/hadoop01:16000:becomeActiveMaster] master.HMaster: hbase:namespace,,1579505733727.1008ebe9ec36aa590581795aa076361f. is NOT online; state={1008ebe9ec36aa590581795aa076361f state=CLOSED, ts=1612745125446, server=hadoop03,16020,1612424044665}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined.
2021-02-08 08:45:26,703 INFO [HBase-Metrics2-1] impl.GlobalMetricRegistriesAdapter: Registering Master,sub=Coprocessor.Master.CP_org.apache.hadoop.hbase.security.access.AccessController Metrics about HBase MasterObservers
2021-02-08 08:45:28,616 WARN [master/hadoop01:16000:becomeActiveMaster] master.HMaster: hbase:namespace,,1579505733727.1008ebe9ec36aa590581795aa076361f. is NOT online; state={1008ebe9ec36aa590581795aa076361f state=CLOSED, ts=1612745125446, server=hadoop03,16020,1612424044665}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined.
2021-02-08 08:45:32,616 WARN [master/hadoop01:16000:becomeActiveMaster] master.HMaster: hbase:namespace,,1579505733727.1008ebe9ec36aa590581795aa076361f. is NOT online; state={1008ebe9ec36aa590581795aa076361f state=CLOSED, ts=1612745125446, server=hadoop03,16020,1612424044665}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined.
2021-02-08 08:45:40,617 WARN [master/hadoop01:16000:becomeActiveMaster] master.HMaster: hbase:namespace,,1579505733727.1008ebe9ec36aa590581795aa076361f. is NOT online; state={1008ebe9ec36aa590581795aa076361f state=CLOSED, ts=1612745125446, server=hadoop03,16020,1612424044665}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined.

错误二: Found 0 OPEN regions on dead servers and 34 OPEN regions on unknown servers
2021-02-08 09:10:30,210 WARN [master/hadoop01:16000.Chore.1] master.CatalogJanitor: atlas_entity_audit,,1611409787700.771f4de62962bf16ff21dce6af291f4d., unknown_server=hadoop01,16020,1612690760022/IDSS:GOOD,,1611409014714.6814da0f242a0cac9d69739ccc14b82f., unknown_server=hadoop01,16020,1612690760022/IDSS:GOOD,?\xFF\xFF\xFF,1611409014714.e5e73d64068a282e2a56612354eea262., unknown_server=hadoop01,16020,1612690760022/IDSS:GOOD,\x7F\xFF\xFF\xFE,1611409014714.e770ad8ba684f288756a75f46584ed82., unknown_server=hadoop02,16020,1612690751717/IDSS:GOOD,\xBF\xFF\xFF\xFD,1611409014714.98592e766049e7f8d58db2f21a2d7344., unknown_server=hadoop04,16020,1612690754642/hbase:acl,,1579505735886.218a8bd215747fd4091b92fc47edc3d5., unknown_server=hadoop03,16020,1612424044665/hbase:namespace,,1579505733727.1008ebe9ec36aa590581795aa076361f., unknown_server=hadoop04,16020,1612690754642/IDSS:TIMES,,1612424053581.7b65d91d68148c890a7953c8914adf57., unknown_server=hadoop05,16020,1612690756464/IDSS:CITY,,1612412337308.aa217d2a3c4edd49549c1b26e12e4680.
2021-02-08 09:11:25,455 INFO [ProcExecTimeout] assignment.AssignmentManager: Found 0 OPEN regions on dead servers and 34 OPEN regions on unknown servers
2021-02-08 09:13:25,455 INFO [ProcExecTimeout] assignment.AssignmentManager: Found 0 OPEN regions on dead servers and 34 OPEN regions on unknown servers
Mon Feb 8 09:13:30 CST 2021 Stopping hbase (via master)

二、分析

hbase:namespace表分布在hadoop03,16020,1612424044665, 但是现在hadoop03启动是hadoop03,16020,1612702830065故抛异常OPEN regions on unknown servers。

多年经验,迅速定位,是hbase:meta问题!重建即可,假如是hbase1.x版本,修复code是集成的,很轻松可以执行;但是现在集群是2.x,是被剥离开来,稍微麻烦点!

三、检查hbase文件是否丢块损坏

[bigdata@hadoop01 ~]$ hdfs fsck hbase
Connecting to namenode via http://hadoop01:50070/fsck?ugi=bigdata&path=%2Fhbase
FSCK started by bigdata (auth:SIMPLE) from 192.168.1.34 for path hbase at Mon Feb 08 08:50:15 CST 2021
....................................................................................................
...................................................................................Status: HEALTHY
Total size: 146900394 B
Total dirs: 271
Total files: 183
Total symlinks: 0 (Files currently being written: 8)
Total blocks (validated): 129 (avg. block size 1138762 B) (Total open file blocks (not validated): 6)
Minimally replicated blocks: 129 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 5
Number of racks: 1

FSCK ended at Mon Feb 08 08:50:15 CST 2021 in 4 milliseconds



The filesystem under path '/hbase' is HEALTHY
[bigdata@hadoop01 ~]$

发现无丢失无损坏;假如有损坏,则需要先修复损坏的块
hdfs debug命令

四、观察hbase web界面,Dead Region Servers

hbase web: http://hadoop01:16010 

如图是网络摘抄。

hdfs dfs -mkdir hbase/WALs2021020801
hdfs dfs -mv hbase/WALs/* hbase/WALs2021020801/

sh stop-hbase.sh
sh start-hbase.sh

再观察,无dead region servers。错误如上依旧。

五、离线修复元数据

[hbase@hadoop01 ~]$ hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair
Java HotSpot(TM) 64-Bit Server VM warning: UseCMSCompactAtFullCollection is deprecated and will likely be removed in a future release.
Java HotSpot(TM) 64-Bit Server VM warning: CMSFullGCsBeforeCompaction is deprecated and will likely be removed in a future release.
This tool is no longer supported in HBase-2+. Please refer to https://hbase.apache.org/book.html#HBCK2
[hbase@hadoop01 ~]$

当前HBase2.2.2版本不支持1.x语法,需要编译

六、编译hbase-operator-tools

https://github.com/apache/hbase-operator-tools/archive/rel/1.0.0.tar.gz

IDEA导入,maven编译 

hbase-hbck2-1.0.0.jar 上传到hadoop01机器:

[hbase@hadoop01 ~]$ cd jar
[hbase@hadoop01 jar]$ ll
total 2528
-rw-r--r-- 1 hbase hbase 2588263 Feb 7 19:32 hbase-hbck2-1.0.0.jar
[hbase@hadoop01 jar]$

七、再次离线修复元数据

7.1 环境变量检查

[hbase@hadoop01 ~]$ echo $HADOOP_HOME
/home/hbase/software/hadoop
[hbase@hadoop01 ~]$ echo $HBASE_HOME
/home/hbase/software/hbase
[hbase@hadoop01 ~]$

[hbase@hadoop01 ~]$ cd software/hadoop/etc/hadoop/

7.2 hbck jar包追加在HADOOP_CLASSPATH变量里面
[hbase@hadoop01 hadoop]$ vi hadoop-env.sh
export HADOOP_CLASSPATH=/home/hbase/jar/hbase-hbck2-1.0.0.jar:$HADOOP_CLASSPATH:$HADOOP_HOME/share/hadoop/common/hadoop-lzo-0.4.20.jar

7.3 验证
[hbase@hadoop01 hadoop]$ hadoop classpath
/home/hbase/software/hadoop-2.9.2/etc/hadoop:/home/hbase/software/hadoop-2.9.2/share/hadoop/common/lib/*:/home/hbase/software/hadoop-2.9.2/share/hadoop/common/*:/home/hbase/software/hadoop-2.9.2/share/hadoop/hdfs:/home/hbase/software/hadoop-2.9.2/share/hadoop/hdfs/lib/*:/home/hbase/software/hadoop-2.9.2/share/hadoop/hdfs/*:/home/hbase/software/hadoop-2.9.2/share/hadoop/yarn:/home/hbase/software/hadoop-2.9.2/share/hadoop/yarn/lib/*:/home/hbase/software/hadoop-2.9.2/share/hadoop/yarn/*:/home/hbase/software/hadoop-2.9.2/share/hadoop/mapreduce/lib/*:/home/hbase/software/hadoop-2.9.2/share/hadoop/mapreduce/*:/home/hbase/jar/hbase-hbck2-1.0.0.jar:/home/bigdata/software/hadoop/contrib/capacity-scheduler/*.jar:/home/bigdata/software/hadoop/share/hadoop/common/hadoop-lzo-0.4.20.jar

7.4 重建hbase:meta【

至关重要,最核心一步

先停hbase,执行如下命令,将会重建hbase:meta表!
[hbase@hadoop01 ~]$ sh stop-hbase.sh

[hbase@hadoop01 ~]$ cd software/hbase
[hbase@hadoop01 hbase]$

./bin/hbase org.apache.hbase.hbck1.OfflineMetaRepair -details


......
......
2021-02-08 10:13:44,590 INFO [main] regionserver.HRegion: Closed hbase:meta,,1.1588230740
2021-02-08 10:13:44,627 INFO [main] wal.AbstractFSWAL: Closed WAL: AsyncFSWAL hregion-26930097.meta:.meta(num 1612750423613)
2021-02-08 10:13:44,642 INFO [main] hbck1.HBaseFsck: Deleting hdfs://xsbhdfs/hbase/WALs/hregion-26930097, result=true
2021-02-08 10:13:44,642 INFO [main] hbck1.HBaseFsck:

Success! hbase:meta table rebuilt.
Old hbase:meta moved into hdfs://xsbhdfs/hbase/.hbck/hbase-1612750422175


Success! hbase:meta table rebuilt.这句表面元数据重建成功!


7.5 启动集群,但是集群并不会完全被正常启动。hbase:meta将会被卡住:
2021-02-08 10:05:32,779 WARN [master/hadoop01:16000:becomeActiveMaster] master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN, ts=1612749927967, server=hadoop01,16020,1612749770805}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.


7.6 assigns命令,必须使用-skip命令来跳过Master版本检查(如果不使用 -skip 参数,将引发使用 hbase shell 中 assign 命令的PleaseHoldException 异常,因为 Master 尚未启动)


[hbase@hadoop01 hbase]$ ./bin/hbase org.apache.hbase.HBCK2 -skip assigns 1588230740


7.7 hbase:namespace被卡住:

2021-02-08 10:18:02,231 WARN [master/hadoop01:16000:becomeActiveMaster] master.HMaster: hbase:namespace,,1579505733727.1008ebe9ec36aa590581795aa076361f. is NOT online; state={1008ebe9ec36aa590581795aa076361f state=CLOSED, ts=1612750681026, server=null}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined.


7.8 hbase:namespace手动aasigns

[hbase@hadoop01 hbase]$ ./bin/hbase org.apache.hbase.HBCK2 -skip assigns 1008ebe9ec36aa590581795aa076361f


7.9 master 完成初始化

2021-02-08 10:18:32,385 INFO [master/hadoop01:16000:becomeActiveMaster] master.HMaster: Master has completed initialization 165.412sec

7.10 重建hbase:meta表后,表处于DISABLED状态并且其Region 处于 CLOSED 模式。
需要重新enable 用户表,使表的所有Region在线。
一次执行一个,或者执行 enable_all ".*" 命令的用法一次性启用所有表。

[hbase@hadoop01 hbase]$ hbase shell
hbase(main):001:0> list
TABLE
IDSS:ORDER
IDSS:LOG
IDSS:USER
SYSTEM:CATALOG
SYSTEM:FUNCTION
SYSTEM:LOG
SYSTEM:MUTEX
SYSTEM:SEQUENCE
SYSTEM:STATS
IDSS:ORDERITEMS
IDSS:WAREHOUSE
IDSS:GOOD
IDSS:TIMES
IDSS:CITY
14 row(s)
Took 0.3822 seconds
=> ["IDSS:ORDER", "IDSS:LOG", "IDSS:USER", "SYSTEM:CATALOG", "SYSTEM:FUNCTION", "SYSTEM:LOG", "SYSTEM:MUTEX", "SYSTEM:SEQUENCE", "SYSTEM:STATS", "IDSS:ORDERITEMS", "IDSS:WAREHOUSE", "IDSS:GOOD", "IDSS:TIMES", "IDSS:CITY"]

hbase(main):003:0> is_disabled 'IDSS:USER' 校验是真的disable

true

Took 0.0957 seconds

=> 1



hbase(main):004:0> enable_all ".*"
IDSS:ORDER
IDSS:LOG
IDSS:USER
SYSTEM:CATALOG
SYSTEM:FUNCTION
SYSTEM:LOG
SYSTEM:MUTEX
SYSTEM:SEQUENCE
SYSTEM:STATS
IDSS:ORDERITEMS
IDSS:WAREHOUSE
IDSS:GOOD
IDSS:TIMES
IDSS:CITY

Enable the above 14 tables (y/n)? 输入y,需要等待一小会,可以实时查看active hbase master的log
y
14 tables successfully enabled
Took 32.6373 seconds
hbase(main):005:0>

8.重启HBase 再观察,建表插入数据测试

8.1 重启,再验证
sh stop-hbase.sh
sh start-hbase.sh

8.2 观察master log,无异常,成功启动
2021-02-08 10:21:21,997 INFO [master/hadoop01:16000:becomeActiveMaster] master.HMaster: Master has completed initialization 8.151sec

8.3 hbase shell 进入客户端,建表插入扫描数据
create 'user','info'
put 'user','r1','info:name','JJ'
scan 'user'



2021若泽数据高级班10期,Spark+Flink全栈训练营,开课啦

我们企业在职,佛系招生

支持货比三家,自愿报名


1.官网介绍 

http://www.ruozedata.com/advanced.html 

底部[阅读原文],单击直接跳转,先看看,先初步了解课程内容

2.课程时间:

  • 开课时间: 2021-01-15

  • 课程周期: 4.5个月(节假日除外)

  • 上课时间: 周三21-23点、周六周日20-22点,一周至少三次课程;

    根据学员情况,周末额外补课

  • 一年只开2期: 半年周期为一期,2020下半年就这一期

3.讲师团队:

  • 以若泽PK+J哥为主若泽数据团队,企业在职

  • 若泽PK,北京西二旗某司大数据架构师

  • J哥,上海某金融机构大数据负责人

赶快咨询哟

单击[阅读原文],高级班课表



以下是若泽数据近期情况

杭州某公司大数据团队与若泽数据的3年情缘

2020元旦-线下项目第18期圆满结束

2019感恩节-线下项目第17期圆满结束(34.5w offer&面试题)

2019国庆-线下项目第16期圆满结束

捷报:连续18周若泽数据第60-63名学员喜捷offer(年薪和119.2w)

捷报:连续17周若泽数据第57-59名学员喜捷offer(3个小伙伴)

捷报:连续16周若泽数据第50-56名学员喜捷offer(7个小伙伴)

毕业1年半转型大数据,斩获月薪17.5K

15K的Java Web转型大数据斩获28K

斩获“京哈快招百”offer的学员总结

控制工程女,转型大数据

迎着别人的冷眼和嘲笑成长

知遇若泽数据,成就自己未来

相遇若泽大数据—成就更好的自己

文章转载自若泽大数据,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论