Hbase介绍
HBase是一个分布式的、面向列的开源非关系型数据库(NoSQL),它适合于存储非结构化的数据。HBase拥有高可用性、高性能、面向列存储、可拓展等特性。利用HBase的这些特性,可以在廉价的服务器上搭建一套大规模的存储集群。
下面一些在实际业务中的应用场景:
数据量大,并且访问需要满足随机、快速响应的需要。
要满足动态扩容的需要。
不需要满足关系型数据库中的特性(如事务、连接、交叉表)。
写数据时,需要拥有高吞吐的能力。
Hbase架构图

HMaster的作用:
为HRegionServer分配HRegion
负责HRegionServer的负载均衡
发现失效的HRegionServer并重新分配
HDFS上的垃圾文件回收
处理Schema更新请求
HRegionServer的作用:
维护HMaster分配给它的HRegion,处理对这些HRegion的IO请求
负责切分正在运行过程中变得过大的HRegion
HRegion:
Table在行的方向上分割为多个HRegion,HRegion是HBase中分布式存储和负载均衡的最小单元,即不同的HRegion可以分别在不同的HRegionServer上,但同一个HRegion是不会拆分到多个HRegionServer上的。HRegion按大小分割,每个表一般只有一个HRegion,随着数据不断插入表,HRegion不断增大,当HRegion的某个列簇达到一个阀值(默认256M)时就会分成两个新的HRegion。
Store
每一个HRegion由一个或多个Store组成,至少是一个Store,HBase会把一起访问的数据放在一个Store里面,即为每个ColumnFamily建一个Store,如果有几个ColumnFamily,也就有几个Store。一个Store由一个MemStore和0或者多个StoreFile组成。HBase以Store的大小来判断是否需要切分HRegion。
MemStore
MemStore 是放在内存里的,保存修改的数据即keyValues。当MemStore的大小达到一个阀值(默认64MB)时,MemStore会被Flush到文件,即生成一个快照。目前HBase会有一个线程来负责MemStore的Flush操作。
StoreFile
MemStore内存中的数据写到文件后就是StoreFile,StoreFile底层是以HFile的格式保存。
HFile
HBase中KeyValue数据的存储格式,是Hadoop的二进制格式文件。首先HFile文件是不定长的,长度固定的只有其中的两块:Trailer和FileInfo。Trailer中有指针指向其他数据块的起始点,FileInfo记录了文件的一些meta信息。Data Block是HBase IO的基本单元,为了提高效率,HRegionServer中有基于LRU的Block Cache机制。每个Data块的大小可以在创建一个Table的时候通过参数指定(默认块大小64KB),大号的Block有利于顺序Scan,小号的Block利于随机查询。每个Data块除了开头的Magic以外就是一个个KeyValue对拼接而成,Magic内容就是一些随机数字,目的是防止数据损坏。
HLog
HLog(WAL log):WAL意为write ahead log,用来做灾难恢复使用,HLog记录数据的所有变更,一旦region server 宕机,就可以从log中进行恢复。
LogFlusher
定期的将缓存中信息写入到日志文件中
LogRoller
对日志文件进行管理维护
安装过程
官网下载
https://downloads.apache.org/hbase/
解决安装包并移动目录
[hadoop@nna opt]$ tar -zxvf hbase-2.2.6-bin.tar.gz
[hadoop@nna opt]$ sudo mv hbase-2.2.6 /usr/local/
在hbase配置文件中添加java环境变量
# vi /usr/local/hbase-2.2.6/conf/hbase-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_102
在regionservers
配置文件中添加节点 信息
# vi /usr/local/hbase-2.2.6/conf/regionservers
dn1
dn2
dn3
配置文件/etc/profile
添加环境变量
export HBASE_HOME=/usr/local/hbase-2.2.6
export PATH=$PATH:$HBASE_HOME/bin
同步配置文件到nns节点
sudo scp /etc/profile nns:/etc/profile
编辑hbase-site.xml
配置文件
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- 设置hbase的zookeeper地址 -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>dn1:2181,dn2:2181,dn3:2181</value>
<description>
The directory shared by RegionServers.
</description>
</property>
<!-- 设置hbase的zookeeper的客户端访问端口 -->
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<!-- hbase的元数据信息在本地的存储路径 -->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/data/hbase/zk</value>
<description>
Property from ZooKeeper config zoo.cfg.The directory
where the snapshot is stored.
</description>
</property>
<!-- hbase集群对客户端提供访问的接口地址 -->
<property>
<name>hbase.rootdir</name>
<value>hdfs://cluster1/hbase</value>
<description>
The directory shared by RegionServers.
</description>
</property>
<!-- 开启hbase分布式属性,flase表示集群模式为standalone -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>
Possible values are false:standalone and pseudo-distributed setups with managed
Zookeeper true:fully-distributed with unmanaged Zookeeper Quorum(see hbase-env.sh)
</description>
</property>
</configuration>
在nna节点上同步配置好的hbase文件夹到其它节点
sudo scp -r hbase-2.2.6 nns:/usr/local
sudo scp -r hbase-2.2.6 dn1:/usr/local
sudo scp -r hbase-2.2.6 dn2:/usr/local
sudo scp -r hbase-2.2.6 dn3:/usr/local
--并修改用户组属性
sudo chown -R hadoop:hadoop /usr/local/hbase-2.2.6/
在nna节点上启动hbase集群服务
[hadoop@nna ~]$ start-hbase.sh
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase-2.2.6/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase-2.2.6/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
dn1: running zookeeper, logging to /usr/local/hbase-2.2.6/bin/../logs/hbase-hadoop-zookeeper-dn1.out
dn3: running zookeeper, logging to /usr/local/hbase-2.2.6/bin/../logs/hbase-hadoop-zookeeper-dn3.out
dn2: running zookeeper, logging to /usr/local/hbase-2.2.6/bin/../logs/hbase-hadoop-zookeeper-dn2.out
running master, logging to /usr/local/hbase-2.2.6/logs/hbase-hadoop-master-nna.out
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase-2.2.6/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
dn3: running regionserver, logging to /usr/local/hbase-2.2.6/bin/../logs/hbase-hadoop-regionserver-dn3.out
dn2: running regionserver, logging to /usr/local/hbase-2.2.6/bin/../logs/hbase-hadoop-regionserver-dn2.out
dn1: running regionserver, logging to /usr/local/hbase-2.2.6/bin/../logs/hbase-hadoop-regionserver-dn1.out
[hadoop@nna ~]$
在nns节点上再启动一个HMaster进程,构成高可用环境
[hadoop@nns ~]$ hbase-daemon.sh start master
running master, logging to /usr/local/hbase-2.2.6/logs/hbase-hadoop-master-nns.out
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase-2.2.6/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[hadoop@nns ~]$ jps
2866 Jps
1430 NameNode
2793 HMaster
1500 DFSZKFailoverController
启动看各节点的jps进程情况
nna
[hadoop@nna ~]$ jps
1476 NameNode
1558 DFSZKFailoverController
5320 Jps
5036 HMaster
nns
[hadoop@nns ~]$ jps
1430 NameNode
2793 HMaster
2970 Jps
1500 DFSZKFailoverController
dn1-3
[hadoop@dn2 logs]$ jps
2945 Jps
1334 DataNode
2760 HRegionServer
1241 QuorumPeerMain
1402 JournalNode
启动时报错
2021-05-18 10:54:30,548 ERROR [main] regionserver.HRegionServer: Failed construction RegionServer
java.lang.IllegalArgumentException: java.net.UnknownHostException: cluster1
原因:HBase无法识别nameservice ID集群。
解决方案:Hadoop中2个配置文件core-site.xml和hdfs-site.xml,复制到HBase的conf目录下即可运行成功。
在每个节点上把hadoop的配置文件hdfs-site.xml和core-site.xml 拷贝到hbase的conf文件夹下
[hadoop@dn1 hadoop]$ cp /usr/local/hadoop-2.7.4/etc/hadoop/hdfs-site.xml /usr/local/hbase-2.2.6/conf
[hadoop@dn1 hadoop]$ cp /usr/local/hadoop-2.7.4/etc/hadoop/core-site.xml /usr/local/hbase-2.2.6/conf
HBase Web访问地址(默认端口是16010)

Hbase shell命令使用
进入控制台
[hadoop@nna ~]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hbase-2.2.6/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.2.6, r88c9a386176e2c2b5fd9915d0e9d3ce17d0e456e, Tue Sep 15 17:36:14 CST 2020
Took 0.0023 seconds
hbase(main):001:0>
创建表
hbase(main):014:0> create 'test','family'
Created table test
Took 2.7345 seconds
=> Hbase::Table - test
添加数据
hbase(main):019:0* put 'test','rowkey1','family','value1'
Took 0.2555 seconds
查询全表数据
hbase(main):024:0> scan test
ArgumentError: wrong number of arguments (0 for 2)
hbase(main):025:0> scan 'test'
ROW COLUMN+CELL
rowkey1 column=family:, timestamp=1621307645946, value=value1
rowkey2 column=family:, timestamp=1621307661835, value=value1
2 row(s)
Took 0.1080 seconds
根据rowkey查询数据
hbase(main):028:0* get 'test','rowkey1'
COLUMN CELL
family: timestamp=1621307645946, value=value1
1 row(s)
Took 0.0506 seconds
删除表
hbase(main):029:0> drop 'test'
ERROR: Table test is enabled. Disable it first.
For usage try 'help "drop"'
Took 0.0248 seconds
hbase(main):030:0> disable 'test'
Took 1.2947 seconds
hbase(main):031:0> drop 'test'
Took 0.4682 seconds




