HBASE基础知识

HBASE的集群的搭建
HBASE的表设计
HBASE的底层存储模型

HBase 是一个高可靠、高性能、面向列、可伸缩的分布式缓存系统、
利用HBase 技术可在廉价PC Server上搭建起大规模结构化存储集群
HBase利用hadoop hdfs作为起文件存储系统，利用hadoop mapreduce
来处理HBase中的海量数据，利用zookeeper作为协调工具。

主键： Row Key
主键是用来减速记录的主键，访问hbase table中的行，只有3种方式
1. 通过单个row key 访问
2. 通过row key 的range
3. 全表扫描

列族： Column Family
列族在创建表的时候声明，第一个列族可以包含多个列，列中的数据都是以二进制形式存在，没有数据类型。

时间戳 timestamp
HBase 中通过row 和columns 确定的为一个存储单元称为cell。每个cell都保持着同一份数据的多个版本，
版本通过时间戳来索引。

配置hbase
1. hbase-env.sh
export JAVA_HOME=/data/jdk/

2.vim hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///data/hbase/data/</value>
</property>
</configuration>

3. 启动hbase
/data/hbase/bin/start-hbase.sh

4.使用命令操作
[root@hbase1 bin]# ./hbase
Usage: hbase [<options>] <command> [<args>]

Commands:
Some commands take arguments. Pass no args or -h for usage.
shell Run the HBase shell
hbck Run the hbase 'fsck' tool
hlog Write-ahead-log analyzer
snapshot Create a new snapshot of a table
snapshotinfo Tool for dumping snapshot information
hfile Store file analyzer
zkcli Run the ZooKeeper shell
upgrade Upgrade hbase
master Run an HBase HMaster node
regionserver Run an HBase HRegionServer node
zookeeper Run a Zookeeper server
rest Run an HBase REST server
thrift Run the HBase Thrift server
thrift2 Run the HBase Thrift2 server
clean Run the HBase clean up script
classpath Dump hbase CLASSPATH
mapredcp Dump CLASSPATH entries required by mapreduce
pe Run PerformanceEvaluation
ltt Run LoadTestTool
canary Run the Canary tool
version Print the version
CLASSNAME Run the class named CLASSNAME

5. 运行hbase shell
[root@hbase1 bin]# ./hbase shell

6. 创建表，列族
hbase(main):001:0> help create （查看数据库帮组）
hbase(main):001:0> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'} 创建t1表，列祖为 f1,f2,f3
hbase(main):003:0> create 'people', {NAME => 'info',VERSIONS => 3},{NAME => 'data',VERSIONS => 1}
hbase(main):004:0> list 查看hbase中所有的表
hbase(main):005:0> scan 'people' 查看people 表中的数据
hbase(main):006:0> describe 'people 查看people 表中的数据结构
hbase(main):007:0> put 'people', 'rk0001' ,'info:name','feng' 插入数据
表 row key 列族：列

hbase(main):020:0> scan 'people' 查询结果全盘扫描
ROW COLUMN+CELL
rk0001 column=info:name, timestamp=1479361079185, value=feng

再次添加属性
hbase(main):021:0> put 'people' ,'rk0001','info:gender','man'

再此查询，显示一行
hbase(main):022:0> scan 'people'
ROW COLUMN+CELL
rk0001 column=info:gender, timestamp=1479361501486, value=man
rk0001 column=info:name, timestamp=1479361079185, value=feng
1 row(s) in 0.0200 seconds

hbase(main):026:0> put 'people','rk0001','info:size','34'
0 row(s) in 0.0150 seconds

hbase(main):028:0> scan 'people'
ROW COLUMN+CELL
rk0001 column=info:gender, timestamp=1479361501486, value=man
rk0001 column=info:name, timestamp=1479361079185, value=feng
rk0001 column=info:size, timestamp=1479361885383, value=34
1 row(s) in 0.0210 seconds

在data列族，建立 phone列
hbase(main):030:0> put 'people','rk0001','data:phone','123456789'
0 row(s) in 0.0090 seconds

hbase(main):031:0> scan 'people'
ROW COLUMN+CELL
rk0001 column=data:phone, timestamp=1479362039723, value=123456789
rk0001 column=info:gender, timestamp=1479361501486, value=man
rk0001 column=info:name, timestamp=1479361079185, value=feng
rk0001 column=info:size, timestamp=1479361885383, value=34
1 row(s) in 0.0190 seconds

建立新row key 列族

hbase(main):032:0> put 'people','rk0002','info:name','laomao'
0 row(s) in 0.0110 seconds

hbase(main):036:0> scan 'people'
ROW COLUMN+CELL
rk0001 column=data:phone, timestamp=1479362039723, value=123456789
rk0001 column=info:gender, timestamp=1479361501486, value=man
rk0001 column=info:name, timestamp=1479361079185, value=feng
rk0001 column=info:size, timestamp=1479361885383, value=34
rk0002 column=info:name, timestamp=1479362256298, value=laomao
2 row(s) in 0.0320 seconds

hbase(main):037:0> put 'people','rk0002','info:book','.国'
0 row(s) in 0.0110 seconds

hbase(main):038:0> scan 'people'
ROW COLUMN+CELL
rk0001 column=data:phone, timestamp=1479362039723, value=123456789
rk0001 column=info:gender, timestamp=1479361501486, value=man
rk0001 column=info:name, timestamp=1479361079185, value=feng
rk0001 column=info:size, timestamp=1479361885383, value=34
rk0002 column=info:book, timestamp=1479363113630, value=xE4xB8xADxE5x9BxBD
rk0002 column=info:name, timestamp=1479362256298, value=laomao
2 row(s) in 0.0150 seconds

############################################################

在 info:size 列插入数据
1. put 'people' ,'rk0001','info:size','33'
2. put 'people' ,'rk0001','info:size','32'

hbase(main):038:0> scan 'people'
ROW COLUMN+CELL
rk0001 column=data:phone, timestamp=1479362039723, value=123456789
rk0001 column=info:gender, timestamp=1479361501486, value=man
rk0001 column=info:name, timestamp=1479361079185, value=feng
rk0001 column=info:size, timestamp=1479361885383, value=32
rk0002 column=info:book, timestamp=1479363113630, value=xE4xB8xADxE5x9BxBD
rk0002 column=info:name, timestamp=1479362256298, value=laomao
2 row(s) in 0.0150 seconds

由于建立列族时，使用了version=3 ，所以保留了3个版本
scan 'people',{COLUMNS => 'info:size',VERSIONS => 3}

再次插入数据，只保留3个versions，内存中还没有删除
put 'people' ,'rk0001','info:size','31'
scan 'people', {RAW => true, VERSIONS => 10}

hbase 集群配置
192.168.20.190 hmaster1 HMaster
192.168.20.191 hmaster2 HMaster
192.168.20.194 hregionserver1 hregionserver
192.168.20.195 hregionserver2 hregionserver

[root@hbase1 conf]# vim hbase-env.sh
export JAVA_HOME=/data/jdk/
export HBASE_MANAGES_ZK=false

[root@hbase1 conf]# vim hbase-site.xml
<configuration>

<property>
<name>hbase.rootdir</name>
<value>hdfs://ns1/hbase/</value>
</property>


<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>


<property>
<name>hbase.zookeeper.quorum</name>
<value>zookeeper1:2181,zookeeper2:2181,zookeeper3:2181</value>
</property>
</configuration>

配置hbase（hmster）小弟
[root@hbase1 conf]# vim regionservers

hregionserver1
hregionserver2

拷贝hadoop的 core-site.xml 和 hdfs-site.xml 到 hbase的 conf目录
[root@hmaster1 conf]# cp /data/hadoop/etc/hadoop/core-site.xml /data/hbase/conf/
[root@hmaster1 conf]# cp /data/hadoop/etc/hadoop/hdfs-site.xml /data/hbase/conf/

从hmaster1 拷贝hbase 到其他节点
[root@hmaster1 data]# scp -r hbase 192.168.20.191:/data/
[root@hmaster1 data]# scp -r hbase 192.168.20.194:/data/
[root@hmaster1 data]# scp -r hbase 192.168.20.195:/data/

设置hbase权限，用户名为hadoop
[root@hmaster1 data]# chown -R hadoop.hadoop /data/hbase
[root@hmaster2 data]# chown -R hadoop.hadoop /data/hbase
[root@hregionserver1 data]# chown -R hadoop.hadoop /data/hbase
[root@hregionserver2 data]# chown -R hadoop.hadoop /data/hbase

在hmaster1上设置免登陆，用于启动hregionserver 进程
[hadoop@hmaster1 bin]$ ssh-keygen -t rsa
[hadoop@hmaster1 bin]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@hregionserver1
[hadoop@hmaster1 bin]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@hregionserver2

在hmaster2上设置免登陆，用于启动hregionserver 进程
[hadoop@hmaster2 bin]$ ssh-keygen -t rsa
[hadoop@hmaster2 bin]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@hregionserver1
[hadoop@hmaster2 bin]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@hregionserver2

在hmaster1上使用hadoop的用户启动 hbase 进程
[hadoop@hmaster1 bin]$ ./start-hbase.sh

[hadoop@hmaster1 bin]$ jps
7186 Jps
6696 HMaster

[root@hregionserver1 ~]# jps
4242 Jps
3740 HRegionServer

[root@hregionserver2 ~]# jps
4164 Jps
3686 HRegionServer

打开hmaster1 访问页面
http://192.168.20.190:60010

在hmaster2上使用hadoop的用户启动 hbase 进程
[hadoop@hmaster2 bin]$ ./hbase-daemon.sh start master

HBase 常用Shell命令

进入hbase shell console
$HBASE_HOME/bin/hbase shell
如果有kerberos认证，需要事先使用相应的keytab进行一下认证（使用kinit命令），认证成功之后再使用hbase shell进入可以使用whoami命令可查看当前用户
hbase(main)> whoami
表的管理
1）查看有哪些表
hbase(main)> list
2）创建表

# 语法：create <table>, {NAME => <family>, VERSIONS => <VERSIONS>}
# 例如：创建表t1，有两个family name：f1，f2，且版本数均为2
hbase(main)> create 't1',{NAME => 'f1', VERSIONS => 2},{NAME => 'f2', VERSIONS => 2}
3）删除表
分两步：首先disable，然后drop
例如：删除表t1

hbase(main)> disable 't1'
hbase(main)> drop 't1'
4）查看表的结构

# 语法：describe <table>
# 例如：查看表t1的结构
hbase(main)> describe 't1'
5）修改表结构
修改表结构必须先disable

# 语法：alter 't1', {NAME => 'f1'}, {NAME => 'f2', METHOD => 'delete'}
# 例如：修改表test1的cf的TTL为180天
hbase(main)> disable 'test1'
hbase(main)> alter 'test1',{NAME=>'body',TTL=>'15552000'},{NAME=>'meta', TTL=>'15552000'}
hbase(main)> enable 'test1'
权限管理
1）分配权限
# 语法 : grant <user> <permissions> <table> <column family> <column qualifier> 参数后面用逗号分隔
# 权限用五个字母表示： "RWXCA".
# READ('R'), WRITE('W'), EXEC('X'), CREATE('C'), ADMIN('A')
# 例如，给用户‘test'分配对表t1有读写的权限，
hbase(main)> grant 'test','RW','t1'
2）查看权限

# 语法：user_permission <table>
# 例如，查看表t1的权限列表
hbase(main)> user_permission 't1'
3）收回权限

# 与分配权限类似，语法：revoke <user> <table> <column family> <column qualifier>
# 例如，收回test用户在表t1上的权限
hbase(main)> revoke 'test','t1'
表数据的增删改查
1）添加数据
# 语法：put <table>,<rowkey>,<family:column>,<value>,<timestamp>
# 例如：给表t1的添加一行记录：rowkey是rowkey001，family name：f1，column name：col1，value：value01，timestamp：系统默认
hbase(main)> put 't1','rowkey001','f1:col1','value01'
用法比较单一。
2）查询数据
a）查询某行记录

# 语法：get <table>,<rowkey>,[<family:column>,....]
# 例如：查询表t1，rowkey001中的f1下的col1的值
hbase(main)> get 't1','rowkey001', 'f1:col1'
# 或者：
hbase(main)> get 't1','rowkey001', {COLUMN=>'f1:col1'}
# 查询表t1，rowke002中的f1下的所有列值
hbase(main)> get 't1','rowkey001'
b）扫描表

# 语法：scan <table>, {COLUMNS => [ <family:column>,.... ], LIMIT => num}
# 另外，还可以添加STARTROW、TIMERANGE和FITLER等高级功能
# 例如：扫描表t1的前5条数据
hbase(main)> scan 't1',{LIMIT=>5}
c）查询表中的数据行数

# 语法：count <table>, {INTERVAL => intervalNum, CACHE => cacheNum}
# INTERVAL设置多少行显示一次及对应的rowkey，默认1000；CACHE每次去取的缓存区大小，默认是10，调整该参数可提高查询速度
# 例如，查询表t1中的行数，每100条显示一次，缓存区为500
hbase(main)> count 't1', {INTERVAL => 100, CACHE => 500}
3）删除数据
a )删除行中的某个列值

# 语法：delete <table>, <rowkey>, <family:column> , <timestamp>,必须指定列名
# 例如：删除表t1，rowkey001中的f1:col1的数据
hbase(main)> delete 't1','rowkey001','f1:col1'
注：将删除改行f1:col1列所有版本的数据
b )删除行

# 语法：deleteall <table>, <rowkey>, <family:column> , <timestamp>，可以不指定列名，删除整行数据
# 例如：删除表t1，rowk001的数据
hbase(main)> deleteall 't1','rowkey001'
c）删除表中的所有数据

# 语法： truncate <table>
# 其具体过程是：disable table -> drop table -> create table
# 例如：删除表t1的所有数据
hbase(main)> truncate 't1'
Region管理
1）移动region
# 语法：move 'encodeRegionName', 'ServerName'
# encodeRegionName指的regioName后面的编码，ServerName指的是master-status的Region Servers列表
# 示例
hbase(main)>move '4343995a58be8e5bbc739af1e91cd72d', 'db-41.xxx.xxx.org,60020,1390274516739'
2）开启/关闭region

# 语法：balance_switch true|false
hbase(main)> balance_switch
3）手动split

# 语法：split 'regionName', 'splitKey'
4）手动触发major compaction

#语法：
#Compact all regions in a table:
#hbase> major_compact 't1'
#Compact an entire region:
#hbase> major_compact 'r1'
#Compact a single column family within a region:
#hbase> major_compact 'r1', 'c1'
#Compact a single column family within a table:
#hbase> major_compact 't1', 'c1'
配置管理及节点重启
1）修改hdfs配置
hdfs配置位置：/etc/hadoop/conf
# 同步hdfs配置
cat /home/hadoop/slaves|xargs -i -t scp /etc/hadoop/conf/hdfs-site.xml hadoop@{}:/etc/hadoop/conf/hdfs-site.xml
#关闭：
cat /home/hadoop/slaves|xargs -i -t ssh hadoop@{} "sudo /home/hadoop/cdh4/hadoop-2.0.0-cdh4.2.1/sbin/hadoop-daemon.sh --config /etc/hadoop/conf stop datanode"
#启动：
cat /home/hadoop/slaves|xargs -i -t ssh hadoop@{} "sudo /home/hadoop/cdh4/hadoop-2.0.0-cdh4.2.1/sbin/hadoop-daemon.sh --config /etc/hadoop/conf start datanode"
2）修改hbase配置
hbase配置位置：

# 同步hbase配置
cat /home/hadoop/hbase/conf/regionservers|xargs -i -t scp /home/hadoop/hbase/conf/hbase-site.xml hadoop@{}:/home/hadoop/hbase/conf/hbase-site.xml

# graceful重启
cd ~/hbase
bin/graceful_stop.sh --restart --reload --debug inspurXXX.xxx.xxx.org