HBase使用

Hbase
         高表(tall table)比宽表(tall table)的性能更高(50%以上)
概念:
**cell** 通过row和columns确定的为一个存贮单元称为cell
**timestamp ** 每个cell都保存着同一份数据的多个版本。版本通过时间戳来索引。时间戳的类型是 64位整型。
**Family** 列族在创建之前需要定义好,cloumn可以动态插入
**rowKey** Rowkey排序是按照ASCII码表进行排序
 
# 建表
1 create 'table',{NAME=>'dataset',DATA_BLOCK_ENCODING=>'PREFIX'} # 指定表名/列簇/压缩方式
2 create 'table',{NAME=>'1'},{NAME=>'2'}
3 alter 'table','family' # 添加列簇
# 删除
1 disable 'table' # 删除表
2 drop 'table'     
3  
4 alter 'table',{NAME=>'1',METHOD=>'delete'} # 删除列簇
5 delete 'tebale','row','family:coloumn' # 删除列delete <table>,<rowkey>,<family:column>
6 deleteall 'table','row' # 删除行deleteall <table>,<rowkey>,<family:column>
7 eg:
8 deleteall 'annotation_task','oilT2My9Asrsi85CV0M.6.xj8upd8kbypm7vIQsoE'
9 deleteall 'annotation_task',"oilT2My9Asrsi85CV0M.x5Cx00x5Cx00x5Cx00x5Cx06.xj8upd8kbypm7vIQsoE" (双引号)
# 增加
1 put <table>,<rowkey>,<family:column>,<value>,<timestamp>
2 put 'table','sfsfsf','id:lisi','1993' # column 可以临时创建,通过:指定
# 查询
1 count 'table',{INTERVAL => 100, CACHE => 500} #表中有多少行,每100条显示一次,缓存区为500
2 get 'table','row','family:column'
3  
4 scan 'table',{COLUM=>'info'} # 扫描info这个列簇
5 scan 'table',{COLUMNS=>'info:birthday'} # 扫描指定列
6 scan 'table', {STARTROW => 'Sariel', LIMIT=>1, VERSIONS=>1}
#除了列(COLUMNS)修饰词外,HBase还支持Limit(限制查询结果行数),STARTROW(ROWKEY起始行。会先根据这个key定位到region,再向后扫描)、STOPROW(结束行)、TIMERANGE(限定时间戳范围)、VERSIONS(版本数)、和FILTER(按条件过滤行)等。比如我们从Sariel这个rowkey开始,找下一个行的最新版本
1 scan 'table', { STARTROW => 'rowKey', LIMIT=>1, VERSIONS=>1}
# Filter是一个非常强大的修饰词,可以设定一系列条件来进行过滤。比如我们要限制某个列的值等于26
1 scan 'table', FILTER=>"ValueFilter(=,'binary:26’)"
2 scan 'member', FILTER=>"ValueFilter(=,'substring:6')" # 值包含6这个值
3 scan 'member', FILTER=>"ColumnPrefixFilter('birth') # 列名中的前缀为birth
4 scan 'table',FILTER=>"PrefixFilter('rowPrefix')" # 过滤扫描rowkey
5 scan 'member', FILTER=>"ColumnPrefixFilter('birth') AND ValueFilter ValueFilter(=,'substring:1988')" # 多重条件过滤
6 scan 'hbase:meta',FILTER=>"PrefixFilter('table')" # 获取指定table的region信息
# 其他     
1 exists 'table' # 判断表名是否存在
2 disable 'table' # 修改表结构,先disable,再enable
3 alter 'table',{NAME=>'1',TTL=>'18888'} 
4 ebale 'table'    
# 创建lemon表
 1 create 'sample_set_lemon',
 2 {NAME => 's', DATA_BLOCK_ENCODING => 'FAST_DIFF'}, 
 3 {NAME => 'l', DATA_BLOCK_ENCODING => 'FAST_DIFF'}, 
 4 METADATA => { 
 5     'lemon.autoindex.enabled' => 'true', 
 6     'lemon.index.enabled' => 'true', 
 7     'lemon.index.regions' => '1',
 8     'lemon.update.enabled' =>'true', 
 9     'lemon.index.meta' => '{"indexes":[
10         {"nameType":"E","family":"s","column":"sample_id","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueExtractor"},
11         {"nameType":"E","family":"s","column":"sample_name","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueMd5Extractor"},
12         {"nameType":"E","family":"s","column":"sample_dir","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueMd5Extractor"},
13         {"nameType":"E","family":"s","column":"sample_size","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleSizeExtractor"},
14         {"nameType":"E","family":"s","column":"sample_time","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleDateExtractor"},
15         {"nameType":"E","family":"s","column":"sample_status","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueExtractor"},
16         {"nameType":"E","family":"s","column":"annotation_status","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueExtractor"},
17         {"nameType":"E","family":"s","column":"annotated_by","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueMappingExtractor"},
18         {"nameType":"E","family":"s","column":"reviewer","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueMappingExtractor"},
19         {"nameType":"E","family":"s","column":"review_score","termExtractor":"com.huaweicloud.gaia.annotation.lemon.DatasetQualifierValueExtractor"},
20         {"nameType":"E","family":"s","column":"create_time","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleDateExtractor"},
21         {"nameType":"E","family":"s","column":"update_time","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleDateExtractor"},
22         {"nameType":"E","family":"s","column":"metadata","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleMetadataExtractor"},
23         {"nameType":"F","family":"l","termExtractor":"com.huaweicloud.gaia.annotation.lemon.SampleLabelsExtractor"}
24     ]}'
25 }
 
java API操作HBase表
 
Hbase连接的正确姿势:

一个应用(进程)对应着一个connection,每个应用里的线程通过调用coonection的getTable方法从connection维护的线程池里获得table实例,按官方的说法,这种方式获得的table是线程安全的。每次table读写之后应该把table close掉,整个进程结束的时候才把connection close掉。当面对多线程访问需求时,为了避免较大的系统资源开销,需要预先建立HConnection。Connection是线程安全的,而Table和Admin则不是线程安全的,因此正确的做法是一个进程共用一个Connection对象,而在不同的线程中使用单独的Table和Admin对象。

详见:https://www.jianshu.com/p/fd0cddb43222

 
原文地址:https://www.cnblogs.com/luckyboylch/p/12327298.html