一、importtsv
把hdfs中数据抽取到HBase表中;
1、准备数据
##student.tsv [root@hadoop-senior datas]# cat student.tsv 10001 zhangsan 35 male beijing 0109876543 10002 lisi 32 male shanghia 0109876563 10003 zhaoliu 35 female hangzhou 01098346543 10004 qianqi 35 male shenzhen 01098732543 ## [root@hadoop-senior hadoop-2.5.0]# bin/hdfs dfs -mkdir -p /user/root/hbase/importtsv [root@hadoop-senior hadoop-2.5.0]# bin/hdfs dfs -put /opt/datas/student.tsv /user/root/hbase/importtsv ##创建HBase表 hbase(main):005:0> create 'student', 'info' 0 row(s) in 0.1530 seconds => Hbase::Table - student
2、执行
##执行,下列命令全部执行 export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2 export HADOOP_HOME=/opt/modules/hadoop-2.5.0 HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`:${HBASE_HOME}/conf ${HADOOP_HOME}/bin/yarn jar ${HBASE_HOME}/lib/hbase-server-0.98.6-hadoop2.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY, info:name,info:age,info:sex,info:address,info:phone student hdfs://hadoop-senior.ibeifeng.com:8020/user/root/hbase/importtsv ##查看结果 hbase(main):006:0> scan 'student' ROW COLUMN+CELL 10001 column=info:address, timestamp=1558594471571, value=beijing 10001 column=info:age, timestamp=1558594471571, value=35 10001 column=info:name, timestamp=1558594471571, value=zhangsan 10001 column=info:phone, timestamp=1558594471571, value=0109876543 10001 column=info:sex, timestamp=1558594471571, value=male 10002 column=info:address, timestamp=1558594471571, value=shanghia 10002 column=info:age, timestamp=1558594471571, value=32 10002 column=info:name, timestamp=1558594471571, value=lisi 10002 column=info:phone, timestamp=1558594471571, value=0109876563 10002 column=info:sex, timestamp=1558594471571, value=male 10003 column=info:address, timestamp=1558594471571, value=hangzhou 10003 column=info:age, timestamp=1558594471571, value=35 10003 column=info:name, timestamp=1558594471571, value=zhaoliu 10003 column=info:phone, timestamp=1558594471571, value=01098346543 10003 column=info:sex, timestamp=1558594471571, value=female 10004 column=info:address, timestamp=1558594471571, value=shenzhen 10004 column=info:age, timestamp=1558594471571, value=35 10004 column=info:name, timestamp=1558594471571, value=qianqi 10004 column=info:phone, timestamp=1558594471571, value=01098732543 10004 column=info:sex, timestamp=1558594471571, value=male
二、bulk load
1、bulk load
HBase支持bulk load的入库方式,它是利用hbase的数据信息按照特定格式存储在hdfs内这一原理,直接在HDFS中生成持久化的HFile数据格式文件, 然后上传至合适位置,即完成巨量数据快速入库的办法。配合mapreduce完成,高效便捷,而且不占用region资源,增添负载,在大数据量写入时能 极大的提高写入效率,并降低对HBase节点的写入压力。 通过使用先生成HFile,然后再BulkLoad到Hbase的方式来替代之前直接调用HTableOutputFormat的方法有如下的好处: (1)消除了对HBase集群的插入压力 (2)提高了Job的运行速度,降低了Job的执行时间
2、生成HFile
##建表 hbase(main):007:0> create 'student2', 'info' 0 row(s) in 0.1320 seconds => Hbase::Table - student2 ##生成Hfile export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2 export HADOOP_HOME=/opt/modules/hadoop-2.5.0 HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`:${HBASE_HOME}/conf ${HADOOP_HOME}/bin/yarn jar ${HBASE_HOME}/lib/hbase-server-0.98.6-hadoop2.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY, info:name,info:age,info:sex,info:address,info:phone -Dimporttsv.bulk.output=hdfs://hadoop-senior.ibeifeng.com:8020/user/root/hbase/hfileoutput student2 hdfs://hadoop-senior.ibeifeng.com:8020/user/root/hbase/importtsv ##查看 [root@hadoop-senior hadoop-2.5.0]# bin/hdfs dfs -ls /user/root/hbase/hfileoutput/info Found 1 items -rw-r--r-- 1 root supergroup 1888 2019-05-24 13:31 /user/root/hbase/hfileoutput/info/8c28c6c654bc4fe2aa2c32ef54480771
2、将数据导入进表student2
##导数据 export HBASE_HOME=/opt/modules/hbase-0.98.6-hadoop2 export HADOOP_HOME=/opt/modules/hadoop-2.5.0 HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase mapredcp`:${HBASE_HOME}/conf ${HADOOP_HOME}/bin/yarn jar ${HBASE_HOME}/lib/hbase-server-0.98.6-hadoop2.jar completebulkload hdfs://hadoop-senior.ibeifeng.com:8020/user/root/hbase/hfileoutput student2 ##scan student2 hbase(main):008:0> scan 'student2' ROW COLUMN+CELL 10001 column=info:address, timestamp=1558675878109, value=beijing 10001 column=info:age, timestamp=1558675878109, value=35 10001 column=info:name, timestamp=1558675878109, value=zhangsan 10001 column=info:phone, timestamp=1558675878109, value=0109876543 10001 column=info:sex, timestamp=1558675878109, value=male 10002 column=info:address, timestamp=1558675878109, value=shanghia 10002 column=info:age, timestamp=1558675878109, value=32 10002 column=info:name, timestamp=1558675878109, value=lisi 10002 column=info:phone, timestamp=1558675878109, value=0109876563 10002 column=info:sex, timestamp=1558675878109, value=male 10003 column=info:address, timestamp=1558675878109, value=hangzhou 10003 column=info:age, timestamp=1558675878109, value=35 10003 column=info:name, timestamp=1558675878109, value=zhaoliu 10003 column=info:phone, timestamp=1558675878109, value=01098346543 10003 column=info:sex, timestamp=1558675878109, value=female 10004 column=info:address, timestamp=1558675878109, value=shenzhen 10004 column=info:age, timestamp=1558675878109, value=35 10004 column=info:name, timestamp=1558675878109, value=qianqi 10004 column=info:phone, timestamp=1558675878109, value=01098732543 10004 column=info:sex, timestamp=1558675878109, value=male 4 row(s) in 0.0420 seconds
3、在MapReduce中生成HFile文件