hbase-hive整合及sqoop的安装配置使用

从hbase中拿数据，然后整合到hbase中

上hive官网 -- 点击wiki--> hive hbase integation（整合） --》注意整合的时候两个软件的版本要能进行整合按照官网的要求

在整合之前需要将hive 的jar进行导入： hive-hbase-handler-x.y.z.jar
单节点的启动命令
$HIVE_SRC/build/dist/bin/hive --auxpath
$HIVE_SRC/build/dist/lib/hive-hbase-handler-0.9.0.jar,
$HIVE_SRC/build/dist/lib/hbase-0.92.0.jar,
$HIVE_SRC/build/dist/lib/zookeeper-3.3.4.jar,$HIVE_SRC/build/dist/lib/guava-r09.jar
--hiveconf hbase.master=hbase.yoyodyne.com:60000
集群的启动命令 --至少三个集群
$HIVE_SRC/build/dist/bin/hive --auxpath $HIVE_SRC/build/dist/lib/hive-hbase-handler-0.9.0.jar,$HIVE_SRC/build/dist/lib/hbase-0.92.0.jar,$HIVE_SRC/build/dist/lib/zookeeper-3.3.4.jar,$HIVE_SRC/build/dist/lib/guava-r09.jar --hiveconf hbase.zookeeper.quorum=zk1.yoyodyne.com,zk2.yoyodyne.com,zk3.yoyodyne.com

--需要将hive和hbase的jar进行整合
a)先将hbase中的jar包拷贝到hive中
cp ./*.jar /root/apache-hive-1.2.1-bin/lib/
b)将hive的jar包拷贝到hbase 中
cp ./hive-hbase-hadler.jar
c)在hive客户端分别启动habse 和 hive 启动没有先后顺序
启动hbase   start-hbase.sh
启动hive   现在服务端启动hive服务   hive --service metastore
然后在客户端启动hive ./hive
在客户端启动hbase   hbase shell
将hbase的列映射到hive中
3、在hive中创建临时表

CREATE EXTERNAL TABLE tmp_order
(key string, id string, user_id string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,order:order_id,order:user_id")
TBLPROPERTIES ("hbase.table.name" = "t_order");

CREATE TABLE hbasetbl(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz", "hbase.mapred.output.outputtable" = "xyz");

在hive中插入数据，数据最终会被插入到hbase中
要创建hive外部表，需要先在hbase中创建这个表，然后在hive中创建这个表

sqoop相关知识总结
sqoop由client端直接接入hadoop，任务通过解析生成对应的mapreduce执行
安装步骤：
1、解压
2、配置环境变量
export SQOOP_HOME= /XX/sqoop.xx
3、添加数据库驱动包
cp mysql-connector-java-5.1.10.jar/sqoop-install-path/lib
4、重命名配置文件
mv sqoop-env-template.sh sqoop-env.sh
sqoop list-databases --connect jdbc:mysql://node1:3306 -username root --password 123

sqoop import --connect jdbc:mysql://node1:3306/test --username root --password 123 --columns start_ip,end_ip,country --delete-target-dir -m 1 -table test2 --target-dir /sqoop/

也可以将sqoop命令放到一个文件中再进行执行
将sqoop放置到文件中的格式
import
--connect
jdbc:mysql://node1:3306/test
--username
root
--password
123
--columns
start_ip,end_ip,country
--delete-target-dir
-m
1
-table
test2
--target-dir
/sqoop/

执行文件的命令： sqoop --options-file option

2、在文件中加入具体的sql执行语句
import
--connect
jdbc:mysql://node1:3306/test
--username
root
--password
123
--delete-target-dir
-m
1
--target-dir
/sqoop/
-e select start_ip,end_ip from test2 where $CONDITIONS

执行文件的命令：sqoop --options-file option1
注： sql语句后面必须包含条件 $CONDITIONS

在hive中查询执行的结果：

dfs-cat /sqoop/*

在hive文件条件查询中添加配置条件
import
--connect
jdbc:mysql://node1:3306/test
--username
root
--password
123
--columns
start_ip,end_ip,country
--delete-target-dir
-m
1
-table
test2
--target-dir
/sqoop/
--where
"country = 'US'"

---将mysql的数据直接导入到hive表中
import
--connect
jdbc:mysql://node1:3306/test
--username
root
--password
123
--columns
start_ip,end_ip,country
--delete-target-dir
-m
1
-table
test2
--hive-import
--create-hive-table
--hive-table
sqoophive

注意命令文件的书写格式

---将hive的数据导出到mysql
export
--connect
jdbc:mysql://node1:3306/test
--username
root
--password
123
--columns
start_ip,end_ip,country
-m
1
--table
h_sql
--export-dir
/sqoop/

注：要将hive的数据导入到mysql中，hive文件中存储的数据必须是具有mysql能够接受的数据格式，才能够将数据全部的导入。

from (

     select
    pl, from_unixtime
    (cast(s_time/1000 as bigint),'yyyy-MM-dd') as day, u_ud,

      (case when count(p_url) = 1 then "pv1"

     when count(p_url) = 2 then "pv2"

      when count(p_url) = 3 then "pv3"

        when count(p_url) = 4 then "pv4"

       when count(p_url) >= 5 and count(p_url) <10 then "pv5_10"

       when count(p_url) >= 10 and count(p_url) <30 then "pv10_30"

       when count(p_url) >=30 and count(p_url) <60 then "pv30_60"

       else 'pv60_plus' end) as pv

       from event_logs

      where
    en='e_pv'

      and p_url is not null

      and pl is not null

      and s_time >= unix_timestamp('2019-03-15','yyyy-MM-dd')*1000

      and s_time < unix_timestamp('2019-03-15,'yyyy-MM-dd')*1000

       group by
    pl, from_unixtime(cast(s_time/1000 as bigint),'yyyy-MM-dd'), u_ud

      ) as tmp

      insert overwrite table stats_view_depth_tmp

      select pl,day,pv,count(distinct u_ud) as ct where u_ud is not null group by pl,day,pv;