day05

今天学

习过程

和小结

一、 Hive基础

Hive是数据仓库工具，可以将结构化的数据文件映射为一张表，并提供SQL语句查询。本质是将HQL转化为MapReduce程序。

1) Hive处理的数据存储在hdfs上

2) 底层数据分析的试下是MapReduce

3) 在Yarn上运行程序

数据仓库的内容是读多写少，因此，hive中不建议对数据修改

二、 Hive安装

1) 下载、解压apache-hive-1.2.1-bin.tar.gz到/opt/module/目录下面

2) 修改文件名称为hive

3) 修改cong目录下的hive-env.sh.template名称为hive-env.sh

4) 配置hive-env.sh文件，Hadoop路径、hive路径

三、 Hive基本操作

1) 启动：hive；

2) 查看数据库：show databases；

3) 打开默认数据库：use default；

4) 显示所有的表：show tables；

5) 创建表：create table tablename（id int，name string）row Formate delimited fields terminated by “,”，在创建表示需要说明文件分割符

6) 查询数据：select * from tablename；

7) 退出：quit；

四、本地文件导入hive案例

1) 创建本地文件student.txt，要注意文件之间的间隔符

2) Load data local inpath “文件路径” into table 表名；这里要注意local 表示为本地文件

五、安装MySQL数据库

六、 Hive元数据配置到MySQL

1) 下载、解压mysql-connector-java-5.1.27.tar.gz

2) 复制mysql-connector-java-5.1.27-bin.jar到hive/lib/下

3) 配置metastore到MySQL，修改hive-site.xml文件

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true</value>

<description>JDBC connect string for a JDBC metastore</description>

</property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

<description>Driver class name for a JDBC metastore</description>

</property>

<name>javax.jdo.option.ConnectionUserName</name>

<description>username to use against metastore database</description>

</property>

<name>javax.jdo.option.ConnectionPassword</name>

<description>password to use against metastore database</description>

</property>

</configuration>

七、 HiveJDBC访问

1) 启动hiveserver2服务：bin/hiveserver2

2) 启动beebine：beebine

3) 连接hiveserver2

beeline> !connect jdbc:hive2://hadoop102:10000（回车）

Connecting to jdbc:hive2://hadoop102:10000

Enter username for jdbc:hive2://hadoop102:10000: atguigu（回车）

Enter password for jdbc:hive2://hadoop102:10000: （直接回车）

Connected to: Apache Hive (version 1.2.1)

Driver: Hive JDBC (version 1.2.1)

Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://hadoop102:10000> show databases;

+----------------+--+

| database_name |

+----------------+--+

| default |

八、外部表

1) 建表语句

创建部门表

create external table if not exists default.dept(

deptno int,

dname string,

loc int

)

row format delimited fields terminated by ' ';

2) 创建员工表

create external table if not exists default.emp(

empno int,

ename string,

job string,

mgr int,

hiredate string,

sal double,

comm double,

deptno int)

row format delimited fields terminated by ' ';

3) 查看创建的表

hive (default)> show tables;

tab_name

dept

emp

4) 向外部表中导入数据

导入数据

hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table default.dept;

hive (default)> load data local inpath '/opt/module/datas/emp.txt' into table default.emp;

5) 查询结果

hive (default)> select * from emp;

hive (default)> select * from dept;

6) 查看表格式化数据

hive (default)> desc formatted dept;

Table Type: EXTERNAL_TABLE

九、分区表

分区表实际上就是对应一个HDFS文件系统上的独立的文件夹，该文件夹下是该分区所有的数据文件。Hive中的分区就是分目录，把一个大的数据集根据业务需要分割成小的数据集。在查询时通过WHERE子句中的表达式选择查询所需要的指定的分区，这样的查询效率会提高很多。

引入分区表（需要根据日期对日志进行管理）

/user/hive/warehouse/log_partition/20170702/20170702.log

/user/hive/warehouse/log_partition/20170703/20170703.log

/user/hive/warehouse/log_partition/20170704/20170704.log

创建分区表语法

hive (default)> create table dept_partition(

deptno int, dname string, loc string

)

partitioned by (month string)

row format delimited fields terminated by ' ';

加载数据到分区表中

hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201709');

hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201708');

hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table default.dept_partition partition(month='201707’);

重命名：

语法

ALTER TABLE table_name RENAME TO new_table_name

实操案例

hive (default)> alter table dept_partition2 rename to dept_partition3;

遇到问

题汇总

一、使用hive是遇到hadoop is not allowed to impersonate hadoop (state=08S01,code=0)的问题，是因为hive2以后权限问题

解决办法：在Hadoop core-site.xml中添加一下内容

<name>hadoop.proxyuser.hadoop.hosts</name>

</property>

<name>hadoop.proxyuser.hadoop.groups</name>

</property>

然后重新启动Hadoop即可。

二、在hive创建表后导入本地数据为空

解决办法：hive在创建表时，需要指定文件切割符，（ create table tablename（id int，name string）row Formate delimited fields terminated by “,”）并且在本地文件中，数据的格式要按照切割符分离，才能将本地数据导入到hive不为空（ Load data local inpath “文件路径” into table 表名）。