Hive QL的操作

一、数据定义DDL操作

创建表：

--create table为创建一个指定名字的表
create（external） table table_name
--external关键字可以让用户创建一个外部表

创建表的demo

create table page_view
(
    viewTime INT,
    userid BIGINT,
    page_url  STRING,
    referrer_url STRING,
    ip STRING COMMENT 'IP ADDRESS of the User'   
)
COMMENT 'This is the page view table'--注释表的用途
PARTITIONED BY(dt STRING,country STRING)--建立表分区
STORED AS SEQUENCEFILE;

2、删除表

--用于删除表的元数据和数据
drop table table_name

3、修改表，分区

--改变一个已经存在的表结构，比如增加列或者分区。
alter table table_name

4、创建、删除视图

--视图是只读的，不能用于插入，修改等。
create view view_name as select ...

create view onion_referrences
as
select distinct referrer_url
from page_view
where page_url='http://www.theonion.com';

--删除指定视图的元数据。
drop view view_name

5、创建、删除函数

create temporary function function_name as class_name

drop temporary function function_name

6、展示描述语句

--显示表
show tables page_view
--显示分区
show partitions table_name
--显示函数
show functions "a.*"--".*"用来显示所有函数

二、数据操作DML

主要有数据表加载文件，查询结果

1、向数据表中加载文件

当数据被夹在到表中，Hive并不会对数据进行任何转换，Load操作只是将数据复制，移动到Hive表对应的位置上

load data inpath 'filepath'--filepath可以是相对路径，也可以是绝对路径，也可以是完整的URI
 into table table_name--加载的目标可以是表table，分区partition。

2、插入操作

将查询结果通过insert插入表中

insert overwrite table table_name
    select _statement1 from from_statement

3、查询操作

select语法来进行

select [all|distinct] select_expr1,select_expr2,...
from table
[where condition]
[group by col_list]

select * from sales where amout>10 and region="US"

all/distinct：可以定义重复的行是否要返回，没有定义就是默认为all，不删除重复的记录

limit：控制输出的记录数，有点类似mysql的分页关键字。

Hive不支持Having语句，只能通过子查询来实现

--SQL语句
select col1 from table1 group by col1 having sum (col2)>10
--转化为Hive语句
select col1 from (select col1,sum(col2) as col2sum from table1 group by col1) table2 where table2.col2sum>10