Hive笔记--sql语法详解及JavaAPI

Hive SQL 语法详解：http://blog.csdn.net/hguisu/article/details/7256833
Hive SQL 学习笔记（常用）：http://blog.sina.com.cn/s/blog_66474b16010182yu.html
Hive中的分区：http://blog.csdn.net/jiedushi/article/details/6660185

Hive基础知识：http://www.csdn.net/article/2014-01-07/2818052-about-hive

HiveJavaAPI：http://787141854-qq-com.iteye.com/blog/2068303
hive的group by速度慢，因为需要用到hadoop的map-reduce。这个可以在spark中实现

启动：hive --service hiveserver2

常用：

建表：CREATE TABLE pokes (foo INT, bar STRING);

建分区表：分区有data和pos， ip的描述：'IP Address of the User'，用COMMENT来定义
字段之间用' '分割，行之间是断行
如果文件数据是纯文本，可以使用 STORED AS TEXTFILE。如果数据需要压缩，使用 STORED AS SEQUENCE

CREATE TABLE par_table(viewTime INT, userid BIGINT,

page_url STRING, referrer_url STRING,

ip STRING COMMENT 'IP Address of the User')

COMMENT 'This is the page view table'

PARTITIONED BY(date STRING, pos STRING)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ' '

lines terminated by ' '

STORED AS SEQUENCEFILE;

对分区的操作

(1). 如何定义分区，创建分区

创建分区表：

hive> create table test(name string,sex int) partitioned by (birth string, age string);

添加3个分区：

hive> alter table test add partition (birth='1980', age ='30');

hive> alter table test add partition (birth='1981', age ='29');

hive> alter table test add partition (birth='1982', age ='28');

hive> show partitions test;

birth=1980/age =30

birth=1981/age =29

birth=1982/age =28

(2)如何删除分区
hive> alter table test drop partition (birth='1980',age='30');

(3)加载数据到指定分区 load data local inpath '/home/hadoop/data.log' overwrite into table test partition(birth='1980-01-01',age='30');

创建分区原则：最少粒度原则

(4)向partition_test的分区中插入数据：
hive> insert overwrite table partition_test partition(stat_date='20110728',province='henan') select member_id,name from partition_test_input where stat_date='20110728' and province='henan';

(5)选择某一个分区的所有数据

select * from test where (birth = '1982')