hive入门

连接登录

!connect jdbc:hive2://localhost:10000
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000: hadoop
Enter password for jdbc:hive2://localhost:10000:

创建表

hive与mysql的之一是在建表时要制定数据格式

create table t_sz01(id int, name string) row format delimited fields terminated by ',';

导入数据

[hadoop@mini2 study]$ hadoop fs -put sz4.dat /user/hive/warehouse/myhive.db/t_sz02

1,oo
2,pp
3,ll
4,i9i
5,kkj
6,ujn
7,aa
8,zx
9,sdfa
10,4sad
11,3d3
12,sadf
13,gdh
14,asdf4
15,asdfsadf
16,asdf
17,asddd

然后执行查询 select * from t_sz02;

创建表的语句：
Create [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC],...)]INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]

CREATE TABLE 创建一个指定名字的表。如果相同名字的表已经存在，则抛出异常；用户可以用 IF NOT EXIST 选项来忽略这个异常。
EXTERNAL 关键字可以让用户创建一个外部表，在建表的同时指定一个指向实际数据的路径（LOCATION），Hive创建内部表时，会将数据移动到数据仓库指向的路径；若创建外部表，仅记录数据所在的路径，不对数据的位置做任何改变。在删除表的时候，内部表的元数据和数据会被一起删除，而外部表只删除元数据，不删除数据。
如果文件数据是纯文本，可以使用 STORED AS TEXTFILE。如果数据需要压缩，使用 STORED AS SEQUENCE 。
有分区的表可以在创建的时候使用 PARTITIONED BY 语句。一个表可以拥有一个或者多个分区，每一个分区单独存在一个目录下。而且，表和分区都可以对某个列进行 CLUSTERED BY 操作，将若干个列放入一个桶（bucket）中。也可以利用SORT BY 对数据进行排序。这样可以为特定应用提高性能

创建一个普通表

create table if not exists mytable(id int,name string) row format delimited fields terminated by '005' stored as textfile;

外部表（导入数据的方法相同）

create external table if not exists myexternaltable(id int,name string) row format delimited fields terminated by ',' location 'hdfs://mini2:9000/user/myhive/warehouse/myexternaltable';

desc extended myexternaltable; 查看更详细的表信息

desc formatted myexternaltable; 格式化的详细信息

装载数据的方法

0: jdbc:hive2://localhost:10000> load data local inpath '/home/hadoop/study/sz4.dat' overwrite into table myexternaltable;（overwrite是覆盖数据，如果不是覆盖就不要）

在hive中查看hdfs

0: jdbc:hive2://localhost:10000> dfs -ls /user/hive/warehouse/myhive.db/;

分区表

0: jdbc:hive2://localhost:10000> create table parttable(id int ,name string) partitioned by (country string)
0: jdbc:hive2://localhost:10000> row format delimited fields terminated by ',';

加载数据时要指定向那个分区中加载数据

load data local inpath '/home/hadoop/study/sz4.dat' into table parttable partition(country='US');

查询 select * From parttable where country='US';

查询出来的country是伪列

没有插入数据时可以修改表添加分区

alter table t_name add [if not exists] partition_spec [location 'localtion1']

partion_spc [location 'location2'] ..

partition_spc: (partition_col = partttion_val,partition_col = partttion_val,)

ALTER TABLE tname drop partition_spc,partition_spc,..

具体实例

alter table t1 add partition(part='a') partition(part='b');

分区与分桶的区别

http://www.cnblogs.com/xiohao/p/6429305.html

描述表

http://blog.csdn.net/lskyne/article/details/38427895

查看表的分区，可以在页面去看

show partiotns parttable;

date_sub函数,脚本中只要日期参数格式正确,就可以解析

hive> select date_sub('2017-07-01',11) from dual;
OK
2017-06-20

hive中查看hdfs上文件超快

hive> dfs -ls /test;
Found 4 items
drwxr-xr-x   - root supergroup          0 2017-09-01 12:22 /test/outpt
drwxr-xr-x   - root supergroup          0 2017-09-01 13:22 /test/outpt1
drwxr-xr-x   - root supergroup          0 2017-09-01 13:33 /test/outpt2
drwxr-xr-x   - root supergroup          0 2017-09-03 09:09 /test/outptx