1 基本概念

1.1 什么是分区

Hive查询中一般会扫描整个表内容，会消耗很多时间。有时候只需要查询表中的一部分数据，因此建表时引入了partition（分区）概念。

表中的一个 Partition 对应于表下的一个目录，所有的 Partition 的数据都存储在对应的目录中，因此，使用分区，很容易对数据进行部分查询。

注意：

创建分区必须在建表前。

a）单分区建表语句：create table day_table (id int, content string) partitioned by (dt string);

单分区表，按天分区，在表结构中存在id，content，dt三列。以dt为文件夹区分

b）双分区建表语句：create table day_hour_table (id int, content string) partitioned by (dt string, hour string);

双分区表，按天和小时分区，在表结构中新增加了dt和hour两列。先以dt为文件夹，再以hour子文件夹区分

如果是分区外部表一定要对外部表执行ALTER TABLE table_name ADD PARTITION。否则是根本访问不到数据的

假定有hive中有empl_ext(外部表)，有分区（logdate string）；

alter table empl_ext add partition (logdate=‘2015-02-26’) location ‘hdfs://nameservice1/vod_pb/’;

load data inpath 'hdfs://nameservice1/vod_pb/' overwrite into table empl_ext   partition(logdate='2015-02-26');

当数据被加载至表中时，不会对数据进行任何转换。Load操作只是将数据复制至Hive表对应的位置。数据加载时在表下自动创建一个目录，文件存放在该分区下