HIVE中的分区表是什么,我们先看操作,然后再来体会。
创建一个分区表,分区的单位时dt和国家名 hive> create table logs(ts bigint,line string) > partitioned by (dt String,country string);
接下来我们创建要给分区
hive> load data local inpath '/root/hive/partitions/file1' into table logs > partition (dt='2001-01-01',country='GB');
上面语句的效果是在hdfs系统上建立了一个层级目录
-logs
-dt=2001-01-01
-country=GB
我们继续执行下面语句,先看一下什么效果 hive> load data local inpath '/root/hive/partitions/file2' into table logs > partition (dt='2001-01-01',country='GB'); Loading data to table default.logs partition (dt=2001-01-01, country=GB) OK Time taken: 1.379 seconds hive> load data local inpath '/root/hive/partitions/file3' into table logs > partition (dt='2001-01-01',country='US'); Loading data to table default.logs partition (dt=2001-01-01, country=US) OK Time taken: 1.307 seconds hive> load data local inpath '/root/hive/partitions/file4' into table logs > partition (dt='2001-01-02',country='GB'); Loading data to table default.logs partition (dt=2001-01-02, country=GB) OK Time taken: 1.253 seconds hive> load data local inpath '/root/hive/partitions/file5' into table logs > partition (dt='2001-01-02',country='US'); Loading data to table default.logs partition (dt=2001-01-02, country=US) OK Time taken: 1.07 seconds hive> load data local inpath '/root/hive/partitions/file6' into table logs > partition (dt='2001-01-02',country='US'); Loading data to table default.logs partition (dt=2001-01-02, country=US) OK Time taken: 1.227 seconds
我们到HDFS上查看,发现建立了下面层级目录
/user/hive/warehouse/logs
├── dt=2001-01-01/
│ ├── country=GB/
│ │ ├── file1
│ │ └── file2
│ └── country=US/
│ └── file3
└── dt=2001-01-02/
├── country=GB/
│ └── file4
└── country=US/
├── file5
└── file6
是加上所有files的内容基本上一样,蓝色的^A是系统默认分隔符。八进制是‘