hive创建桶表

创建桶表

hive> create table house_1(id int,name string,age int) clustered by (id) into 3 buckets row format delimited fields terminated by ',';

查看表详细信息

hive> desc formatted house_1;
OK
# col_name              data_type               comment

id                      int
name                    string
age                     int

# Detailed Table Information
Database:               cr
Owner:                  xiaoqiu
CreateTime:             Mon Aug 13 10:59:08 EDT 2018
LastAccessTime:         UNKNOWN
Retention:              0
Location:               hdfs://mycluster/user/hive/warehouse/cr.db/house_1
Table Type:             MANAGED_TABLE
Table Parameters:
        COLUMN_STATS_ACCURATE   {"BASIC_STATS":"true"}
        numFiles                0
        numRows                 0
        rawDataSize             0
        totalSize               0
        transient_lastDdlTime   1534172348

# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:             No
Num Buckets:            3  #桶的个数
Bucket Columns:         [id]  #根据id划分桶
Sort Columns:           []
Storage Desc Params:
        field.delim             ,
        serialization.format    ,
Time taken: 2.636 seconds, Fetched: 33 row(s)

加载数据到桶表，采用insert into ....select ...

hive> load data local inpath '/home/xiaoqiu/teacher.txt' into table house_1;
FAILED: SemanticException Please load into an intermediate table and use 'insert... select' to allow Hive to enforce bucketing. Load into bucketed tables are disabled for safety reasons. If you know what you are doing, please sethive.strict.checks.bucketing to false and that hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get errors or incorrect results if you make a mistake while using some of the unsafe features.

hive> insert into house_1 select * from house;

hive> select * from house;
OK
2       'bom1'  24
3       'rom1'  25
4       'som1'  26
5       'tom10' 23
6       'tom9'  20
7       'tom8'  11
8       'tom7'  12
9       'tom6'  13
10      'tom5'  33
12      'tom4'  45
22      'tom3'  20
32      'tom2'  23
NULL    NULL    NULL
hive> select * from house_1; #根据id进行分桶
OK
3       'rom1'  25
12      'tom4'  45
9       'tom6'  13
NULL    NULL    NULL
6       'tom9'  20
22      'tom3'  20
4       'som1'  26
10      'tom5'  33
7       'tom8'  11
8       'tom7'  12
32      'tom2'  23
5       'tom10' 23
2       'bom1'  24

查看HDFS目录

[xiaoqiu@s150 /home/xiaoqiu]$ hadoop fs -lsr /
drwxr-xr-x   - xiaoqiu   supergroup          0 2018-08-13 11:45 /user/hive/warehouse/cr.db              /house_1
-rwxr-xr-x   3 xiaoqiu   supergroup         58 2018-08-13 11:43 /user/hive/warehouse/cr.db              /house_1/000000_0 #桶表1
-rwxr-xr-x   3 xiaoqiu   supergroup         50 2018-08-13 11:44 /user/hive/warehouse/cr.db              /house_1/000001_0#桶表2
-rwxr-xr-x   3 xiaoqiu   supergroup         50 2018-08-13 11:43 /user/hive/warehouse/cr.db              /house_1/000002_0#桶表3
[xiaoqiu@s150 /home/xiaoqiu]$ hadoop fs -cat /user/hive/warehouse/cr.db/house_1/000000_0#桶表1
3,'rom1',25
12,'tom4',45
9,'tom6',13
N,N,N
6,'tom9',20
[xiaoqiu@s150 /home/xiaoqiu]$ hadoop fs -cat /user/hive/warehouse/cr.db/house_1/000001_0#桶表2
22,'tom3',20
4,'som1',26
10,'tom5',33
7,'tom8',11
[xiaoqiu@s150 /home/xiaoqiu]$ hadoop fs -cat /user/hive/warehouse/cr.db/house_1/000002_0#桶表3
8,'tom7',12
32,'tom2',23
5,'tom10',23
2,'bom1',24
[xiaoqiu@s150 /home/xiaoqiu]$

欢迎关注我的公众号：小秋的博客 CSDN博客：https://blog.csdn.net/xiaoqiu_cr github:https://github.com/crr121 联系邮箱：rongchen633@gmail.com 有什么问题可以给我留言噢~