Hive_Hive的数据模型_分区表

Hive的数据模型之分区表

准备数据表:

create table sampledata
(sid int, sname string, gender string, language int, math int, english int)
row format delimited fields terminated by ',' stored as textfile;

准备文本数据:

sampledata.txt
1,Tom,M,60,80,96
2,Mary,F,11,22,33
3,Jerry,M,90,11,23
4,Rose,M,78,77,76
5,Mike,F,99,98,98

将文本数据插入到数据表:

hive> load data local inpath '/root/pl62716/hive/sampledata.txt' into table sampledata;

-partition对应于数据库中的Partition 列的密集索引

-在Hive中,表中的一个Partition对应于表下的一个目录,所有的Partition的数据都存储在对应的目录中。

创建分区表:

create table partition_table
(sid int, sname string)
partitioned by (gender string)
row format delimited fields terminated by ',';

向分区表中插入数据:

hive> insert into table partition_table partition(gender='M') select sid, sname from sampledata where gender='M';
hive> insert into table partition_table partition(gender='F') select sid, sname from sampledata where gender='F';

从内部表解析比从分区表解析效率低:

(1)内部表:

hive> explain select * from sampledata where gender='M';
OK
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: sampledata
          Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: NONE
          Filter Operator
            predicate: (gender = 'M') (type: boolean)
            Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: sid (type: int), sname (type: string), 'M' (type: string), language (type: int), math (type: int), english (type: int)
              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
              Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: NONE
              ListSink

Time taken: 0.046 seconds, Fetched: 20 row(s)

(2)分区表:

hive> explain select * from partition_table where gender='M';
OK
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: partition_table
          Statistics: Num rows: 2 Data size: 13 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: sid (type: int), sname (type: string), 'M' (type: string)
            outputColumnNames: _col0, _col1, _col2
            Statistics: Num rows: 2 Data size: 13 Basic stats: COMPLETE Column stats: NONE
            ListSink

Time taken: 0.187 seconds, Fetched: 17 row(s)
原文地址:https://www.cnblogs.com/liupuLearning/p/6322609.html