hive的create、insert、drop、truncate

1. hive建表：create

create table if not exists db_name.test_tb(id string, name string, age string, province string, score string)partitioned by (str_date string) row format delimited fields terminated by '1'

-- db_name：为数据库名称

-- partitioned by (str_date string)：设置分区字段

2. 追加插入记录：insert into

insert into table db_name.test_tb partition(str_date='2020-04-20') values('1','花木兰','24','北京','98')

insert into table db_name.test_tb partition(str_date='2020-04-20') values('2','李白','28','南京','90')

insert into table db_name.test_tb partition(str_date='2020-04-21') values('3','妲己','26','南京','95')

insert into table db_name.test_tb partition(str_date='2020-04-22') values('4','王昭君','22','上海','96')

当然，也可以查询插入数据表：

insert into table db_name.test_tb partition(str_date='2020-04-22') select * from db_name.tb_name   #从其他数据表或查询插入数据

如果没有分区，则一般插入数据形式即可。

3. 查询：select

select * from db_name.test_tb

4. 通过select表记录进行建表

create table if not exists db_name.test_tb_2 select * from db_name.test_tb

select * from db_name.test_tb_2

5. insert overwrite：擦掉原数据，写入新数据

insert overwrite table db_name.test_tb_2 partition(str_date='2020-04-24') values('5','陈咬金','30','北京','85')  # 不行，因为这种建表方式没有获得原表的分区信息

show partitions db_name.test_tb  #获取分区

既然没有复制分区，就按照一般表格式进行，此时把str_date字段当做一般字段：

insert overwrite table db_name.test_tb_2 values('5','陈咬金','30','北京','85','2020-04-24')

select * from db_name.test_tb_2

同理，也可以查询插入表记录：

insert overwrite table db_name.test_tb_2 select * from db_name.tb_name   #从其他数据表或查询插入数据

# ---------------------------------------------------------------------------------------------------------------------------

以上是没有分区的insert overwrite，如果有分区呢？

insert overwrite table db_name.test_tb partition(str_date='2020-04-24') values('6','狄仁杰','27','北京','96')

select * from db_name.test_tb

是否很好奇，明明是insert overwrite，为啥其他记录还存在？让我们再执行一次。

insert overwrite table db_name.test_tb partition(str_date='2020-04-24') values('7','李元芳','24','武汉','92')

select * from db_name.test_tb

可以发现，数据确实改变了，只不过改变的是（str_date='2020-04-24'）这个分区的数据，而之前还存在的数据并不是这个分区的。

所以总结：

insert overwrite在有分区的情况下，是刷写的该分区的内容（先擦除原数据，再写新数据），没有对其他分区的数据造成影响。

# ------------------------------------------------------------------------------------------------------------------------

注：同样可以select数据表进行，例如我们继续刷写 '2020-04-24' 的分区，利用之前的那个表 test_tb_2：

insert overwrite table db_name.test_tb partition(str_date='2020-04-24') select id, name, age, province, score from db_name.test_tb_2
#没有用 * 的原因是：str_date已经作为分区字段出现了，那么插入的数据中就不应该有这个字段

6. 删除表数据，但是不删除表结构：truncate

truncate table db_name.test_tb_2

select * from db_name.test_tb_2

此时，只是返回空表，不会报错，因为表还存在。

7. 删除表 (表数据和表结构)：drop

drop table db_name.test_tb_2

select * from db_name.test_tb_2  # 报错，因为表已经不存在了

参考：

https://my.oschina.net/sshuj/blog/852596

https://blog.csdn.net/gdkyxy2013/article/details/81200924