hive--distribute by and sort by

数据

B 10 store_B_4
A 12 store_A_1
A 14 store_A_2
B 15 store_B_1
B 19 store_B_2
B 30 store_B_3

建表及加载数据

create table if not exists store(
sid string,
amount string,
name string
)
row format delimited fields terminated by ' '
lines terminated by '
'
stored as textfile
;
load data local inpath '/opt/wangyuqi/store.txt' into table store;

hive中 distribute by + 字段,关键字会控制map输出结果的分发,相同字段的map会分发到一个reduce节点,sort by 为每个reduce内部排序

select * from store distribute by sid sort by amount desc;
result:
A    14    store_A_2
A    12    store_A_1
B    30    store_B_3
B    19    store_B_2
B    15    store_B_1
B    10    store_B_4
Time taken: 224.482 seconds

cluster by用法:相当于 distribute by 和sort by 的结合,默认只能是升序

select * from store cluster by sid;
result:
A    14    store_A_2
A    12    store_A_1
B    30    store_B_3
B    19    store_B_2
B    15    store_B_1
B    10    store_B_4
Time taken: 126.178 seconds, Fetched: 6 row(s)
原文地址:https://www.cnblogs.com/youchi/p/13551421.html