0. 说明
通过 Hive 对 duowan 数据进行简单处理
1. 操作流程
1.1 建表
create table duowan(id int, name string, pass string, mail string, nickname string) row format delimited fields terminated by ' ' lines terminated by ' ' stored as textfile;
1.2 加载数据
load data inpath '/duowan_user.txt' into table duowan;
1.3 开始执行
select pass , count(*) as count from duowan group by pass order by count desc limit 10;
1.4 设置 reduce 个数
set mapreduce.job.reduces=2;