hive实现单词统计

参考博客：一个hive小案例：使用HIVE进行单词统计, 并把结果存入mysql

问题：统计客户某个年龄有多少人

客户表信息

hive> desc customer_info;
OK
id                      int
name                    string
age                     int
Time taken: 0.213 seconds, Fetched: 3 row(s)
hive> select * from customer_info limit 10;
OK
1       tom1    11
2       tom2    12
3       tom3    13
4       tom4    14
5       tom5    15
6       tom6    16
7       tom7    17
8       tom8    18
9       tom9    19

切割出age列，并保存到每个数组里面，并命名新的列名为AGE

hive> select split(age,',') as AGE from customer_info limit 10;
OK
["11"]
["12"]
["13"]
["14"]
["15"]
["16"]
["17"]
["18"]
["19"]
["20"]

将每个数组炸开

hive> select explode(split(age,',')) as AGE from customer_info limit 10;
OK
11
12
13
14
15
16
17
18
19
20

统计每个年龄的数量

hive> select t.AGE,count(*) from ( select explode(split(age,',')) as AGE from customer_info ) as t group by t.AGE;
11      1
12      1
13      1
14      1
15      1
16      1
17      1
18      1
19      1
20      2
21      2
22      2
23      2
24      2
25      2
26      2
27      2
28      3
29      3
30      3
31      3
32      3
33      3
34      3
35      3
36      3
37      3
38      2
39      2
40      1
41      1
42      1
43      1
44      1

将结果写入新的表中

 create table customer_age_count as select t.AGE,count(*) from ( select explode(split(age,',')) as AGE from customer_info ) as t group by t.AGE;

hive> export table customer_age_count to '/hive_export';
[xiaoqiu@s150 /home/xiaoqiu]$ hadoop fs -lsr /hive_export
lsr: DEPRECATED: Please use 'ls -R' instead.
-rwxr-xr-x   3 xiaoqiu supergroup       1303 2018-08-13 23:28 /hive_export/_metadata
drwxr-xr-x   - xiaoqiu supergroup          0 2018-08-13 23:28 /hive_export/data
-rwxr-xr-x   3 xiaoqiu supergroup        170 2018-08-13 23:28 /hive_export/data/000000_0

欢迎关注我的公众号：小秋的博客 CSDN博客：https://blog.csdn.net/xiaoqiu_cr github:https://github.com/crr121 联系邮箱：rongchen633@gmail.com 有什么问题可以给我留言噢~