hive实现单词统计

参考博客:一个hive小案例:使用HIVE进行单词统计, 并把结果存入mysql

问题:统计客户某个年龄有多少人

  • 客户表信息
hive> desc customer_info;
OK
id                      int
name                    string
age                     int
Time taken: 0.213 seconds, Fetched: 3 row(s)
hive> select * from customer_info limit 10;
OK
1       tom1    11
2       tom2    12
3       tom3    13
4       tom4    14
5       tom5    15
6       tom6    16
7       tom7    17
8       tom8    18
9       tom9    19

  • 切割出age列,并保存到每个数组里面,并命名新的列名为AGE
hive> select split(age,',') as AGE from customer_info limit 10;
OK
["11"]
["12"]
["13"]
["14"]
["15"]
["16"]
["17"]
["18"]
["19"]
["20"]
  • 将每个数组炸开
hive> select explode(split(age,',')) as AGE from customer_info limit 10;
OK
11
12
13
14
15
16
17
18
19
20
  • 统计每个年龄的数量
hive> select t.AGE,count(*) from ( select explode(split(age,',')) as AGE from customer_info ) as t group by t.AGE;
11      1
12      1
13      1
14      1
15      1
16      1
17      1
18      1
19      1
20      2
21      2
22      2
23      2
24      2
25      2
26      2
27      2
28      3
29      3
30      3
31      3
32      3
33      3
34      3
35      3
36      3
37      3
38      2
39      2
40      1
41      1
42      1
43      1
44      1
  • 将结果写入新的表中
 create table customer_age_count as select t.AGE,count(*) from ( select explode(split(age,',')) as AGE from customer_info ) as t group by t.AGE;
  • 将表导出到HDFS目录下
hive> export table customer_age_count to '/hive_export';
[xiaoqiu@s150 /home/xiaoqiu]$ hadoop fs -lsr /hive_export
lsr: DEPRECATED: Please use 'ls -R' instead.
-rwxr-xr-x   3 xiaoqiu supergroup       1303 2018-08-13 23:28 /hive_export/_metadata
drwxr-xr-x   - xiaoqiu supergroup          0 2018-08-13 23:28 /hive_export/data
-rwxr-xr-x   3 xiaoqiu supergroup        170 2018-08-13 23:28 /hive_export/data/000000_0
  •  
欢迎关注我的公众号:小秋的博客 CSDN博客:https://blog.csdn.net/xiaoqiu_cr github:https://github.com/crr121 联系邮箱:rongchen633@gmail.com 有什么问题可以给我留言噢~
原文地址:https://www.cnblogs.com/flyingcr/p/10326857.html