Hive练习--蚂蚁森林习题一

问题:

假设2017年1月1日开始记录低碳数据(user_low_carbon),假设2017年10月1日之前满足申领条件的用户都申领了一颗p004-胡杨,剩余的能量全部用来领取“p002-沙柳” 。统计在10月1日累计申领“p002-沙柳” 排名前10的用户信息;以及他比后一名多领了几颗沙柳。

数据的下载链接:https://files.cnblogs.com/files/yxym2016/%E8%9A%82%E8%9A%81%E6%A3%AE%E6%9E%97%E6%95%B0%E6%8D%AE.zip

得到的统计结果如下表样式:

user_id  plant_count less_count(比后一名多领了几颗沙柳)
u_101    1000         100
u_088    900          400
u_103    500

 一、准备工作

1、创建表

create table user_low_carbon(user_id String,data_dt String,low_carbon int) row format delimited fields terminated by '	';
create table plant_carbon(plant_id string,plant_name String,low_carbon int) row format delimited fields terminated by '	';

2、加载数据

load data local inpath "/opt/module/data/low_carbon.txt" into table user_low_carbon;
load data local inpath "/opt/module/data/plant_carbon.txt" into table plant_carbon;

二、解决步骤

1、统计每个用户在2017年10月1日之前收集的总碳量,并按照碳量进行倒序排序

select user_id  sum(low_carbon) as sum_low_carbon 
from user_low_carbon 
where data_dt < '2017/10/1' 
group by user_id 
order by sum_low_carbon desc
limit 11;  a1  # 优化操作,过滤掉无关的数据,减少数据量

输出结果:

user_id    sum_low_carbon
u_007    1470
u_013    1430
u_008    1240
u_005    1100
u_010    1080
u_014    1060
u_011    960
u_009    930
u_006    830
u_002    659
u_004    640
u_003    620
u_001    475
u_015    290
u_012    250

2、每个用户的总碳量减去一颗胡杨的碳量,然后用剩余的碳量除以沙柳的碳量

# 获取每颗胡杨所需要的碳量
select low_carbon from plant_carbon where plant_id = 'p004'; a2  
215
# 获取每颗沙柳所需要的碳量 
select low_carbon from plant_carbon where plant_id = 'p002'; a3  
19

select user_id,(a1.sum_low_carbon - a2.low_carbon)/a3.low_carbon as plant_count from a1,a2,a3;  a4

整合一下:

select user_id,(a1.sum_low_carbon - a2.low_carbon)/a3.low_carbon as plant_count from 
(select user_id,sum(low_carbon) as sum_low_carbon 
from user_low_carbon 
where data_dt < '2017/10/1' 
group by user_id 
order by sum_low_carbon desc) a1,
(select low_carbon from plant_carbon where plant_id = 'p004') a2,
(select low_carbon from plant_carbon where plant_id = 'p002') a3; a4

输出结果:

user_id    plant_count
u_007    66.05263157894737
u_013    63.94736842105263
u_008    53.94736842105263
u_005    46.578947368421055
u_010    45.526315789473685
u_014    44.473684210526315
u_011    39.21052631578947
u_009    37.63157894736842
u_006    32.36842105263158
u_002    23.36842105263158
u_004    22.36842105263158
u_003    21.31578947368421
u_001    13.68421052631579
u_015    3.9473684210526314
u_012    1.8421052631578947

3、按照每个人领取的沙柳棵数进行倒序排序,并获取当前记录的下一条记录所领取的沙柳的棵数

select user_id,plant_count,lead(plant_count,1,'') over(order by plant_count desc) next_count from a4;a5

将a4带入上面的SQL语句:

select user_id,plant_count,lead(plant_count,1,'0') over(order by plant_count desc) next_count from (
select user_id,(a1.sum_low_carbon - a2.low_carbon)/a3.low_carbon as plant_count from 
(select user_id,sum(low_carbon) as sum_low_carbon 
from user_low_carbon 
where data_dt < '2017/10/1' 
group by user_id 
order by sum_low_carbon desc) a1,
(select low_carbon from plant_carbon where plant_id = 'p004') a2,
(select low_carbon from plant_carbon where plant_id = 'p002') a3) a4
limit 10;

输出结果:

user_id    plant_count    next_count
u_007    66.05263157894737    63.94736842105263
u_013    63.94736842105263    53.94736842105263
u_008    53.94736842105263    46.578947368421055
u_005    46.578947368421055    45.526315789473685
u_010    45.526315789473685    44.473684210526315
u_014    44.473684210526315    39.21052631578947
u_011    39.21052631578947    37.63157894736842
u_009    37.63157894736842    32.36842105263158
u_006    32.36842105263158    23.36842105263158
u_002    23.36842105263158    22.36842105263158
u_004    22.36842105263158    21.31578947368421
u_003    21.31578947368421    13.68421052631579
u_001    13.68421052631579    3.9473684210526314
u_015    3.9473684210526314    1.8421052631578947
u_012    1.8421052631578947    NULL

4、将每一条记录对应的用户领取的沙柳棵数和排在他下面的用户领取的棵数,进行相减,就是当前用户比他后一名所多的棵数

select user_id,plant_count,(plant_count-next_count) less_count from a5;

将a5表带入上面的sql语句:

select user_id,plant_count,(plant_count-next_count) less_count from (
select user_id,plant_count,lead(plant_count,1,'0') over(order by plant_count desc) next_count from (
select user_id,(a1.sum_low_carbon - a2.low_carbon)/a3.low_carbon as plant_count from 
(select user_id,sum(low_carbon) as sum_low_carbon 
from user_low_carbon 
where data_dt < '2017/10/1' 
group by user_id 
order by sum_low_carbon desc) a1,
(select low_carbon from plant_carbon where plant_id = 'p004') a2,
(select low_carbon from plant_carbon where plant_id = 'p002') a3) a4) a5;

输出结果:

# 输出结果
user_id    plant_count    less_count
u_007    66.05263157894737    2.10526315789474
u_013    63.94736842105263    10.0
u_008    53.94736842105263    7.368421052631575
u_005    46.578947368421055    1.05263157894737
u_010    45.526315789473685    1.05263157894737
u_014    44.473684210526315    5.2631578947368425
u_011    39.21052631578947    1.5789473684210549
u_009    37.63157894736842    5.263157894736835
u_006    32.36842105263158    9.000000000000004
u_002    23.36842105263158    1.0
u_004    22.36842105263158    1.05263157894737
u_003    21.31578947368421    7.6315789473684195
u_001    13.68421052631579    9.736842105263158
u_015    3.9473684210526314    2.1052631578947367
u_012    1.8421052631578947    1.8421052631578947

5、使用floor函数,对进行取整操作,我们只需在获取棵数那里进行取整即可,因为那里涉及了除的操作,所以会产生小数,所以在这里进行取整即可。

select user_id,plant_count,(plant_count-next_count) less_count from (
select user_id,plant_count,lead(plant_count,1,'0') over(order by plant_count desc) next_count from (
select user_id,floor((a1.sum_low_carbon - a2.low_carbon)/a3.low_carbon) as plant_count from 
(select user_id,sum(low_carbon) as sum_low_carbon 
from user_low_carbon 
where data_dt < '2017/10/1' 
group by user_id 
order by sum_low_carbon desc) a1,
(select low_carbon from plant_carbon where plant_id = 'p004') a2,
(select low_carbon from plant_carbon where plant_id = 'p002') a3) a4) a5
limit 10;

输出结果:

user_id    plant_count    less_count
u_007    66    3
u_013    63    10
u_008    53    7
u_005    46    1
u_010    45    1
u_014    44    5
u_011    39    2
u_009    37    5
u_006    32    9
u_002    23    1
原文地址:https://www.cnblogs.com/yxym2016/p/13254316.html