Hive练习题1-累计报表

如下测试数据

A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
C,2015-01,10
C,2015-01,20
A,2015-02,4
A,2015-02,6
C,2015-02,30
C,2015-02,10
B,2015-02,10
B,2015-02,5
A,2015-03,14
A,2015-03,6
B,2015-03,20
B,2015-03,25
C,2015-03,10
C,2015-03,20
View Code

需求:

求单月访问次数和总访问次数,截至当前最大访问次数

 

建表

create table t_access(name string, month string, time int) row format delimited fields terminated by ',';

导数

load data local inpath "/home/access_time.txt" into table t_access;

查询

select * from t_access;

传统做法

求每个用户,每月的总访问次数,数据放到一张临时表

create table t_access_tmp as
select name, month, sum(time) as month_times from t_access group by name,month;

 临时表自连接

select a.*, b.*
from t_access_tmp a join t_access_tmp b on a.name = b.name
where a.month<=b.month

按照b.name,b.month聚合,求最大值和求和。

select
t.bname,
t.bmonth,
sum(t.month_times) as accu_times,
max(t.month_times) as max_times
from(
select b.name as bname, b.month as bmonth, a.month_times as month_times from t_access_tmp a 
join t_access_tmp b on a.name = b.name where a.month<=b.month) as t group by t.bname,t.bmonth;

或者简写成

select a.name, a.month, sum(b.times) as accumulate 
from (select name,month,sum(time) as times from t_access group by name,month) a join (select name,month,sum(time) as times from t_access group by name,month) b on a.name = b.name where a.month >= b.month group by a.name, a.month;

 


 

用开窗函数做法,效率更高。

curent_month_times是当前用户,当前月的访问次数

accu_times是当前用户,截至当前月的累计访问次数

max_times是当前用户,截至当前月的最大的月访问次数

 

select
t.name,
t.month,
t.month_times as curent_month_times,
sum(t.month_times)over(partition by t.name order by t.month rows between unbounded preceding and current row ) as accu_times,
max(t.month_times)over(partition by t.name order by t.month rows between unbounded preceding and current row ) as max_times
from(
select name, month, sum(time) as month_times from t_access group by name,month
) t;
首先对原始数据,根据name,month聚合。聚合后的结果用分析函数求累计
在这个题中,需要加rows between unbounded preceding and current row,意思是从这一组的第一行开始到当前行需要进行聚合计算。

 

 

 

 

 

原文地址:https://www.cnblogs.com/lucas-zhao/p/11905278.html