yhd日志分析(二)

yhd日志分析(二)

继续yhd日志分析,统计数据

日期uvpv登录人数游客人数平均访问时长二跳率独立ip数

1 分析

登录人数

count(distinct endUserId)

游客人数

count(distinct guid) - count(distinct endUserId)

平均访问时长

先把tracktime转换为unix timestamp, 相同sessionId的tracktime中,max(tracktime)-min(tracktime),得到用户停留时间。所有用户的停留时间相加,得到总停留时间。总停留时间和总访问次数的比例,就是平均访问时长

select sum(stay_time) as total_stay_time
from (select max(to_unix_timestamp(trackTime)) - min(to_unix_timestamp(trackTime)) as stay_time from yhd_log group by sessionId) stay

用户访问总数

count(distinct sessionId)

二跳率

sessionViewNo=2的用户,即为二跳用户。统计出二跳用户和uv的比例

select count(distinct guid) from yhd_log where sessionViewNo=2

独立ip数

count(distinct ip)

实现

  1. 借助中间表,分别存放停留时间和二次跳用户总数

     // 存放总停留时间
     
     create table if not exists yhd_log_total_stay_time(
     	date string,
     	total_stay_time bigint
     )
     row format delimited fields terminated by '	'
     stored as textfile;
     
     // 存放二次跳用户总数
     
     create table if not exists yhd_log_total_second_jump(
     	date string,
     	total_second_jump bigint
     )
     row format delimited fields terminated by '	'
     stored as textfile;
    
  2. 计算总停留时间,存放在yhd_log_total_stay_time, 按日期分组

     insert overwrite table yhd_log_total_stay_time
     select date, sum(stay_time) as total_stay_time
     from (select max(to_unix_timestamp(trackTime)) - min(to_unix_timestamp(trackTime)) as stay_time, date from yhd_log group by date, sessionId) stay
     group by date
    
  3. 计算二次跳用户总数,存放在yhd_log_total_second_jump, 按日期分组

     insert overwrite table yhd_log_total_second_jump
     select date, count(distinct guid)
     from yhd_log
     where sessionViewNo=2 
     group by date
    
  4. 统计

     把yhd_log_total_stay_time,yhd_log_total_second_jump和yhd_log按照 date连接查询
    
     select date, pv, uv, user_count, guest_count, 
     	total_stay_time/total_visit as average_stay_time,
     	total_second_jump/ uv as second_jump_rate, indepent_ip
     from (
     select log.date, 
     	count(url) as pv, 
     	count(distinct guid) as uv,
     	count(distinct endUserId) as user_count,
     	count(distinct guid) - count(distinct endUserId) as guest_count,
     	count(distinct sessionId) as total_visit,
     	min(stay.total_stay_time) as total_stay_time,
     	min(second.total_second_jump) as total_second_jump,
     	count(distinct ip) as indepent_ip
     from yhd_log log inner join yhd_log_total_stay_time stay on stay.date=log.date inner join yhd_log_total_second_jump second on second.date=log.date
     group by log.date
     ) stat
    

结果

datepvuvuser_countguest_countaverage_stay_timesecond_jump_rateindepent_ip
20150828126134390071768721320745.97973932447510.1311815827928320730462
原文地址:https://www.cnblogs.com/ivanny/p/yhd_log_statatis_2.html