访问记录切割新会话

需求,获取到的访问日志,现根据某用户间隔30分钟后为新的会话。
测试环境:postgresql
 
author : jackical
datetime : 2019-01-04
  
select *,sum(f1) over(order by timestamp_) from (
select timestamp_,case when  timestamp_ - lag(timestamp_,1) over(order by timestamp_)  > 60*30 then 1 else 0 end as f1 from (
select EXTRACT(epoch FROM CAST(data_lastchangetime AS TIMESTAMP)) as timestamp_ from bt_apppackages where data_lastchangetime>'2019-01-01' order by data_lastchangetime
) tab order by timestamp_  
) tab order by timestamp_ limit 100;
 
解说,由于窗口函数在over子句中有两中的边界定义方式: 1, ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 
                                                                                         2,  RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
 
 
 
关于边界的使用:
SELECT cookieid,
    createtime,
    pv,
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1
    SUM(pv) OVER(PARTITION BY cookieid) AS pv3,    --分组内所有行
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,  --当前行+往前3行
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,  --当前行+往前3行+往后1行
    SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6  ---当前行+往后所有行  
    FROM lxw1234;
     
    cookieid createtime     pv      pv1     pv2     pv3     pv4     pv5      pv6
    -----------------------------------------------------------------------------
    cookie1  2015-04-10      1       1       1       26      1       6       26
    cookie1  2015-04-11      5       6       6       26      6       13      25
    cookie1  2015-04-12      7       13      13      26      13      16      20
    cookie1  2015-04-13      3       16      16      26      16      18      13
    cookie1  2015-04-14      2       18      18      26      17      21      10
    cookie1  2015-04-15      4       22      22      26      16      20      8
    cookie1  2015-04-16      4       26      26      26      13      13      4
 
 
关于边界定义心得:1,默认边界定义,在有order by时为 起点到当前行
          2,没有在over中指定排序时,则边界为当前窗口(所有行)
           3,貌似有些资料说hive 在默认的边界定认为 range between unbounded preceding and current row,不管是range还是row 都是从起始到当前行
 
 
这种内容也不是第一次接触了,刚刚别人问了下,再想想比较困难,索性将结果与心得记下来。下次不必再去想,拿来就用。。
 
原文地址:https://www.cnblogs.com/jackicalSong/p/10221053.html