再度思索:从配送表中选出订单号和配送者相同时的最新记录

在前文 https://www.cnblogs.com/heyang78/p/12079017.html 中,我主要描述了现象,给出了结论,这次试图对过程进行一点思索。

需求:从配送表里,当订单号和配送者一样时,取时间最靠近现在即时间值最大的一条记录。

配送表表结构:

create table peisong(
id int,
order_no int, --订单号
shipper_cd int,--配送者号
create_time timestamp,
primary key(id))

填充实验值:

insert into peisong(id,order_no,shipper_cd,create_time) values('0',1,1,to_date('2004-05-07','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('1',1,1,to_date('2004-05-08','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('2',1,2,to_date('2004-05-09','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('3',1,2,to_date('2004-05-10','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('4',1,3,to_date('2004-05-11','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('5',2,1,to_date('2004-05-12','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('6',2,1,to_date('2004-05-13','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('7',2,2,to_date('2004-05-14','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('8',2,2,to_date('2004-05-15','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('9',2,3,to_date('2004-05-16','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('10',2,3,to_date('2004-05-17','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('11',3,1,to_date('2004-05-18','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('12',3,2,to_date('2004-05-19','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('13',3,2,to_date('2004-05-20','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('14',3,3,to_date('2004-05-21','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('15',3,4,to_date('2004-05-22','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('16',3,4,to_date('2004-05-23','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('17',3,4,to_date('2004-05-24','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('18',3,4,to_date('2004-05-25','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('19',3,4,to_date('2004-05-26','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('20',3,4,to_date('2004-05-27','yyyy-mm-dd'));

有多种方式可以达成需求,以下列举四种:

方案一:先分组再内联方案

select
    a.id,
    a.order_no,
    a.shipper_cd,
    to_char(a.create_time,'yyyy-mm-dd') as create_time
from
    peisong a,
    (
    select
        order_no,
        shipper_cd,
        max(create_time) as create_time
    from
        peisong
    group by
        order_no,
        shipper_cd) b
where
    a.order_no = b.order_no and 
    a.shipper_cd = b.shipper_cd and 
    a.create_time = b.create_time
order by
    a.id,
    a.order_no,
    a.shipper_cd

或:

select a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a, ( select order_no, shipper_cd, max(create_time) as create_time from peisong group by order_no, shipper_cd) b where a.order_no = b.order_no and a.shipper_cd = b.shipper_cd and a.create_time = b.create_time order by a.id, a.order_no, a.shipper_cd

筛选结果:

        ID   ORDER_NO SHIPPER_CD CREATE_TIM
---------- ---------- ---------- ----------
         1          1          1 2004-05-08
         3          1          2 2004-05-10
         4          1          3 2004-05-11
         6          2          1 2004-05-13
         8          2          2 2004-05-15
        10          2          3 2004-05-17
        11          3          1 2004-05-18
        13          3          2 2004-05-20
        14          3          3 2004-05-21
        20          3          4 2004-05-27

已选择10行。

 如果我们把order_no,shipper_cd相同的记录划分成一组,那么由b表的形成条件可知,无论组内元素有多少条,最终只会选出create_time最大(即最近)的那条,两表内联查询便会在21*10=210的结果集选,因此这种查询方式适合组内数量稍多的情况,比较均衡。

方案二:左连接方案

SELECT
    a.id,
    a.order_no,
    a.shipper_cd,
    to_char(a.create_time,'yyyy-mm-dd') as create_time
from
    peisong a left JOIN 
    peisong b
on
    a.order_no = b.order_no and 
    a.shipper_cd = b.shipper_cd and 
    a.create_time < b.create_time
where
    b.create_time  IS  NULL
order by
    a.id,
    a.order_no,
    a.shipper_cd

或:

SELECT a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a left JOIN peisong b on a.order_no = b.order_no and a.shipper_cd = b.shipper_cd and a.create_time < b.create_time where b.create_time IS NULL order by a.id, a.order_no, a.shipper_cd

结果:

        ID   ORDER_NO SHIPPER_CD CREATE_TIM
---------- ---------- ---------- ----------
         1          1          1 2004-05-08
         3          1          2 2004-05-10
         4          1          3 2004-05-11
         6          2          1 2004-05-13
         8          2          2 2004-05-15
        10          2          3 2004-05-17
        11          3          1 2004-05-18
        13          3          2 2004-05-20
        14          3          3 2004-05-21
        20          3          4 2004-05-27

已选择10行。

这个左联SQL的代码稍微有点费解,我在这里赘述两句。

首先可以看看peisong表左联自己,条件是左表时间小于右表的情况。

SQL:

SELECT
    a.id,
    to_char(a.create_time,'yyyy-mm-dd') as actime,
    to_char(b.create_time,'yyyy-mm-dd') as bctime
from
    peisong a left JOIN 
    peisong b
on
    a.order_no = b.order_no and 
    a.shipper_cd = b.shipper_cd and 
    a.create_time < b.create_time
order by
    a.id,
    a.order_no,
    a.shipper_cd

查询结果:

SQL> SELECT
  2      a.id,
  3      to_char(a.create_time,'yyyy-mm-dd') as actime,
  4      to_char(b.create_time,'yyyy-mm-dd') as bctime
  5  from
  6      peisong a left JOIN
  7      peisong b
  8  on
  9      a.order_no = b.order_no and
 10      a.shipper_cd = b.shipper_cd and
 11      a.create_time < b.create_time
 12  order by
 13      a.id,
 14      a.order_no,
 15      a.shipper_cd;

        ID ACTIME     BCTIME
---------- ---------- ----------
         0 2004-05-07 2004-05-08
         1 2004-05-08
         2 2004-05-09 2004-05-10
         3 2004-05-10
         4 2004-05-11
         5 2004-05-12 2004-05-13
         6 2004-05-13
         7 2004-05-14 2004-05-15
         8 2004-05-15
         9 2004-05-16 2004-05-17
        10 2004-05-17
        11 2004-05-18
        12 2004-05-19 2004-05-20
        13 2004-05-20
        14 2004-05-21
        15 2004-05-22 2004-05-23
        15 2004-05-22 2004-05-26
        15 2004-05-22 2004-05-25
        15 2004-05-22 2004-05-27
        15 2004-05-22 2004-05-24
        16 2004-05-23 2004-05-24
        16 2004-05-23 2004-05-27
        16 2004-05-23 2004-05-26
        16 2004-05-23 2004-05-25
        17 2004-05-24 2004-05-27
        17 2004-05-24 2004-05-26
        17 2004-05-24 2004-05-25
        18 2004-05-25 2004-05-27
        18 2004-05-25 2004-05-26
        19 2004-05-26 2004-05-27
        20 2004-05-27

已选择31行。

由上面的查询结果可以看出来,当左表的create_time字段已经是最大时,右表里已经找不出order_no,shipper_cd相同,而create_time字段更大的记录,于是按连接规则补NULL,这个NULL在上面的查询中就是以空档显示的。

通过观察我们就能知道,左表acreate_time字段有数据,右表bcreate_time字段是空的情况,这就是我们想要的记录,于是筛选条件where b.create_time  IS  NULL就出现了。

让我们会看ab两表连接条件“on a.order_no = b.order_no and a.shipper_cd = b.shipper_cd and a.create_time < b.create_time”,如果我们把order_no,shipper_cd相同的记录划到一组,会发现组越大,产生的连接结果就越多,最典型是15到20这一组(共5条记录),通过左联接产生了16条记录(5+4+3+2+1+1),如果有n条,那么就是n*(n+1)/2+1条,也就说说同组数量越大,产生记录会和数量平方成正比,这就是在原先实验中组小基本是一两条时左联接最快,组数量越大越慢的根本原因。综合分析可以得出结论,左联方案只是完成需求,对优化没有多考虑。

方案三:not exists方案

select
    a.id,
    a.order_no,
    a.shipper_cd,
    to_char(a.create_time,'yyyy-mm-dd') as create_time
from
    peisong a
where
      not exists  (
    select
        1
    from
        peisong b
    where
        b.shipper_cd = a.shipper_cd and 
        b.order_no = a.order_no and 
        b.create_time > a.create_time)
order by
    a.id,
    a.order_no,
    a.shipper_cd

或:

select a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a where not exists ( select 1 from peisong b where b.shipper_cd = a.shipper_cd and b.order_no = a.order_no and b.create_time > a.create_time) order by a.id, a.order_no, a.shipper_cd

结果:

        ID   ORDER_NO SHIPPER_CD CREATE_TIM
---------- ---------- ---------- ----------
         1          1          1 2004-05-08
         3          1          2 2004-05-10
         4          1          3 2004-05-11
         6          2          1 2004-05-13
         8          2          2 2004-05-15
        10          2          3 2004-05-17
        11          3          1 2004-05-18
        13          3          2 2004-05-20
        14          3          3 2004-05-21
        20          3          4 2004-05-27

Not exists方式会形成n*n的结构,但右边的n只做条件不筛选,然后对左边的n过滤, 最适合按 order_no,shipper_cd分组,而组内元素较多>10的情况,和左联正好相反。

方案四:最好理解又高效的rank分析函数方案

Oracle提供了分析函数rank,它可以按order_no,shipper_cd分组,按create_time逆序排序,然后将组内顺序赋上顺序值。

像这样

select id,rank() over (partition by order_no,shipper_cd order by create_time desc) as seq from peisong

之后我们只要取出seq=1的记录就达成需求了。

最终SQL:

select
id,
order_no,
shipper_cd,
to_char(create_time,'yyyy-mm-dd') as create_time
from peisong where id in (select a.id from (select id,rank() over (partition by order_no,shipper_cd order by create_time desc) as seq from peisong) a where a.seq=1)
order by id

执行结果:

SQL> select
  2      id,
  3      order_no,
  4      shipper_cd,
  5      to_char(create_time,'yyyy-mm-dd') as create_time
  6  from peisong where id in (select a.id from (select id,rank() over (partition by order_no,shipper_cd order by create_time desc) as seq from peisong) a where a.seq=1)
  7  order by id;

        ID   ORDER_NO SHIPPER_CD CREATE_TIM
---------- ---------- ---------- ----------
         1          1          1 2004-05-08
         3          1          2 2004-05-10
         4          1          3 2004-05-11
         6          2          1 2004-05-13
         8          2          2 2004-05-15
        10          2          3 2004-05-17
        11          3          1 2004-05-18
        13          3          2 2004-05-20
        14          3          3 2004-05-21
        20          3          4 2004-05-27

已选择10行。

这种方案由于rank的使用,第二个子查询就抛下了大多数无用数据行,因此效率也不错,还好理解,值得推荐。

可以看出,以上四种SQL文都能达成需求,具体选择哪种方案,需要根据实际数据分布进行实验,再选最合适的一种。

这次的分析应该比上次:https://www.cnblogs.com/heyang78/p/12079017.html 要更深入一些。

END

原文地址:https://www.cnblogs.com/heyang78/p/15206050.html