再度思索：从配送表中选出订单号和配送者相同时的最新记录

在前文 https://www.cnblogs.com/heyang78/p/12079017.html 中，我主要描述了现象，给出了结论，这次试图对过程进行一点思索。

需求：从配送表里，当订单号和配送者一样时，取时间最靠近现在即时间值最大的一条记录。

配送表表结构：

create table peisong(
id int,
order_no int, --订单号
shipper_cd int,--配送者号
create_time timestamp,
primary key(id))

填充实验值：

insert into peisong(id,order_no,shipper_cd,create_time) values('0',1,1,to_date('2004-05-07','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('1',1,1,to_date('2004-05-08','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('2',1,2,to_date('2004-05-09','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('3',1,2,to_date('2004-05-10','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('4',1,3,to_date('2004-05-11','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('5',2,1,to_date('2004-05-12','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('6',2,1,to_date('2004-05-13','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('7',2,2,to_date('2004-05-14','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('8',2,2,to_date('2004-05-15','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('9',2,3,to_date('2004-05-16','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('10',2,3,to_date('2004-05-17','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('11',3,1,to_date('2004-05-18','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('12',3,2,to_date('2004-05-19','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('13',3,2,to_date('2004-05-20','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('14',3,3,to_date('2004-05-21','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('15',3,4,to_date('2004-05-22','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('16',3,4,to_date('2004-05-23','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('17',3,4,to_date('2004-05-24','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('18',3,4,to_date('2004-05-25','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('19',3,4,to_date('2004-05-26','yyyy-mm-dd'));
insert into peisong(id,order_no,shipper_cd,create_time) values('20',3,4,to_date('2004-05-27','yyyy-mm-dd'));

有多种方式可以达成需求，以下列举四种：

方案一：先分组再内联方案

select
    a.id,
    a.order_no,
    a.shipper_cd,
    to_char(a.create_time,'yyyy-mm-dd') as create_time
from
    peisong a,
    (
    select
        order_no,
        shipper_cd,
        max(create_time) as create_time
    from
        peisong
    group by
        order_no,
        shipper_cd) b
where
    a.order_no = b.order_no and 
    a.shipper_cd = b.shipper_cd and 
    a.create_time = b.create_time
order by
    a.id,
    a.order_no,
    a.shipper_cd

或：

select a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a, ( select order_no, shipper_cd, max(create_time) as create_time from peisong group by order_no, shipper_cd) b where a.order_no = b.order_no and a.shipper_cd = b.shipper_cd and a.create_time = b.create_time order by a.id, a.order_no, a.shipper_cd

筛选结果：

        ID   ORDER_NO SHIPPER_CD CREATE_TIM
---------- ---------- ---------- ----------
         1          1          1 2004-05-08
         3          1          2 2004-05-10
         4          1          3 2004-05-11
         6          2          1 2004-05-13
         8          2          2 2004-05-15
        10          2          3 2004-05-17
        11          3          1 2004-05-18
        13          3          2 2004-05-20
        14          3          3 2004-05-21
        20          3          4 2004-05-27

已选择10行。

如果我们把order_no，shipper_cd相同的记录划分成一组，那么由b表的形成条件可知，无论组内元素有多少条，最终只会选出create_time最大（即最近）的那条，两表内联查询便会在21*10=210的结果集选，因此这种查询方式适合组内数量稍多的情况，比较均衡。

方案二：左连接方案

SELECT
    a.id,
    a.order_no,
    a.shipper_cd,
    to_char(a.create_time,'yyyy-mm-dd') as create_time
from
    peisong a left JOIN 
    peisong b
on
    a.order_no = b.order_no and 
    a.shipper_cd = b.shipper_cd and 
    a.create_time < b.create_time
where
    b.create_time  IS  NULL
order by
    a.id,
    a.order_no,
    a.shipper_cd

或：

SELECT a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a left JOIN peisong b on a.order_no = b.order_no and a.shipper_cd = b.shipper_cd and a.create_time < b.create_time where b.create_time IS NULL order by a.id, a.order_no, a.shipper_cd

结果：

        ID   ORDER_NO SHIPPER_CD CREATE_TIM
---------- ---------- ---------- ----------
         1          1          1 2004-05-08
         3          1          2 2004-05-10
         4          1          3 2004-05-11
         6          2          1 2004-05-13
         8          2          2 2004-05-15
        10          2          3 2004-05-17
        11          3          1 2004-05-18
        13          3          2 2004-05-20
        14          3          3 2004-05-21
        20          3          4 2004-05-27

已选择10行。

这个左联SQL的代码稍微有点费解，我在这里赘述两句。

首先可以看看peisong表左联自己，条件是左表时间小于右表的情况。

SQL：

SELECT
    a.id,
    to_char(a.create_time,'yyyy-mm-dd') as actime,
    to_char(b.create_time,'yyyy-mm-dd') as bctime
from
    peisong a left JOIN 
    peisong b
on
    a.order_no = b.order_no and 
    a.shipper_cd = b.shipper_cd and 
    a.create_time < b.create_time
order by
    a.id,
    a.order_no,
    a.shipper_cd

查询结果：

SQL> SELECT
  2      a.id,
  3      to_char(a.create_time,'yyyy-mm-dd') as actime,
  4      to_char(b.create_time,'yyyy-mm-dd') as bctime
  5  from
  6      peisong a left JOIN
  7      peisong b
  8  on
  9      a.order_no = b.order_no and
 10      a.shipper_cd = b.shipper_cd and
 11      a.create_time < b.create_time
 12  order by
 13      a.id,
 14      a.order_no,
 15      a.shipper_cd;

        ID ACTIME     BCTIME
---------- ---------- ----------
         0 2004-05-07 2004-05-08
         1 2004-05-08
         2 2004-05-09 2004-05-10
         3 2004-05-10
         4 2004-05-11
         5 2004-05-12 2004-05-13
         6 2004-05-13
         7 2004-05-14 2004-05-15
         8 2004-05-15
         9 2004-05-16 2004-05-17
        10 2004-05-17
        11 2004-05-18
        12 2004-05-19 2004-05-20
        13 2004-05-20
        14 2004-05-21
        15 2004-05-22 2004-05-23
        15 2004-05-22 2004-05-26
        15 2004-05-22 2004-05-25
        15 2004-05-22 2004-05-27
        15 2004-05-22 2004-05-24
        16 2004-05-23 2004-05-24
        16 2004-05-23 2004-05-27
        16 2004-05-23 2004-05-26
        16 2004-05-23 2004-05-25
        17 2004-05-24 2004-05-27
        17 2004-05-24 2004-05-26
        17 2004-05-24 2004-05-25
        18 2004-05-25 2004-05-27
        18 2004-05-25 2004-05-26
        19 2004-05-26 2004-05-27
        20 2004-05-27

已选择31行。

由上面的查询结果可以看出来，当左表的create_time字段已经是最大时，右表里已经找不出order_no，shipper_cd相同，而create_time字段更大的记录，于是按连接规则补NULL，这个NULL在上面的查询中就是以空档显示的。

通过观察我们就能知道，左表acreate_time字段有数据，右表bcreate_time字段是空的情况，这就是我们想要的记录，于是筛选条件where b.create_time IS NULL就出现了。

让我们会看ab两表连接条件“on a.order_no = b.order_no and a.shipper_cd = b.shipper_cd and a.create_time < b.create_time”，如果我们把order_no，shipper_cd相同的记录划到一组，会发现组越大，产生的连接结果就越多，最典型是15到20这一组（共5条记录），通过左联接产生了16条记录（5+4+3+2+1+1），如果有n条，那么就是n*(n+1)/2+1条，也就说说同组数量越大，产生记录会和数量平方成正比，这就是在原先实验中组小基本是一两条时左联接最快，组数量越大越慢的根本原因。综合分析可以得出结论，左联方案只是完成需求，对优化没有多考虑。

方案三：not exists方案

select
    a.id,
    a.order_no,
    a.shipper_cd,
    to_char(a.create_time,'yyyy-mm-dd') as create_time
from
    peisong a
where
      not exists  (
    select
        1
    from
        peisong b
    where
        b.shipper_cd = a.shipper_cd and 
        b.order_no = a.order_no and 
        b.create_time > a.create_time)
order by
    a.id,
    a.order_no,
    a.shipper_cd

或:

select a.id, a.order_no, a.shipper_cd, to_char(a.create_time,'yyyy-mm-dd') as create_time from peisong a where not exists ( select 1 from peisong b where b.shipper_cd = a.shipper_cd and b.order_no = a.order_no and b.create_time > a.create_time) order by a.id, a.order_no, a.shipper_cd

结果：

        ID   ORDER_NO SHIPPER_CD CREATE_TIM
---------- ---------- ---------- ----------
         1          1          1 2004-05-08
         3          1          2 2004-05-10
         4          1          3 2004-05-11
         6          2          1 2004-05-13
         8          2          2 2004-05-15
        10          2          3 2004-05-17
        11          3          1 2004-05-18
        13          3          2 2004-05-20
        14          3          3 2004-05-21
        20          3          4 2004-05-27

Not exists方式会形成n*n的结构，但右边的n只做条件不筛选，然后对左边的n过滤，最适合按 order_no,shipper_cd分组，而组内元素较多>10的情况，和左联正好相反。

方案四：最好理解又高效的rank分析函数方案

Oracle提供了分析函数rank，它可以按order_no,shipper_cd分组，按create_time逆序排序，然后将组内顺序赋上顺序值。

像这样

select id，rank() over (partition by order_no,shipper_cd order by create_time desc) as seq from peisong

之后我们只要取出seq=1的记录就达成需求了。

最终SQL：

select
id,
order_no,
shipper_cd,
to_char(create_time,'yyyy-mm-dd') as create_time
from peisong where id in (select a.id from (select id，rank() over (partition by order_no,shipper_cd order by create_time desc) as seq from peisong) a where a.seq=1)
order by id

执行结果：

SQL> select
  2      id,
  3      order_no,
  4      shipper_cd,
  5      to_char(create_time,'yyyy-mm-dd') as create_time
  6  from peisong where id in (select a.id from (select id，rank() over (partition by order_no,shipper_cd order by create_time desc) as seq from peisong) a where a.seq=1)
  7  order by id;

        ID   ORDER_NO SHIPPER_CD CREATE_TIM
---------- ---------- ---------- ----------
         1          1          1 2004-05-08
         3          1          2 2004-05-10
         4          1          3 2004-05-11
         6          2          1 2004-05-13
         8          2          2 2004-05-15
        10          2          3 2004-05-17
        11          3          1 2004-05-18
        13          3          2 2004-05-20
        14          3          3 2004-05-21
        20          3          4 2004-05-27

已选择10行。

这种方案由于rank的使用，第二个子查询就抛下了大多数无用数据行，因此效率也不错，还好理解，值得推荐。

可以看出，以上四种SQL文都能达成需求，具体选择哪种方案，需要根据实际数据分布进行实验，再选最合适的一种。

这次的分析应该比上次：https://www.cnblogs.com/heyang78/p/12079017.html 要更深入一些。

END