新三种求数列中值SQL之效率再比拼

在 https://www.cnblogs.com/xiandedanteng/p/12677688.html 中我列举了三种求中值方案，其中日本人MICK的做法因为不适用于二百万结果集而放弃，取而代之是新方案一。

新方案一：

经过思考后我又得出了一种中值的新解法，那就是利用排序后正向序列和反向序列交叉点为中值区的原理，如果两个序列相减小于等于一则求所在区域的均值即可。

SQL：

select avg(a.salary) from
(select salary,dense_rank() over (order by salary asc) as seq,dense_rank() over (order by salary desc) as revseq from tb_employee order by salary) a where abs(a.seq-a.revseq)<=1

解释计划：

SQL> select avg(a.salary) from
  2  (select salary,dense_rank() over (order by salary asc) as seq,dense_rank() over (order by salary desc) as revseq from tb_employee order by salary) a where abs(a.seq-a.revseq)<=1;
已用时间:  00: 00: 00.00

执行计划
----------------------------------------------------------
Plan hash value: 9035349

---------------------------------------------------------------------------------------------
| Id  | Operation             | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |             |     1 |    39 |       |  8850   (2)| 00:01:47 |
|   1 |  SORT AGGREGATE       |             |     1 |    39 |       |            |          |
|*  2 |   VIEW                |             |  2000K|    74M|       |  8850   (2)| 00:01:47 |
|   3 |    WINDOW SORT        |             |  2000K|  7812K|    22M|  8850   (2)| 00:01:47 |
|   4 |     WINDOW SORT       |             |  2000K|  7812K|    22M|  8850   (2)| 00:01:47 |
|   5 |      TABLE ACCESS FULL| TB_EMPLOYEE |  2000K|  7812K|       |  3045   (2)| 00:00:37 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(ABS("A"."SEQ"-"A"."REVSEQ")<=1)

从Cost看是目前最快的。

执行时间：

SQL> select avg(a.salary) from
  2  (select salary,dense_rank() over (order by salary asc) as seq,dense_rank() over (order by salary desc) as revseq from tb_employee order by salary) a where abs(a.seq-a.revseq)<=1;

AVG(A.SALARY)
-------------
         5500

已用时间:  00: 00: 02.46

从运行时间上看排第二。

经局部优化后两种方案SQL如下：

原有方案二：

select avg(b.salary) from
(select a.salary,abs(a.seq-a.revseq) as diff from (select id,salary,dense_rank() over (order by salary asc) as seq,dense_rank() over (order by salary desc) as revseq from tb_employee order by salary) a ) b
where b.diff=(select min(c.diff) from
(select abs(a.seq-a.revseq) as diff from (select dense_rank() over (order by salary asc) as seq,dense_rank() over (order by salary desc) as revseq from tb_employee order by salary) a ) c)

解释计划：

SQL> select avg(b.salary) from
  2  (select a.salary,abs(a.seq-a.revseq) as diff from (select id,salary,dense_rank() over (order by salary asc) as seq,dense_rank() over (order by salary desc) as revseq from tb_employee order by salary) a ) b
  3  where b.diff=(select min(c.diff) from
  4  (select abs(a.seq-a.revseq) as diff from (select dense_rank() over (order by salary asc) as seq,dense_rank() over (order by salary desc) as revseq from tb_employee order by salary) a ) c);
已用时间:  00: 00: 00.00

执行计划
----------------------------------------------------------
Plan hash value: 3874635296

-----------------------------------------------------------------------------------------------
| Id  | Operation               | Name        | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |             |     1 |    39 |       | 19969   (2)| 00:04:00 |
|   1 |  SORT AGGREGATE         |             |     1 |    39 |       |            |          |
|*  2 |   VIEW                  |             |  2000K|    74M|       | 11119   (2)| 00:02:14 |
|   3 |    WINDOW SORT          |             |  2000K|    19M|    38M| 11119   (2)| 00:02:14 |
|   4 |     WINDOW SORT         |             |  2000K|    19M|    38M| 11119   (2)| 00:02:14 |
|   5 |      TABLE ACCESS FULL  | TB_EMPLOYEE |  2000K|    19M|       |  3045   (2)| 00:00:37 |
|   6 |    SORT AGGREGATE       |             |     1 |    26 |       |            |          |
|   7 |     VIEW                |             |  2000K|    49M|       |  8850   (2)| 00:01:47 |
|   8 |      WINDOW SORT        |             |  2000K|  7812K|    22M|  8850   (2)| 00:01:47 |
|   9 |       WINDOW SORT       |             |  2000K|  7812K|    22M|  8850   (2)| 00:01:47 |
|  10 |        TABLE ACCESS FULL| TB_EMPLOYEE |  2000K|  7812K|       |  3045   (2)| 00:00:37 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(ABS("A"."SEQ"-"A"."REVSEQ")= (SELECT MIN(ABS("A"."SEQ"-"A"."REVSEQ"))
              FROM  (SELECT DENSE_RANK() OVER ( ORDER BY "SALARY") "SEQ",DENSE_RANK() OVER ( ORDER
              BY INTERNAL_FUNCTION("SALARY") DESC ) "REVSEQ" FROM "TB_EMPLOYEE" "TB_EMPLOYEE" ORDER
              BY "SALARY") "A"))

执行时间：

SQL> select avg(b.salary) from
2 (select a.salary,abs(a.seq-a.revseq) as diff from (select id,salary,dense_rank() over (order by salary asc) as seq,dense_rank() over (order by salary desc) as revseq from tb_employee order by salary) a ) b
3 where b.diff=(select min(c.diff) from
4 (select abs(a.seq-a.revseq) as diff from (select dense_rank() over (order by salary asc) as seq,dense_rank() over (order by salary desc) as revseq from tb_employee order by salary) a ) c);

AVG(B.SALARY)
-------------
5500

已用时间: 00: 00: 04.56

原有方案三：

select avg(a.salary) from
(select 
salary,row_number() over (order by salary asc) as seq,row_number() over (order by salary desc) as revseq,(select ceil(count(*)/2) from tb_employee) as ceil 
from tb_employee) a
where a.seq=a.ceil or a.revseq=a.ceil

解释计划：

SQL> select avg(a.salary) from
  2  (select
  3  salary,row_number() over (order by salary asc) as seq,row_number() over (order by salary desc) as revseq,(select ceil(count(*)/2) from tb_employee) as ceil
  4  from tb_employee) a
  5  where a.seq=a.ceil or a.revseq=a.ceil;
已用时间:  00: 00: 00.00

执行计划
----------------------------------------------------------
Plan hash value: 3541675306

-----------------------------------------------------------------------------------------------
| Id  | Operation              | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |              |     1 |    52 |       | 14656   (3)| 00:02:56 |
|   1 |  SORT AGGREGATE        |              |     1 |    52 |       |            |          |
|   2 |   SORT AGGREGATE       |              |     1 |       |       |            |          |
|   3 |    INDEX FAST FULL SCAN| SYS_C0012264 |  2000K|       |       |  1474   (2)| 00:00:18 |
|*  4 |   VIEW                 |              |  2000K|    99M|       | 14656   (3)| 00:02:56 |
|   5 |    WINDOW SORT         |              |  2000K|  7812K|    22M| 14656   (3)| 00:02:56 |
|   6 |     WINDOW SORT        |              |  2000K|  7812K|    22M| 14656   (3)| 00:02:56 |
|   7 |      TABLE ACCESS FULL | TB_EMPLOYEE  |  2000K|  7812K|       |  3045   (2)| 00:00:37 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("A"."SEQ"="A"."CEIL" OR "A"."REVSEQ"="A"."CEIL")

消耗时间：

SQL> ;
  1  select avg(b.salary) from
  2   (select
  3   salary,row_number() over (order by salary asc) as seq,row_number() over (order by salary desc) as revseq,(select ceil(count(*)/2) from tb_employee) as ceil
  4  - filter("A"."SEQ"="A"."CEIL" OR "A"."REVSEQ"="A"."CEIL")；
  5*  where a.seq=a.ceil or a.revseq=a.ceil
SQL> select avg(a.salary) from
  2  (select
  3  salary,row_number() over (order by salary asc) as seq,row_number() over (order by salary desc) as revseq,(select ceil(count(*)/2) from tb_employee) as ceil
  4  from tb_employee) a
  5  where a.seq=a.ceil or a.revseq=a.ceil;

AVG(A.SALARY)
-------------
         5498

已用时间:  00: 00: 01.98

而单查表数量的解释计划大家也参考看一下：

SQL> select count(*) from tb_employee;
已用时间:  00: 00: 00.01

执行计划
----------------------------------------------------------
Plan hash value: 2866911338

------------------------------------------------------------------------------
| Id  | Operation             | Name         | Rows  | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |              |     1 |  1474   (2)| 00:00:18 |
|   1 |  SORT AGGREGATE       |              |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| SYS_C0012264 |  2000K|  1474   (2)| 00:00:18 |
------------------------------------------------------------------------------

最后的大比拼表格：

方案	Cost	耗时
新方案一	8850	2.46秒
原有方案二	19946	4.56秒
原有方案三	14646	1.98

截至目前，方案三以较高效率胜出，新方案一其实也不错，原有方案三排第三位。

--2020年4月12日--