oracle 验证流水存在性火箭试优化

在生产中经常遇到“select * from tbl_IsExist where date=?”的SQL，经与开发人员沟通得知此SQL是验证流水存在性，若不存在则插入，若存在退出。

前台根据list的size值来判断符合条件的数据是否存在。我们构建测试表分析验证流水存在性的优化方向。

1、构建测试表

--创建测试表
drop table tbl_IsExist purge;
create table tbl_IsExist
(id number not null ,in_create date,in_remark varchar2(1000),
primary key(id)
);
--插入测试数据
insert into tbl_IsExist
select rownum,to_date(trunc(dbms_random.value(to_number(to_char(sysdate-5,'J')),to_number(to_char(sysdate,'J')))),'J'),rpad('*',1000,'*') from dual connect by rownum<20000;
--根据生产创建索引
SQL> create index ind_isexist_create on tbl_isexist(in_create);

Index created.

View Code

由于生产环境date的数据是均匀分配的，对于in_create字段的处理符合生产规则。执行生产中捕获的SQL，查看其执行计划。

2、发现验证存在性的SQL执行计划是全表扫描，代价很高：

--设置输出参数
SQL> set linesize 2000
SQL> set autotrace on;
--执行查询脚本
SQL> select * from tbl_isexist where in_create=TO_DATE('2020-09-24 00:00:00','YYYY-MM-DD HH24:MI:SS');

3962 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2271097464

---------------------------------------------------------------------------------
| Id  | Operation              | Name            | Rows    | Bytes | Cost (%CPU)| Time    |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                |  3269 |  1672K|   777   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL    | TBL_ISEXIST     |  3269 |  1672K|   777   (1)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("IN_CREATE"=TO_DATE(' 2020-09-24 00:00:00', 'syyyy-mm-dd
          hh24:mi:ss'))

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
      0      recursive calls
      0      db block gets
      3098  consistent gets
      0      physical reads
      0      redo size
      90787 bytes sent via SQL*Net to client
      3362  bytes received via SQL*Net from client
      266      SQL*Net roundtrips to/from client
      0      sorts (memory)
      0      sorts (disk)
      3962  rows processed

View Code

表上有供选择的索引，但是SQL并没有执行，此SQL的执行计划是全表扫描，cost为777，逻辑读3098数据块。若表中的记录很多，全表扫描的执行计划将耗用大量的逻辑读。
3、开始优化，验证流水存在性SQL

①优化思路1：

表中有可用索引，优化器未采纳是否由于代价高，进一步验证：

用hint强制走索引，查看执行计划
SQL> select /*+ INDEX(T IND_ISEXIST_CREATE)*/* from tbl_isexist T where in_create=TO_DATE('2020-09-24 00:00:00','YYYY-MM-DD HH24:MI:SS');

3962 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1453123986

----------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |  3269 |  1672K|  2004   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TBL_ISEXIST          |  3269 |  1672K|  2004   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                    | IND_ISEXIST_CREATE |  3269 |          |    10   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("IN_CREATE"=TO_DATE(' 2020-09-24 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
      0          recursive calls
      0          db block gets
      2648      consistent gets
      0          physical reads
      0          redo size
      4126831   bytes sent via SQL*Net to client
      3397      bytes received via SQL*Net from client
      266          SQL*Net roundtrips to/from client
      0          sorts (memory)
      0          sorts (disk)
      3962      rows processed

View Code

强制走索引cost是2004比走全表扫描代价更大，oracle是基于代价的优化器，故舍弃索引扫描的执行计划

②优化思路2：

在行数上做文章，验证流水存在性是否需要将所有记录都返回。因为在java中根据list的值不为0判断存在性，设想满足条件的第一条记录存在，岂不已经验证存在性。

现改写SQL查看其执行计划：

--只取第一条符合条件的记录，加限制条件rownum=1
SQL> select * from tbl_isexist where in_create=TO_DATE('2020-09-24 00:00:00','YYYY-MM-DD HH24:MI:SS') and rownum=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 1281543153

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name               | Rows  | Bytes | Cost (%CPU)| Time    |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                    |    1  |   524 |    2   (0)| 00:00:01 |
|*  1 |  COUNT STOPKEY                        |                    |       |       |           |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED | TBL_ISEXIST        |  3269 |  1672K|    2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                   | IND_ISEXIST_CREATE |       |       |    1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   3 - access("IN_CREATE"=TO_DATE(' 2020-09-24 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
      0  recursive calls
      0  db block gets
      3  consistent gets
      0  physical reads
      0  redo size
   1725  bytes sent via SQL*Net to client
    471  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

我们由取全部符合条件的记录，改为只取第一行符合条件的记录，cost由777降为2，逻辑读由3098降为3，提速几百倍，甚是欣喜。高兴之余，试想是否还有可优化空间。分析执行计划：INDEX RANGE SCAN花去了1个cost，TABLE ACCESS BY INDEX ROWID BATCHED也就是回表花去了1个cost（2-1）。

回表的原理：查询的列不在索引中，需要通过索引的rowid定位表记录，把需要的字段取出，展示给客户端。

③优化思路3：

不回表降低cost，在SQL返回的列上做文章。只显示索引“IND_ISEXIST_CREATE”的列in_create，既能降低cost也能满足需求，完全可以。

SQL> select in_create from tbl_isexist where in_create=TO_DATE('2020-09-24 00:00:00','YYYY-MM-DD HH24:MI:SS') and rownum=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 3010836204

----------------------------------------------------------------------------------------
| Id  | Operation          | Name                | Rows  | Bytes | Cost (%CPU)  | Time    |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                     |     1 |     9 |     1     (0)| 00:00:01|
|*  1 |  COUNT STOPKEY     |                     |       |       |              |         |
|*  2 |   INDEX RANGE SCAN | IND_ISEXIST_CREATE  |  3269 | 29421 |     1     (0)| 00:00:01|
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM=1)
   2 - access("IN_CREATE"=TO_DATE(' 2020-09-24 00:00:00', 'syyyy-mm-dd
          hh24:mi:ss'))

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)


Statistics
----------------------------------------------------------
      0  recursive calls
      0  db block gets
      2  consistent gets
      0  physical reads
      0  redo size
    556  bytes sent via SQL*Net to client
    479  bytes received via SQL*Net from client
      2  SQL*Net roundtrips to/from client
      0  sorts (memory)
      0  sorts (disk)
      1  rows processed

至此优化结束，代价从777降到1，逻辑读从3098降到2，可谓是火箭试的提速！

其他问题请关注目录:https://www.cnblogs.com/handhead/