实验：小表驱动大表速度比较

SQL性能优化的侧重点之一就是用小表驱动大表，今天我做了一次实验。

硬件环境：T440p

软件环境：Win10

实验数据库：Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production

实验表：bigtable含id,name,score,createtime，有百万数据；smalltable含id,name，createtime三个字段，有十万数据。

这些表是如何建表插值的请参考：https://www.cnblogs.com/xiandedanteng/p/12151536.html

需求：先从小表中查出姓名以张开头的姓名，再与大表进行半连接。这样是为了模拟《SQL优化核心思想》（罗炳森黄超等著）P187的内容。否则用select * from bigtable where name like '张%'就好了。

目的：对比常规方案和优化后的小表驱动大表方案的耗时;

常规方案：

select * from bigtable where name in (select name from smalltable where name like '张%')

小表驱动大表方案：

select /*+ leading(big@small) use_nl(big@small,big */ * from bigtable big 

where name in (select /*+ qb_name(small) */ name from smalltable where name like '张%')

比较结果：

#	常规方案耗时	小表驱动大表方案耗时
1	00: 00: 04.81	00: 00: 03.54
2	00: 00: 04.04	00: 00: 03.50
3	00: 00: 04.06	00: 00: 03.47

可以看去小表驱动大表的确实是快些，但优势还是不明显，下面让我把大表扩大些。

我使用以下SQL将bigtable的记录数扩成了两百万，这已经是我可怜的T440p能接受一次性插入的最大数据量了。

 Insert into bigtable
 select rownum,dbms_random.string('*',dbms_random.value(6,20)),dbms_random.value(0,20),sysdate from dual
 connect by level<=2000000
 order by dbms_random.random

比较结果如下：

#	常规方案耗时	小表驱动大表方案耗时
1	00: 00: 07.29	00: 00: 06.75
2	00: 00: 07.80	00: 00: 06.81
3	00: 00: 07.37	00: 00: 06.87

优势更明显些了。

第二次实验再次确认了小表驱动大表的优势。

第三次实验

限于bigtable总量的限制，这回我把大表里含张姓的数据调多了，具体看看：

SQL> select count(*) from smalltable where name like '张%';

  COUNT(*)
----------
       212

已用时间:  00: 00: 00.03

SQL> select count(*) from bigtable where name like '张%';

  COUNT(*)
----------
     48330

已用时间:  00: 00: 00.12

小表张姓数据是两百来条，大表是四万八千条。

再比三次：

#	常规方案	小表驱动大表方案
1	00: 01: 16.40	00: 01: 12.54
2	00: 01: 17.14	00: 01: 15.60
3	00: 01: 15.73	00: 01: 15.30

这把随着数据量的增多，优势反而越来越微弱了！！

忽然想起小表驱动大表后面还有一句，大表走索引，再给大表的name字段加上索引试试：

SQL> create index bigtable_name_index on bigtable(name);

索引已创建。

已用时间:  00: 00: 06.02

再比三次：

#	常规方案	小表驱动大表方案
1	00: 01: 15.62	00: 01: 13.59
2	00: 01: 16.27	00: 01: 16.24
3	00: 01: 17.50	00: 01: 16.58

让人晕菜的结果出来了，原以为加了索引两个方案都快了，而且驱动比常规方案优势明显，结果都没有。估计数据库还是走的全表查询。

让我们看看select * from bigtable where name in (select name from smalltable where name like '张%')的执行计划：

SQL> set autotrace trace exp
SQL> select * from bigtable where name in (select name from smalltable where name like '张%');
已用时间:  00: 00: 00.00

执行计划
----------------------------------------------------------
Plan hash value: 4026363122

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                     |   311 | 50693 |   446   (1)| 00:00:06 |
|   1 |  NESTED LOOPS                |                     |       |       |            |          |
|   2 |   NESTED LOOPS               |                     |   311 | 50693 |   446   (1)| 00:00:06 |
|   3 |    SORT UNIQUE               |                     |   178 | 11036 |   172   (2)| 00:00:03 |
|*  4 |     TABLE ACCESS FULL        | SMALLTABLE          |   178 | 11036 |   172   (2)| 00:00:03 |
|*  5 |    INDEX RANGE SCAN          | BIGTABLE_NAME_INDEX |     3 |       |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID| BIGTABLE            |     2 |   202 |     5   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("NAME" LIKE U'5F20%')
   5 - access("NAME"="NAME")
       filter("NAME" LIKE U'5F20%')

Note
-----
   - dynamic sampling used for this statement (level=2)

select /*+ leading(big@small) use_nl(big@small,big */ * from bigtable big where name in (select /*+ qb_name(small) */ name from smalltable where name like '张%')的执行计划：

SQL> select /*+ leading(big@small) use_nl(big@small,big */ * from bigtable big 

where name in (select /*+ qb_name(small) */ name from smalltable where name like '张%');
已用时间:  00: 00: 00.00

执行计划
----------------------------------------------------------
Plan hash value: 4026363122

----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                     |   311 | 50693 |   446   (1)| 00:00:06 |
|   1 |  NESTED LOOPS                |                     |       |       |            |          |
|   2 |   NESTED LOOPS               |                     |   311 | 50693 |   446   (1)| 00:00:06 |
|   3 |    SORT UNIQUE               |                     |   178 | 11036 |   172   (2)| 00:00:03 |
|*  4 |     TABLE ACCESS FULL        | SMALLTABLE          |   178 | 11036 |   172   (2)| 00:00:03 |
|*  5 |    INDEX RANGE SCAN          | BIGTABLE_NAME_INDEX |     3 |       |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID| BIGTABLE            |     2 |   202 |     5   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("NAME" LIKE U'5F20%')
   5 - access("NAME"="NAME")
       filter("NAME" LIKE U'5F20%')

Note
-----
   - dynamic sampling used for this statement (level=2)

最后我们发现执行计划是近乎一致的，看来是Oracle内部将普通方案进行优化的结果，优化的方向就是小表驱动大表。

如果我强制让大表驱动，那么差别不就出来了？

SQL> select /*+ leading(big) */ * from bigtable big where name in (select /*+ qb_name(small) */ name from smalltable where name like '张%');
已用时间:  00: 00: 00.01

执行计划
----------------------------------------------------------
Plan hash value: 3046684681

-----------------------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |   312 | 50856 |       |  4297   (2)| 00:00:52 |
|*  1 |  HASH JOIN SEMI    |            |   312 | 50856 |  4920K|  4297   (2)| 00:00:52 |
|*  2 |   TABLE ACCESS FULL| BIGTABLE   | 44556 |  4394K|       |  3883   (2)| 00:00:47 |
|*  3 |   TABLE ACCESS FULL| SMALLTABLE |   178 | 11036 |       |   172   (2)| 00:00:03 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("NAME"="NAME")
   2 - filter("NAME" LIKE U'5F20%')
   3 - filter("NAME" LIKE U'5F20%')

Note
-----
   - dynamic sampling used for this statement (level=2)

这一步oralce让大表先筛选一遍，再是小表筛选一遍，两者再做哈希半连接。

看这把它要执行多少时间：

已选择48330行。

已用时间: 00: 00: 09.75

这比小表驱动大表还要快！

再看看终极 sql要跑多久：

select * from bigtable where name like '张%'

已选择48330行。

已用时间: 00: 00: 09.94

看来数据不够复杂，Oracle内部优化器轻易找到了最优方案，这次实验没有达到预期效果。

--END-- 2020年1月5日10点28分

实验：小表驱动大表 速度比较

实验：小表驱动大表速度比较